Dave's Virtual Machine: Now with labels

In my previous article, I came to the conclusion that programming in DVM01 is, in fact, quite troublesome. The trickiest part, by far, was getting the correct addresses when you need to jump to another code location. The stack implementation (link above) used a bunch of NOPs to help with aiming the jumps. While this helps a bit, it’s still only a hack. So I decided to give you…

DVM02 - An upgraded VM whose “compiler” supports jump labels!

Okay, so what changed? Syntactically, not much. Let’s have a look at the fibonacci implementation. It’s simple enough, with one loop and a couple of “variables”.

The parser/compiler part now recognizes two new token types: a jump label, and a label identifier. Okay, the tokenizer doesn’t care about the identifier, it’s still just a regular word at that stage. So what does the code look like? DOES HE LOOK LIKE A BITCH? - No.

# This is a fibonacci implementation for Dave's second virtual machine (DVM02).

# First, initialize a nop slide for "registers". The first one is still
# the program counter, and we're just labeling it as that.  Instead of
# documenting everything, We're now using jump labels, even for things
# that we won't be using for jumps.

pc:        START
acc1:      NOP
acc2:      NOP
temp:      NOP
loopcount: NOP

# Initialisation: Loop counter and accumulators
    SET loopcount 30
    SET acc1       1
    SET acc2       1

loop:
    SET temp @acc1
    ADD temp @acc2
    WRITEI @acc1
    WRITEC 10       # Newline
    SET acc1 @acc2
    SET acc2 @temp

    SUB loopcount 1

# Jump to beginning of the loop, unless the counter is zero.
    JNZ loop @loopcount

# If the counter was zero, the jump didn't happen, and the program
# must end.

    END

Aaah, if you compare it with the previous version, this really looks much more readable, or at least understandable, don’t you agree?

Down the rabbit hole: The tokenizer

So what exactly changed in the parser? As it turns out, not really much. In fact, what changed was only the tokenizer and parser. Everything else (ie. the interpreter) is still exactly the same as with DVM01.

The tokenizer needed to learn a new token type, because we need to treat jump labels differently: The main idea behind jump labels was that the resulting “bytecode”, if we can already call it that way, should still be compatible with the old implementation.

So, jump labels got some special treatment in the tokenizer. Any word that is followed by a colon is converted to a jump label. Tadaa, Tokenizer done!

Even further down: The parser

The changes in the parser are a bit more complicated. As it turns out, I needed to double-pass the token stream. Why? Because otherwise, forward jumps would not be possible.

Let me explain: When first going through the tokens, you don’t know what’s coming up ahead. If you stumble upon a reference to a jump label that’s not defined yet, you’re screwed - You can’t convert it to an address, because the address is not known yet.

Therefore, a double-pass.

In the first pass, we simply look for tokens of type “jumplabel” and store their position in the token stream in a dictionary. Remember, the position in the stream equals the address in our current implementation.

So, now in the actual parse pass, if we find a reference to a label, we can simply look it up. If it’s not there, it doesn’t exist, and we simply step out of the computer and slap the programmer. And a real Archie slap at that!

Back to business.

So what exactly is a reference, and where can we use it? Simple: Anything operand to any operation can now be a reference. If it’s not a number, it’s a reference - that easy. Note that operand modifiers like “@” are still treated exactly the same. If we find such a reference, it’s simply replaced with the number stored in our label dictionary from pass one. Then, processing continues as usual.

Result: A whole lot less guesswork, that’s what!

As usual, you can find the code on Github. Look at the DVM02 subdirectory!

And if you really bother to look, you can even see the beginning of a python implementation. It’s not really working yet, but maybe I’ll have a go at it soon. We’ll see :)