Writing a programming language: Step 1 - a simple virtual machine, called DVM01

Many times, I started hacking up a specification of the core features of “my” language, only to discover that such a spec is a huge project in itself: The more spec I wrote, the more details turned up that I just couldn’t keep track of. In the end, I always got demotivated moved on, only to try again a few weeks or months later.

So, this time, I’ve decided to try more of a bottom-up approach: Before even starting on another language specification, I wanted to implement something simple, something which could then evolve with my knowledge of the domain.

A plan!

As that idea rolled around in my head, I finally set out to specify some very simple, assembly-like vocabulary and a concept of a virtual machine that could run it. I thought of the minimum requirements so I could later write some “useful” program with it.

After a little bit of thinking, I decided on the following set of operations:

  • READ - to get input from the user
  • WRITE - to output something
  • SET - to initialize a memory location with a value
  • JNZ - to jump to an address conditionally (here: jump if some value is non-zero)
  • All four basic math operators (+, -, *, /)

The idea emerged that my VM should have a Von-Neumann style Memory architecture (data and program share the same address space).

After some further amount of thinking, I figured that I would require some indirect access to memory as well, since it simplifies wiriting programs for that VM by a very large amount.

More details

After some polishing, I finally got around to write up a spec that does the above. Here are the main points:

  • No functions
  • No variables (addresses only)
  • No registers - you can use a NOP slide at the beginning of the program to free some well-defined space. Then you can use that to simulate registers
  • One address space
  • Program counter is directly modifiable (stored at address 0)
  • IO: only stdin/stdout available, very simplified: You can write (unicode) characters or integers, and read unicode characters. Internal representation is always as an integer.
  • Only datatype: integer
  • Variable-length opcodes
  • Operands can be indirect or direct. Indirect uses the address that the operand points to, direct uses operand literally (either as address or value, depending on the operation):

    • SET 80 5 # set address 80 to value 5
    • WRITEI @80 # output integer stored at address 80
  • In other words: if a value is prefixes with an @, it is interpreted as an address, and the value at this address is used. This also works for non-addr operands, as can be seen by the following examples:

    • JZ 0 0 # always jumps to addr 0 (restart program).
    • JZ 0 @0 # Jump to addr 0 if value at that addr is 0

Here’s a full list of all the operations that are supported.

Opcode Description
START Your program needs to start with this. It's actually just an alias for NOP. However, you really need this to make space for the program counter, which sits at address 0.
NOP Doesn't do anything.
READC addr Read character from stdin, storing it's integer (unicode) value at addr
WRITEC addr Writes integer at addr as an unicode character to stdout
WRITEI addr Writes integer at addr in decimal representation to stdout
SET addr n Sets value at given addr to value n
ADD addr n Adds value n to value at addr
SUB addr n Subtracts value n from value at addr
MUL addr n Multiplies value at addr by n
DIV addr n divides value at addr by n
JZ addr n Jumps to opcode at addr if n is zero
JNZ addr n Jumps to opcode at addr if n is not zero
JGT addr n1 n2 Jumps to opcode at addr if n1 is greater than n2.
COPY addr1 addr2 Writes content of addr2 to addr1 (Use this to de-reference pointers)
DUMP addr1 addr2 Dumps the memory area between addr1 and addr2 (Useful for debugging)
END Exits the program

Implementation

Of course, every good thing is not only described, but actually done. So i went out to hack up a simple Web GUI and started writing an implementation in javascript. Why javascript? I think it’s just the most accessible language (for you, the reader!), since you don’t need any fancy programming language installed (which you may or may not have), and you don’t need any compiler / build tool knowledge either to give it a shot.

You can download the current version from Github. As you can see, I’ve already put up a directory structure that hints at more stuff to come. That’s right! I hope to develop that virtual machine further in the future, adding features and improving, until the main goal is reached: Dave’s own virtual machine :)

But for now, go grab that code and try it for yourself!