Building an Assembler

A while back, I decided to take the NANDToTetris course as a way to help fill in some knowledge gaps. The course final project (part 1) asks students to write an assembler translating code written in the HACK assembly language into binary executable code for the HACK hardware platform. You can read more details about the requirements for this project at the sites project page.

Writing the Assembler

Another personal goal for this project was to familiarize myself with PHP. I had written a few small programs using PHP, but never really spent much time outside of that with the language. So I decided to make my implementation of the HACK assembler in PHP. Overall, adding this challenge to the project definitely made the experience much more enjoyable, but it also made things a bit messy 😅.

My aim was to get a minimum viable solution and so I won’t be spending time refactoring this down the line, but I still wanted to share my results because it was a very fun and rewarding project.

You can find my completed project files and instructions on how to get things up and running on GitHub.

How it Works

I won’t dive too deep into the details for this since you can just have a look at the project files, but as a general overview, the assembler is broken up into three main parts:

  • symbols.php includes the logic for adding special symbols written in the target file to a symbol table to be referenced later when generating the final output. This file also contains logic for assigning the correct address to each symbol using the SymbolTable class.
  • parser.php includes the logic for parsing through the target file and generating a multidimensional array containing each of the commands to be translated in the target file.
  • translator.php is responsible for translating the commands contained in the elements of the multidimensional array generated by parser.php into their respective 16-bit counterparts.

Putting all of these together, assembler.php does the following:

  1. Opens a new file matching the filename of the target with a .hack extension
  2. Parses the target file searching for special symbols to add to a symbol table
  3. Parses the target file a second time, this time assigning an address to each symbol based on file location
  4. Parses the target a final time to create a multi-dimentional array comprised of each commands type and actual instruction
  5. Iterates through this multi-dimensional array and writes to the output file created in step 1

What I Learned

While this project may not be very impressive, I definitely had a lot of fun writing it. From the course, I learned a ton about how computers work, from logic-gates to low-level machine language. I highly recommend it to anyone who, like myself, is self-taught and is looking to go back and learn some of the fundamentals of computer science.