Building a Mythological Programming Language Compiler For an x86 CPU (NASM) — Part I — Hades
An Educational Passion Project to Understand The Complexities of Building a Compiler to NASM For a Custom Language
--
To kick off this series, we will discuss what a compiler is and what the general structure of a compiler looks like. We will implement throughout the series a fairly basic compiler for a new programming language, following the classic skeleton of a compiler. The immediate goal is to create a compilable language in which one can solve a couple of basic LeetCode problems and even allow for some scripting-level programs.
To make it more exotic and interesting, I will be blending technology with mythology by naming everything in the language with a Hades theme (God of the Underworld).
This follows a current trend I’ve been on with articles such as The Code Purgatorio: Ascension to Clean Code and Plato’s Republic Of Software Engineering: A Philosophical Perspective.
If you like this content, please consider clapping, following and subscribing to my newsletter.
Compilers
They are marvels of software engineering. People have done amazing things with technology even before compilers allowed for much more high-level programming and lowered the barrier of entry.
If you are as old as me, you have perhaps played the good old RollerCoaster Tycoon game, the creator of which wrote it entirely in assembly by hand! You can find a super cool open-sourced version here, called OpenRCT2.
To learn more about compilers have a look at this CppNow talk:
Why Yet Another Compiler
Primarily for my learning, and to hopefully leave a path others can follow for a fun approach to understanding compilers more in-depth and not taking high-level programming languages for granted.
Why Hades?
Hades is the hypothetical god of the human underworld just as compilers are the god of the programming underworld.
Here’s what the code will look like:
hero a = 2; # variable
hero b = 3; # variable
styx a; # prints a
bestow a; # returns a
Hero
These are the equivalent of variables, where there is no set type per variable name. It is similar to a simple assignment in Python or TypeScript’s: let a = 2
.
Styx
This is the equivalent of print
or console.log
. The name refers to a river of the underworld, and similarly, it will be a river for our output to the console.
Bestow
This is a playful take on the good old return
statement.
What Makes Up a Compiler
The job of a compiler, in our case, is to turn a programming language into code the computer understands well enough to know what to give to the CPU to execute.
To accomplish this feat, the general process is to tokenize the input, convert it to an abstract syntax tree (AST), and then convert the AST into assembly. A good compiler will also do sanity checks throughout the process to validate grammar and syntax alongside possibly the safety of the program and other considerations (think compiler warnings).
Let’s make this less abstract and more visual:
In this diagram, we can observe a program converted to an array of tokens and then into a potential abstract syntax tree. There are many ways one could do this, but this is what I have decided for this demonstration.
Netwide Assembler (NASM)
To convert the Abstract Syntax Tree to machine code (assembly), we will use the Netwise Assembler NASM.
I will also primarily focus on MacOS 64-bit, but with the understanding here one can change the final step of the pipeline to generate for any other operating system. It wouldn’t be extremely easy, but many things will not need to change. NASM has been designed to be simple and easy to understand, with a syntax close to the standard Intel syntax, making it familiar to those who have worked with the x86 assembly language.
ASM is also well-documented, which makes it easier for newcomers to learn how to use it and for experienced developers to reference.
Next Steps
I will publish articles on how I have gone about the initial steps of allowing the language to return and print values, and eventually store variables and do operations on them, alongside flow control (if, for, etc.).
The very next few articles will focus on generating tokens, an AST and assembly for a basic example where we will be able to return an integer number from the program and set that number as the exit status code ($?
)once the program closes. And this will give us an end-to-end working compiler that we can then expand in complexity.
echo 'bestow 3;' > program.hades
./compile.sh program.hades
./program
echo $?
> 3
The above will be the first result of tokenizing, parsing and converting the language to assembly.
Let’s go ahead and build a Tokenizer For a Simple Program in Part II of this series!
Dear Reader,
Please clap, follow and join my newsletter if you like this content and you would like to support 🙇🏻♂️. You can connect with me on X (Twitter), Linkedin, Instagram & Github!
If you liked this article, you may also enjoy:
- Dante’s Code Hell Inferno: the Nine Layers
- The Code Purgatorio: Ascension to Clean Code
- How To Not Be a Run-of-the-Mill Software Engineer
- Plato’s Republic Of Software Engineering: A Philosophical Perspective
- Code 💻: The Modern Driving Skill — Why It’s as Crucial Today as Driving 🚙Was in the Past
- Diversify your Experience as you would your Investment Portfolio!