Building a Mythological Programming Language Compiler For an x86 CPU (NASM) — Part II —Tokenizer For a Simple Program

Creating Tokens For a Hyphothethical Yet Working Programming Language to Understand Compilers Better

Adrian Nenu 😺
5 min readNov 16, 2023

In the previous part of this series, Building a Mythological Programming Language Compiler For an x86 CPU (NASM) — Part I — Hades), we have covered what we want to accomplish and the general structure of a compiler:

Code => Tokens => Parsed Tokens as Abstract Syntax Tree => Assembly Code CPU understands

It is time to deep-dive into the nitty-gritty and get our hands dirty by implementing a basic tokenizer in C++.

CPU city — generated by Midjourney

The Hades Tokenizer

Every part of a compiler can be infinitely complex, hence we need to understand where to draw a proverbial line in the sand and limit the scope of our implementation.

To kick us off easily, our tokenizer will not handle any edge cases that we would expect any language to support, but instead, we will be satisfied with a system that can parse the following tokens:

enum TokenType
{
BESTOW,
STYX,
NUMBER,
SEMICOLON…

--

--

Adrian Nenu 😺

Software Engineer @ Google. Photographer and writer on engineering, personal reflection, and creativity - nenuadrian.com.