Building a Mythological Programming Language Compiler For an x86 CPU (NASM) — Part II —Tokenizer For a Simple Program
Creating Tokens For a Hyphothethical Yet Working Programming Language to Understand Compilers Better
In the previous part of this series, Building a Mythological Programming Language Compiler For an x86 CPU (NASM) — Part I — Hades), we have covered what we want to accomplish and the general structure of a compiler:
Code => Tokens => Parsed Tokens as Abstract Syntax Tree => Assembly Code CPU understands
It is time to deep-dive into the nitty-gritty and get our hands dirty by implementing a basic tokenizer in C++.
The Hades Tokenizer
Every part of a compiler can be infinitely complex, hence we need to understand where to draw a proverbial line in the sand and limit the scope of our implementation.
To kick us off easily, our tokenizer will not handle any edge cases that we would expect any language to support, but instead, we will be satisfied with a system that can parse the following tokens:
enum TokenType
{
BESTOW,
STYX,
NUMBER,
SEMICOLON…