Building the EVM from scratch

6 min read

EVM From Scratch

💬

I recently built my own (incomplete) implementation of the Ethereum Virtual Machine in Typescript. This was made possible thanks to the EVM-From-Scratch practical challenge built by W1nt3r.eth

🔍

evm-from-scratch repository

Full source code of the project

github.com

First principles: What is the EVM?

You can skip this section if you want to jump to the implementation details!

Let's start with simple principles: If the Ethereum Protocol was a computer, then the EVM would be its operating system, and smart contracts would be the programs running on it. Computers are deterministic: this means that if you give exactly the same input to a computer, you will always get the same output. This is also why generating random numbers is so hard.

For Ethereum smart contracts to work on a huge network of potentially hundreds of millions of computers across the world, we need to be sure that the software running on every single one is deterministic and will give the expected output every time. For this to happen on any machine independently of its processor architecture, memory blueprint or other hardware differences, we need to abstract away the program's execution from the hardware. This is where virtual machines come into play.

In particular, the Ethereum Virtual Machine contains the set of instructions that are supported by the protocol and the rules that control the interactions between different smart contracts, transfer of value, memory or storage access, and more.

How is the EVM laid out?

The official source of truth for the EVM implementation is the Ethereum yellow paper. There are a lot of useful resources explaining its layout, as well as multiple complete implementations in different languages:

On its core, the EVM is a stack-based virtual machine with a 32 byte word size and 1024 stack depth. The word size was chosen because it can perfectly fit a Keccak256 hash (256 bit), which is the hash primitive used for the Ethereum protocol, as well as allowing easy 256-bit integer arithmetic.

Other than the stack, the single EVM execution context includes many other entities, such as memory, storage, transaction data, block data, call data, return data and more.

The memory is defined as an ephemeral infinitely-extensible byte array which starts empty at the beginning of each execution. In practice, there is a quadratic cost associated with expanding memory, so it is not possible to consume more than a limited amount of it.

The storage is a key-value store holding data about each account indexed by address. The data is stored in the form of 256-bit words. The storage is a part of the state of the blockchain and is updated by the transactions. The way this works is by keeping track of the global state via keeping the root hash of the state Merkle tree in each block header. I wrote more about this here.

What operations can the EVM do?

A full reference of all the EVM assembly instructions can be found here. For my own implementation, I followed this categorization of instructions:

  • Arithmetic: basic integer arithmetic
  • Bitwise Logic: simple logic gates
  • Block: access block information
  • Comparison: compare integer values
  • Control Flow: alter the program counter conditionally
  • Environmental: access calldata, code, return data, balance and other transaction info
  • Keccak: hash functions
  • Logging: on-chain event logs creation
  • Memory: read or write from memory
  • Stack: stack operations such as push or pop
  • Storage: read or write from transient storage
  • System: call sub-execution contexts, create new ethereum accounts

Development journal

I have documented the building process thoroughly day by day here.

Here is a list of all the journal entries:

📚

Day 0

Research of relevant learning material & tools to get started

github.com
📚

Day 1

Gathering more resources & reading Mastering Ethereum chapter 13

github.com
📚

Day 2

Setting up the EVM-from-scratch challenge & EVM class

github.com
📚

Day 3

Reading the yellow paper & EVM inception (EVM inside EVM)

github.com
📚

Day 4

Stack & memory implementation & first Opcodes

github.com
📚

Day 5

PUSH, POP, SUB Opcodes, MachineState context struct

github.com
📚

Day 6

Most Arithmetic, Comparison, Bitwise operations & JUMP Opcodes

github.com
📚

Day 7

Memory structure & related Opcodes

github.com
📚

Day 8

TxData, globalState, Block data & related Opcodes

github.com
📚

Day 9

More Environmental Opcodes & CALLDATALOAD.

github.com
📚

Day 10

CALLDATASIZE, CALLDATACOPY, CODESIZE, CODECOPY Opcodes

github.com
📚

Day 11

EXTCODESIZE, EXTCODECOPY, SELFBALANCE Opcodes

github.com
📚

Day 12

Research & study on the Storage / data layer of the Ethereum protocol

github.com
📚

Day 13

Simple Storage implementation, SSTORE, SLOAD, RETURN, REVERT Opcodes

github.com
📚

Day 14

Upgraded test file & refactored code, added GAS, LOG Opcodes

github.com
📚

Day 15

Major EVM class refactoring & started CALL Opcode

github.com
📚

Day 16

Final CALL implementation & RETURNDATASIZE, RETURNDATACOPY Opcodes

github.com
📚

Day 17

Opcode runners refactoring, DELEGATECALL, STATICCALL Opcodes

github.com
📚

Day 18

CREATE, SELFDESTRUCT Opcodes. Challenge completed!

github.com