I recently built my own (incomplete) implementation of the Ethereum Virtual Machine in Typescript. This was made possible thanks to the EVM-From-Scratch practical challenge built by W1nt3r.eth
evm-from-scratch repository
Full source code of the project
github.comFirst principles: What is the EVM?
You can skip this section if you want to jump to the implementation details!
Let's start with simple principles: If the Ethereum Protocol was a computer, then the EVM would be its operating system, and smart contracts would be the programs running on it. Computers are deterministic: this means that if you give exactly the same input to a computer, you will always get the same output. This is also why generating random numbers is so hard.
For Ethereum smart contracts to work on a huge network of potentially hundreds of millions of computers across the world, we need to be sure that the software running on every single one is deterministic and will give the expected output every time. For this to happen on any machine independently of its processor architecture, memory blueprint or other hardware differences, we need to abstract away the program's execution from the hardware. This is where virtual machines come into play.
In particular, the Ethereum Virtual Machine contains the set of instructions that are supported by the protocol and the rules that control the interactions between different smart contracts, transfer of value, memory or storage access, and more.
How is the EVM laid out?
The official source of truth for the EVM implementation is the Ethereum yellow paper. There are a lot of useful resources explaining its layout, as well as multiple complete implementations in different languages:
- go-ethereum: Go
- py-evm: Python
- evmone: C++
- ethereumjs-evm: Typescript
- revm: Rust
On its core, the EVM is a stack-based virtual machine with a 32 byte word size and 1024 stack depth. The word size was chosen because it can perfectly fit a Keccak256 hash (256 bit), which is the hash primitive used for the Ethereum protocol, as well as allowing easy 256-bit integer arithmetic.
Other than the stack, the single EVM execution context includes many other entities, such as memory, storage, transaction data, block data, call data, return data and more.
The memory is defined as an ephemeral infinitely-extensible byte array which starts empty at the beginning of each execution. In practice, there is a quadratic cost associated with expanding memory, so it is not possible to consume more than a limited amount of it.
The storage is a key-value store holding data about each account indexed by address. The data is stored in the form of 256-bit words. The storage is a part of the state of the blockchain and is updated by the transactions. The way this works is by keeping track of the global state via keeping the root hash of the state Merkle tree in each block header. I wrote more about this here.
What operations can the EVM do?
A full reference of all the EVM assembly instructions can be found here. For my own implementation, I followed this categorization of instructions:
- Arithmetic: basic integer arithmetic
- Bitwise Logic: simple logic gates
- Block: access block information
- Comparison: compare integer values
- Control Flow: alter the program counter conditionally
- Environmental: access calldata, code, return data, balance and other transaction info
- Keccak: hash functions
- Logging: on-chain event logs creation
- Memory: read or write from memory
- Stack: stack operations such as push or pop
- Storage: read or write from transient storage
- System: call sub-execution contexts, create new ethereum accounts
Development journal
I have documented the building process thoroughly day by day here.
Here is a list of all the journal entries:
Day 0
Research of relevant learning material & tools to get started
github.comDay 1
Gathering more resources & reading Mastering Ethereum chapter 13
github.comDay 2
Setting up the EVM-from-scratch challenge & EVM class
github.comDay 3
Reading the yellow paper & EVM inception (EVM inside EVM)
github.comDay 4
Stack & memory implementation & first Opcodes
github.comDay 5
PUSH, POP, SUB Opcodes, MachineState context struct
github.comDay 6
Most Arithmetic, Comparison, Bitwise operations & JUMP Opcodes
github.comDay 7
Memory structure & related Opcodes
github.comDay 8
TxData, globalState, Block data & related Opcodes
github.comDay 9
More Environmental Opcodes & CALLDATALOAD.
github.comDay 10
CALLDATASIZE, CALLDATACOPY, CODESIZE, CODECOPY Opcodes
github.comDay 11
EXTCODESIZE, EXTCODECOPY, SELFBALANCE Opcodes
github.comDay 12
Research & study on the Storage / data layer of the Ethereum protocol
github.comDay 13
Simple Storage implementation, SSTORE, SLOAD, RETURN, REVERT Opcodes
github.comDay 14
Upgraded test file & refactored code, added GAS, LOG Opcodes
github.comDay 15
Major EVM class refactoring & started CALL Opcode
github.comDay 16
Final CALL implementation & RETURNDATASIZE, RETURNDATACOPY Opcodes
github.comDay 17
Opcode runners refactoring, DELEGATECALL, STATICCALL Opcodes
github.comDay 18
CREATE, SELFDESTRUCT Opcodes. Challenge completed!
github.com