Runtime code generation and execution in Go: Part 1

2024-05-19

Disclaimer: I won’t expand on why/when you want to do this kind of hack, and you should not do this unless you know exactly what you are doing. Everything I talk about here is completely unsafe and might not be accurate for the future Go versions. This is not a recommendation, but more of a fun story.

A few days ago, I posted about the idea of a blog post on runtime code generation and execution in Go:

feel like i should write a blog post about how to write JIT engine in pure Go and really weird bugs I encountered in the development of wazero's compiler if anyone wants to read
— Takeshi Yoneda(マスタケ) (@mathetake) May 18, 2024

And it got a lot of attention way more than I expected, so I decided to do it. I don’t think single post is enough to cover all I want to share, so I’ll split it into multiple posts (please hope I won’t die before I finish it).

First of all, who am I? In case you don’t know me, which is likely the case for most of you, I’m an open source software engineer working for a startup called Tetrate.io. In the last few years, I was knee-deep in the space of WebAssembly and its ecosystem, and service mesh related software like Envoy and Istio(¹,²). I’ve mostly written Go/C++ at work, but also like to use Rust and Zig³ in side projects.

More importantly, I am the creator of wazero WebAssembly runtime, and that’s where I learned tons of things about runtime code generation and execution in Go. wazero was once a part of my hobby project, but luckily it became a part of my job in the last 2.5 years thanks to the support by my employer.

wazero is an extremely unique and rare piece of production software out there in the Go ecosystem in the sense that it generates semantically equivalent x86-64 and AArch64 machine code from WebAssembly bytecode at runtime, and then provides the API to execute and interact with it with zero dependency, hence without CGo. At the GopherCon 2022, I gave a talk on wazero, so if you are more curious about wazero itself, please take a look at my talk⁴ as well as the wazero’s website. It has a neat documentation about how its optimizing compiler works⁵.

This post is decoupled from wazero itself, and I’ll focus on the general concept of runtime code generation and execution in Go. In the subsequent posts, if I have enough time and energy, I’ll dive into quirky bugs I encountered in the development of wazero’s compiler. Hope you can enjoy the post, and feel free to ask me anything on X/Twitter. I spend more of spare time on hacking weird low-level stuff you can find on my GitHub, so check it out if you are interested.

From here, I assume readers have the basic understanding of Go as well as the concepts of stack and function calls in the low-level programming.

Terminology: Runtime Code Generation and Execution vs (JIT, AOT)

Okay, the first things first, let me clarify wtf I meant by “runtime code generation and execution”. I intentionally stick to use the phrase “runtime code generation and execution” instead of the simple “JIT” (Just-In-Time) or “AOT” (Ahead-Of-Time) compilation, where the latter two are more common terms in general. But I find them confusing and misused sometimes⁶.

AOT generally refers to the process of compiling the source code into machine code before the execution of the program. In contrast, JIT refers to the process of compiling the source code into machine code during the execution of the program.

But this creates a confusion: What if we compile a piece of source program in a process and then execute it in the same process, do you call it AOT or JIT? Sure, it is clear that that is not JIT in the same sense as the JIT in the JVMs because it doesn’t compile during the execution, but on the other hand, it does “compilation during execution (of the host program)”. In WebAssembly community in general, people sometimes mistakenly call this kind of “runtime code generation and execution” as JIT. Actually, wazero used to call itself as JIT runtime, but later we decided to avoid the term. As far as I understand, most of the “WebAssembly runtime” out there are not JIT in the sense of JVMs⁷, but they are more like AOT.

So anyway, I stick to use the term “runtime code generation and execution” to avoid the confusion, though it is not a standard term and verbose. In other words, what I am going to talk about is a pure Go program that generates a machine code and executes it in the same process. I might call the generated machine code as “JITed code”, but it’s the only exception.

Prior art

So I guess the “runtime code generation and execution” sounds terrible and pretty crazy to you and normal Go developers. I was also one of you until I started to work on wazero. But actually, there are some prior art in the Go ecosystem that do similar things, or at least there have been some attempts to do so. Basically, I am definitely not the only crazy person who wanted to do this kind of stuff in Go. With the quick search on the web, I found the following projects besides wazero:

Note that all of them were trying to do it without CGo since it’s clearly possible to do runtime code generation and execution with CGo. You can do whatever you want with CGo, but you know that’s not what we want.

Overview

Basically, the runtime code generation and execution in Go can be broken down into the following steps:

Generate Machine code represented as []byte slice which contains the architecture-specific machine code.
Mark the machine code as executable and readable, usually using mmap on Unix-like systems.
Take the first address of the machine code as unsafe.Pointer(&slice[0]).
Call the “trampoline” Go Asm function with the address of the machine code as an argument.⁸
Jump to the machine code from the trampoline function.

where the step 1 and 2 are the “code generation” part, and the rest is the “execution” part. To be clear, these will be almost the same for any language, but in the case of pure Go, we really really really need to be careful about the Go runtime behavior and its implementation details in order to ensure the execution won’t piss off the runtime. That affects the design of the code generation part as well as the execution part.

How serious is it? Well, let me give you some terrifying example of what can happen if you make a bug in the code generation part: If you make this kind of bug like the following

diff --git a/internal/engine/wazevo/backend/isa/arm64/abi.go b/internal/engine/wazevo/backend/isa/arm64/abi.go
index 6615471c..1747eafa 100644
--- a/internal/engine/wazevo/backend/isa/arm64/abi.go
+++ b/internal/engine/wazevo/backend/isa/arm64/abi.go
@@ -19,9 +19,8 @@ var regInfo = &regalloc.RegisterInfo{
        AllocatableRegisters: [regalloc.NumRegType][]regalloc.RealReg{
                // We don't allocate:
                // - x18: Reserved by the macOS: https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms#Respect-the-purpose-of-specific-CPU-registers
-               // - x28: Reserved by Go runtime.
                // - x27(=tmpReg): because of the reason described on tmpReg.
-               regalloc.RegTypeInt: {
+               regalloc.RegTypeInt: {x28,
                        x8, x9, x10, x11, x12, x13, x14, x15,
                        x16, x17, x19, x20, x21, x22, x23, x24, x25,
                        x26, x29, x30,

where I mistakenly allows the use of AArch64’s x28 register in the generated machine code in wazero. The register is reserved and the value must be the same across the execution to play nicely with the Go runtime.⁹ If you run the test, you would get errors like the following (it really depends on the kind of bug you made and the platform you are on):

traceback: unexpected SPWRITE function runtime.morestack
fatal error: traceback

runtime stack:
runtime.throw({0x100ecb49d?, 0x100d05fa0?})
	/usr/local/go/src/runtime/panic.go:1023 +0x40 fp=0x1729aed40 sp=0x1729aed10 pc=0x100ccd1b0
runtime.(*unwinder).resolveInternal(0x1729aee90, 0x0?, 0xee?)
	/usr/local/go/src/runtime/traceback.go:364 +0x318 fp=0x1729aedc0 sp=0x1729aed40 pc=0x100cfa108
runtime.(*unwinder).next(0x1729aee90)
	/usr/local/go/src/runtime/traceback.go:512 +0x160 fp=0x1729aee50 sp=0x1729aedc0 pc=0x100cfa2b0
runtime.(*_panic).nextFrame.func1()
	/usr/local/go/src/runtime/panic.go:938 +0x8c fp=0x1729aef10 sp=0x1729aee50 pc=0x100cccddc
runtime.systemstack(0x7ff000)
	/usr/local/go/src/runtime/asm_arm64.s:243 +0x6c fp=0x1729aef20 sp=0x1729aef10 pc=0x100d05f0c

which is totally cryptic and hard to debug. This is just one example, and there are many other ways to make the Go runtime angry. So in other words, the generated machine code must be tailored to the Go runtime behavior, and that’s the most challenging part of the runtime code generation and execution in Go.

Tiny demo

The following is the tiny demo of the runtime code generation and execution in Go. I assume you are on a Unix-like system like Linux or macOS on an AArch64 machine, and you have Go installed on your machine.

First, we prepare two source codes:

$ ls
go.mod          main.go         main_arm64.s

The main.go is the main Go source code, and the main_arm64.s is the Go Assembly source code. The main.go is the following:

// main.go
package main

import (
	"fmt"
	"syscall"
	"unsafe"
)

// exec is implemented as a Go Assembly function in main_arm64.s
// entrypoint is the initial address of the machine code.
func exec(entrypoint uintptr)

func main() {
	// 1. Allocate memory for machine code via mmap. At this point, the memory is not executable, but read-writable.
	machineCodeBuf := mustAllocateByMMap()

	// 2. TODO: Write machine code to machineCodeBuf.

	// 3. Mark the memory region as executable. This marks the memory region as read-executable.
	mustMarkAsExecutable(machineCodeBuf)

	// 4. Execute the machine code.
	entrypoint := uintptr(unsafe.Pointer(&machineCodeBuf[0]))
	fmt.Printf("entrypoint: %#x\n", entrypoint)
	exec(entrypoint)

	fmt.Println("ok")
}

// mustAllocateByMMap returns a memory region that is read-writable via mmap.
func mustAllocateByMMap() []byte {
	machineCodes, err := syscall.Mmap(-1, 0,
		// For the purpose of blog post, we allocate 10 pages of memory. That should be enough.
		syscall.Getpagesize()*10,
		syscall.PROT_READ|syscall.PROT_WRITE, syscall.MAP_ANON|syscall.MAP_PRIVATE,
	)
	if err != nil {
		panic(err)
	}
	return machineCodes
}

// mustMarkAsExecutable marks the memory region as read-executable via mprotect.
func mustMarkAsExecutable(machineCodes []byte) {
	if err := syscall.Mprotect(machineCodes, syscall.PROT_READ|syscall.PROT_EXEC); err != nil {
		panic(err)
	}
}

where what the main function is supposed to do is the following:

Allocate memory for machine code via mmap. At this point, the memory is not executable, but read-writable.
Write machine code to the allocated memory region. At this point, I left it as TODO.
Mark the memory region as executable. This marks the memory region as read-executable.
Execute the machine code.

For the purpose of mmap and how they work in general, please refer to the wonderful article by @elibendersky: How to JIT - an introduction. In my blog posts, I won’t go into the details on that, and focus on the code generation and execution part.¹⁰

The exec function is implemented as a Go Assembly function in main_arm64.s as follows¹¹:

// main_arm64.s
#include "funcdata.h"
#include "textflag.h"

TEXT ·exec(SB), NOSPLIT|NOFRAME, $0-8
    // Load the entry point of the executable into R27.
    MOVD entrypoint+0(FP), R27 
    // Jump to the entry point of the executable stored in R27.
    JMP  (R27)

That’s it. Let’s compile and run the program:

$ go run .
entrypoint: 0x1051c4000
SIGILL: illegal instruction
PC=0x1051c4000 m=0 sigcode=2
instruction bytes: 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0

goroutine 1 gp=0x140000021c0 m=0 mp=0x104e3a4c0 [running]:
runtime: g 1 gp=0x140000021c0: unknown pc 0x1051c4000
stack: frame={sp:0x1400010aec0, fp:0x0} stack=[0x1400010a000,0x1400010b000)

The error you are observing is something that you would never encounter in the normal Go program (if you encounter this with normal Go code, that is highly like a bug in the Go compiler!). But fear not, this is the expected behavior. If you take a closer look at the error message, you can see that the program tried to execute the machine code at the address 0x1051c4000, which is the address of the machine code we allocated via mmap. But the machine code is not written yet, so the CPU tried to execute the zero-filled memory region, and that’s why you got the SIGILL error since the AArch64 instruction encoded as 0x00000000 is “Undefined/UDF” instruction.

One thing you also notice is that the error says unknown pc 0x1051c4000. This is because the Go runtime is not aware of the machine code you generated, and it doesn’t have the debug information for the machine code.

Okay, how can we fix this? One of the tiniest functions is the one just that returns, so let’s write the machine code for that:

--- a/codes/runtime_code_generation_in_go/main.go
+++ b/codes/runtime_code_generation_in_go/main.go
@@ -1,6 +1,7 @@
 package main
 
 import (
+       "encoding/binary"
        "fmt"
        "syscall"
        "unsafe"
@@ -14,7 +15,8 @@ func main() {
        // 1. Allocate memory for machine code via mmap. At this point, the memory is not executable, but read-writable.
        machineCodeBuf := mustAllocateByMMap()
 
-       // 2. TODO: Write machine code to machineCodeBuf.
+       // 2. Write machine code to machineCodeBuf that just returns.
+       binary.LittleEndian.PutUint32(machineCodeBuf, 0xd6_5f_03_c0)
 
        // 3. Mark the memory region as executable. This marks the memory region as read-executable.
        mustMarkAsExecutable(machineCodeBuf)

this patch writes the AArch64 machine code for the RET instruction to the allocated memory region. The machine code 0xd6_5f_03_c0 is its encoding as described in the AArch64 manual. Note that AArch64 is a little-endian architecture, and each instruction is encoded as a 32-bit word.

Let’s run the program again:

$ go run .
entrypoint: 0x1050dc000
ok

Cool innit? The program successfully executed the machine code that just returns, and the program prints ok as expected from the main function.

You can browse the whole source code here.

Conclusion

In this post, I introduced the concept of runtime code generation and execution in Go, and showed the tiny demo of it. The demo is really simple, so I hope readers can understand the basic idea of runtime code generation and execution in Go, but at the same time I guess you still have no idea on how to write the machine code for the real function or program. In other words, I didn’t explain how to perform function calls just like any normal Go program does as well as how to return the results from the JITed code to the caller in the Go world. That’s what I am going to cover in the next series of posts.

If you have any questions, feedback, requests, please let me know on X/Twitter. I am happy to answer any questions you have. Also, I am always looking for an exciting project/problem to work on, so if you have anything in mind and think I can help with it, please let me know as well, I would love to chat.

See you in the next post!

I am one of the authors of Istio’s Wasm plugin system: https://istio.io/latest/blog/2021/wasm-api-alpha/ ↩︎
I served as a commiter of Envoy/Proxy-Wasm project before. ↩︎
I contributed some patches to the Zig compiler ↩︎
GopherCon 2022: Takeshi Yoneda - CGO-less Foreign Function Interface with WebAssembly ↩︎
https://wazero.io/docs/how_the_optimizing_compiler_works/ ↩︎
In wazero, we switched to avoid the explicit use of AOT or JIT in the codebase and API. wazero#560 ↩︎
It is clear that browser based WebAssembly runtime like V8 is JIT in typical sense. ↩︎
It is possible to convert the machine code as a Go function, but it gets hairy for various reasons. ↩︎
Go internal ABI specification details how the Go runtime uses the registers in its implementation. ↩︎
On AArch64, the OS typically forbids read-write-executable memory regions in the user land, so you need to mark the memory region as read-executable after writing the machine code. That is controlled by the SCTL register only accessible in the privileged mode. For more details, see “Preventing execution from writable locations” section in the AArch64 manual. ↩︎
For the syntax of assembly, please refer to A Quick Guide to Go’s Assembler. ↩︎