VibeCrypto — Veille crypto

Helius4 sept., 17h · il y a 10 mois

Comment écrire des programmes Solana en assembleur sBPF

Sur Solana, une poignée de développeurs contourne le compilateur pour écrire du bytecode à la main et arracher chaque unité de calcul disponible.

L'écosystème Solana est le théâtre d'une course à l'optimisation extrême. Pendant que des bibliothèques comme Pinocchio améliorent drastiquement l'efficacité des programmes Rust, certains développeurs vont plus loin en écrivant directement en assembleur sBPF, le langage natif de la machine virtuelle Solana, dérivé de l'eBPF.

Ce niveau de contrôle permet de générer un bytecode bien plus efficace que celui produit par le compilateur, avec des économies significatives en unités de calcul et en taille binaire. Le revers : une ergonomie réduite, une vérification manuelle fastidieuse et des coûts d'audit plus élevés, réservant cette approche aux opérations où la performance est critique.

Solana

Source ↗

Détails

Source: Helius
Publication: 4 sept. à 17h23
Lien direct: https://www.helius.dev/blog/sbpf-assembly

Contenu source (brut)

There is currently a parallel race to the bottom for program optimization playing out in the Solana ecosystem. At a high level, libraries like <a href="https://github.com/anza-xyz/pinocchio">Pinocchio</a> are revolutionizing Rust development, achieving orders-of-magnitude improvements in compute efficiency. Meanwhile, at the absolute lowest level, a group of dedicated developers, united by their mutual disrespect of the compiler, are taking it one step further. Instead of writing Solana programs in compiled languages like Rust or C, they direct their focus towards meticulously hand-rolling bytecode to squeeze the maximum performance out of every last instruction.These low-level gains are only possible when we directly instruct the VM in its native language: sBPF Assembly, Solana's own variant of the extended Berkeley Packet Filter (eBPF), the bytecode used and executed within every single on-chain program.Writing sBPF assembly gives developers direct access to the lowest level interface of the Solana Virtual Machine. While the Rust compiler and LLVM attempt optimizations, due to insufficient verbosity of language syntax or a lack of sufficient context to make better compilation choices, they often end up generating suboptimal bytecode in comparison to that of a skilled developer with the complete instruction-level control that assembly enables. While this added level of control comes at the cost of ergonomics, the savings in compute unit usage and binary size (and thus rent) are significant. These savings become especially important in highly contentious, competitive, performance-critical operations.Simultaneously, there is also an argument to be made that not all programs should be written in assembly. Although things have improved drastically, historically, the tooling has been limited, and more importantly, performance gains often come with the significant tradeoff of manual verification of correctness and increased audit costs. This is due to a lack of automated tooling and the syntax being more cumbersome to read, write, and understand. Conversely, it may also be argued that compiled languages are a black box, obscuring the choices made by the compiler. Thus, the added transparency and control of assembly can often reveal things that are not easily visible when working with compiled languages. In fact, the vast majority of recent performance breakthroughs in our Rust SDKs were actually discovered and informed by realising we could hand-roll more efficient bytecode than the compiler.In this article, you'll learn:<ul class="list-bullet"><li value=1>What sBPF Assembly is and how it provides direct control over the virtual machine</li><li value=2>The evolution from Berkeley Packet Filter to eBPF and why Solana adopted it</li><li value=3>sBPF's virtual machine architecture, instruction set, and memory model</li><li value=4>How to set up your development environment and build sBPF programs</li><li value=5>Step-by-step assembly programming through a practical memo example</li><li value=6>Essential security considerations when writing low-level code</li></ul><h2>What is Assembly?</h2>Assembly is a human-readable variant of machine code: the lowest-level programming language that directly corresponds to the instruction set of a CPU or VM.Instead of variables and functions, assembly operates on registers (fast, temporary storage locations in the CPU), memory addresses (physical locations in RAM or on disk), and fundamental operations such as load (read from memory), store (persist to memory), arithmetic, and jumps (control flow).Each instruction in assembly maps one-to-one to an equivalent instruction in machine code. This one-to-one mapping means programmers control precisely what the processor executes, including which registers hold data, how memory is accessed, and the exact sequence of operations. Unlike high-level languages, where a single function might generate dozens of instructions, assembly offers complete transparency and control over the machine's behavior without opaque abstractions.<h2>What is Berkeley Packet Filter (BPF) and eBPF?</h2>Berkeley Packet Filter (BPF) originated in 1992 as a virtual machine for efficiently filtering network packets in Unix kernels. The original BPF used a simple instruction set and register-based architecture that could run sandboxed code safely within the kernel.Extended Berkeley Packet Filter (eBPF) modernized this concept, expanding from a packet filter into a general-purpose virtual machine. eBPF introduced a 64-bit architecture, more registers, and richer instruction sets, enabling complex programs to run securely in kernel space for networking, security, and system monitoring.Solana adopted eBPF because it provided a proven, secure execution environment with built-in sandboxing. The sandboxing prevents programs from accessing system resources, crashing nodes, or interfering with other programs, while deterministic execution ensures all validators produce identical results. Additionally, the register-based architecture and mature toolchain made it ideal for high-performance on-chain execution, while the existing LLVM backend allowed developers to compile from high-level languages like Rust.<h3>sBPF Virtual Machine Architecture</h3>When a Solana program executes, the runtime loads the sBPF bytecode into memory, performs static verification to ensure safety (checking for infinite loops, invalid memory access, and proper instruction usage), and then executes it within the virtual machine. The VM provides a controlled 64-bit execution environment where programs run in complete isolation from the host system and other programs, with all resource access mediated through the runtime.<h3>sBPF Instruction Set Architecture</h3>sBPF operates with eleven 64-bit registers (r0-r10), with r10 serving as a read-only frame pointer, and r0 serving as a return register. Instructions follow a consistent format with opcodes specifying operations (arithmetic, logic, memory access, jumps) and operands indicating source/destination registers, offsets, and/or immediate values. Key instruction categories include ALU operations (add, subtract, bitwise), memory operations (load/store), and control flow (conditional/unconditional jumps).<h3>sBPF Memory Model</h3>sBPF programs operate within a structured memory layout: a 4KB stack for local variables and function calls, a heap for dynamic allocations, read-only program data containing the bytecode and constants, and account data regions that map to Solana accounts, which the program can access during execution.All memory access is bounds-checked, and programs cannot access memory outside their designated regions.<h3>Solana Syscalls in sBPF</h3>sBPF programs cannot directly access system resources or perform I/O operations. Instead, they request services through syscalls, which are special instructions that transfer control to the Solana runtime. In sBPF assembly, syscalls are invoked using the call instruction and a call symbol that is modified to a call target by the compiler at the time of assembly. Currently, syscalls are invoked via text-based dynamic relocations; a complicated string lookup table system that maps symbols to a 32-bit Mumur3 hash at JIT compilation. However, there is an <a href="https://github.com/solana-foundation/solana-improvement-documents/pull/178/files">active proposal</a> to replace this with static syscalls, drastically simplifying calling conventions. When a syscall is invoked, arguments are passed through registers 1 through 5, w