Skip to main content

System architecture

The RISC-V ISA, on which Cartesi Machines are based, consists of a minimal 32-bit integer instruction set to which several extensions can be added. The standard defines a privileged architecture with features commonly used by modern operating systems, such as multiple privilege levels, paged-based virtual-memory, timers, interrupts, exceptions and traps, etc. Implementations are free to select the combination of extensions that better suit their needs.

The Cartesi Machine architecture can be separated into a processor and a board. The processor performs the computations, executing the traditional fetch-execute loop while maintaining a variety of registers. The board defines the surrounding environment with an assortment of memories (ROM, RAM, flash drives, rollup memory ranges) and a number of devices. To make verification possible, a Cartesi Machine maps its entire state to the physical address space in a well-defined way. This includes the internal states of the processor, the board, and of all attached devices. The contents of the address space therefore completely define the Cartesi Machine. Fortunately, this modification does not limit the operating system or the applications it hosts in any significant way.

Both the processor and board are implemented in the emulator. A full description of the RISC-V ISA is out of the scope of this documentation (See the volumes 1 and 2 of the ISA specification for details.) This section describes Cartesi's RISC-V architecture, the modifications made to support verification, the devices supported by the emulator, and the process the machine follows to boot the Linux kernel.

The processor

Following RISC-V terminology, Cartesi Machines implement the RV64IMAZicsr_Zifencei ISA. The letters after RV specify the extension set. This selection corresponds to a 64-bit machine, integer arithmetic with multiplication and division, atomic operations, as well as the optional supervisor and user privilege levels. In addition, Cartesi Machines support the Sv39 mode of address translation and memory protection. Sv39 provides a 39-bit protected virtual address space, divided into 4KiB pages, organized by a three-level page table. This set of features creates a balanced compromise between the simplicity demanded by a blockchain implementation and the flexibility expected from off-chain computations.

There are a total of 98 instructions, out of which 28 simply narrow or widen, respectively, their 64-bit or 32-bit counterparts. This being a RISC ISA, most instructions are very simple and can be emulated in a few lines of high-level code. In contrast, the x86 ISA defines at least 2000 (potentially complex) instructions. In fact, the only complex operation in RISC-V is the virtual-to-physical address translation. Instruction decoding is particularly simple due to the reduced number of formats (only 4, all taking 32-bits).

The entire processor state fits within 512 bytes, which are divided into 64 registers, each one holding 64-bits. It consists of 32 general-purpose integer registers and 26 control and status registers (CSRs). Most of these registers are defined by the RISC-V ISA; the remaining are Cartesi-specific. The processor makes its entire state available, externally-only and read-only, by mapping individual registers to the lowest 512 bytes in the physical address space (in the processor shadow). The adjacent 1.5KiB are reserved for future use. The entire mapping is given in the following table:

Offset RegisterOffset RegisterOffset RegisterOffset Register
0x000 x0 0x120 mcycle0x160 misa0x1a0 sepc
0x008 x1 0x128 minstret0x168 mie0x1a8 scause
0x130 mstatus0x170 mip0x1b0 stval
0x0f8 x310x138 mtvec0x178 medeleg0x1b8 satp
0x100 pc0x140 mscratch0x180 mideleg0x1c0 scounteren
0x108 mvendorid0x148 mepc0x188 mcounteren0x1c8 ilrsc
0x110 marchid0x150 mcause0x190 stvec0x1d0 iflags
0x118 mimpid0x158 mtval0x198 sscratch

The only generally relevant standard register is mcycle. Since its value is advanced at every CPU cycle, it can be used to identify a particular step in the computation being performed by a Cartesi Machine. This is a key component of the verification process, and can also be used to bound the amount of computation.

The registers whose names start with i are Cartesi additions, and have the following semantics:

  • The layout for register iflags can be seen below:

    Bits 63–54–3210
    Field ReservedPRVXYH

Bit PRV gives the current privilege level (0 for User, 1 for Supervisor, and 3 for Machine), bit X is set to 1 when the processor has yielded automatic, bit Y is set to 1 when the processor has yielded manual, bit H is set to 1 to signal the processor has been permanently halted.

  • Register ilrsc holds the reservation address for the LR/SC atomic memory operations;

The board

The interaction between board and processor happens through interrupts and the memory bus. Devices are mapped to the processor's physical address space. The mapping can be seen in the following table:

Physical address Mapping
0x00000000–0x000003ff Processor shadow
0x00000800–0x00000bff Board shadow
0x00001000–0x000ffff ROM (Bootstrap & Devicetree)
0x02000000–0x020bffff Core Local Interruptor
0x40008000–0x40008fff Host-Target Interface
0x60000000–0x600fffff (configurable) Rollup RX buffer
0x60200000–0x602FFFFF (configurable) Rollup TX buffer
0x60400000–0x60400FFF (configurable) Rollup Input Metadata
0x60600000–0x606FFFFF (configurable) Rollup Voucher Hashes
0x60800000–0x608FFFFF (configurable) Rollup Notice Hashes
0x80000000–configurable RAM
configurable Flash drive 0
configurable Flash drive 7

There are 60KiB of ROM starting at address 0x1000, where execution starts by default. The amount of RAM is user-configurable, but always starts at address 0x80000000. Finally, a number of additional physical memory ranges can be set aside for flash-memory devices. These will typically be preloaded with file-system images, but can also hold raw data.

The board maps two non-memory devices to the physical address space: CLINT and HTIF.

CLINT

The Core Local Interruptor (or CLINT) controls the timer interrupt. The active addresses are 0x0200bff8 (mtime) and 0x02004000 (mtimecmp). The CLINT issues a hardware interrupt whenever mtime equals mtimecmp. Since Cartesi Machines must ensure reproducibility, the processor's clock and the timer are locked by a constant frequency divisor of 100. In other words, mtime is incremented once for every 100 increments of mcycle. There is no notion of wall-clock time.

HTIF

The Host-Target Interface (HTIF) mediates communication with the external world. It is mapped to a physical memory starting at 0x40008000, where registers can be accessed at the following offsets:

Offset Register
0x000 tohost
0x008 fromhost
0x010 ihalt
0x018 iconsole
0x020 iyield
0x028 Reserved
0x218 Reserved

The format of CSRs tohost and fromhost are as follows:

Bits 63–5655–4847–0
Field DEVCMDDATA

Interactions with Cartesi's HTIF device follow the following protocol:

  1. start by writing 0 to fromhost;
  2. write the request to tohost;
  3. read the response from fromhost.

Cartesi's HTIF supports 3 subdevices: Halt, Console, and Yield. These are identified by the following values for the field DEV.

`DEV`
Name Value
HTIF_DEVICE_HALT0
HTIF_DEVICE_CONSOLE1
HTIF_DEVICE_YIELD2

Registers ihalt, iconsole, and iyield are bit masks specifying the commands that are available for the respective devices. Unavailable commands are silently ignored by the machine.

Halt
`CMD`
Name Value
HTIF_HALT_HALT0

The Halt device (DEV=HTIF_DEVICE_HALT) is used to halt the machine. This will permanently set bit H in iflags and return control back to the host.

Send request CMD=HTIF_HALT_HALT and DATA containing bit 0 set to 1. Bits 471 can be set to an arbitrary exit code.

Console
`CMD`
Name Value
HTIF_CONSOLE_GETCHAR0
HTIF_CONSOLE_PUTCHAR1

The Console device (DEV=HTIF_DEVICE_CONSOLE) can be used to input/output characters.

To input a character from console (in interactive sessions), request CMD=HTIF_CONSOLE_GETCHAR, DATA=0, then read response CMD=HTIF_CONSOLE_GETCHAR, DATA=<ch>+1. (DATA=0 means no character was available);

To output a character <ch> to console, request CMD=HTIF_CONSOLE_PUTCHAR, with DATA=<ch>.

Yield

The Yield device can be used to return control to the host. It uses a slight refinement to the format of CSRs tohost and fromhost, by splitting out a REASON field from DATA:

Bits 63–5655–4847–3231–0
Field DEVCMDREASONDATA

There are two types of yield: automatic and manual.

`CMD`
Name Value
HTIF_YIELD_AUTOMATIC0
HTIF_YIELD_MANUAL1

To issue an automatic yield, request CMD=HTIF_YIELD_AUTOMATIC. An automatic yield sets the bit X in iflags and returns control back to the host. There are currently 4 supported reasons for automatic yields:

`REASON`
Name Value
HTIF_YIELD_REASON_PROGRESS0
HTIF_YIELD_REASON_TX_VOUCHER3
HTIF_YIELD_REASON_TX_NOTICE4
HTIF_YIELD_REASON_TX_REPORT5

To report progress, set REASON=HTIF_YIELD_REASON_PROGRESS, and DATA=<permil>, where <permil> gives the progress in parts per thousand. The other reasons for automatic yield signal the production of Cartesi Rollups responses. REASON=HTIF_YIELD_REASON_TX_VOUCHER, REASON=HTIF_YIELD_REASON_TX_NOTICE, and REASON=HTIF_YIELD_REASON_TX_REPORT denote, respectively, transfers of a voucher, a notice, and a report from target to host. The DATA field in tohost is ignored in these cases.

To issue a manual yield, request CMD=HTIF_YIELD_MANUAL. A manual yield sets the bit Y in iflags and returns control back to the host. There are currently 3 supported reasons for manual yields, all used with Cartesi Rollups:

`REASON`
Name Value
HTIF_YIELD_REASON_RX_ACCEPTED1
HTIF_YIELD_REASON_RX_REJECTED2
HTIF_YIELD_REASON_TX_EXCEPTION6

To accept or reject the previous request, set REASON=HTIF_YIELD_REASON_RX_ACCEPTED or REASON=HTIF_YIELD_REASON_RX_REJECTED, respectively. The DATA field in tohost is ignored in these cases. Upon return, the DATA field in fromhost will contain the type of the next request:

`DATA` in response
Name Value
HTIF_YIELD_ADVANCE_STATE0
HTIF_YIELD_INSPECT_STATE1

The signal the throwing of a rollup exception, set REASON=HTIF_YIELD_REASON_TX_EXCEPTION. The DATA field in tohost is ignored in this case.

Before resuming the emulator after a manual yield, the host must manually reset the Y bit in iflags. Otherwise, the emulator will immediately return with no changes to its state.

Rollup

In order to interact with Cartesi Rollups, the host application controlling the emulator and the target application running inside the emulator must follow an agreed-upon protocol, mediated by the HTIF Yield device.

The low-level view of what happens inside the machine is as follows:

Initialize
Repeat
`voucher_index` = 0
`notice_index` = 0
`reason` = HTIF_YIELD_REASON_RX_ACCEPTED
Yield manual with `reason` as `REASON` in `tohost`
If `DATA` in `fromhost` is `HTIF_YIELD_ADVANCE_STATE`
Read input metadata from Rollup Input Metadata
Read input data from Rollup RX Buffer
Process advance-state request
For each voucher to emit
Write voucher data to Rollup TX Buffer
Write voucher hash to slot `voucher_index` in Rollup Voucher Hashes
`voucher_index` = `voucher_index` + 1
Yield automatic with HTIF_YIELD_REASON_TX_VOUCHER as `REASON` in `tohost`
End
For each notice to emit
Write notice data to Rollup TX Buffer
Write notice hash to slot `notice_index` in Rollup Notice Hashes
`notice_index` = `notice_index` + 1
Yield automatic with HTIF_YIELD_REASON_TX_NOTICE as `REASON` in `tohost`
End
For each report to emit
Write report data to Rollup TX Buffer
Yield automatic with HTIF_YIELD_REASON_TX_REPORT as `REASON` in `tohost`
End
If exception to emit
Write exception data to Rollup TX Buffer
Yield automatic with HTIF_YIELD_REASON_TX_EXCEPTION as `REASON` in `tohost`
ElseIf input rejected
`reason` = HTIF_YIELD_REASON_RX_REJECTED
End
End
If `DATA` in `fromhost` is `HTIF_YIELD_INSPECT_STATE`
Read query data from Rollup RX Buffer
Process inspect-state request
For each report
Write report data to Rollup TX Buffer
Yield automatic with HTIF_YIELD_REASON_TX_REPORT as `REASON` in `tohost`
End
If exception
Write exception data to Rollup TX Buffer
Yield automatic with HTIF_YIELD_REASON_TX_EXCEPTION as `REASON` in `tohost`
ElseIf input rejected
`reason` = HTIF_YIELD_REASON_RX_REJECTED
End
End
End

At a higher level, the target application running inside the emulator is supported by the /dev/rollup Linux device driver via its ioctl interface, or by even higher-level interfaces based on it, such as /opt/cartesi/bin/rollup command-line utility or the HTTP API exposed by the /opt/cartesi/bin/rollup-http-server command-line utility. It is the /dev/rollup device that copies data to and from all rollup memory ranges, and that uses the /dev/yield device to perform the required yields.

There are two types of request: advance-state requests and inspect-state requests. The loop processes one request per iteration. To transition between requests, the application accepts or rejects the previous request by issuing a command to the HTIF yield device, or throws an exception. The return from the yield defines the type of the next request.

When the application identifies an advance-state request, it obtains input medatada from the Rollup Input Metadata memory range, and input from the Rollup RX Buffer memory range. While processing advance-state requests, the application can emit vouchers, notices, reports, or exceptions. It writes the data for all these to the Rollup TX buffer memory range. Moreover, when emitting the ith voucher (respectively, notice) in response to a given input, it writes its hash to the ith 32-byte slot in the Rollup Voucher Hashes (respectively, Rollup Notice Hashes) memory range. It then issues the appropriate command to the HTIF yield device.

When an application identifies an inspect-state request, it obtains the query from the Rollup RX buffer memory range. While processing inspect-state requests, the application can emit vouchers, or exceptions. It writes data for all these to the Rollup TX buffer memory range and sends the appropriate command to the HTIF yield device.

The format for all these request and response data are as follows:

Format for input metadata
Offset (bytes) Field
0–31 message sender (address hash)
32–63 block number (number)
64–95 time stamp (number)
96–127 epoch index (number)
128–159 input index (number)
Format for voucher
Offset (bytes) Field
0–31 address (address hash)
32–63 offset (number, always 64)
64–95 length (number)
96–96+length-1 payload (raw data)
Format for input, notice, report, and exception
Offset (bytes) Field
0–31 offset (number, always 32)
32–63 length (number)
64–64+length-1 payload (raw data)
All numbers are encoded as 256-bit big-endian integers. Address hashes are encoded in the least significant 160-bits of a zero-padded 256-bit big-endian integer.

In the host, the loop is as follows:

While bit `H` in `iflags` is not set (machine has not halted)
Snapshot machine state
Resume machine
If bit `Y` in `iflags` is set (i.e. manual yield)
If `REASON` in `tohost` is HTIF_YIELD_REASON_RX_REJECTED
Discard vouchers and notices emitted from previous request, if any
Rollback machine state
End
If `REASON` in `tohost` is HTIF_YIELD_REASON_TX_EXCEPTION
Read exception data from Rollup TX Buffer
Discard vouchers and notices emitted from previous request, if any
Rollback machine state
End
If `REASON` in `tohost` is HTIF_YIELD_REASON_RX_ACCEPTED
If previous request was advance-state
Read Rollup Voucher Hashes for previous request
Read Rollup Notice Hashes for previous request
End
Obtain the next request from an external source
If advance-state request
Clear Rollup Voucher Hashes
Clear Rollup Notice Hashes
Write input data to Rollup RX Buffer
Write input metadata to Rollup Input Metadata
Write HTIF_YIELD_ADVANCE_STATE to `DATA` in `fromhost`
End
If inspect-state request
Write query data to Rollup RX Buffer
Write HTIF_YIELD_INSPECT_STATE to `DATA` in `fromhost`
End
Clear `Y` bit in `iflags`
End
End
If bit `X` in `iflags` is set (i.e. automatic yield)
If `REASON` in `tohost` is HTIF_YIELD_REASON_TX_VOUCHER
Read voucher data from rollup memory ranges
End
If `REASON` in `tohost` is HTIF_YIELD_REASON_TX_NOTICE
Read notice data from rollup memory ranges
End
If `REASON` in `tohost` is HTIF_YIELD_REASON_TX_REPORT
Read report data from Rollup TX Buffer
End
End
End

In production, the host application controlling the emulator is the Server Manager While prototyping, it can be the cartesi-machine command-line utility, or even a custom script using the Lua API.

PMAs

The physical memory mapping is described by Physical Memory Attribute records (PMAs) that start at address 0x00000800 (the board shadow) . Each PMA consists of 2 64-bit words. The first word gives the start of a range and the second word its length. These words are readable both internally and externally. Since the ranges must be aligned to 4KiB page boundaries, the lowest 12-bits of each word are available for attributes. The meaning of each attribute field is as follows:

Bits 63–1211–876543210
Field startDIDIWIRXWREIOM
Bits 63–1211–0
Field lengthReserved (=0)

The M, IO, and E bits are mutually exclusive, and respectively mark the range as memory, I/O mapped, or excluded. Bits R, W, and X mark read, write, and execute permissions, respectively. The IR and IW bits mark the range as idempotent for reads and writes, respectively. Finally, the DID gives the device id, which can have the following values:

Name Value
PMA_MEMORY_DID0
PMA_SHADOW_DID1
PMA_FLASH_DRIVE_DID2
PMA_CLINT_DID3
PMA_HTIF_DID4
PMA_DHD_DID5
PMA_ROLLUP_RX_BUFFER_DID6
PMA_ROLLUP_TX_BUFFER_DID7
PMA_ROLLUP_INPUT_METADATA_DID8
PMA_ROLLUP_VOUCHER_HASHES_DID9
PMA_ROLLUP_NOTICE_HASHES_DID10

The list of PMA records ends with an invalid PMA entry for which length=0.

Linux setup

By default, pc starts at 0x1000, pointing to the start of the ROM region. Before control reaches the RAM image (and ultimately the Linux kernel), a small program residing in ROM builds a devicetree describing the hardware. Cartesi's ROM image rom.bin containing this program can be generated from the rom/ directory of the Cartesi Machine SDK. To do so, it goes over the PMA entries identifying the devices and their locations in the physical address space. It also looks for a null-terminated string, starting at the last 4k of the ROM region, that will be used as the command-line for the Linux kernel. Once the devicetree is ready, the ROM program sets register x10 to 0 (the value of mhartid), x11 to point to the devicetree (which it places at the end of the RAM region), and then jumps to RAM-base at  address 0x80000000. This is where the entry point of the RAM image is expected to reside.

The dtc command-line utility can be used to inspect the devicetree:

cartesi-machine \
--append-rom-bootargs="single=yes" \
--rollup \
-- "dtc -I dtb -O dts /sys/firmware/fdt"

The result is


.
/ \
/ \
\---/---\ /----\
\ X \
\----/ \---/---\
\ / CARTESI
\ / MACHINE
'

<stdout>: Warning (interrupt_provider): /cpus/cpu@0/interrupt-controller: Missing #address-cells in interrupt provider
/dts-v1/;

/ {
#address-cells = <0x02>;
#size-cells = <0x02>;
compatible = "ucbbar,riscvemu-bar_dev";
model = "ucbbar,riscvemu-bare";

cpus {
#address-cells = <0x01>;
#size-cells = <0x00>;
timebase-frequency = <0xf4240>;

cpu@0 {
device_type = "cpu";
reg = <0x00>;
status = "okay";
compatible = "riscv";
riscv,isa = "rv64aimsu";
mmu-type = "riscv,sv39";
clock-frequency = <0x5f5e100>;

interrupt-controller {
#interrupt-cells = <0x01>;
interrupt-controller;
compatible = "riscv,cpu-intc";
phandle = <0x01>;
};
};
};

memory@80000000 {
device_type = "memory";
reg = <0x00 0x80000000 0x00 0x3ff0000>;
};

flash@8000000000000000 {
#address-cells = <0x02>;
#size-cells = <0x02>;
compatible = "mtd-ram";
bank-width = <0x04>;
reg = <0x80000000 0x00 0x00 0x5000000>;
linux,mtd-name = "flash.0";
};

rollup {
#address-cells = <0x02>;
#size-cells = <0x02>;
compatible = "ctsi-rollup";

rx_buffer@60000000 {
reg = <0x00 0x60000000 0x00 0x200000>;
};

tx_buffer@60200000 {
reg = <0x00 0x60200000 0x00 0x200000>;
};

input_metadata@60400000 {
reg = <0x00 0x60400000 0x00 0x1000>;
};

voucher_hashes@60600000 {
reg = <0x00 0x60600000 0x00 0x200000>;
};

notice_hashes@60800000 {
reg = <0x00 0x60800000 0x00 0x200000>;
};
};

yield {
compatible = "ctsi-yield";
automatic;
manual;
};

soc {
#address-cells = <0x02>;
#size-cells = <0x02>;
compatible = "ucbbar,riscvemu-bar-soc\0simple-bus";
ranges;

htif@40008000 {
compatible = "ucb,htif0";
reg = <0x00 0x40008000 0x00 0x1000>;
interrupts-extended = <0x01 0x0d>;
};
};

chosen {
bootargs = "console=hvc0 rootfstype=ext2 root=/dev/mtdblock0 rw quiet swiotlb=noforce mtdparts=flash.0:-(root) single=yes -- dtc -I dtb -O dts /sys/firmware/fdt";
};
};

Halted
Cycles: 71150848

The memory@80000000 section describes 64MiB of RAM starting at address 0x80000000. The flash@8000000000000000 describes flash drive 0: a memory region of 60MiB, starting at address 0x8000000000000000, under the control of the mtd-ram driver, with name flash.0. This will eventually become available as block device /dev/mtdblock0. The rollup section specifies the starts and lengths of all rollup memory ranges. The yield section specifies that the machine will process automatic and manual yields. Finally, section chosen includes the bootargs string that will be used as the kernel command-line parameters. Notice the specification of the root file-system pointing to /dev/mtdblock0, i.e., flash.0, and the mtdparts giving it the label root. Also notice the command dtc -I dtb -O dts /sys/firmware/fdt coming directly from the cartesi-machine command line.

Linux support for RISC-V is upstream in the Linux kernel archives. The kernel runs in supervisor mode, on top of a Supervisor Binary Interface (SBI) provided by a machine-mode shim: the Berkeley Boot Loader (BBL). The BBL is linked against the Linux kernel and this resulting RAM image is preloaded into RAM. Cartesi's RAM image linux.bin can be generated from the kernel/ directory of the Cartesi Machine SDK. The SBI provides a simple interface through which the kernel interacts with CLINT and HTIF. Besides implementing the SBI, the BBL also installs a trap that catches invalid instruction exceptions. This mechanism can be used, for example, to emulate floating-point instructions, although it is more efficient to setup the target toolchain to replace floating point instructions with calls to a soft-float implementation instead. After installing the trap, BBL switches to supervisor mode and cedes control to the kernel entry point.

After completing its own initialization, the kernel mounts the root file-system and eventually cedes control to /sbin/init. Cartesi's root file-system rootfs.ext2 can be generated from the fs/ directory in the Cartesi Machine SDK. The Cartesi-provided /sbin/init script scans all flash devices /dev/mtdblock1/dev/mtdblock7 for valid file-systems. When a file-system is found, the script obtains the corresponding <label> (set in the mtdparts kernel command-line parameter) by inspecting /sys/block/mtdblock*/device/name and mounts the filesystem at /mnt/<label>. The kernel passes to /sbin/init as command-line parameters all arguments after the separator -- in the bootargs string it found in the devicetree. The Cartesi-provided /sbin/init script concatenates all arguments into a string and executes the command in this string in a shell. When the shell returns, /sbin/init unmount all file-systems and gracefully halts the machine.

© 2024 Cartesi Foundation Ltd. All rights reserved.

The Cartesi Project is commissioned by the Cartesi Foundation.

We use cookies to ensure that we give you the best experience on our website. By using the website, you agree to the use of cookies.