rakko #1: Building a function-to-JSON Rust proc macro

July 20, 2024

This post is about one part of a much larger embedded project that I'll talk more about soon. One aspect of this system is that I hope to program it entirely in Rust, with a basic operating system and portable code modules. The chip the OS runs on is RISC-V, so the OS (running in machine mode) provides system calls that the code modules (running in user mode) can execute.

Executing system calls on RISC-V is easy: in user mode, the ecall instruction causes an environment call to the next lower level of execution---in this case machine mode from user mode. However, you (the programmer) have to create all the code that communicates functions, arguments, and return values. There are a lot of ways to do this, but I'm looking at a simple (and currently unfinished) approach that uses a 32-bit system call function ID and the RISC-V calling convention to pass arguments and return values.

The ultimate goal is to make code modules easy to compile with a generic RISC-V Rust target. This requires that code modules have access to all the function call prototypes and IDs: something easy to do with a proc macro!

We'll move through the proc macro code step by step. The big picture is simple: it declares a function attribute (rakko_syscall) that takes one argument, id, then processes the signature of the function and generates simple JSON output containing the same info. This JSON can then be converted into basic Rust system call wrappers (functions that take arguments, place them in the correct registers, then execute ecall). The primary goal here is to take proc-macro-tagged system call functions compiled into the OS, convert them to JSON, then use that JSON to create wrappers for code modules.

With this code, we can turn this:

#[rakko_syscall(id=1)]
fn modtest1(c: u8, d: Box) -> Box {

into this:

{
    "name": "modtest1",
    "id": 1,
    "arguments": [
        {
            "name": "c",
            "ty": "u8"
        },
        {
            "name": "d",
            "ty": "Box"
        }
    ],
    "return_type": "Box"
}

One thing to note: I'm no proc macro expert! There is likely an easier way to do everything that this proc macro does.

Each code block below comes together to make a small lib.rs for a proc macro crate.

Crates used

extern crate proc_macro;

use proc_macro::TokenStream;
use syn::{parse_macro_input, FnArg, ItemFn, LitInt, Pat, PathArguments, ReturnType, Type, GenericArgument, Expr, Lit};
use syn::spanned::Spanned;

use std::io::Write;
use std::fs::OpenOptions;

use json::{self, JsonValue, object};

Pretty normal proc macro crates, plus the json crate to generate JSON output.

Parser types

#[derive(Default)]
struct RakkoSyscallArgument {
    name: String,
    ty: String,
}

impl Into for RakkoSyscallArgument {
    fn into(self) -> JsonValue {
        let m = object! {
            name: self.name,
            ty: self.ty,
        };
        return m;
    }
}


#[derive(Default)]
struct RakkoSyscall {
    name: String,
    id: u32,
    arguments: Vec<RakkoSyscallArgument>,
    return_type: String
}

Each system call is represented by a RakkoSyscall struct. The name and return type are simply a String, the syscall ID is a u32, and each argument is represented by a RakkoSyscallArgument, another simple struct with argument name and type as strings. RakkoSyscallArgument also has a basic Into that creates a JSON object using the object! macro.

Parser code

#[proc_macro_attribute]
pub fn rakko_syscall(attr: TokenStream, item: TokenStream) -> TokenStream {
    let mut this_syscall = RakkoSyscall::default();
	
    // process the attribute tag to get the id
    let id_parser = syn::meta::parser(|meta| {
        if meta.path.is_ident("id") {
            let w: LitInt = meta.value()?.parse()?;
            this_syscall.id = w.base10_parse()?;
            Ok(())
        } else {
            Err(meta.error("unsupported rakko_syscall property"))
        }
    });

    parse_macro_input!(attr with id_parser);

We start by creating a blank RakkoSyscall, then parse the "id" field. This is the only field that can be present and it can only be an integer literal. If it isn't, the error is simple:

error: expected integer literal
 --> src/main.rs:7:20
  |
7 | #[rakko_syscall(id="a")]
  |                    ^^^

Then we process the function spec. The name is easy, just one field. The arguments are more complicated, because we have to be able to process multiple types of argument:

This is the largest block of code and it's honestly ugly, but it does work. It starts by parsing the function name, then iterating over arguments and identifying the type of each one. See the comments for more specifics.

    let m = item.clone();
    let a = parse_macro_input!(m as ItemFn);
    // set the function name in the struct
    this_syscall.name = a.sig.ident.to_string();
    let mut arguments = Vec::new();

    for (count, input) in a.sig.inputs.into_iter().enumerate() {
        let mut arg_struct = RakkoSyscallArgument::default();
    
        if let FnArg::Typed(arg) = input {
            if let Pat::Ident(pat_ident) = *arg.pat {
                // set the name of the argument
                arg_struct.name = pat_ident.ident.to_string();
            }
            if let Type::Path(type_path) = *arg.ty {
                let segments = type_path.path.segments;
                arg_struct.ty = segments[0].ident.to_string().clone();
        
                if let PathArguments::AngleBracketed(angle_bracketed) = segments[0].clone().arguments {
                    // assemble a generic type
                    if let GenericArgument::Type(generic_arg_1) = &angle_bracketed.args[0] {
                        if let Type::Path(type_path_inner_1) = generic_arg_1 {
                            // traverse inward to the first set of angle brackets
                            let inner_segment_2 = type_path_inner_1.path.segments[0].clone();
                            arg_struct.ty = format!("{}<{}", arg_struct.ty, inner_segment_2.ident.to_string());
                            if let PathArguments::AngleBracketed(angle_bracketed_inner) = inner_segment_2.arguments {
                                // handle double-generic types if there's an inner generic
                                if let GenericArgument::Type(generic_arg_2) = &angle_bracketed_inner.args[0] {
                                    if let Type::Path(type_path_inner_2) = generic_arg_2 {
                                        // expand into single-generic, note that we use format! to expand the existing string
                                        arg_struct.ty = format!("{}<{}>>", arg_struct.ty, type_path_inner_2.path.segments[0].ident.to_string());
                                    }
                                }
                            } else {
                                // close the angle bracket for single-generic types
                                arg_struct.ty = format!("{}>", arg_struct.ty);
                            }
                        } else if let Type::Array(type_array) = generic_arg_1 {
                            // handle single-generic types of arrays
                            if let Type::Path(elem) = *type_array.elem.clone() {
                                let array_type = elem.path.segments[0].ident.to_string();

                                // get the LitInt that represents the array length
                                if let Expr::Lit(array_length) = &type_array.len {
                                    if let Lit::Int(array_length) = &array_length.lit {
                                        let array_length_int: std::num::NonZero = array_length.base10_parse().unwrap();
                                        arg_struct.ty = format!("{}<[{array_type}; {array_length_int}]>", arg_struct.ty);
                                    }
                                }
                            }
                        } 
                    } 
                }
            }
            arguments.push(arg_struct);
        }
    }

Finally, we parse the return value. It's possible that we'll want more potential return types, but for now we only have support for non-generic types and single-generic types.

    if let ReturnType::Type(_, ret_type) = a.sig.output { // _ ignores the RArrow
        if let Type::Path(type_path) = *ret_type {
            let segments = type_path.path.segments;
            // grab the basic return type
            this_syscall.return_type = segments[0].ident.to_string();
            // check if we have a generic argument and process it plus the thing inside
            if let PathArguments::AngleBracketed(angle_bracketed) = segments[0].clone().arguments {
                if let GenericArgument::Type(generic_arg_1) = &angle_bracketed.args[0] {
                    if let Type::Path(type_path_inner_1) = generic_arg_1 {
                        let inner_segment_2 = type_path_inner_1.path.segments[0].clone();
	                // expand the type into generic
                        this_syscall.return_type = format!("{}<{}>", this_syscall.return_type, inner_segment_2.ident.to_string());
                    }
                }
            }
        // anything else is not permitted
        } else {
            // .into() converts the proc_macro2.TokenStream generated
            // by .into_compile_error() into a proc_macro.TokenStream
            return syn::Error::new(ret_type.span(), "syscall functions may only return single values!")
                .into_compile_error()
                .into();
        }
    }

Generating JSON

This is the fun part. The json crate has a handy macro object! (as shown above) that can take any types and turn them into nicely formatted JSON.

    let data = object! {
        name: this_syscall.name,
        id: this_syscall.id,
        arguments: arguments,
        return_type: this_syscall.return_type,
    };

Then we save it to a file in append mode and return the original item being processed:

    let mut file = OpenOptions::new()
        .append(true)
        .create(true)
        .open("functions.txt")
        .expect("can't open file!");
    
    write!(file, "{data}").unwrap();

    item

This saves functions.txt in the main directory of the crate using this proc macro (which makes sense, because the point of a proc macro is that it is code that's executed wherever).

End

That's all! Come back next time (I promise it'll be there!) for more cool stuff.