⏴ Back to all articles

Published on 2024-09-19

A small trick for simple Rust/C++ interop

🏷️ Rust, C++
Table of contents

Discussions: Reddit, HN.

I am rewriting a gnarly C++ codebase in Rust at work.

Due to the heavy use of callbacks (sigh), Rust sometimes calls C++ and C++ sometimes calls Rust. This done by having both sides expose a C API for the functions they want the other side to be able to call.

This is for functions; but what about C++ methods? Here is a trick to rewrite one C++ method at a time, without headaches. And by the way, this works whatever the language you are rewriting the project in, it does not have to be Rust!

The trick

  1. Make the C++ class a standard layout class. This is defined by the C++ standard. In layman terms, this makes the C++ class be similar to a plain C struct. With a few allowances, for example the C++ class can still use inheritance and a few other things. Most notably, virtual methods are forbidden. I don't care about this limitation because I never use virtual methods myself and this is my least favorite feature in any programming language.
  2. Create a Rust struct with the exact same layout as the C++ class.
  3. Create a Rust function with a C calling convention, whose first argument is this Rust class. You can now access every C++ member of the class!

Note: Depending on the C++ codebase you find yourself in, the first point could be either trivial or not feasible at all. It depends on the amount of virtual methods used, etc.

In my case, there were a handful of virtual methods, which could all be advantageously made non virtual, so I first did this.

This is all very abstract? Let's proceed with an example!

Example

Here is our fancy C++ class, User. It stores a name, a uuid, and a comment count. A user can write comments, which is just a string, that we print.

// Path: user.cpp

#include <cstdint>
#include <cstdio>
#include <cstring>
#include <string>

class User {
  std::string name;
  uint64_t comments_count;
  uint8_t uuid[16];

public:
  User(std::string name_) : name{name_}, comments_count{0} {
    arc4random_buf(uuid, sizeof(uuid));
  }

  void write_comment(const char *comment, size_t comment_len) {
    printf("%s (", name.c_str());
    for (size_t i = 0; i < sizeof(uuid); i += 1) {
      printf("%x", uuid[i]);
    }
    printf(") says: %.*s\n", (int)comment_len, comment);
    comments_count += 1;
  }

  uint64_t get_comment_count() { return comments_count; }
};

int main() {
  User alice{"alice"};
  const char msg[] = "hello, world!";
  alice.write_comment(msg, sizeof(msg) - 1);

  printf("Comment count: %lu\n", alice.get_comment_count());

  // This prints:
  // alice (fe61252cf5b88432a7e8c8674d58d615) says: hello, world!
  // Comment count: 1
}

So let's first ensure it is a standard layout class. We add this compile-time assertion in the constructor (could be placed anywhere, but the constructor is as good a place as any):

// Path: user.cpp

    static_assert(std::is_standard_layout_v<User>);

And... it builds!

Now onto the second step: let's define the equivalent class on the Rust side.

We create a new Rust library project:

$ cargo new --lib user-rs-lib

And place our Rust struct in src/lib.rs.

We just need to be careful about alignment (padding between fields) and the order the fields, so we mark the struct repr(C) to make the Rust compiler use the same layout as C does:

// Path: ./user-rs/src/lib.rs

#[repr(C)]
pub struct UserC {
    pub name: [u8; 32],
    pub comments_count: u64,
    pub uuid: [u8; 16],
}

Note that the fields can be named differently from the C++ fields if you so choose.

Also note that std::string is represented here by an opaque array of 32 bytes. That's because on my machine, with the standard library I have, sizeof(std::string) is 32. That is not guaranteed by the standard, so this makes it very much not portable. We'll go over some options to work-around this at the end. I wanted to include a standard library type to show that it does not prevent the class from being a 'standard layout class', but that is also creates challenges.

For now, let's forget about this hurdle.

We can also write a stub for the Rust function equivalent to the C++ method:

// Path: ./user-rs-lib/src/lib.rs

#[no_mangle]
pub extern "C" fn RUST_write_comment(user: &mut UserC, comment: *const u8, comment_len: usize) {
    todo!()
}

Now, let's use the tool cbindgen to generate the C header corresponding to this Rust code:

$ cargo install cbindgen
$ cbindgen -v src/lib.rs --lang=c++ -o ../user-rs-lib.h

And we get this C header:

// Path: user-rs-lib.h

#include <cstdarg>
#include <cstdint>
#include <cstdlib>
#include <ostream>
#include <new>

struct UserC {
  uint8_t name[32];
  uint64_t comments_count;
  uint8_t uuid[16];
};

extern "C" {

void RUST_write_comment(UserC *user, const uint8_t *comment, uintptr_t comment_len);

} // extern "C"

Now, let's go back to C++, include this C header, and add lots of compile-time assertions to ensure that the layouts are indeed the same. Again, I place these asserts in the constructor:

#include "user-rs-lib.h"

class User {
 // [..]

  User(std::string name_) : name{name_}, comments_count{0} {
    arc4random_buf(uuid, sizeof(uuid));

    static_assert(std::is_standard_layout_v<User>);
    static_assert(sizeof(std::string) == 32);
    static_assert(sizeof(User) == sizeof(UserC));
    static_assert(offsetof(User, name) == offsetof(UserC, name));
    static_assert(offsetof(User, comments_count) ==
                  offsetof(UserC, comments_count));
    static_assert(offsetof(User, uuid) == offsetof(UserC, uuid));
  }

  // [..]
}

With that, we are certain that the layout in memory of the C++ class and the Rust struct are the same. We could probably generate all of these asserts, with a macro or with a code generator, but for this article, it's fine to do manually.

So let's rewrite the C++ method in Rust. We will for now leave out the name field since it is a bit problematic. Later we will see how we can still use it from Rust:

// Path: ./user-rs-lib/src/lib.rs

#[no_mangle]
pub extern "C" fn RUST_write_comment(user: &mut UserC, comment: *const u8, comment_len: usize) {
    let comment = unsafe { std::slice::from_raw_parts(comment, comment_len) };
    let comment_str = unsafe { std::str::from_utf8_unchecked(comment) };
    println!("({:x?}) says: {}", user.uuid.as_slice(), comment_str);

    user.comments_count += 1;
}

We want to build a static library so we instruct cargo to do so by sticking these lines in Cargo.toml:

[lib]
crate-type = ["staticlib"]

We now build:

$ cargo build
# This is our artifact:
$ ls target/debug/libuser_rs_lib.a

We can use our Rust function from C++ in main, with some cumbersome casts:

// Path: user.cpp

int main() {
  User alice{"alice"};
  const char msg[] = "hello, world!";
  alice.write_comment(msg, sizeof(msg) - 1);

  printf("Comment count: %lu\n", alice.get_comment_count());

  RUST_write_comment(reinterpret_cast<UserC *>(&alice),
                     reinterpret_cast<const uint8_t *>(msg), sizeof(msg) - 1);
  printf("Comment count: %lu\n", alice.get_comment_count());
}

And link (manually) our brand new Rust library to our C++ program:

$ clang++ user.cpp ./user-rs-lib/target/debug/libuser_rs_lib.a
$ ./a.out
alice (336ff4cec0a2ccbfc0c4e4cb9ba7c152) says: hello, world!
Comment count: 1
([33, 6f, f4, ce, c0, a2, cc, bf, c0, c4, e4, cb, 9b, a7, c1, 52]) says: hello, world!
Comment count: 2

The output is slightly different for the uuid, because we use in the Rust implementation the default Debug trait to print the slice, but the content is the same.

A couple of thoughts:

Accessing std::string from Rust

std::string should be an opaque type from the perspective of Rust, because it is not the same across platforms or even compiler versions, so we cannot exactly describe its layout.

But we only want to access the underlying bytes of the string. We thus need a helper on the C++ side, that will extract these bytes for us.

First, the Rust side. We define a helper type ByteSliceView which is a pointer and a length (the equivalent of a std::string_view in C++ latest versions and &[u8] in Rust), and our Rust function now takes an additional parameter, the name:

#[repr(C)]
// Akin to `&[u8]`, for C.
pub struct ByteSliceView {
    pub ptr: *const u8,
    pub len: usize,
}


#[no_mangle]
pub extern "C" fn RUST_write_comment(
    user: &mut UserC,
    comment: *const u8,
    comment_len: usize,
    name: ByteSliceView, // <-- Additional parameter
) {
    let comment = unsafe { std::slice::from_raw_parts(comment, comment_len) };
    let comment_str = unsafe { std::str::from_utf8_unchecked(comment) };

    let name_slice = unsafe { std::slice::from_raw_parts(name.ptr, name.len) };
    let name_str = unsafe { std::str::from_utf8_unchecked(name_slice) };

    println!(
        "{} ({:x?}) says: {}",
        name_str,
        user.uuid.as_slice(),
        comment_str
    );

    user.comments_count += 1;
}

We re-run cbindgen, and now C++ has access to the ByteSliceView type. We thus write a helper to convert a std::string to this type, and pass the additional parameter to the Rust function (we also define a trivial get_name() getter for User since name is still private):

// Path: user.cpp

ByteSliceView get_std_string_pointer_and_length(const std::string &str) {
  return {
      .ptr = reinterpret_cast<const uint8_t *>(str.data()),
      .len = str.size(),
  };
}

// In main:
int main() {
    // [..]
  RUST_write_comment(reinterpret_cast<UserC *>(&alice),
                     reinterpret_cast<const uint8_t *>(msg), sizeof(msg) - 1,
                     get_std_string_pointer_and_length(alice.get_name()));
}

We re-build, re-run, and lo and behold, the Rust implementation now prints the name:

alice (69b7c41491ccfbd28c269ea4091652d) says: hello, world!
Comment count: 1
alice ([69, b7, c4, 14, 9, 1c, cf, bd, 28, c2, 69, ea, 40, 91, 65, 2d]) says: hello, world!
Comment count: 2

Alternatively, if we cannot or do not want to change the Rust signature, we can make the C++ helper get_std_string_pointer_and_length have a C convention and take a void pointer, so that Rust will call the helper itself, at the cost of numerous casts in and out of void*.

Improving the std::string situation

Conclusion

We now have successfully re-written a C++ class method. This technique is great because the C++ class could have hundreds of methods, in a real codebase, and we can still rewrite them one at a time, without breaking or touching the others.

The big caveat is that: the more C++ specific features and standard types the class is using, the more difficult this technique is to apply, necessitating helpers to make conversions from one type to another, and/or numerous tedious casts. If the C++ class is basically a C struct only using C types, it will be very easy.

Still, I have employed this technique at work a lot and I really enjoy its relative simplicity and incremental nature.

It can also be in theory automated, say with tree-sitter or libclang to operate on the C++ AST:

  1. Add a compile-time assert in the C++ class constructor to ensure it is a 'standard layout class' e.g. static_assert(std::is_standard_layout_v<User>);. If this fails, skip this class, it requires manual intervention.
  2. Generate the equivalent Rust struct e.g. the struct UserC.
  3. For each field of the C++ class/Rust struct, add an compile-time assert to make sure the layout is the same e.g. static_assert(sizeof(User) == sizeof(UserC)); static_assert(offsetof(User, name) == offsetof(UserC, name));. If this fails, bail.
  4. For each C++ method, generate an (empty) equivalent Rust function. E.g. RUST_write_comment.
  5. A developer implements the Rust function. Or AI. Or something.
  6. For each call site in C++, replace the C++ method call by a call to the Rust function. E.g. alice.write_comment(..); becomes RUST_write_comment(alice, ..);.
  7. Delete the C++ methods that have been rewritten.

And boom, project rewritten.

Addendum: the full code

The full code
// Path: user.cpp

#include "user-rs-lib.h"
#include <cstdint>
#include <cstdio>
#include <cstring>
#include <string>

extern "C" ByteSliceView
get_std_string_pointer_and_length(const std::string &str) {
  return {
      .ptr = reinterpret_cast<const uint8_t *>(str.data()),
      .len = str.size(),
  };
}

class User {
  std::string name;
  uint64_t comments_count;
  uint8_t uuid[16];

public:
  User(std::string name_) : name{name_}, comments_count{0} {
    arc4random_buf(uuid, sizeof(uuid));

    static_assert(std::is_standard_layout_v<User>);
    static_assert(sizeof(std::string) == 32);
    static_assert(sizeof(User) == sizeof(UserC));
    static_assert(offsetof(User, name) == offsetof(UserC, name));
    static_assert(offsetof(User, comments_count) ==
                  offsetof(UserC, comments_count));
    static_assert(offsetof(User, uuid) == offsetof(UserC, uuid));
  }

  void write_comment(const char *comment, size_t comment_len) {
    printf("%s (", name.c_str());
    for (size_t i = 0; i < sizeof(uuid); i += 1) {
      printf("%x", uuid[i]);
    }
    printf(") says: %.*s\n", (int)comment_len, comment);
    comments_count += 1;
  }

  uint64_t get_comment_count() { return comments_count; }

  const std::string &get_name() { return name; }
};

int main() {
  User alice{"alice"};
  const char msg[] = "hello, world!";
  alice.write_comment(msg, sizeof(msg) - 1);

  printf("Comment count: %lu\n", alice.get_comment_count());

  RUST_write_comment(reinterpret_cast<UserC *>(&alice),
                     reinterpret_cast<const uint8_t *>(msg), sizeof(msg) - 1,
                     get_std_string_pointer_and_length(alice.get_name()));
  printf("Comment count: %lu\n", alice.get_comment_count());
}
// Path: user-rs-lib.h

#include <cstdarg>
#include <cstdint>
#include <cstdlib>
#include <ostream>
#include <new>

struct UserC {
  uint8_t name[32];
  uint64_t comments_count;
  uint8_t uuid[16];
};

struct ByteSliceView {
  const uint8_t *ptr;
  uintptr_t len;
};

extern "C" {

void RUST_write_comment(UserC *user,
                        const uint8_t *comment,
                        uintptr_t comment_len,
                        ByteSliceView name);

} // extern "C"

// Path: user-rs-lib/src/lib.rs

#[repr(C)]
pub struct UserC {
    pub name: [u8; 32],
    pub comments_count: u64,
    pub uuid: [u8; 16],
}

#[repr(C)]
// Akin to `&[u8]`, for C.
pub struct ByteSliceView {
    pub ptr: *const u8,
    pub len: usize,
}

#[no_mangle]
pub extern "C" fn RUST_write_comment(
    user: &mut UserC,
    comment: *const u8,
    comment_len: usize,
    name: ByteSliceView,
) {
    let comment = unsafe { std::slice::from_raw_parts(comment, comment_len) };
    let comment_str = unsafe { std::str::from_utf8_unchecked(comment) };

    let name_slice = unsafe { std::slice::from_raw_parts(name.ptr, name.len) };
    let name_str = unsafe { std::str::from_utf8_unchecked(name_slice) };

    println!(
        "{} ({:x?}) says: {}",
        name_str,
        user.uuid.as_slice(),
        comment_str
    );

    user.comments_count += 1;
}

⏴ Back to all articles

This blog is open-source! If you find a problem, please open a Github issue. The content of this blog as well as the code snippets are under the BSD-3 License which I also usually use for all my personal projects. It's basically free for every use but you have to mention me as the original author.