Trying to create self-referential structs in Rust

July 23, 2022 • 6 minute read programming , rust

Suppose in Rust you wanted to both read and write simultaneously from the same TcpStream. This is possible because, while io::Read and io::Write are implemented for TcpStream, they are also implemented for &TcpStream. This means we can do something like this:

fn do_the_read<R: Read>(_r: R) {
    unimplemented!()
}

fn do_the_write<W: Write>(_w: W) {
    unimplemented!()
}

fn read_and_write(stream: TcpStream) {
    let arc1 = Arc::new(stream);
    let arc2 = arc1.clone();
    thread::spawn(move || do_the_read(&*arc1));
    thread::spawn(move || do_the_write(&*arc2));
}

If io::Read wasn’t implemented for &TcpStream, only TcpStream, then we would need to use an Arc<Mutex<TcpStream>> to give the threads access to a &mut TcpStream. But since the standard library’s TcpStream does internal locking, it can implement Read and Write for a regular shared reference, saving us a Mutex.

Cool pattern, right? Let’s try using it.

A simple server

Let’s write a simple server. It could be a web server, or a websocket server, or a game server; it doesn’t matter. We’ll have one thread per connection. We start with our User struct, which holds logic for one connection.

use std::net::{TcpStream, TcpListener};
use std::sync::Arc;
use std::io;
use std::thread;

struct User {
    stream: TcpStream
}

impl User {
    fn run(&self) {
        todo!() // read and write from the stream
    }
}

The main loop accepts new clients in a loop, creates User objects, adds them to a set of connected users, and then runs the User::run in a separate thread.

fn main() -> io::Result<()> {
    let listener = TcpListener::bind("127.0.0.1:80")?;
    let mut users: Vec<Arc<User>> = Vec::new();

    for stream in listener.incoming() {
        let stream = stream?;
        let user = Arc::new(User {
            stream
        });

        users.push(user.clone());

        thread::spawn(move || user.run());
    }

    Ok(())
}

But reading or writing a lot of small chunks of data from a TcpStream is slow, so we can use buffered I/O instead:

struct User {
    stream: TcpStream
+   reader: BufReader<TcpStream>,
+   writer: BufWriter<TcpStream>,
}

...

let user = Arc::new(User {
    stream
+   reader: BufReader::new(stream),
+   writer: BufWriter::new(stream),
});

And here we see our problem:

error[E0382]: use of moved value: `stream`
  --> <source>:26:36
   |
22 |         let stream = stream?;
   |             ------ move occurs because `stream` has type `TcpStream`, which does not implement the `Copy` trait
...
25 |             reader: BufReader::new(stream),
   |                                    ------ value moved here
26 |             writer: BufWriter::new(stream),
   |                                    ^^^^^^ value used here after move

Of course, BufReader and BufWriter wrap a reader or writer. Since TcpStream isn’t Copy, we can’t give it to both. However, io::Read and io::Write is implemented for &TcpStream, so we can do this instead:

struct User<'a> {
    stream: TcpStream
    reader: BufReader<&'a TcpStream>,
    writer: BufWriter<&'a TcpStream>,
}

// ...

let user = Arc::new(User {
    stream
    reader: BufReader::new(&stream), // "borrowed value does not live long enough"
    writer: BufWriter::new(&stream), // "borrowed value does not live long enough"
});

And now, we run into a lot of problems. First of all, User should not be parameterized by a lifetime—it doesn’t even borrow data from outside itself, so giving it a lifetime wouldn’t make any sense. What we want to somehow do is tell the compiler that the references inside reader and writer point to the User’s own stream, a.k.a. a self-referential struct¹. Let’s go through some ways to solve this problem:

Using `Arc<T>`

The simplest way to solve this with purely safe Rust is to use an Arc:

struct User {
    stream: Arc<TcpStream>,
    reader: BufReader<Arc<TcpStream>>,
    writer: BufWriter<Arc<TcpStream>>,
}

// ...

let stream = Arc::new(stream);
let user = Arc::new(User {
    stream.clone(),
    reader: BufReader::new(stream.clone()),
    writer: BufWriter::new(stream.clone()),
});

Now of course this is going to give us another error since Arc<TcpStream> doesn’t implement read or write, only &TcpStream does.

error[E0277]: the trait bound `Arc<TcpStream>: std::io::Write` is not satisfied
  --> src/main.rs:8:13
   |
8  |     writer: BufWriter<Arc<TcpStream>>,
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `std::io::Write` is not implemented for `Arc<TcpStream>`
   |

Luckily this is a problem that people have run into before, and io-arc is a crate that fixes exactly this: it’s an Arc<T> that implements read and write if its corresponding &T implements it. And its implementation is as simple as you’d expect it to be:

#[derive(Debug)]
pub struct IoArc<T>(Arc<T>);

impl<T> IoArc<T> {
    /// Create a new instance of IoArc.
    pub fn new(data: T) -> Self {
        Self(Arc::new(data))
    }
}

impl<T> Read for IoArc<T>
where
    for<'a> &'a T: Read,
{
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
        (&mut &*self.0).read(buf)
    }
}

impl<T> Write for IoArc<T>
where
    for<'a> &'a T: Write,
{
    fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
        (&mut &*self.0).write(buf)
    }

    fn flush(&mut self) -> io::Result<()> {
        (&mut &*self.0).flush()
    }
}

Disadvantages

In practice this works. But in principle it’s duct tape over a larger problem. For one, Read and Write are just two traits that can be implemented on &T, but what if you have more? For example there’s AsyncRead from futures_io. The Rust team is aware of this issue because Read is implemented for Arc<File>, but not anything else. And what if we want an Rc<File>? And what if our type doesn’t do its own locking? Can we wrap a Arc<Mutex<T: Read>> in a buffer? Not really.

The problem comes from the fact that references are kind of special to the standard library. A shared reference is like the canonical way to share something, so if you have other plans, well that’s up to you to implement it.

Oh what’s that? No foreign traits for foreign types? Oh. That’s too bad.

Using static references, unsafely

Ok, so if &TcpStream is Read, and nothing else is, why don’t we brute-force our way to using it? Let’s go back to our bad lifetime version.

struct User<'a> {
    stream: TcpStream
    reader: BufReader<&'a TcpStream>,
    writer: BufWriter<&'a TcpStream>,
}

let user = Arc::new(User {
    stream
    reader: BufReader::new(&stream),
    writer: BufWriter::new(&stream),
});

We can make this work! Let’s think about this. What’s 'a? Well we know:

User<'a> should really just be User. Lifetime parameters are for when your struct points to some data it doesn’t own, but in this case, the TcpStream is all managed internally.
For an &'a TcpStream, 'a lives for the entire lifetime of the struct

Hey, that’s just 'static!

'static doesn’t need to be parameterized
To BufReader’s point of view, &'static TcpStream lives for the entire lifetime of “the program”² because it’s valid until the User is dropped.

We’ll have to be careful about drop order now too, the stream has to be dropped last.

struct User {
    reader: BufReader<&'static TcpStream>,
    writer: BufWriter<&'static TcpStream>,
    stream: TcpStream, // drop order matters now
}

unsafe {
	let user = Arc::new(User {
		// extend lifetime via transmute
	    reader: BufReader::new(mem::transmute(&stream)),
	    writer: BufWriter::new(mem::transmute(&stream)),
	    stream,
	});
}

This was the most elegant way I found to do this, if not veering on unsoundness.

Ah! Did I bait-and-switch you? You thought this was going to be a blog post about Pin? Think again! There’s more to self-reference after all. ↩︎
Assuming BufReader doesn’t store the reference somewhere that outlives it ↩︎

Trying to create self-referential structs in Rust

A simple server

Using Arc<T>

Disadvantages

Using static references, unsafely

Using `Arc<T>`