June 18, 2020

Rust Pin trait (English)¶

I don't think the manual of "Rust Pin trait" is difficult to read. I try to explain it.

Let me get straight to the bottom line. The conclusion is as follows.

It is usually safe to move an object, however, not always. For example, when the object has a reference to itself.
Rust itself can't prevent types from moving.
Rust has 2 build patterns to prevent from moving syntax. The programmers use a variable like a pointer in either way.
Some functions have the same effect to move even if they don't use the syntax. e.g. std::mem::swap().
Wrapping the pointer in Pin enables the compiler to detect the bugs related to such functions as well.

Self reference example¶

First, let me explain why self-reference type should not be moved.

struct SelfRef {
    x: i32,
    ptr_x: *const i32,  // reference to x
}

impl SelfRef {
    /// Constructor, but has a problem.
    pub fn new(x: i32) -> Self {
        let mut this = Self {
            x,
            ptr_x: std::ptr::null(),
         };
         this.ptr_x = &this.x;

         // This check is OK.
         this.check();

         // Moving.
         // It causes the problem.
         this
    }

    /// Make sure that ptr_x points to x.
    pub fn check(&self) {
        assert!(&self.x as *const i32 == self.ptr_x);
    }
}

fn main() {
    let obj = SelfRef::new(5);

    // Causes an assertion error,
    // because obj.ptr_x still points to where
    // obj.x was before moved.
    obj.check();
}

SelfRef::ptr_x should point to the x, i.e. SelfRef has a reference to itself. This makes the problem.

The last obj.check() causes an assertion error because obj was moved after constructed, however, obj.ptr_x still points to where obj.x was before moved.

How to prevent from moving?¶

SelfRef should not be moved, we have to accept the fact.

Then, how do we help the bug?

There are two ways.

Building the object in heap and don't move it.
Building the object in the stack and enable the compiler to detect the bug.

Building the object in heap¶

The first way is easy. Changing the constructor and everything will be fine.

... Same to the previous example ...

impl SelfRef {
    /// Constructor, but return Box<Self> instead of Self.
    pub fn new(x: i32) -> Box<Self> {
        let mut this = Box::new(SelfRef {
            x,
            ptr_x: std::ptr::null(),
        });
        this.ptr_x = &this.x;

        this.check();

        // Moving only the reference to the object.
        // The object will stay.
        this
    }

 ... Same to the previous example ...

The constructor allocates heap memory, build the object there, and return the pointer to the object. The object itself will stay there.

This works well, however, the performance is not very good because allocating memory is not cheap.

Enable the compiler to detect the bug¶

The second way is hiding the variable as soon as the object was built in the stack.

Because the variable is hidden, we cannot make a code to move it.

The following example causes a compile error.

... Same to the previous example ...

impl SelfRef {
    /// Compile Error!
    pub fn new(x: i32) -> Self {
        let mut this = Self {
            x,
            ptr_x: std::ptr::null(),
         };

         // Hide the vairable "this", however, the constructed
         // object is still in the stack.
         // The new "this" is a reference to the object.
         // We can access to the object through it.
         let this = &mut this;

         this.ptr_x = &this.x;

         // This check has no problem.
         this.check();

         // The function should return SelfRef,
         // but "this" is &mut SelfRef type.
         // Compiler find the bug.
         this
    }

 ... Same to the previous example ...

The function should return SelfRef type, however, we don't have such variable ("this" is a reference, not SelfRef itself.)

Of cause, it is possible to return SelfRef if it implements Copy trait, and if the constructor returns *this instead of this. However, it is not natural. Generally speaking, such type should not be copied if it should not be moved.

Pin trait¶

I explained 2 ways, however, they just care about moving syntax. std::mem::swap(), for example, has the same effect. They can still make the problem.

The following example is same to the second one (the constructor returns Box<SelfRef>), except for the main function calls std::mem::swap().

struct SelfRef {
    x: i32,
    ptr_x: *const i32,  // reference to x
}

impl SelfRef {
    /// Constructor
    pub fn new(x: i32) -> Box<Self> {
        let mut this = Box::new(Self {
            x,
            ptr_x: std::ptr::null(),
         });
         this.ptr_x = &this.x;

         this.check();

         this
    }

    /// Make sure that ptr_x points to x.
    pub fn check(&self) {
        assert!(&self.x as *const i32 == self.ptr_x);
    }
}

fn main() {
    // Building obj1, it is OK.
    let mut obj1 = SelfRef::new(1);
    obj1.check();

    // Building obj2, it is OK, too.
    let mut obj2 = SelfRef::new(2);
    obj2.check();

    // Swapping obj1 and obj2.
    // This has the same effect to move obj1 and obj2.
    std::mem::swap(&mut *obj1, &mut *obj2);

    // This line will cause an assertion error.
    obj1.check();
}

Then last obj.check() causes an error because std::mem::swap() behaves like moving.

Pin trait helps in such time.

Let's change main() as follows.

... Same to the previous example ...

// This does not make any difference so far.
fn main() {
    // std::pin::Pin::new_unchecked() is unsafe function.
    // The following code is unsafe.
    unsafe {
        // Building obj1 and obj2
        let obj1 = SelfRef::new(1);
        obj1.check();

        let obj2 = SelfRef::new(2);
        obj2.check();

        // Wrapping obj1 and obj2.
        let mut obj1 = std::pin::Pin::new_unchecked(obj1);
        let mut obj2 = std::pin::Pin::new_unchecked(obj2);

        // Swapping obj1 and obj2.
        std::mem::swap(&mut *obj1, &mut *obj2);

        // Assertion error.
        obj1.check();
    }
}

Unfortunately, the last obj1.check() still causes an error. That's because Pin works only the content does NOT implement Unpin trait.

Does SelfRef implement Unpin?

Yes, it does!

Rust compiler does it automatically if all the properties are unpinned. It is quite natural because it is rare that the type should not be moved. Actually, most primitives and structs are unpinned.

We must specify NOT to do it in this case.

std::marker::PhantomPinned is a phantom date which is not unpinned. (Phantom data is a type whose size is 0 byte.)

Let's add the property. (main() is not changed.)

struct SelfRef {
    x: i32,
    ptr_x: *const i32,  // reference to x
    _phantom: std::marker::PhantomPinned,
}

impl SelfRef {
    pub fn new(x: i32) -> Box<Self> {
        let mut this = Box::new(Self {
            x,
            ptr_x: std::ptr::null(),
            _phantom: std::marker::PhantomPinned,
         });
         this.ptr_x = &this.x;

         this.check();

         this
    }

    /// Make sure that ptr_x points to x.
    pub fn check(&self) {
        assert!(&self.x as *const i32 == self.ptr_x);
    }
}

fn main() {
    unsafe {
        // Building obj1 and obj2
        let obj1 = SelfRef::new(1);
        obj1.check();

        let obj2 = SelfRef::new(2);
        obj2.check();

        // Wrapping obj1 and obj2.
        // This make impossible to take mutalbe pointer
        // to the object.
        let mut obj1 = std::pin::Pin::new_unchecked(obj1);
        let mut obj2 = std::pin::Pin::new_unchecked(obj2);

        // Compile error.
        // Because `&mut *obj1` is not allowed.
        std::mem::swap(&mut *obj1, &mut *obj2);

        obj1.check();
    }
}

Then everything is OK. The compiler detects the bug.

Conclusion¶

Rust cannot prevent types from moving, unlike C++.

Rust has 2 build patterns instead.

Allocating heap memory and building an object there. The object will not be moved.
Building an object in the stack, and overwriting the variable by the reference. The compiler finds the bug to move such value.

Anyway, we have access to the object through a variable like a pointer. The compiler will detect the bug related to functions like std::mem::swap() if the pointer is wrapped in Pin.

Honestly speaking, I can't understand why Rust takes such a way. Probably there are some reasons, but it is too difficult. For example, is Unpin necessary?

What is worse, we still have a chance to build normally and to make a bug.

I don't like this system.