Fn/Log - Rust Bit - A Use For Deref Trait

>> Problem

I have a toy database module that lazily loads an file-based toy index with a read, add and commit operation as shown in this snippet:

use std::{io, fs, path::{Path ,PathBuf}};

#[derive(Debug)]
struct DB;

#[derive(Debug)]
struct Index(PathBuf, Vec<u8>);

impl DB {
    pub fn open() -> Self {
        Self
    }

    pub fn read_index() -> Index {
        let path = Path::new("index.db").to_path_buf();
        let index_data = fs::read(&path).unwrap();

        Index(path, index_data)
    }
}

impl Index {
    pub fn read(&self) -> &[u8] {
        self.1.as_slice()
    }

    pub fn add(&mut self, entry: Vec<u8>) {
        self.1.append(&mut entry);
    }

    pub fn write(&mut self) {
        fs::write(self.0, self.1).unwrap();
    }
}

The challenge is to create an interface where the index can be read by many or written exclusively like a reference. How about we return a reference from DB::read_index instead of an owned Index? Representing that as load_index and lock_index in the following snippet:

/// Can we return a reference of an owned object?
impl DB {
    pub fn load_index(&self) -> &Index {
        &Self::read_index()     // error[E0515]: cannot return reference to temporary value
    }

    pub fn lock_mut(&self) -> &mut Index {
        &mut Self::read_index() // error[E0515]: cannot return reference to temporary value
    }
}

However, this will fail since the owned object is dropped by the end of the function. What if we create a read-only and write wrapper structs like IndexRef and IndexRefMut respectively? The read-only wrapper should only expose immutable methods such as Index::read while the other exposes everything like so:

/// What if we create a read/write interfaces?
#[derive(Debug)]
struct IndexRef(Index);

#[derive(Debug)]
struct IndexRefMut(Index);

impl DB {
    pub fn load_index(&self) -> IndexRef {
        IndexRef(Self::read_index())
    }

    pub fn lock_mut(&mut self) -> IndexRefMut {
        IndexRefMut(Self::read_index())
    }
}

/// Immutable Wrapper
impl IndexRef {
    pub fn read(&self) -> &[u8] {
        self.0.read()
    }
}

/// Mutable Wrapper
impl IndexRefMut {
    pub fn read(&self) -> &[u8] {
        self.0.read()
    }

    pub fn add(&mut self, entry: Vec<u8>) -> &[u8] {
        self.0.add(entry)
    }

    pub fn write(&mut self) {
        self.0.write()
    }
}

While it does the job, adding new methods to Index require adding it to IndexRef and IndexRefMut which is a maintenance issue.

>> Solution

A trait already exists for this case: std::ops::Deref and std::ops::DerefMut. When a method does not exist on an expression, the compiler will attempt to dereference it until it can find the method. (See Method Call Expression.) If we allow IndexRef and IndexRefMut to derefence to Index, we solve the maintenance issue which is conveyed in the snippet below:

let meaning: &str = "42";

value.len();
/// Same as below since `&str` automatically derefences to `str`
(*value).len()

/// Assuming `IndexRef` and `IndexRefMut` implements nothing
impl IndexRef {}
impl IndexRefMut {}

let mut write_index: IndexRefMut;

write_index.write();
/// Since `IndexRefMut::write` does not exist.
/// The compiler attempts to derefence `write_index` and hopes to find the `.write` method.
/// If `IndexRefMut` derefences to `Index`, it will.
(*write_index).write();

Reading the docs, implementing both traits is straightforward:

impl Deref for IndexRef {
    type Target = Index;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

impl Deref for IndexRefMut {
    type Target = Index;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

impl DerefMut for IndexRefMut {
    fn deref_mut(&mut self) -> &mut Self::Target {
        &mut self.0
    }
}

To demonstrate it works:

fs::write("index.db", "").unwrap();

let mut db = DB::open();

{
    let read_index = db.load_index();
    println!("Should be empty data: {:?}", read_index.read());
}

{
    let mut write_index = db.lock_index();
    write_index.add("TOTORO".as_bytes());
    write_index.write();
}

{
    let mut updated_index = db.lock_index();
    println!("Should have data: {:?}", updated_index.read());

    /// Should not be allowed because `db` is already mutably borrowed which is intended
    let read_index = db.load_index();
}

Another advantage is that we can implement the std::ops::Drop on IndexRefMut so that Index::write is called whenever it is dropped to automatically sync the data. This is also a good separation of concern to keep Index pure while the wrappers implement convenience or ergonomics as shown below:

impl Drop for IndexRefMut {
    fn drop(&mut self) {
        self.0.write();
    }
}

{
    let mut write_index = db.lock_index();
    write_index.add("TOTORO".as_bytes());
    // write_index.write(); // Should automatically write when dropped
}

Although this is a neat trick, it is generally discouraged to implement Deref as it changes a well-known behavior. In our case, it is intentional but do checkout the Rust API guidelines for other things to be aware of.

Tidbit Code

>> Notes

>>> File Locking

While we preserve the borrow rules, it does not work when multiple process or threads access the file. We can implement a simple file locking mechanism where we atomically try to create a file and remove it when dropping. If another process sees the lock file, then the lock operation should fail. Instead of using Path::exists then fs::write, we can do with fs::OpenOptions to do it in one atomic step and avoid a filesystem race condition. A simple lock can be amended to IndexRefMut like so:

use std::{fs::{self, OpenOptions}}n;

impl DB {
    pub fn lock_index(&mut self) -> Index {
        OpenOptions::new()
            .write(true)
            .create_new(true)
            .open("index.db.lck")
            .unwrap();

        IndexRefMut(Self::read_index())
    }
}


impl Drop for IndexRefMut {
    fn drop(&mut self) {
        self.0.write();
        fs::remove_file("index.db.lck").ok();
    }
}

>>> is_changed Flag

While syncing the data on drop is nice, it should only sync when data is changed. We could add an is_changed flag to IndexRefMut so that it is true whenever Index.add is called and likewise false for Index.write. It is easily done with:

struct IndexRefMut(Index, bool);

impl DB {
    pub fn lock_index(&self) -> IndexRefMut {
        IndexRefMut(Self::read_index(), false)
    }
}

impl IndexRefMut {
    pub fn add(&mut self, entry: Vec<u8>) {
        self.0.add(entry);
        self.1 = true;
    }

    pub fn write(&mut self) {
        self.0.write();
        self.1 = false;
    }
}

impl Drop for IndexRefMut {
    fn drop(&mut self) {
        if self.1 {
            self.0.write();
        }
    }
}

We could also implement it on Index itself but it is nice that we can keep Index pure.

>>> RefCell

It is valid to use a RefCell to store/cache the data on load and then use RefCell.borrow and RefCell.borrow_mut for load_index and lock_index respectively.

use std::cell::{RefCell, Ref, RefMut};

#[derive(Debug)]
struct DB(Option<RefCell<Index>>);

impl DB {
    pub fn load_index(&self) -> Ref<Index> {
        if let Some(index_cell) = self.0.as_ref() {
            return index_cell.borrow();
        }

        self.0 = Some(RefCell::new(Self::read_index()));
        self.0.as_ref().unwrap().borrow()
    }

    pub fn lock_mut(&mut self) -> RefMut<Index> {
        if let Some(index_cell) = self.0.as_ref() {
            return index_cell.borrow_mut();
        }

        self.0 = Some(RefCell::new(Self::read_index()));
        self.0.as_ref().unwrap().borrow_mut()
    }
}

Although this works, we are unnecessarily giving away compile time safety compared to our solution.