>> Problem
I have a toy database module that lazily loads an file-based toy index with a read, add and commit operation as shown in this snippet:
use std::{io, fs, path::{Path ,PathBuf}};
#[derive(Debug)]
struct DB;
#[derive(Debug)]
struct Index(PathBuf, Vec<u8>);
impl DB {
pub fn open() -> Self {
Self
}
pub fn read_index() -> Index {
let path = Path::new("index.db").to_path_buf();
let index_data = fs::read(&path).unwrap();
Index(path, index_data)
}
}
impl Index {
pub fn read(&self) -> &[u8] {
self.1.as_slice()
}
pub fn add(&mut self, entry: Vec<u8>) {
self.1.append(&mut entry);
}
pub fn write(&mut self) {
fs::write(self.0, self.1).unwrap();
}
}
The challenge is to create an interface where the index can be read by
many or written exclusively like a reference. How about we return a
reference from DB::read_index
instead of an owned Index
?
Representing that as load_index
and lock_index
in the following
snippet:
/// Can we return a reference of an owned object?
impl DB {
pub fn load_index(&self) -> &Index {
&Self::read_index() // error[E0515]: cannot return reference to temporary value
}
pub fn lock_mut(&self) -> &mut Index {
&mut Self::read_index() // error[E0515]: cannot return reference to temporary value
}
}
However, this will fail since the owned object is dropped by the end
of the function. What if we create a read-only and write wrapper
structs like IndexRef
and IndexRefMut
respectively? The read-only
wrapper should only expose immutable methods such as Index::read
while the other exposes everything like so:
/// What if we create a read/write interfaces?
#[derive(Debug)]
struct IndexRef(Index);
#[derive(Debug)]
struct IndexRefMut(Index);
impl DB {
pub fn load_index(&self) -> IndexRef {
IndexRef(Self::read_index())
}
pub fn lock_mut(&mut self) -> IndexRefMut {
IndexRefMut(Self::read_index())
}
}
/// Immutable Wrapper
impl IndexRef {
pub fn read(&self) -> &[u8] {
self.0.read()
}
}
/// Mutable Wrapper
impl IndexRefMut {
pub fn read(&self) -> &[u8] {
self.0.read()
}
pub fn add(&mut self, entry: Vec<u8>) -> &[u8] {
self.0.add(entry)
}
pub fn write(&mut self) {
self.0.write()
}
}
While it does the job, adding new methods to Index
require adding it
to IndexRef
and IndexRefMut
which is a maintenance issue.
>> Solution
A trait already exists for this case: std::ops::Deref and
std::ops::DerefMut. When a method does not exist on an expression, the
compiler will attempt to dereference it until it can find the method.
(See Method Call Expression.) If we allow IndexRef
and IndexRefMut
to derefence to Index
, we solve the maintenance issue which is
conveyed in the snippet below:
let meaning: &str = "42";
value.len();
/// Same as below since `&str` automatically derefences to `str`
(*value).len()
/// Assuming `IndexRef` and `IndexRefMut` implements nothing
impl IndexRef {}
impl IndexRefMut {}
let mut write_index: IndexRefMut;
write_index.write();
/// Since `IndexRefMut::write` does not exist.
/// The compiler attempts to derefence `write_index` and hopes to find the `.write` method.
/// If `IndexRefMut` derefences to `Index`, it will.
(*write_index).write();
Reading the docs, implementing both traits is straightforward:
impl Deref for IndexRef {
type Target = Index;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl Deref for IndexRefMut {
type Target = Index;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl DerefMut for IndexRefMut {
fn deref_mut(&mut self) -> &mut Self::Target {
&mut self.0
}
}
To demonstrate it works:
fs::write("index.db", "").unwrap();
let mut db = DB::open();
{
let read_index = db.load_index();
println!("Should be empty data: {:?}", read_index.read());
}
{
let mut write_index = db.lock_index();
write_index.add("TOTORO".as_bytes());
write_index.write();
}
{
let mut updated_index = db.lock_index();
println!("Should have data: {:?}", updated_index.read());
/// Should not be allowed because `db` is already mutably borrowed which is intended
let read_index = db.load_index();
}
Another advantage is that we can implement the std::ops::Drop on
IndexRefMut
so that Index::write
is called whenever it is dropped
to automatically sync the data. This is also a good separation of
concern to keep Index
pure while the wrappers implement convenience
or ergonomics as shown below:
impl Drop for IndexRefMut {
fn drop(&mut self) {
self.0.write();
}
}
{
let mut write_index = db.lock_index();
write_index.add("TOTORO".as_bytes());
// write_index.write(); // Should automatically write when dropped
}
Although this is a neat trick, it is generally discouraged to
implement Deref
as it changes a well-known behavior. In our case, it
is intentional but do checkout the Rust API guidelines for other
things to be aware of.
>> Notes
>>> File Locking
While we preserve the borrow rules, it does not work when multiple
process or threads access the file. We can implement a simple file
locking mechanism where we atomically try to create a file and remove
it when dropping. If another process sees the lock file, then the
lock operation should fail. Instead of using Path::exists then
fs::write, we can do with fs::OpenOptions to do it in one atomic step
and avoid a filesystem race condition. A simple lock can be amended
to IndexRefMut
like so:
use std::{fs::{self, OpenOptions}}n;
impl DB {
pub fn lock_index(&mut self) -> Index {
OpenOptions::new()
.write(true)
.create_new(true)
.open("index.db.lck")
.unwrap();
IndexRefMut(Self::read_index())
}
}
impl Drop for IndexRefMut {
fn drop(&mut self) {
self.0.write();
fs::remove_file("index.db.lck").ok();
}
}
>>> is_changed Flag
While syncing the data on drop is nice, it should only sync when data
is changed. We could add an is_changed
flag to IndexRefMut
so
that it is true
whenever Index.add
is called and likewise false
for Index.write
. It is easily done with:
struct IndexRefMut(Index, bool);
impl DB {
pub fn lock_index(&self) -> IndexRefMut {
IndexRefMut(Self::read_index(), false)
}
}
impl IndexRefMut {
pub fn add(&mut self, entry: Vec<u8>) {
self.0.add(entry);
self.1 = true;
}
pub fn write(&mut self) {
self.0.write();
self.1 = false;
}
}
impl Drop for IndexRefMut {
fn drop(&mut self) {
if self.1 {
self.0.write();
}
}
}
We could also implement it on Index
itself but it is nice that we
can keep Index
pure.
>>> RefCell
It is valid to use a RefCell to store/cache the data on load and then
use RefCell.borrow
and RefCell.borrow_mut
for load_index
and
lock_index
respectively.
use std::cell::{RefCell, Ref, RefMut};
#[derive(Debug)]
struct DB(Option<RefCell<Index>>);
impl DB {
pub fn load_index(&self) -> Ref<Index> {
if let Some(index_cell) = self.0.as_ref() {
return index_cell.borrow();
}
self.0 = Some(RefCell::new(Self::read_index()));
self.0.as_ref().unwrap().borrow()
}
pub fn lock_mut(&mut self) -> RefMut<Index> {
if let Some(index_cell) = self.0.as_ref() {
return index_cell.borrow_mut();
}
self.0 = Some(RefCell::new(Self::read_index()));
self.0.as_ref().unwrap().borrow_mut()
}
}
Although this works, we are unnecessarily giving away compile time safety compared to our solution.