>> Chapter 12: Diff Command

This chapter is about integrating the myers::diff by making the diff command. Writing the new command is straightforward but hunking the diffs and integrating a pager was challenging specially to optimize.

Source:

>>> Abstracting Commands

Since the diff and status command render the same data differently, the book suggests to refactor them into an abstract base class. While Rust does not have abstract classes, I used cargo workspaces to share the core library crate and reuse them in the CLI binary crate which is the same effect. Even in Elixir, I usually structure my applications like so:

[workspace]
members = [
  "core",
  "cli"
]

This separation allows me to test easier with modular interfaces or layers. Also if more interfaces are needed such as a GUI or C library, it is easy to plug it in. It also helps with compilation time since modified crates should only be compiled. Not necessarily an abstract classes equivalent but I appreciate how workspaces made this natural.

Personally, I prefer traits or object composition over abstract classes since they are safer and easier to maintain. One difficulty with abstract classes is that when it is changed, its concrete classes also change which may lead to incorrect or undefined behavior. Another is allowing object hierarchy that increase learning complexity than multiple smaller traits. The closest thing in Rust is that trait methods can have default implementation which I did for Blob, Tree and Commit:

/// Trait to serialize/deserialize to the correct git storage formats
pub trait ObjectData {
    fn kind() -> &'static str;

    fn content(&self) -> Vec<u8>;

    fn object_content(&self) -> Vec<u8> {
        let content = self.content();

        let mut buf = Vec::with_capacity(100);

        buf.put(format!("{} {}\0", Self::kind(), content.len()).as_bytes());
        buf.put(content.as_slice());

        buf
    }
}

impl ObjectData for Blob {
    fn kind() -> &'static str {
        "blob"
    }

    fn content(&self) -> Vec<u8> {
        self.bytes.clone()
    }
}

/// Import the trait to use its methods
use core::ObjectData;

let blob = Blob::new(Vec::new());
blob.object_content();

Implementing the required trait methods and having the optional methods be interface methods feel better. Rust also has supertraits which is the closest to inheritance but not needed here. Perhaps abstract classes are in conflict with Rust's zero-cost abstraction? Either way, it is a minor thing but I do appreciate Rust more on this front.

>>> Hunks

The hardest task is to filter and group a list of diffs to highlight the changes. Ideally, it looks like this:

#[derive(Debug)]
struct Hunk<'a, T>(Vec<Diff<'a, T>>);

impl<'a, T> Hunk {
    fn diff_hunks(diffs: Vec<Diff<'a, T>>) -> Vec<Hunk<'a, T>> {
        unimplemented!();
    }
}

let left_lines = left_text.lines().collect::<Vec<_>>();
let right_lines = right_text.lines().collect::<Vec<_>>();

let diffs = diff::myers(left_lines.as_slice(), right_lines.as_slice());
for hunk in Hunk::diff_hunks(&diffs) {
    for diff in hunk.0 {
        info!("{}{}", diff.symbol(), diff.item());
    }
}
@@ -9,7 +9,7 @@
 const PAGER_CMD: &'static str = "less";

 pub struct Pager {
-    cmd: String,
+    cmd: Option<String>,
 }

 impl Pager {

The challenge comes from grouping the diffs into hunks. Initially, I thought to group by the diffs into a change and non-change group, filter out the non-change groups and store everything into a vector:

use itertools::Itertools;

let hunks = diffs.into_iter()
    .group_by(|diff| diff.is_same())
    .filter(|(is_same, _group)| !is_same)
    .map(|(_, group)| group.collect::<Vec<_>>())
    .collect::<Vec<_>>();

The tricky thing is that hunks have leading and trailing padding diffs (3 Diff::Same above and below) to further show where the hunks are. Furthermore, when two hunks are separated by less than the padding amount, they are merged together. The following snippet is my version of the chunking algorithm:

fn group_hunks<'b, T>(xs: &'b Vec<T>, is_change: impl Fn(&T) -> bool, padding: usize) -> Vec<&'b [T]> {
    let mut hunks = Vec::new();

    let mut offset = 0;

    loop {
        /// First Phase: Find the first/next element that is a change
        while let Some(x) = xs.get(offset) {
            if is_change(x) {
                break;
            }

            offset += 1;
        }

        if offset >= xs.len() {
            break;
        }

        /// Second Phase: Find the end of the hunk with padding

        /// The start of the hunk is the current minus the padding
        let hunk_start = offset.checked_sub(padding).unwrap_or_default();

        /// This variables will end at the end of the hunk
        let mut counter = padding;
        offset += 1;

        while counter > 0 {
            if offset >= xs.len() {
                break;
            }

            let x = xs.get(offset).unwrap();

            /// If the current diff is a change, refresh the counter.
            /// Otherwise, decrement it.
            if is_change(x) {
                counter = padding;
            } else {
                counter -= 1;
            }

            offset += 1;
        }

        /// With start and end index, the hunk can be added
        hunks.push(&xs[hunk_start..offset]);

        offset += 1;
    }

    hunks
}

for diff_slice in group_hunks(&diffs, |diff| !diff.is_same(), 3) {
    let hunk = Hunk(diff_slice);

    for diff in hunk.0 {
        info!("{}{}", diff.symbol(), diff.item());
    }
}

I am proud of this since it only needs to allocate for the new list compared to the book which makes me appreciate how Rust encourages optimization because of its type system.

>>> Pager

Since the output of this command is quite long, the book suggests integrating a terminal pager which allows the output to be scrolled via less or some other pager command. The main idea is to spawn a process via std::process::Command::spawn and pipe the output of the command to the input of this child process instead:

use std::process::{Command, Stdio};

let mut pager_child = Command::new("less")
    .env("LESS", "FRV")             /// Add default LESS options
    .stdin(Stdio::piped())          /// Enable child stdin
    .stdout(Stdio::inherit())       /// Share parent stdout to child
    .spawn()
    .expect("Pager spawn failed");

let output = "HELLO WORLD";

let stdin = pager_child.stdin.as_mut().unwrap();
stdin.write_all(output.as_bytes()).expect("Failed to write");
stdin.flush().expect("Failed to flush");

/// Instead of just println!(output) or info!(output);

pager_child.wait().expect("Failed to close");  /// Wait for the subprocess to close

Although Rust has the Drop trait, it is not implemented for Child so Child::wait is required at the ends of its lifetime to avoid zombie processes. Since I use log (and loggerv), the challenge comes from redirecting the output to this child's stdin. Instead of replacing every instance of info! in this command, I can create a custom Log to redirect the output instead:

use log::{Log, LevelFilter, Metadata, Record};
use std::{io::Write, process::Child, sync::{Arc, Mutex}};

/// Arc + Mutex since static lifetime and interior mutability is needed
pub fn setup_piped_logger(child_rc: Arc<Mutex<Child>>) {
    struct ChildLog(Arc<Mutex<Child>>);

    /// Custom logger redirecting to child process
    impl Log for ChildLog {
        fn enabled(&self, _metadata: &Metadata) -> bool {
            true
        }

        fn log(&self, record: &Record) {
            let mut child = self.0.lock().expect("Process failed");
            let stdin = (*child).stdin.as_mut().expect("Process failed");

            stdin
                .write_fmt(*record.args())
                .expect("Failed to write to stdin");
            stdin
                .write("\n".as_bytes())
                .expect("Failed to write to stdin");
        }

        fn flush(&self) {
            let mut child = self.0.lock().expect("Process failed");
            let stdin = (*child).stdin.as_mut().expect("Process failed");

            stdin.flush().expect("Failed to write to stdin");
        }
    }

    /// Set custom logger
    log::set_max_level(LevelFilter::Info);
    log::set_boxed_logger(Box::new(ChildLog(child_rc))).ok();
}

let pager_rc = Arc::new(Mutex::new(pager_child));
setup_piped_logger(pager_rc.clone());

/// Goes through the pager instead of the console directly
info!("LET THERE BE LIGHT");

pager_rc.lock().unwrap().wait().expect("Failed to close");

My only peeve is using Arc and Mutex to get around the static lifetime of the boxed logger. I could use mutable static but it requires unsafe which is not allowed or needed. I am impressed that writing a custom logger was straightforward for a pager view. Tutorials for integrating a more thorough pager was sparse and perhaps an area to write about.