>> Chapter 12: Diff Command
This chapter is about integrating the myers::diff
by making the diff
command. Writing the new command is straightforward but hunking the
diffs and integrating a pager was challenging specially to optimize.
Source:
>>> Abstracting Commands
Since the diff
and status
command render the same data
differently, the book suggests to refactor them into an abstract base
class. While Rust does not have abstract classes, I used cargo
workspaces to share the core library crate and reuse them in the CLI
binary crate which is the same effect. Even in Elixir, I usually
structure my applications like so:
[workspace]
members = [
"core",
"cli"
]
This separation allows me to test easier with modular interfaces or layers. Also if more interfaces are needed such as a GUI or C library, it is easy to plug it in. It also helps with compilation time since modified crates should only be compiled. Not necessarily an abstract classes equivalent but I appreciate how workspaces made this natural.
Personally, I prefer traits or object composition over abstract
classes since they are safer and easier to maintain. One difficulty
with abstract classes is that when it is changed, its concrete
classes also change which may lead to incorrect or undefined
behavior. Another is allowing object hierarchy that increase learning
complexity than multiple smaller traits. The closest thing in Rust
is that trait methods can have default implementation which I did for
Blob
, Tree
and Commit
:
/// Trait to serialize/deserialize to the correct git storage formats
pub trait ObjectData {
fn kind() -> &'static str;
fn content(&self) -> Vec<u8>;
fn object_content(&self) -> Vec<u8> {
let content = self.content();
let mut buf = Vec::with_capacity(100);
buf.put(format!("{} {}\0", Self::kind(), content.len()).as_bytes());
buf.put(content.as_slice());
buf
}
}
impl ObjectData for Blob {
fn kind() -> &'static str {
"blob"
}
fn content(&self) -> Vec<u8> {
self.bytes.clone()
}
}
/// Import the trait to use its methods
use core::ObjectData;
let blob = Blob::new(Vec::new());
blob.object_content();
Implementing the required trait methods and having the optional methods be interface methods feel better. Rust also has supertraits which is the closest to inheritance but not needed here. Perhaps abstract classes are in conflict with Rust's zero-cost abstraction? Either way, it is a minor thing but I do appreciate Rust more on this front.
>>> Hunks
The hardest task is to filter and group a list of diffs to highlight the changes. Ideally, it looks like this:
#[derive(Debug)]
struct Hunk<'a, T>(Vec<Diff<'a, T>>);
impl<'a, T> Hunk {
fn diff_hunks(diffs: Vec<Diff<'a, T>>) -> Vec<Hunk<'a, T>> {
unimplemented!();
}
}
let left_lines = left_text.lines().collect::<Vec<_>>();
let right_lines = right_text.lines().collect::<Vec<_>>();
let diffs = diff::myers(left_lines.as_slice(), right_lines.as_slice());
for hunk in Hunk::diff_hunks(&diffs) {
for diff in hunk.0 {
info!("{}{}", diff.symbol(), diff.item());
}
}
@@ -9,7 +9,7 @@
const PAGER_CMD: &'static str = "less";
pub struct Pager {
- cmd: String,
+ cmd: Option<String>,
}
impl Pager {
The challenge comes from grouping the diffs into hunks. Initially, I thought to group by the diffs into a change and non-change group, filter out the non-change groups and store everything into a vector:
use itertools::Itertools;
let hunks = diffs.into_iter()
.group_by(|diff| diff.is_same())
.filter(|(is_same, _group)| !is_same)
.map(|(_, group)| group.collect::<Vec<_>>())
.collect::<Vec<_>>();
The tricky thing is that hunks have leading and trailing padding
diffs (3 Diff::Same
above and below) to further show where the
hunks are. Furthermore, when two hunks are separated by less than the
padding amount, they are merged together. The following snippet is my
version of the chunking algorithm:
fn group_hunks<'b, T>(xs: &'b Vec<T>, is_change: impl Fn(&T) -> bool, padding: usize) -> Vec<&'b [T]> {
let mut hunks = Vec::new();
let mut offset = 0;
loop {
/// First Phase: Find the first/next element that is a change
while let Some(x) = xs.get(offset) {
if is_change(x) {
break;
}
offset += 1;
}
if offset >= xs.len() {
break;
}
/// Second Phase: Find the end of the hunk with padding
/// The start of the hunk is the current minus the padding
let hunk_start = offset.checked_sub(padding).unwrap_or_default();
/// This variables will end at the end of the hunk
let mut counter = padding;
offset += 1;
while counter > 0 {
if offset >= xs.len() {
break;
}
let x = xs.get(offset).unwrap();
/// If the current diff is a change, refresh the counter.
/// Otherwise, decrement it.
if is_change(x) {
counter = padding;
} else {
counter -= 1;
}
offset += 1;
}
/// With start and end index, the hunk can be added
hunks.push(&xs[hunk_start..offset]);
offset += 1;
}
hunks
}
for diff_slice in group_hunks(&diffs, |diff| !diff.is_same(), 3) {
let hunk = Hunk(diff_slice);
for diff in hunk.0 {
info!("{}{}", diff.symbol(), diff.item());
}
}
I am proud of this since it only needs to allocate for the new list compared to the book which makes me appreciate how Rust encourages optimization because of its type system.
>>> Pager
Since the output of this command is quite long, the book suggests integrating a terminal pager which allows the output to be scrolled via less or some other pager command. The main idea is to spawn a process via std::process::Command::spawn and pipe the output of the command to the input of this child process instead:
use std::process::{Command, Stdio};
let mut pager_child = Command::new("less")
.env("LESS", "FRV") /// Add default LESS options
.stdin(Stdio::piped()) /// Enable child stdin
.stdout(Stdio::inherit()) /// Share parent stdout to child
.spawn()
.expect("Pager spawn failed");
let output = "HELLO WORLD";
let stdin = pager_child.stdin.as_mut().unwrap();
stdin.write_all(output.as_bytes()).expect("Failed to write");
stdin.flush().expect("Failed to flush");
/// Instead of just println!(output) or info!(output);
pager_child.wait().expect("Failed to close"); /// Wait for the subprocess to close
Although Rust has the Drop
trait, it is not implemented for Child
so Child::wait is required at the ends of its lifetime to avoid
zombie processes. Since I use log (and loggerv), the challenge comes
from redirecting the output to this child's stdin
. Instead of
replacing every instance of info!
in this command, I can create a
custom Log to redirect the output instead:
use log::{Log, LevelFilter, Metadata, Record};
use std::{io::Write, process::Child, sync::{Arc, Mutex}};
/// Arc + Mutex since static lifetime and interior mutability is needed
pub fn setup_piped_logger(child_rc: Arc<Mutex<Child>>) {
struct ChildLog(Arc<Mutex<Child>>);
/// Custom logger redirecting to child process
impl Log for ChildLog {
fn enabled(&self, _metadata: &Metadata) -> bool {
true
}
fn log(&self, record: &Record) {
let mut child = self.0.lock().expect("Process failed");
let stdin = (*child).stdin.as_mut().expect("Process failed");
stdin
.write_fmt(*record.args())
.expect("Failed to write to stdin");
stdin
.write("\n".as_bytes())
.expect("Failed to write to stdin");
}
fn flush(&self) {
let mut child = self.0.lock().expect("Process failed");
let stdin = (*child).stdin.as_mut().expect("Process failed");
stdin.flush().expect("Failed to write to stdin");
}
}
/// Set custom logger
log::set_max_level(LevelFilter::Info);
log::set_boxed_logger(Box::new(ChildLog(child_rc))).ok();
}
let pager_rc = Arc::new(Mutex::new(pager_child));
setup_piped_logger(pager_rc.clone());
/// Goes through the pager instead of the console directly
info!("LET THERE BE LIGHT");
pager_rc.lock().unwrap().wait().expect("Failed to close");
My only peeve is using Arc and Mutex to get around the static lifetime of the boxed logger. I could use mutable static but it requires unsafe which is not allowed or needed. I am impressed that writing a custom logger was straightforward for a pager view. Tutorials for integrating a more thorough pager was sparse and perhaps an area to write about.