>> Problem
I've been using Emacs for quite some time now and used both smex and helm as my commander. (pun somewhat intended.) But when I started, it was just ido and the phenomenal flex matching feature which makes command searching faster and easier. I encourage you to read or use flex matching before continuing but the easiest explanation I can give is that *given a list of function names and a search text, return all the functions that contain all the letters of the search text in the function name, sequentially*.
Not the best perhaps but I do want to stress the word sequential because it is not the typical substring search. Maybe one or two examples might help.
Let's say we have a function emacs-lisp
, we can match it with the
substring emacs
, lisp
or cs-li
but not el
or ep
; however, flex
matching will match el
because e
and l
appear sequentially in the
name and likewise ep
but not me
although both letters appear in the
name but are not sequential. I love to call this kind of matching
forward character matching but that's just me.
After getting used to this kind matching and a thousand functions to
scan, one naturally asks the question what is the smallest search text
that will match a target? My sample target is emacs-lisp-mode
and
what I hope to understand is whether elmo
is one of the flex matching
candidate for the job.
To limit the problem so that it doesn't explode, the question is roughly
using smex
, what are the smallest matching search text for a
command? So if you don't use Emacs or smex as your flex matcher,
reading this article might have little mileage for you but if you want
to hear the journey, let's ride.
>> Before Coding
For those who want the code instead of a boring explanation, here it is.
Here is an outline of how things would go. Given a function name, do the following
- Check if the function exists, bail if there is not
- Generate all possible flex matching "substring" from that name
- For each substring, get the possible matches and rate it by order
- Filter out those matches with the highest order
Looks simple but there is one critical thing hard with this. The word generate whispers combinatorial explosion which screams garbage collection which pretty much means performance and memory problems. In doing this in the best OS, this will block the UI which is bad like in JavaScript. There are probably many ways to address this but the way I go about it is with lazy streams and transducers which tries to emulate async await. Or I can just let the UI block for minutes while I twiddle my thumb and be done with it but that wouldn't be fun to explore would it?
Truly, this is article is more of an exploration of doing asynchronous computation in Emacs that is inspired by solving the best flex matching candidates. As another shameless plugs, I made the aptly named libraries stream.el, transducer.el and promise.el to support this endeavor. We will explore the role of each one and how it will achieve our goals.
>> Streams
Lists. The basic form of collection. As the collection grows, mapping over it takes longer which is natural. Dealing with a permutation of a long text, the number of candidates can grow to millions and if a mapping function takes time then it is compounded. What we want is to map over a huge list without blocking, what we are really asking is to defer the computation or be lazy. How about a snippet?
;; Use your imagination
(setq xs (list "a" "list" "of" "..." "a " "million " "items") ; Millions of items
mapper (lambda (x) (list "a" "long" "operation" "on" x)) ; A long operation
)
(princ (mapcar mapper xs)) ; Get a cup of coffee
With stream.el
, we can do it in a deferred fashion.
(require 'stream)
(setq xs-stream (stream-from-list (list 1 2 3 4 5)))
;; Start the stream
(princ (funcall xs-stream)) ;;> steam-start
;; Streams as just a function invokation
(princ (funcall xs-stream)) ;;> 1
;; Invoking the stream function yields the next value
(princ (funcall xs-stream)) ;;> 2
(princ (funcall xs-stream)) ;;> 3
(princ (funcall xs-stream)) ;;> 4
(princ (funcall xs-stream)) ;;> 5
;; End of the stream
(princ (funcall xs-stream)) ;;> stream-stop
;; No more values
(princ (funcall xs-stream)) ;;> stream-stop
It is a little more contrived but we can say when we want the next value with a function call. With that in mind, let's map over the stream.
(setq xs (list 1 2 3 4 5)
mapper #'1+
xs-stream (stream-from-list xs)
ys-stream
(stream
(lambda (&rest _)
(let ((x (funcall xs-stream)))
;; Common stream idiom
;; Ignore start-value
(when (stream-start-value-p x)
(setq x (funcall xs-stream)))
;; Handle stop-value
(if (stream-stop-value-p x)
stream-stop
;; Actually map over it
(funcall mapper x))))))
(funcall ys) ;;> stream-start
(funcall ys) ;;> 2
(funcall ys) ;;> 3
(funcall ys) ;;> 4
(funcall ys) ;;> 5
(funcall ys) ;;> 6
(funcall ys) ;;> stream-stop
Even if xs
is a small or huge list, we can control when the value is
computed and thus avoid blocking although the code is more verbose.
Let's apply this to our problem of generating the flex matching
candidates. Our problem of finding all flex match substrings is roughly
equivalent to printing out all binary strings of a certain length. With
the help of inductive reasoning and recursion, we have the following
implementation.
(require 'dash)
(require 'transducer)
(defun fbo/text-forward-permutations (text)
(letrec ((recurser
(lambda (sub-text)
(cond
((string-empty-p sub-text) ;; Base cases
(list ""))
((= (length sub-text) 1)
(list sub-text ""))
((= (length sub-text) 2) ;; Optimized non-trivial case
(list
""
(substring-no-properties sub-text 0 1)
(substring-no-properties sub-text 1 2)
sub-text))
(t ;; Inductive case
(lexical-let* ((first-text
(substring-no-properties sub-text 0 1))
(rest-text
(substring-no-properties sub-text 1)))
(lexical-let* ((rest-permutations (funcall recurser rest-text)))
(append
(transducer-transduce
(transducer-map (-partial #'concat first-text))
(transducer-list-reducer)
rest-permutations)
rest-permutations))))))))
(funcall recurser text)))
This is the original implementation with transducers
which in this
case is an equivalent of mapcar
. Take your time to grok it or use it
if you can. Now if you do, try running (fbo/text-forward-permutations "emacs-lisp-mode")
and tell me how long it took to complete it? If it
ran without blocking a bit, then I envy your hardware specs. Now let us
see convert the blocking lists to lazy streams.
(require 'transducer)
(defun fb/text-forward-permutations (text)
(letrec ((recurser
(lambda (sub-text)
(cond
((string-empty-p sub-text)
(stream-from-list (list "")))
((= (length sub-text) 1)
(stream-from-list (list sub-text "")))
((= (length sub-text) 2)
(stream-from-list
(list
""
(substring-no-properties sub-text 0 1)
(substring-no-properties sub-text 1 2)
sub-text)))
(t
(lexical-let* ((first-text
(substring-no-properties sub-text 0 1))
(rest-text
(substring-no-properties sub-text 1)))
(lexical-let* ((rest-permutations (funcall recurser rest-text))
(repeat-permutations (stream-copy 'empty rest-permutations))
(base-stream (car repeat-permutations))
(next-stream (cdr repeat-permutations)))
(stream-append
(transducer-transduce-stream
(transducer-map (-partial #'concat first-text))
base-stream)
next-stream))))))))
(funcall recurser text)))
Did you notice the difference? At the surface, it just basically wrapped
the list return as streams. At a deeper look, there is an additional
stream-copy
function and the destructuring of it with base-stream
and next-stream
. This is the fundamental difference between lists and
streams: lists can be consumed repeatedly while streams are not. In
the stream
example above, once you reach stream-stop
you cannot go
back to the beginning which implies that a stream can only be consumed
once. So if you want to reuse the stream with their values, one has to
copy it with stream-copy
. The other thing is how it is consumed; lists
are data structures while streams are function closures. Despite these
two fundamental differences, it was easy to switch from a list to a
stream implementation thanks to the abstraction of transducers
.
So what we have now is the first part of the algorithm, now we can move on to figuring out how to get the candidates or matches given a substring.
>> Transducers
So we have lazy streams but that doesn't stop it from consuming a list
of million items and taking its sweet time mapping over it. Let's
checkout how to get the candidates of a flex match with smex
while
suspending your disbelief for transducers
for now.
(require 's)
(require 'dash)
(require 'stream)
(require 'transducer)
(defun fb/function-symbol-stream ()
(transducer-transduce-stream
(transducer-map #'car)
(stream-from-list smex-cache)))
(defun fb/flex-match-text (text)
(concat
(regexp-quote (string (aref text 0)))
(mapconcat
(lambda (c)
(concat "[^" (string c) "]*"
(regexp-quote (string c))))
(substring text 1) "")))
The main idea is to convert the smex-cache
into a list of function
names with fb/function-symbol-stream
which is our source of truth. Of
course, it is stream that emits symbols, so we massaged or mapped it a
bit from the smex-cache
cons list.
Now how do we filter which is our candidates? We convert the search text
into a regex with fb/flex-match-text
and filter the source stream with
it, which I just plundered from ido
. To get a grasp of this, let's use
elmo
as a substring. Using (fb/flex-match-text "elmo")
, it would
output e[^l]*l[^m]*m[^o]*o
, looks weird but it does its job. Testing
it out with (s-match "e[^l]*l[^m]*m[^o]*o" "emacs-lisp-mode")
outputs
("emacs-lisp-mo")
, so it works, hopefully.
So we now have a filterer, all we need is a stream-filter
. It would
just take a few lines of code to implement. If in the future we want to
switch from a stream
to a list
because streams is a bit more
complicated, we will have no reuse and have to replace every instance of
stream-filter
and stream-map
to -filter
and -map
. So how do we
abstract the operation over the collection?
transducers
. If you been using map
, filter
, or reduce
in your
code, then transducers
allows the trio of operations on most
collection data types such as vector
, sequence
and more. This is
admittedly another abstraction over the collection that might not be
warranted if you noticed the verbosity of the code but I assure you
there is another benefit we can reap from it.
Well, I am not the authority to talk about it and I just took it from
Rich Hickey's explanation of Closure transducer and I might as well have
used an Emacs to Clojure library instead of creating my own. As a
budding lisper, I found the challenge of implementing both stream.el
and transducer.el
as a way to improve my elisp coding and
understanding transducers at an implementation level. Transducers are
great and a nice thing to study and use as well as the inspiration from
it. Rolling with transducer.el
, let's see how it is used.
I urge you to read transducers from Clojure before continuing since I might not do it justice. What I can show you is a comparison of the standard and transducing way.
(require 'dash)
(require 'transducer)
;; Mapping
(setq xs (list 1 2 3 4 5)
mapper #'number-to-string)
;; Standard mapping
(princ (-map mapper xs)) ;;> 2 3 4 5 6
;; Transducer mapping
(princ
(transducer-transduce
(transducer-map mapper)
(transducer-list-reducer)
xs))
;; Filtering
(setq ys (list 1 2 3 4 5)
filterer #'oddp)
;; Standard filtering
(princ (-filter filterer ys)) ;;> 1 3 5
;; Transducer filtering
(princ
(transducer-transduce
(transducer-filter filterer)
(transducer-list-reducer)
ys))
;; Composition
(setq zs (list 1 2 3 4 5)
mapper #'1+
filterer #'evenp)
;; Standard composition
;; We can't use `-compose'
(princ
(-filter
filterer
(-map
mapper
xs))) ;;> 2 4 6
;; Transducer composition
(princ
(transducer-transduce
(transducer-composes
(transducer-map mapper)
(transducer-filter filterer))
(transducer-list-reducer)
xs))
So with just plain mapping and filtering, the standard way seem better too. The time it shines when there is composition of operation, the form is much more readable and better. This works for streams as promised with a slight variation.
(setq xs (list 1 2 3 4 5)
xs-stream (stream-from-list xs)
mapper #'1+
filterer #'evenp)
;; Transducer composition with a stream
(princ
(stream-to-list
(transducer-trgansduce-stream
(transducer-composes
(transducer-map mapper)
(transducer-filter filterer))
(stream-from-list xs))))
As you can see it looks similar with streams and thus everything can be carried over. Cool. But the most striking of all benefits of a transducer is that it is lazy or more composable. As a lead, how about filtering a collection of numbers with two predicates.
(setq less-than-ten-p (lambda (n) (< n 10))
more-than-five-p (lambda (n) (> n 5))
xs (number-sequence 0 10))
(princ ;; Remember me?
(-filter
less-than-ten-p
(-filter
more-than-five-p
xs))) ;;> 6 7 8 9
What this does is filter the whole collection and then filter again the smaller collection. Of course, you can optimize this by composing the predicates so that it would only filter once. However, the composition of transducers and the laziness of streams, this is already inherent about it. No need to optimize how the operations interact, it will be optimized most of the time. This performance benefit alone is why I am advocating transducers so that majority of the symbol stream can be skipped and avoid heavily computing the candidates if possible. I think RxJS has a better explanation of the optimization the API does when composing mappings and predicates.
So with that, here is the long awaited candidate rater.
(defun fb/rate-flex-match (search target)
(transducer-transduce-stream
(transducer-composes
(transducer-map-indexed #'cons)
(transducer-first
(lambda (pair)
(string-equal (cdr pair) target))))
(fb/flex-match-symbol-stream
search
(fb/function-symbol-stream))))
What this returns is a single valued stream of a cons pair where the car
is the index of the target function in the sorted candidate list
generated by smex
. The index will be our rating: if the target appears
first on the list, the rating would be 0. What we want is a rating of 0
but you can be a bit more loose if you don't mind scrolling a bit to
select the command. Since the return value is a stream, you have to
unwrap with preferably stream-to-list
and get the first value, just
take note.
So let's take all this to one proof.
>> Checkpoint
With the generator and the rater, we can now make a working code.
;; Just a simple check if the target function exists.
(defun fb/function-symbol-p (name)
(not
(null ;; Since the result is a list, we have to unwrap it
(stream-to-list
(transducer-transduce-stream
(transducer-first (-partial #'string-equal name))
(fb/function-symbol-stream))))))
(defun fb/rate-flex-matcher (search-size target)
;; If the function does not exists, return a default stream value
(if (not (fb/function-symbol-p target))
(stream-stopped)
(lexical-let*
((rater (-rpartial #'fb/rate-flex-match target))
(rate-stream
(transducer-transduce-stream
(transducer-composes
(transducer-filter
(lambda (search)
(and
(not (string-empty-p search))
(<= (length search) search-size))))
;; My own filters
;; I prefer the first letter of the candidates be the same as the target
(transducer-filter
(lambda (search)
(string-equal
(substring-no-properties search 0 1)
(substring-no-properties target 0 1))))
;; No separators please
(transducer-filter
(lambda (search)
(not
(s-contains-p
"-"
search))))
;; End of my own filters
(transducer-map
(lambda (search)
(cons
search
(stream-to-list (funcall rater search)))))
(transducer-filter
(lambda (pair)
(and
(not (null (cdr pair))) ;; If a candidate was found
(= 0 (car (car (cdr pair))))))) ;; Iff the rating 0
(transducer-map
(lambda (pair)
(car pair))))
(fb/text-forward-permutations target))))
rate-stream)))
A little mouthful here but the outline is still the same: given a target
function name, generate all possible substrings, rate them, filter by
the highest rating and then tell me. The other thing here is that there
is search-size
which limits the number of substrings to process.
Ideally, you want to type as little as possible so you should set it
near half the length of the target text but open for configuration.
There are other optimizations on my part. One, I would like the candidate and
target to have the first same letter as a hint. For example, with
emacs-lisp-mode
, I want my candidates to begin with e
so I can
easily remember it as being natural. Two, I would like no command
separator in the candidate because I don't type them. You can remove
both if you like.
Regardless of the logic, let's try it with the long awaited
emacs-lisp-mode
(require 'cl-lib)
(cl-prettyprint (stream-to-list (fb/rate-flex-matcher 3 "emacs-lisp-mode")))
("ema" ; Take your pick for =emacs-lisp-mode=
"emc"
"emp"
"eac"
"eas"
"eas"
"eam"
"ecs"
"ecs"
"ecm"
"ecd"
"esi"
"esp"
"esp"
"epm"
"epd")
;; Can we go lower?
(cl-prettyprint (stream-to-list (fb/rate-flex-matcher 2 "emacs-lisp-mode")))
nil ; No dice
Cool, it works. And at this point, you can try this out and walk out. If
you did try this out, you might experience several garbage collection if
you set garbage-collection-messages
to t
to see how many times. I
don't know about you but mine happened a lot but it didn't take too much
time but it is indicative of memory issues. Not a biggie but if it
garbage collects, it blocks the UI which is the whole point in the first
place.
However, the one thing we want is that it should not block the UI or be asynchronous... sort of. Time for the bonus round.
>> Promises
This is the last piece to build delayed and buffered streams and this might get hairy as I am running out of documentation. The only assumption I have for you is that you know how to use promises. Read about if you don't or we can just jump in.
Again if we have a list of a million items, it would still take time processing all of it. What if we can buffer it by time? Given a time period and stream, accumulate values into a list until the period is done. This can be implemented like so.
(defun fb/stream-time-buffered (buffer-time stream)
(stream
(lambda (&rest args)
(lexical-let ((initial-value (apply stream args))
(done nil)
(now (current-time)))
(when (stream-start-value-p initial-value)
(setq initial-value (apply stream args)))
(if (stream-stop-value-p initial-value)
stream-stop
(promise
(lambda (res rej)
(lexical-let ((buffered-values (list initial-value))
(elapsed-time (float-time (time-subtract (current-time) now)))
(current-value initial-value))
(while (and (not done)
(< elapsed-time buffer-time))
(if (stream-stop-value-p current-value)
(setq done t)
(push current-value buffered-values)
(setq current-value (apply stream args))
(setq elapsed-time (float-time (time-subtract (current-time) now)))))
(funcall res (reverse buffered-values))))))))))
Much in the implementation of a stream but the idea is in
buffered-values
and elapsed-time
and probably promises
. Let's see
this with a huge list.
(setq range (number-sequence 1 100000)
stream (stream-from-list range)
period (/ 30.0) ; 30fps
buffered-stream (fb/stream-time-buffered period stream))
;; Your results might vary
(princ (funcall buffered-stream)) ;;> stream-started
(princ (funcall buffered-stream)) ;;> 1 ... 4928
(princ (funcall buffered-stream)) ;;> 4929 ... 9753
Ideally, the combination of this and run-with-idle-timer
prevents
blocking the UI. And here comes promise.el
or it can be replaced with
the mature deferred elisp library but the idea here is that it returns a
promise like value when completed returns the number of values
accumulated. Generalizing this.
(defun fb/async-write-stream (file stream)
(with-temp-file file
(erase-buffer))
(letrec ((async-stream
(fb/stream-time-buffered
fb/frame-rate
stream))
(recurser
(lambda ()
(let ((value (funcall async-stream)))
(when (stream-start-value-p value)
(setq value (funcall async-stream)))
(if (stream-stop-value-p value)
(message "Done")
(promise-then
value
(lambda (values)
(when values
(append-to-file
(concat
(string-join
values
"\n")
"\n")
nil
file))
(run-with-idle-timer
fb/frame-rate
nil
recurser))))))))
(funcall recurser)))
What this does is write the buffered values to a file of a stream. It is a bit rudimentary since you can't stop it once you start it but this is the main attempt to buffer and asynchronously write values... sort of. You don't need to understand but the idea here is with the promise stream, you get one value, wait for it to fulfill, write the fulfilled values, wait for some delay to avoid blocking the UI, get another promise value and repeat until it is consumed. But let me try it for you.
(fb/async-write-stream
"~/result"
(fb/rate-flex-matcher 4 "emacs-lisp-mode"))
;; After you see the "Done" message
(insert-file-contents "~/result")
;; Output here
;; emac
;; emas
;; emas
;; emai
;; emai
;; emas
;; emap
;; emam
;; emao
;; emad
;; emad
;; emcs
;; emcs
;; emci
;; emci
;; emcs
;; emcp
;; emco
;; emco
;; emc
;; emcd
;; emsl
;; emsl
;; emsi
;; emsp
;; emsp
;; emlp
;; emlp
;; emis
;; emio
;; emio
;; emid
;; empm
;; empm
;; empo
;; emp
;; empd
;; eacs
;; eacs
;; eaci
;; eaci
;; eacs
;; eacp
;; eaco
;; eaco
;; eac
;; eacd
;; easl
;; easl
;; easi
;; easp
;; easp
;; easm
;; easo
;; eas
;; easd
;; eals
;; eals
;; ealm
;; eais
;; eais
;; eaim
;; easp
;; easp
;; easm
;; easo
;; eas
;; easd
;; ease
;; eapd
;; eapd
;; eamo
;; eam
;; eamd
;; eame
;; eaod
;; ecsl
;; ecsl
;; ecss
;; ecss
;; ecsp
;; ecsm
;; ecso
;; ecsd
;; ecsd
;; ecls
;; ecls
;; eclm
;; ecis
;; ecis
;; ecim
;; ecsp
;; ecsp
;; ecsm
;; ecso
;; ecs
;; ecse
;; ecse
;; ecpm
;; ecmo
;; ecmo
;; ecm
;; ecmd
;; ecme
;; ecod
;; ecd
;; esli
;; esli
;; esis
;; esis
;; esip
;; esim
;; esio
;; esi
;; esie
;; esie
;; espm
;; espm
;; espo
;; esp
;; espd
;; elis
;; elis
;; elid
;; elpm
;; elpm
;; eisp
;; eisp
;; eipm
;; espm
;; espm
;; espo
;; esp
;; espd
;; epmo
;; epmo
;; epm
;; epmd
;; epme
;; epod
;; epd
It sort of works if you try it, the only thing I can add is to change the promise delay by higher idle time as well as account for garbage collection time. However, the idea of non-blocking Emacs might not be far off but it does take some effort to do so. I do want to stress for this final leg, that promises or deferred objects are used in conjunction with streams so that computation time is not noticeable to appear blocking. Again, the garbage collection looms if you have it turned on and that the responsiveness takes a dip from time to time.
>> Notes
So we achieved our goal of getting the best candidates for flex matched commands and then toyed around with streams and transducers and a little bit of promises with inspirations from Clojure and Javascript. I feel the latter was more fun to experiment on further but I have written too much now for just a little problem in efficiency. Much ado about nothing.