Project Script Loader

>> Problem

An early Emacs hack and study is creating a project script loader that when I open a project such as Java or NodeJS, I want to load a custom script specific to that project. Since I am using projectile, it was a great learning experience on understanding symbols and lambdas.

The Emacs way of loading project specific configuration is through =.dir-locals.el= that has an unfortunately elaborate structure to fill out. Emacs is a Lisp interpreter, thus this mechanism enforces safety even in evaluating malicious code since it is possible in a multi-user setup that someone tamper any automatic eval mechanism to cause harm. Remember the saying, eval is evil.

A stronger and direct reason you want to use such mechanism is to avoid tangling your global configuration with a project specific configuration. Keep the business and pure logic separate goes the mantra. My use is with this blog, when I want to write I want to load the org-jekyll-blogger.el setup but not tangle the environment if I'm not.

Although it is the builtin and preferred mechanism, it is slightly frustrating to debug if your script is being run or that the variables are set. Not to mention it is harder to edit. Sometimes you just want to write the code and not worry about security, trust is overrated.

Since the project library projectile does not offer this simple yet security riddled functionality, it leaves the ecosystem to fill in that blank. Packages offering this mechanism already exist such as defproject but it is simple to write the code without relying on a third-party package.

For the impatient, here is the complete snippet; for the curious, let's discuss aspects of it.

>> Project Library

I am using projectile as project managing library. For this task, we need two functions from it:

projectile-project-p : This tells us if the current buffer is in a project.

projectile-project-root : This gives us the current project root the current buffer is in.

projectile-project-name : The optional third function, this prettifies the file path into a more debuggable name.

If you aren't using projectile, Emacs has a builtin project library vc that is tied closely to a VCS. An example of checking if the buffer is in a project.

(defun vcs-project-root ()
  (with-current-buffer (current-buffer)
    (lexical-let ((current-file (buffer-file-name)))
      (or (vc-git-root current-file)
         (vc-svn-root current-file)
         (vc-hg-root current-file)))))

This is rather primitive but it works if the project is under a version control with git, svn or hg. Thankfully, this assumption was not taken by projectile and it finds the root by looking for key files that signify a project such as .git, pom.xml or others. The builtin vc does not cut it; rather, a builtin function that finds the project root is locate-dominating-file. This function takes a file path and file name and recursively travels the parent to find the file name starting at file path. If we assume a project is in a directory that contains a .project.el file, here is the snippet for this:

(defun locate-project-root ()
  (locate-dominating-file (buffer-file-name) ".project.el"))

projectile also has their own copy of this function to avoid depending on files.el where the original comes from. If you have multiple key files aside from .project.el, it is better to create a more performant version of this since you do not want to traverse the disk several times.

The lesson in this is that this job is better left to a library.

>> Core

With this, we implement it quite easily:

(defun fn/load-project-file ()
  "Loads the `fn/project-file' for a project.
This is run once after the project is loaded signifying project setup."
  (interactive)
  (when (projectile-project-p) ;; Check if buffer is in a project
    (lexical-let* ((current-project-root (projectile-project-root))
        (project-init-file (expand-file-name ".project.el" current-project-root)))
      (when (file-exists-p project-init-file) ;; Check if project script exists
        (message "Loading project init file for %s" (projectile-project-name)) ;; Some extra logging
        (condition-case ex ;; Load it
            (load project-init-file t)
          ('error ;; Report the error
           (message "There was an error loading %s: %s" project-init-file (error-message-string ex))))))))

(add-hook 'find-file-hook #'fn/load-project-file)

During my early writing, there was a bug that loading the project file would trigger the find-file-hook endlessly. Thankfully, such subtle issue had existed but either way it is simple to write.

>> Symbols Or Lambdas

This quick implementation triggers the project configuration each time a file in the project is opened. What we want is each main project script be loaded once, not every time. This is true for the project locals but not for the main project script.

Thinking functionally, this is memoization of the main loader. We shiv a quick memoization function:

(defun fn/memoize (fn)
  (lexical-let ((fn fn)
      (cache-table (make-hash-table :test 'equal)))
    (lambda (&rest args) ;; Assuming `args' can be used with the hash function `equal'
      (lexical-let ((cached-value
           (gethash args cache-table)))
        (if cached-value
            cached-value
          (lexical-let ((computed-value (apply fn args)))
            (puthash args computed-value cache-table)
            computed-value))))))

(defun fib (n)
  (pcase n
    ((or 1 2) 1)
    (_ (+ (fib (- n 1))
          (fib (- n 2))))))

(lexical-let ((fib-memoized (fn/memoize #'fib)))
  (mapcar
   (lambda (n)
     (funcall fib-memoized n))
   (list 3 5 15 30 3 30)))

Not the best implementation but notice we had to use let, funcall and apply to use it instead just the usual function invocation. This syntactic mismatch or hoop is the difference with symbols and lambdas. The memoization function returns a lambda, if we wanted to use it as a function we need to use defun or fset.

(fset 'what-symbol-name (fn/memoize #'fib))

This raises the question what symbol name you should use? Generated or clobbered? Managing symbols is another task but if we just ignore this issue and plugin a lambda for a hook, we get this:

(add-hook
 'find-file-hook
 (lambda () ;; Written hastily, not representative
   (lexical-let ((wrapped-func ;; We are wrapping `fn/load-project-file' since it needs to take an argument
        (fn/memoize
         (lambda (project-file)
           (fn/load-project-file)))))
     (funcall wrapped-func (buffer-file-name))))

Looks ugly doesn't it, what I find uglier is what is written in find-file-hook:

;; (cl-prettyprint find-file-hook)
((lambda nil
   (progn
     (defvar --cl-wrapped-func--)
     (let ((--cl-wrapped-func-- (fn/memoize (function
                                             (lambda (project-file)
                                               (fn/load-project-file))))))
       (funcall (symbol-value '--cl-wrapped-func--)))))
 recentf-track-opened-file
 auto-insert
 whitespace-turn-on-if-enabled
 global-command-log-mode-check-buffers
 projectile-find-file-hook-function
 #[0 "\302\301!\210\303\304!8\211\207" [buffer-file-name auto-revert-tail-pos make-local-variable 7 file-attributes] 3]
 global-visual-line-mode-check-buffers
 auto-compile-on-save-mode-check-buffers
 url-handlers-set-buffer-mode
 global-font-lock-mode-check-buffers
 epa-file-find-file-hook
 vc-refresh-state
 fn/load-project-file
 fn/load-project-local-file
 which-func-ff-hook
 org-jekyll-blogger--find-file-hook)

Do you see the lambda standing out from the rest of the symbols? This happens because anonymous functions are represented as a closure object. Here lies the crossroad of being functional in Emacs: symbols overs lambdas.

To express this notion, let me craft a different form for memoization.

>>> Wrapped Symbol

As contrary as this is, it is better to write the wrapped function as another separate function using defun.

(defvar fn/loaded-projects (list))

(defun fn/wrapped-load-project-file ()
  ;; There is a bug here, can you figure it out?
  (lexical-let* ((project-root (projectile-project-root))
      (loaded-project (member project-root fn/loaded-projects)))
    (if loaded-project
        nil
      (fn/load-project-file)
      (add-to-list 'fn/loaded-projects project-root))))

(add-hook 'find-file-hook #'fn/wrapped-load-project-file)

So this version looks a little cleaner but exposes an extra internal variable fn/loaded-projects and an excess wrapper for fn/load-project-file. This is contrary in hiding state in the functional style.

Strangely, this is easier to debug and test. If you wanted to test the wrapped function, you set fn/loaded-projects to nil or a value and repeat the test; this is harder to do with a closure. If a bug is in fn/wrapped-load-project-file, you simply reevaluate the function without having to clean or replace the hook value; with a closure, you have a bugged and patched hook coexisting.

I intentionally left a bug in fn/wrapped-load-project-file to demonstrate this. Patch then eval the new version. I don't have to think about the hook management.

(defun fn/wrapped-load-project-file ()
  (when (projectile-project-p) ;; `projectile-project-root' needs a project first
    (lexical-let* ((project-root (projectile-project-root))
        (loaded-project (member project-root fn/loaded-projects)))
      (if loaded-project
          nil
        (fn/load-project-file)
        (add-to-list 'fn/loaded-projects project-root)))))

(setq fn/loaded-projects (list)) ;; If you want to reset its state

This correctly loads fn/load-project-file once, what does this tell us anyway?

>>> Symbols Over Function

Am I saying that when I want to memoize a function I need to create an extra variable and wrapper and expose state? Not really. You create a memoizing macro that used defun or fset to alleviate this as with _emacs-memoize_. In being simple, revealing state and using symbols seems to be the way to go.

I had struggled with this at first preferring closures, but it does feel cleaner and simple specially in the context of Emacs. Since everything is extensible, exposing and manipulating state, declaring and advicing private functions, hiding things in Emacs seem to counter the notion of customization.

The notion of encapsulation is not disregarded but rather not preferred. Since (almost) everything id found via describe-function or describe-variable, being open is really the way to go. If you find pain in writing more code for a repeating concept, if abstracting the state and logic is worth it in simplicity and extensibility.

I can't speak for Scheme or Clojure; for me, Emacs has changed my hand and mind in writing lisp.

>> Security

We now shiv a final feature that asks permission or trust in loading the project files. Let us create a secure wrapper for fn/load-project-file:

(defvar fn/loaded-projects (list)
  "Projects that have been loaded by `fn/load-project-file'.")

(defun fn/safe-load-project-file ()
  ;; Similar to `fn/wrapped-load-project-file' but ...
  (when (projectile-project-p)
    (lexical-let ((project-root (projectile-project-root))
        (project-name (projectile-project-name)))
      (when (not (member project-root fn/loaded-projects))
        (if (not (fn/safe-project-p project-root)) ;; ... asks permission first
            (message "Project script for %s is not trusted." project-name)
          (fn/load-project-file)

          (add-to-list 'fn/loaded-projects project-root))))))

What we are left with is implementing the symbol fn/safe-project-p. The question is indeed what scheme? A simple scheme is just to use yes-or-no-p:

(defun fn/safe-project-p (project-root)
  (yes-or-no-p
   (format "Do you trust the project at %s?" project-root)))

It is as simple as that. Or be more complex and check for last modified time, expiration period, user ownership and what not. For me, I simply ask permission as well but allow for a deeper setup if needed that I am not going to show. I do want to show how to get the last modified time:

(file-attribute-modification-time ;; file-attribute-* and its company
 (file-attributes
  user-emacs-directory))

Do explore the other functions such as file-attribute-user-id for getting file attributes.

>> Persistence

If you use this snippet, you might get annoyed when Emacs opens and needs permission when running the projects you allowed previously. What I am talking about is persistence that is a tricky subject in itself. Several tricks exists for this:

Customize Mechanics : But what if you don't use a custom-file?

p-cache : Too rich for my blood

Write the lisp object to file : A bit low level

A simpler builtin mechanism exist: savehist. It primarily works for lispy data and it does save it to a file in savehist-file. I love the ease of use just by adding the variable symbol to savehist-additional-variables. To demonstrate in modifying fn/safe-project-p:

(defun fn/checked-projects (list))

(defun fn/safe-project-p (project-root)
  (lexical-let ((checked-project
       (or
        (cdr (assocproject-root fn/checked-projects)) ;; Maybe 'trusted or 'untrusted
        'unchecked)))
    (pcase checked-project
      ('trusted t)
      ('untrusted nil)
      ('unchecked ;; If the project hasn't been trusted yet
       (lexical-let ((trusted
            (yes-or-no-p
             (format "Do you trust the project at %s?" project-root))))
         (add-to-list
          'fn/checked-projects
          (cons project-root (if trusted 'trusted 'untrusted))))))))

(with-eval-after-load 'savehist
  (add-to-list 'savehist-additional-variables 'fn/checked-projects))

The function fn/checked-projects is an association list of the project root string and a symbol of 'trusted or 'untrusted and what we want to persist. As I mentioned, all you have to do is add it to savehist-additional-variables and our preference is persisted without any fuss. Nothing much to say but if you want more details about this simple persistence SaveHist.

>> Conclusion

After all that, the code is still simple to hack without needing to rely on other packages. Getting work done is more important instead of being worried by setup and security but still valid concerns.