cabal-dev is probably one of the most useful things for serious Haskell work. It essentially lets you specify where your packages get installed; by default, they get installed to ~/.cabal (you are cabal installing as a user, right?) which means that if package A and package B have incompatible dependencies, you’re straight out of luck. With cabal-dev, on the other hand, both package A and package B get cabal-dev directories that never have to interact with each other. While this means that there’s some duplicate compiling going on, it means that they never have to conflict again.

So far, so good. But what if I’m working on multiple packages with the same general set of dependencies? For example, while writing milagos I wanted to update yesod-paginator. But I didn’t want to have to install yesod and all its dependencies again; not only would it take a while, but I might not get the same version as the one milagos uses.

Fortunately, cabal-dev has a -s option that lets you specify the location of the sandbox, which is where it stores its packages. So I just made a ~/.sandboxes/yesod sandbox and created an alias that would run cabal-dev -s ~/.sandboxes/yesod. The problem with that is that, of course, anything that calls cabal-dev on my behalf will get the wrong sandbox.

This is why I wrote sandboxer. It’s a small shell script that you source in your shell init script, and it provides you with commands for initializing, activating, and deactivating sandboxes. Full instructions are in the README in the repository, but the gist is that you use sandboxer init to create sandboxes, sandboxer activate to switch the active sandbox, and sandboxer deactivate to return to normal. Note that by default, explicitly setting the sandbox with -s is overridden while there’s an active sandbox; I made this design decision due to the fact that haskell-mode explicitly sets the sandbox so that it works even in subdirectories.

inotify is a great tool for monitoring changes to a directory. You specify the types of events you watch and read a file descriptor, which blocks until something interesting happens. And there are Haskell bindings in the form of hinotify, which replaces this low-level approach with one where you specify a callback for the various event types. In Haskell terms, hinotify has an initINotify :: IO INotify that gives you a value that refers to a specific notifier, and addWatch :: INotify -> [EventVariety] -> FilePath -> (Event -> IO ()) -> IO WatchDescriptor that lets you specify what types of events you want, what path to watch them on, and what to do when you get an event.

It works great, but there’s only one problem: it doesn’t nest. If I have foo/bar/baz, and I put a watch on foo, it doesn’t trigger when I touch baz. I assume this is a performance vs. power tradeoff; I’m not a Linux kernel developer by any stretch so I won’t say whether that’s the right or wrong decision to make here. But it does mean that, for example, if I want to be notified in Milagos when the user edits a post so I can reload it, I have to do interesting things. In particular, I can’t just watch each subdirectory of posts/, because that won’t handle new ones properly. So here’s what I wound up doing. First, let’s build the low-level watcher that reloads the post databse when a directory changes:

changeEvents = [Create, Delete, DeleteSelf, Modify, MoveIn, MoveOut, MoveSelf]
watchPost :: INotify -> FilePath -> IO ()
watchPost i dir =
  void $ addWatch i changeEvents ("posts" </> dir) (reload dir)
reload :: FilePath -> Event -> IO
reload dir ev = do
  putStrLn $ "Caught an event " ++ show ev ++ ", in " ++ dir ++ ", reloading"
  runPool dbconf reloadDB pool

So we have a list of events that we care about, and a watchPost that takes a post’s directory name and registers a watcher for it; when the watcher triggers, it prints out the event and reloads the database. (The dbconf and pool stuff is specific to Yesod, and not important). The void :: (Functor f) => f a -> f () at the beginning is because we’ll pass this to another watcher, and the callback in addWatch must have type Event -> IO (), but addWatch when fully applied gives back an IO WatchDescriptor.

Now that we’ve gotten that out of the way, we can actually get down to the process of setting up watchers. First, grab an INotify:

  inotify <- initINotify

Next, watch posts itself so we can set up new watchers:

  void . addWatch inotify [Create] "posts" $
    \ev -> when (isDirectory ev) $ watchPost inotify (filePath ev) >> runPool dbconf reloadDB pool

Here we only watch creation events; inotify kills off watchers for deleted files. If the created object is a directory watch it (EDIT: and reload the database in case files were created while we weren’t watching it, thanks lpsmith!), else do nothing.

Finally, we need to actually perform that on each existing directory:

  postDirectories <- liftIO . getDirectoryContents $ "posts"
  mapM_ (watchPost inotify) $ filter (`notElem` [".", ".."]) postDirectories

We get the list of post directories, filter out . and .., and set up watches for each of them.

All together, this is the code for watchPosts:

watchPosts dbconf pool = do
  inotify <- initINotify
  void . addWatch inotify [Create] "posts" $
    \ev -> when (isDirectory ev) $ watchPost inotify (filePath ev) >> runPool dbconf reloadDB pool
  postDirectories <- liftIO . getDirectoryContents $ "posts"
  mapM_ (watchPost inotify) $ filter (`notElem` [".", ".."]) postDirectories
    where
      changeEvents = [Create, Delete, DeleteSelf, Modify, MoveIn, MoveOut, MoveSelf]
      watchPost :: INotify -> FilePath -> IO ()
      watchPost i dir =
        void $ addWatch i changeEvents ("posts" </> dir) (reload dir)
      reload :: FilePath -> Event -> IO
      reload dir ev = do
        putStrLn $ "Caught an event " ++ show ev ++ ", in " ++ dir ++ ", reloading"
        runPool dbconf reloadDB pool

And people say Haskell can’t do systems programming!

Now, this obviously doesn’t work for arbitrarily deep files. You could certainly extend this technique of using watchers to create other watchers and using the fact that files that get deleted have their watchers die to trigger on updates to anything arbitrarily deep in the filesystem. But it’s a fairly trivial modification; just change whatever your version of watchPost is to also recursively watch its subdirectories.

Also, make sure that if you trigger something like recompilation on a change to your source tree, you don’t trigger on writes to any auxiliary files such as object files that are created in the same directory that you’re watching! You could also modify the recursive watch technique to kill the watcher, reload your thing from disk, and then restart it, but you’ll lose notifications that happened while your watcher wasn’t running. The failsafe approach is to ignore events that happen while your process is running, but then you have to track whether your process is running. There’s no good approach for this case, which is why I suspect it’s one that you should configure your tool to avoid.

One of my goals with Milagos has always been to get me back in the habit of regularly posting to this blog. That’s why I made the new-post page as minimal as possible: just a title input, a tag input, and an HTML editor. But then I realized: I really hate typing sustained posts in browsers. I don’t know why, but I always feel more comfortable laying my thoughts out in emacs.

One of the other design principles I used for Milagos is the idea that, if there’s only one user, you shouldn’t need any more credentials than the ones you use to connect to the server it runs on. That’s why there’s only a password field in the config file and no username; there’s only one user, and it’s the person who has write access to the Milagos directory.

So then I figured, why not use that same idea for posts? Hakyll uses the idea that you write posts in some subdirectory and then use the hakyll executable to publish static HTML onto your remote server. So I decided to take that idea of writing posts using a text editor and autogenerating everything from a directory and implement it! After about an hour of coding, Milagos now reads posts from the posts/ directory on startup, using a format similar to Octopress’s: the directory is of the form posts/2011-05-02-i-wrote-this-in-emacs/, and it contains a meta.yml that looks like

title: I Wrote This In Emacs
slug: i-wrote-this-in-emacs
posted: 2012-05-02 18:00:00 EDT
tags:
 - milagos

and a post.markdown that contains the Markdown content of this post. Once I’m done with a post, I just restart the milagos daemon (EDIT May 7: I have an inotify-based autoloader so I don’t even have to do that) and it just shows up!

Now, there’s another very obvious solution here: why didn’t I just use Hakyll in the first place? Well, first because that wouldn’t be fun. Second, because I wanted to kick the tires of Yesod, the web framework I’m building Milagos on top of. And third, because Hakyll can’t handle dynamic content. If I want to embed, say, my Twitter feed or a list of Github commits or something in the sidebar, I can’t do that. Milagos is a daemon, so it can do whatever it wants.

I would be lying if I said this didn’t have its downsides. Posts no longer have stable ID numbers, which basically forces me to get around to implementing proper slug URLs. I also had to learn a bunch of stuff like the Haskell FFI so I could write bindings to discount (which I released on Hackage) so that I didn’t have to link against the GPL-licensed pandoc. There’s also a bunch of edge cases I haven’t considered, like having two posts with the same date and the same slug; right now, this isn’t caught until someone tries to individually view one of those posts. I’d like to catch that at load time, preferably by making that violate a database constraint. But overall, I like my solution more than my old WordPress one; it’s easier to write posts, I tend to write more, and I actually understand how everything works.