Introduction

The maildir MH project aims to build a command-line email interface, similar to MH-mail, which use qmail maildirs as folders.

Why would anyone do that? Simple! Maildirs are absolutely reliable as email folders: messages can't get lost or corrupted when properly stored in maildirs. Most folder formats, including mbox and MH, can become corrupted, and messages can be lost. Switching to maildir eliminates those problems.

MH

MH is a powerful mail handling package created originally by RAND Corporation. Currently, nmh (New MH) is maintained as an open source project.

The MH philosophy is to break mail processing down into individual, simple programs: one to list messages, one to show a message, one to compose a reply, etc. Since the programs are called from the user's shell, MH is really a powerful message-processing language. You can use loops, other programs, etc., to process mail messages in sophisticated ways.

For more information, you can read the classic "How to Process 200 Messages a Day and Still Get Some Real Work Done", available from this site.

Also, anybody who has a nifty MH trick can send it to me, and I'll post it here.

Maildir

Maildir is Dan Bernstein's super-reliable folder format. It is completely collision free, and messages cannot be lost or clobbered. Plus, there are no locks: so no deadlock and no annoying delays.

Maildirs offer what MH promises, but doesn't quite deliver: concurrent folder access. You can run two mailers at once, or a mailer and 10,000 little mail processing programs, without clobbering messages. Under MH, concurrent delivery can sometimes make your mailer forget which messages you were reading, and (worse) can sometimes lead to clobbered messages.

You can read the basic maildir spec at Dan's website. You might also want to check out safecat, which performs maildir delivery. Using safecat you can receive your mail into maildirs, no matter which mail transfer software is used at your host.

MDMH

Consider the possibilities: maildirs offer a fully-concurrent folder structure. That means you can be reading your email while another program is delivering messages into the same folder! With most mail readers, that's a terrible problem; you must decide whether to process things like message deletion, which blows away any new messages, or accept new messages, which blows away any deletions, etc., from the current session. In other words, you have no choice but to let your mail reader handle your folder completely.

With maildir, you can automatically sort incoming mail into folders; it doesn't matter if you're reading that folder at the moment. You can leave a mail reader running at work, go home, and dial in to finish reading your email. The next day you will not be faced with lost or corrupted messages because you had two sessions open.

But I said maildirs are fully concurrent. That means that any number of other processes can also access your mail folder, without causing collisions. You can have an archiver which indexes email from the boss. You can have a cleaner archive and delete old email. You can have any number of automatic email processing programs, plus any number of open mailreader sessions, all running at once, and all unable to step on each others toes. I mean, think about it!

Okay, so why a command-line interface, of all things? Well, two reasons. First, a command-line interface is actually quite handy, if you simply want to check a message, or dash off a quick reply. Especially over a slow dialup link. Second, a good command-line interface should be readily imbedded in other applications. For example, the first application of mdmh which I envision is an emacs mailreader which uses maildirs as folders.

Mailing List

Anybody interested in developing mdmh should join the mdmh mailing list. Just send an empty email to mdmh-subscribe@mail.core-tech-systems.com, and follow the directions in the reply.

Download the Prototype

Okay, "prototype" is a real misnomer. However, just to get the ball rolling, I've produced some perl scripts. Anyone who wants to add to them, send a patch, or write some C code...hint, hint...

The latest version of mdmh is 0.7, created on Saturday, April 21.

mdscan

Lists the messages in a maildir. For each message, it prints a message number, the sender, and the subject line.

New in version 0.7: mdscan now prints an 'N' flag next to files less than 10 minutes old, and an 'n' next to files less than three hours old. That way the cautious can avoid radical operations on new files. Also added a manpage, which among other things explains why that's a good thin.

mdshow

Shows a message from a maildir. It uses the same message numbers generated by mdscan to select messages. After showing a message, it marks the message as "seen".

mdrmm

Marks a message for deletion. This command only marks the message; you should then run mdscan to see that the correct message was marked for deletion.

mdundel

Unmarks messages marked with mdrmm. In the event you marked the wrong messages, use mdundel to "undelete" them before calling mdpurge.

mdpurge

Cleans out deleted messages. By default, mdpurge uses my fwipe program to do a secure deletion; this protects against the use of undelete utilities.

mdchk

Lists new messages in a maildir. As it lists messages, it moves them from "new" to "cur". Output looks just like mdscan. Because it moves files, it will only show messages you haven't been notified about in some way.

mdbiff

Similar to mdshow, but meant to run non-interactively. For example, mdbiff might play a sound when new mail is detected. It also moves messages from "new" to "cur", so mdbiff will only alert you to messages you haven't mdscanned, and mdscan will only show you messages that mdbiff didn't alert you to.

They work a little like the corresponding MH programs, but with no features. You can download the scripts (about 16K) and see what I mean. You also need to download and install Dan Bernstein's mess822 package for the scripts to work.

Both scripts work the same way. They list the contents of MAILDIR/new and MAILDIR/cur, and sort the files into dictionary order. Then mdscan prints a summary of each message. mdshow picks out the message specified by number and pipes it through less. There are two main reasons that this is not satisfactory:

  1. Every message in MAILDIR/cur precedes every message in MAILDIR/new. Normally that's just right, but suppose that messages are being moved from new to cur in random order. Two successive runs of mdscan will reshuffle the messages that were in new. Now imagine you called (the nonexistent) mdrmm! You might delete the wrong message.

  2. Directory scans are slow. Suppose that your maildir contains thousands of messages; mdscan would start with a long delay while it sorted the directory contents, and then would experience per-message delays while it peeked inside each one for date, sender and subject.

As I see it, the first question is: how do we assign an ID to a message? It should be short enough for humans to use, yet should point to a certain message unambiguously. Fiddling with messages should not change the meaning of the ID; ideally it should at most cause a "no such message" error.

I think the second question is: how do we index message metadata? We can store sender, subject, and possibly ID in some sort of file; is that the best way? Ideally, the worst that should happen when two processes access a maildir is that metadata might be out of date, requiring a reindexing operation. Even better would be if we can prevent that problem entirely.

 

Top


Len Budney
lbudney@pobox.com
Copyright © 1998 - 2004
Page generated: 20:02:02 21-Dec-2004