MBOX Corpus Training
Some people have maildir, some people have mbox. If you have an mbox corpus, here's what you can do to train with it.
cat ham|formail -s dspam --source=corpus --class=innocent
I would like to see dspam be smart enough to handle an mbox stream. It's not like mbox is a very complicated file type. And it would run a lot faster if it handled it natively. This is an area where Spamassassin beats dspam.
Formail breaks up the input stream into individual messages and runs dspam for each message. Learning speed is much faster if you use the daemon/client configuration.
Converting MBOX to maildir
Note: An easier way to convert mbox->maildir is to use the mb2md package (downloadable at http://batleth.sapienti-sat.org/projects/mb2md/ and also included with many distributions).
--FrankLuithle
I used mutt to convert the mboxes to maildir format. First, add
set mbox_type=maildir
to your .muttrc or .mutt/muttrc. Then open your mbox file
mutt -f ./my_mbox_ham
tag all mails (T . (dot) <Enter>), and copy them to a new folder (;C ./ham_maildir).
Then ./ham_maildir/new/ contains the mails as individual files.
After populating both ham and spam maildirs, you can call
dspam_train user spam_maildir/new ham_maildir/new
Enjoy!
