SpamAssassin

From The Network People, Inc. - Wiki
Jump to navigation Jump to search

SpamAssassin web site.

SpamAssassin is installed by running:

toaster_setup.pl -s filter

If you opted not to install it, you can just update your toaster-watcher.conf file and install it at any later time by running:

toaster_setup.pl -s spamassassin


SpamAssasssin SQL integration

In order to allow virtual users (as in Mail Toaster), to have custom SpamAssassin preferences, you must set up SQL integration and store their user prefs in MySQL. See details on the SA per-user preferences page.

As of v3.02, you can also configure SpamAssassin to store its auto-whitelisting, and bayes databases in SQL. Configure the appropriate options in toaster-watcher.conf (install_spamassassin_sql, install_spamassassin_dbuser, and install_spamassassin_dbpass), and run the SpamAssassin setup command again. It will generate a custom sql.cf for SpamAssassin with the db as required and explained in the SpamAssassin docs.

If you are converting to bayes SQL, you probably want to preserve your existing bayes DB. You can do this by exporting to a backup and then importing that into your SQL tables. You do that by:

sa-learn --sync
sa-learn --backup > backupfile

Then update your SpamAssassin config file to use the SQL tables and run this:

sa-learn --restore backupfile
sa-learn --sync

How do I keep the SpamAssassin up to date?

On FreeBSD, you update your ports tree and then use portupgrade to upgrade SpamAssassin and all the ports it depends upon:

toaster_setup.pl -s ports
/usr/local/etc/rc.d/sa-spamd.sh stop
portupgrade -frR p5-Mail-SpamAssassin-X.XX
qmail cdb
/usr/local/etc/rc.d/sa-spamd.sh start

The X.XX must be replaced by the version of SpamAssassin you have installed. Use "pkg_info | grep spam" to determine this.


Why is SpamAssassin so slow?

First, you need to understand what a "normal" processing time is. On my server, I do DCC, Pyzor, Razor2, and the rest of the "normal" checks. If I have RBL checking on, then processing a message takes a couple seconds. If I disable RBL checks, messages processing drops to less than a second. This is on a pretty old server (Dual PIII 650) with 1GB of RAM.

If your processing time is extraordinarily different, there could be one of several reasons:

DNS timeouts

By default, SpamAssassin does RBL lookups against several blacklists. Make sure your DNS works well. If you are doing RBL checks via SMTP, then you may want to disable the RBL lookups in SpamAssassin. You can do that by adding "skip_rbl_checks 1" to /usr/local/etc/mail/spamassassin/local.cf.

If a RBL that SA is trying to contact is down, it may also introduce an extended timeout. Disabling the RBL checks is a good way to diagnose this. I run all the RBL checks at the SMTP level so I can deny the messages and thus avoid having to process them. However, I find it useful to have SpamAssassin doing the lookups as well. I want the RBLs it uses to count for scoring, and since I've already done a lookup at the smtp level, I'll have the result cached locally.

Firewall blocking

SpamAssassin can use DCC, Razor2, and Pyzor. Those modules require contacting network servers. If your firewall rules are blocking those connections, this will introduce a timeout that hangs SA for 5 seconds per check. So, if your message processing time takes 11 seconds and it should be taking about 1, then you are likely blocking a couple checks. I added the following rules to my IPFW firewall:

# Allow DCC (6277) & Pyzor (24441)
${fwcmd} add allow udp from ${oip} to any 6277,24441
${fwcmd} add allow udp from any 6277,24441 to ${oip} 1024-65535

Inadequate CPU

If I were going to recommend hardware, I'd steer anyone towards dual processor systems. I can't tell you why, but a dual proc box scales much more smoothly than single processor systems. I've got tons of experience using single and duals ranging from 300MHz to 3.5GHz and dual proc systems always handle heavy loads more gracefully. Seriously. I would choose one dual proc system instead of two comparable singles. Check out this article on the SpamAssassin site.

Slow Disks

The mail scanning process generates a lot of disk I/O. Simscan reads in the message, writes it to disk, busts it up into pieces (for attachment scanning), virus scans it all, then passes it through spamassassin, before finally returning it to simcan which hands it off to qmail-queue. It does all that during the SMTP conversation, before returning a success or failure indication. That's why you'll see the folks at SpamAssassin recommending a dedicated disk.