Spam Learning Issues

Started by dcc24, February 20, 2008, 10:22:19 AM

Previous topic - Next topic

dcc24

Hello Everyone (especially Matt).  Thanks again for your awesome toaster.  I've never seen anything that comes close to it!!! I have had my toaster up and running on FreeBSD 6.3 for a few weeks now and noticed a couple of things in particular about the Spam Learning that I was hoping you could comment on. 

First - I've got toaster-watcher.conf set to learn SPAM and HAM every day.  When it executives daily the output below is typical from the cron job.

Quotefound 357 mailboxes.
Learned tokens from 0 message(s) (1 message(s) examined) Learned tokens from 129 message(s) (841 message(s) examined)

The first problem is I don't have 357 mailboxes.  I turned debug on in the toaster and found this in my cron output:

Quotelogfile_append: opened /var/log/mail/learn.log for writing..........ok
    wrote 1 lines...................................................ok
learn_mailboxes: checks passed, getting ready to clean
Parsing through the file /var/qmail/users/assign...done.

get_maildir_paths: found 4 domains.
get_maildir_paths: processing domain1.com mailboxes.
get_maildir_paths: processing domain2.com mailboxes.
get_maildir_paths: processing domain2.com mailboxes.
get_maildir_paths: processing domain2.com mailboxes.
found 357 mailboxes.

The issue there is that get_maildir_paths returns domain2.com three times.  And summarily it executes SPAM/HAM learning three times on that domain.  It is doing that because I have two other "alias domains" (created with ~vpopmail/bin/vaddaliasdomain) pointed to domain2.com    my @all_domains = $qmail->get_domains_from_assign(
        assign => "$qmaildir/users/assign",
        debug  => $debug,
        fatal  => $fatal,
    );
The call to get domains from qmail/users/assign returns all the domains, including the domain aliases.  Matt for your next update could you add another function to return domains without aliases (domains pointing to the same home directory) and use that to get_maildir_paths.  That would be awesome.

The next issue is that my toaster-ham-learn-me file is always empty... this is always written out
Quotefile_write: wrote 0 lines to /tmp/toaster-ham-learn-me..............ok

...
...
syscmd: running /usr/bin/nice /usr/local/bin/sa-learn --ham  -f /tmp/toaster-ham-learn-me
Learned tokens from 0 message(s) (1 message(s) examined)
   child exited with value 0

Here are my settings:

#######################################
#    SpamAssassin Message Learning    #
#######################################

maildir_learn_interval             = 1       # how many days between spam learning runs
maildir_learn_Spam                 = 1       # feed spam through sa-learn (SpamAssassin)
maildir_learn_Read                 = 1       # feed ham through sa-learn (SpamAssassin)
maildir_learn_Read_days            = 3       # only learn from messages older than x days


Here is the code in Toaster.pm at line 581:

573     my $interval = $conf->{'maildir_learn_interval'} || 7;
    574     $interval = $interval + 2;
    575
    576     my $days = $conf->{'maildir_learn_Read_days'};
    577     if ($days) {
    578         print "learn_ham: learning read messages older than $days days.\n"
    579           if $debug;
    580         @files =
    581           `$find $path/Maildir/cur -type f -mtime +$days -mtime -$interval;`;
    582         chomp @files;
    583         $utility->file_write( append=>1, file => $list, lines => \@files, debug=>$debug );


When I run the following command manually based on my settings above on any maildir it always returns empty. 
find ~vpopmail/domains/domain2.com/user1/Maildir/cur -type f -mtime +3 -mtime -3
In english this would translate to find all files at least 3 days old but not older than 3 days which as expected returns nothing.  If I increase my $interval the command works:

find cur -type f -mtime +3 -mtime -5
This returns files whose dates are at least three days old but not older than 5.  Anyway, I'm not entirely sure if I've hit the mark, but instead of adding 2 to $interval in the code, could you not just add $days to $interval for your second argument.

Thanks again,