-------- Original Message -------- Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per token really, really slow or roughly normal? NYTProf results TxRep.pm 1720440 vs 1651114 From: RW <rwmailli...@googlemail.com> To: users@spamassassin.apache.org Date: Tue Nov 07 2017 03:44:50 GMT+0300 (AST)
> You can do that using multiple spamc processes with spamd, but it > doesn't use any less resources, and the default back-end has a > reader-writer lock. Training with sa-learn is usually not an issue compared > with > high-volume autotraining (which is naturally parallel). > > I'm using SA from the FreeBSD port on a low-end i5 from a few years > back with the default Berkeley db on a fairly slow disk and I get > 20 emails/s through sa-learn (without learning to journal). However, I > don't use TxRep. > > Have you tried commenting-out the TxRep loadplugin line, to see what > happens? > > In your original post it wasn't simply slow, if it had continued at > ~1600/hour, it would have completed in a few hours, but it ground to a > halt and apparently the Bayes database got corrupted. If TxRep is the > source of the problem then disabling it on your server while you debug > it separately would be sensible IMO. The bayes database issue isn't really the hangup, though I understand it is a little unexpected. They key hold up seems to be in Locker.pm - now Locker.pm - but with TxRep.pm 17200440 and not with TxRep.pm 1651114. You might suggest using 1651114 that is included in the FreeBSD distro of SA, but the problem is that it generates these weird error messages - lots and lots of them Use of uninitialized value $msgscore in addition (+) at /usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415. Use of uninitialized value $msgscore in subtraction (-) at /usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415. as reported in bug 7164 https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7164 Now with TxRep.pm 1720440, Locker.pm gets called 181799 times and consumes 94,525 seconds http://www.blackrosetech.com/insecure/nytprof-txrep-1720440/ With TxRep.pm 1651114 Locker.pm gets called 33 times and consumes >6 seconds http://www.blackrosetech.com/insecure/nytprof-txrep-1651114/ What's locker doing? It appears to my inexpert eye, that it is being called because a lock file can't be created in Mail-SpamAssassin-Locker-UnixSFSSafe.pm, lines 87 for (my $retries = 0; $retries < $max_retries; $retries++) { if ($retries > 0) { $self->jittery_one_second_sleep(); } called 90,888 times (at roughly a second per call - 94,524 seconds total. sub Mail::SpamAssassin::Locker::UnixNFSSafe::CORE:lstat which got called 188,049 times Ok, so much for amateur sleuthing through the timing stats... why would TxRep.pm hold up the lock state? Seems to be in line 1219 # spent 96340s (471ms+96340) within Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation which was called 468 times, avg 206s/call: # 234 times (212ms+96340s) by Mail::SpamAssassin::Plugin::TxRep::learn_message at line 1846, avg 412s/call # 234 times (259ms+-259ms) by Mail::SpamAssassin::Plugin::TxRep::forget_message at line 1861, avg 0s/call sub check_senders_reputation { and 1388 # spent 96330s (197ms+96330) within Mail::SpamAssassin::Plugin::TxRep::check_reputations which was called 3216 times, avg 30.0s/call: # 468 times (39.4ms+14778s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1376, avg 31.6s/call # 468 times (24.2ms+14662s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1284, avg 31.3s/call # 468 times (31.3ms+13799s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1360, avg 29.5s/call # 468 times (22.3ms+13734s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1358, avg 29.3s/call # 468 times (22.3ms+13711s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1351, avg 29.3s/call # 468 times (29.4ms+13623s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1348, avg 29.1s/call # 408 times (27.8ms+12024s) by Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1354, avg 29.5s/call sub check_reputations { Which seems to get hung at 3216 if ($self->open_storages()) { # spent 94571s making 3216 calls to Mail::SpamAssassin::Plugin::TxRep::open_storages, avg 29.4s/call which seems to get hung at 1654 $self->{checker} = $self->{default_storage} = $factory->new_checker($self->{main}); # spent 94562s making 3216 calls to Mail::SpamAssassin::DBBasedAddrList::new_checker, avg 29.4s/call Note this line is identical in 1651114 but was only called once: 1590 $self->{checker} = $self->{default_storage} = $factory->new_checker($self->{main}); # spent 4.18ms making 1 call to Mail::SpamAssassin::DBBasedAddrList::new_checker The diff is https://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/TxRep.pm?r1=1651114&r2=1720440 Here we can see the fix for the undefined messagescore +# { +# #Bug 7164, trying to find out reason for these: _WARN: Use of uninitialized value $msgscore in addition (+) at /usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415. +# no warnings; +# +# unless (defined $msgscore) { +# #Output some params and the calling function so we can identify more about this bug +# dbg("TxRep: MsgScore Undefined (bug 7164) - check_reputation Parameters: self: $self storage: $storage pms: $pms, key: $key, id: $id, ip: $ip, signedby: $signedby, msgscore: $msgscore"); +# dbg("TxRep: MsgScore Undefined (bug 7164) - weight: $weight"); +# +# my ($package, $filename, $line) = caller(); +# +# chomp($package); +# chomp($filename); +# chomp($line); +# +# dbg("TxRep: MsgScore Undefined (bug 7164) - Caller Info: Package: $package - Filename: $filename - Line: $line"); +# +# #Define $msgscore as a triage to hide warnings while we find the root cause +# #$msgscore = 0; +# } +# } and a little later + $pms->set_tag('TXREP'.$tag_id, sprintf("%2.1f", $delta)); } elsif (defined $self->total()) { - $delta = ($self->total() + $msgscore) / (1 + $self->count()) - $msgscore; + #Bug 7164 - $msgscore undefined + if (defined $msgscore) { + $delta = ($self->total() + $msgscore) / (1 + $self->count()) - $msgscore; + } else { + $delta = ($self->total()) / (1 + $self->count()); + } - $pms->set_tag('TXREP_'.$tag_id, sprintf("%2.1f",$delta)); + $pms->set_tag('TXREP_'.$tag_id, sprintf("%2.1f", $delta)); most of the rest are formatting mods or additional comments except.... this per bug 7191 - return 1 unless (!defined $self->{default_storage}); + # disabled per bug 7191 + #return 1 unless (!defined $self->{default_storage}); and this typo correction - if (ref($factory) =~ /SQLasedAddrList/) { + if (ref($factory) =~ /SQLBasedAddrList/) { Tried reintroducing the typo above, no change (seemed like a likely culprit with file locking and all) Tried uncommenting the line commented out for 7191 (line 1633) and got full performance back. And no 1415 errors, though bug 7191 seems like a real problem: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7191 results with that line uncommented: # perl -T -d:NYTProf /usr/local/bin/sa-learn --spam --no-sync /mail/blackrosetech.com/gessel/.ManJunk/{cur,new} Subroutine NetAddr::IP::STORABLE_freeze redefined at /usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 365. Subroutine NetAddr::IP::STORABLE_thaw redefined at /usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 377. Subroutine NetAddr::IP::STORABLE_freeze redefined at /usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 365. Subroutine NetAddr::IP::STORABLE_thaw redefined at /usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 377. bayes: bayes db version 0 is not able to be used, aborting! at /usr/local/lib/perl5/site_perl/Mail/SpamAssassin/BayesStore/DBM.pm line 208. locker: safe_unlock: lock on /var/amavis/.spamassassin/bayes.lock was lost due to expiry at /usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 219. Learned tokens from 235 message(s) (235 message(s) examined) locker: error accessing /var/amavis/.spamassassin/bayes.lock: No such file or directory at /usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 190. locker: safe_unlock: lock on /var/amavis/.spamassassin/bayes.lock was lost due to expiry at /usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 219. ooops, a few file lock problems, but no streams of line 1415 errors and performance: 1592 seconds (6.7 sec/message) http://www.blackrosetech.com/insecure/nytprof-txrep-1720440-L1633m/ about half the speed of TxRep.pm 1651114 but it would seem that there are at least two meaningful bug fixes included. I updated https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7191