-------- Original Message --------
Subject: Re: very basic SA-Learn performance question: is 90 seconds or so per 
token really, really slow or roughly normal? NYTProf results TxRep.pm 1720440 
vs 1651114
From: RW <rwmailli...@googlemail.com>
To: users@spamassassin.apache.org
Date: Tue Nov 07 2017 03:44:50 GMT+0300 (AST)

> You can do that using multiple spamc processes with spamd, but it
> doesn't use any less resources, and the default back-end has a
> reader-writer lock. Training with sa-learn is usually not an issue compared 
> with
> high-volume autotraining (which is naturally parallel).
> 
> I'm using SA from the FreeBSD port on a low-end i5 from a few years
> back with the default Berkeley db on a fairly slow disk and I get 
> 20 emails/s through sa-learn (without learning to journal). However, I
> don't use TxRep.
> 
> Have you tried commenting-out the TxRep loadplugin line, to see what
> happens? 
> 
> In your original post it wasn't simply slow, if it had continued at
> ~1600/hour, it would have completed in a few hours, but it ground to a
> halt and apparently the Bayes database got corrupted. If TxRep is the
> source of the problem then disabling it on your server while you debug
> it separately would be sensible IMO.


The bayes database issue isn't really the hangup, though I understand it is a 
little unexpected.   They key hold up seems to be in Locker.pm - now Locker.pm  
- but with TxRep.pm 17200440 and not with TxRep.pm 1651114.  You might suggest 
using 1651114 that is included in the FreeBSD distro of SA, but the problem is 
that it generates these weird error messages - lots and lots of them 

Use of uninitialized value $msgscore in addition (+) at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415.
Use of uninitialized value $msgscore in subtraction (-) at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415.

 as reported in bug 7164 https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7164


Now with TxRep.pm 1720440, Locker.pm gets called 181799 times and consumes 
94,525 seconds http://www.blackrosetech.com/insecure/nytprof-txrep-1720440/

With TxRep.pm 1651114 Locker.pm gets called 33 times and consumes >6 seconds 
http://www.blackrosetech.com/insecure/nytprof-txrep-1651114/

What's locker doing?

It appears to my inexpert eye, that it is being called because a lock file 
can't be created in Mail-SpamAssassin-Locker-UnixSFSSafe.pm, lines 87 

 for (my $retries = 0; $retries < $max_retries; $retries++) {
 if ($retries > 0) { $self->jittery_one_second_sleep(); }  called 90,888 times 
(at roughly a second per call - 94,524 seconds total.

sub Mail::SpamAssassin::Locker::UnixNFSSafe::CORE:lstat  which got called 
188,049 times

Ok, so much for amateur sleuthing through the timing stats... why would 
TxRep.pm hold up the lock state?  

Seems to be in

line 1219 

# spent 96340s (471ms+96340) within 
Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation which was called 
468 times, avg 206s/call:
# 234 times (212ms+96340s) by Mail::SpamAssassin::Plugin::TxRep::learn_message 
at line 1846, avg 412s/call
# 234 times (259ms+-259ms) by Mail::SpamAssassin::Plugin::TxRep::forget_message 
at line 1861, avg 0s/call
sub check_senders_reputation {

and 1388
        
# spent 96330s (197ms+96330) within 
Mail::SpamAssassin::Plugin::TxRep::check_reputations which was called 3216 
times, avg 30.0s/call:
# 468 times (39.4ms+14778s) by 
Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1376, avg 
31.6s/call
# 468 times (24.2ms+14662s) by 
Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1284, avg 
31.3s/call
# 468 times (31.3ms+13799s) by 
Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1360, avg 
29.5s/call
# 468 times (22.3ms+13734s) by 
Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1358, avg 
29.3s/call
# 468 times (22.3ms+13711s) by 
Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1351, avg 
29.3s/call
# 468 times (29.4ms+13623s) by 
Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1348, avg 
29.1s/call
# 408 times (27.8ms+12024s) by 
Mail::SpamAssassin::Plugin::TxRep::check_senders_reputation at line 1354, avg 
29.5s/call
sub check_reputations {

Which seems to get hung at 

3216
  if ($self->open_storages()) {
  # spent 94571s making 3216 calls to 
Mail::SpamAssassin::Plugin::TxRep::open_storages, avg 29.4s/call

which seems to get hung at

1654
 $self->{checker} = $self->{default_storage} = 
$factory->new_checker($self->{main});
    # spent 94562s making 3216 calls to 
Mail::SpamAssassin::DBBasedAddrList::new_checker, avg 29.4s/call

Note this line is identical in 1651114 but was only called once:
1590
 $self->{checker} = $self->{default_storage} = 
$factory->new_checker($self->{main});
    # spent  4.18ms making 1 call to 
Mail::SpamAssassin::DBBasedAddrList::new_checker


The diff is 
https://svn.apache.org/viewvc/spamassassin/trunk/lib/Mail/SpamAssassin/Plugin/TxRep.pm?r1=1651114&r2=1720440

Here we can see the fix for the undefined messagescore

+#  {
+#    #Bug 7164, trying to find out reason for these: _WARN: Use of 
uninitialized value $msgscore in addition (+) at 
/usr/share/perl5/vendor_perl/Mail/SpamAssassin/Plugin/TxRep.pm line 1415.
+#    no warnings;
+#
+#    unless (defined $msgscore) {
+#      #Output some params and the calling function so we can identify more 
about this bug
+#      dbg("TxRep: MsgScore Undefined (bug 7164) - check_reputation 
Parameters: self: $self storage: $storage pms: $pms, key: $key, id: $id, ip: 
$ip, signedby: $signedby, msgscore: $msgscore");
+#      dbg("TxRep: MsgScore Undefined (bug 7164) - weight: $weight");
+#
+#      my ($package, $filename, $line) = caller();
+#
+#      chomp($package);
+#      chomp($filename);
+#      chomp($line);
+#
+#      dbg("TxRep: MsgScore Undefined (bug 7164) - Caller Info: Package: 
$package - Filename: $filename - Line: $line");
+#
+#      #Define $msgscore as a triage to hide warnings while we find the root 
cause
+#      #$msgscore = 0;
+#    }
+#  }

and a little later

+           $pms->set_tag('TXREP'.$tag_id,              sprintf("%2.1f", 
$delta));
         } elsif (defined $self->total()) {
-            $delta = ($self->total() + $msgscore) / (1 + $self->count()) - 
$msgscore;
+            #Bug 7164 - $msgscore undefined
+            if (defined $msgscore) {
+              $delta = ($self->total() + $msgscore) / (1 + $self->count()) - 
$msgscore;
+            } else {
+              $delta = ($self->total()) / (1 + $self->count());
+            }
 
-            $pms->set_tag('TXREP_'.$tag_id,             
sprintf("%2.1f",$delta));
+            $pms->set_tag('TXREP_'.$tag_id,             sprintf("%2.1f", 
$delta));

most of the rest are formatting mods or additional comments except....

this per bug 7191
-  return 1 unless (!defined $self->{default_storage});
+  # disabled per bug 7191
+  #return 1 unless (!defined $self->{default_storage});

and this typo correction

-       if (ref($factory) =~ /SQLasedAddrList/) {
+       if (ref($factory) =~ /SQLBasedAddrList/) {

Tried reintroducing the typo above, no change (seemed like a likely culprit 
with file locking and all)

Tried uncommenting the line commented out for 7191 (line 1633) and got full 
performance back.  And no 1415 errors, though bug 7191 seems like a real 
problem:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7191

results with that line uncommented:
 # perl -T -d:NYTProf /usr/local/bin/sa-learn --spam --no-sync 
/mail/blackrosetech.com/gessel/.ManJunk/{cur,new}
Subroutine NetAddr::IP::STORABLE_freeze redefined at 
/usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 365.
Subroutine NetAddr::IP::STORABLE_thaw redefined at 
/usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 377.
Subroutine NetAddr::IP::STORABLE_freeze redefined at 
/usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 365.
Subroutine NetAddr::IP::STORABLE_thaw redefined at 
/usr/local/lib/perl5/site_perl/mach/5.24/NetAddr/IP.pm line 377.
bayes: bayes db version 0 is not able to be used, aborting! at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/BayesStore/DBM.pm line 208.
locker: safe_unlock: lock on /var/amavis/.spamassassin/bayes.lock was lost due 
to expiry at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 219.
Learned tokens from 235 message(s) (235 message(s) examined)
locker: error accessing /var/amavis/.spamassassin/bayes.lock: No such file or 
directory at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 190.
locker: safe_unlock: lock on /var/amavis/.spamassassin/bayes.lock was lost due 
to expiry at 
/usr/local/lib/perl5/site_perl/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 219.

ooops, a few file lock problems, but no streams of line 1415 errors and 
performance:


1592 seconds (6.7 sec/message)
http://www.blackrosetech.com/insecure/nytprof-txrep-1720440-L1633m/

about half the speed of TxRep.pm 1651114 but it would seem that there are at 
least two meaningful bug fixes included.

I updated https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7191

 

Reply via email to