I recently started testing SA 2.62 with ActiveState Perl 5.8.2-808 on Windows XP. When training Bayes on an mbox file with 323 messages in it, I found it took about 135 seconds longer than the same test with Perl 5.6 on Windows.

Looking further, I discovered that BayesStore::tok_get() calls DB_File::FETCH about 50,000 times during this particular run. Of those calls, 134 of them took about 1 second each; those 134 calls were scattered roughly in the middle third of all calls made. The other ~49,900 calls took a small fraction of that (<0.001s). Clearly, this is the problem.

It appears that Perl 5.8's DB_File uses Berkeley DB "version 8" while Perl 5.6's uses "version 5" (according to Cygwin's "file" command). So I'm betting it is something about the underlying BDB implementation that has changed. Incidentally, this also seems to mean that Bayes databases have to be discarded when switching from Perl 5.6 to Perl 5.8.

Does anyone know why DB_File with Perl 5.8 has this behavior but Perl 5.6's does not?

Thanks,

Barry



-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to