I just reviewed a rebuild llog and was shocked to see: Dec-17-18 02:25:25 info: require approximately 1 files (2 words) from folder messages/notspam to get the wanted corpusnorm (1.000)
That's after the messages/spam folder (15k messages) is processed. I have maxfiles set to 15,000 maxbytes set to 4,000 Suggestions? I certainly want our users' good mail to be considered! Can't say I've seen this ever before, but I don't review the rebuild log terribly often. Copy of rebuild log: File rebuildrun.txt follows: Dec-17-18 02:15:00 RebuildSpamDB-thread rebuildspamdb-version 7.50 started in ASSP version 2.6.2(18339) Dec-17-18 02:15:00 RebuildSpamDB uses BerkeleyDB for temporary hashes Dec-17-18 02:15:00 RebuildSpamDB uses BerkeleyDB-ENV with 62.50 MByte Dec-17-18 02:15:00 RebuildSpamDB will create a Hidden Markov Model Dec-17-18 02:15:00 RebuildSpamDB will include attachment-database-entries in to spamdb Dec-17-18 02:15:00 RebuildSpamDB will create unicode enabled databases Dec-17-18 02:15:00 RebuildSpamDB will process all words as Sequence of UAX #29 Grapheme Clusters Dec-17-18 02:15:00 RebuildSpamDB will normalize unicode characters Dec-17-18 02:15:00 RebuildSpamDB will use the ASSP_WordStem engine Dec-17-18 02:15:00 ---ASSP Settings--- Dec-17-18 02:15:00 Do Not Collect Messages with RedListed address: Enabled **Messages with RedListed addresses will be removed from the corpus!** Dec-17-18 02:15:00 Do Not Collect RedRe Messages: Enabled **Messages matching the RedRe will be removed from the corpus!** Dec-17-18 02:15:00 Use Subject as Maillog Names: True Dec-17-18 02:15:00 Maxbytes: 4,000 Dec-17-18 02:15:00 Maxfiles: 15,000 Dec-17-18 02:15:00 RebuildFileTimeLimit: 1 5 Dec-17-18 02:15:00 RebuildFileTimeLimit: files will be moved away from the corpus if their processing takes longer than 5 second(s) Dec-17-18 02:15:00 Trashlist cleaning finished, 2 of 56 files deleted Dec-17-18 02:15:00 c:/ASSP/messages/errors-spam Dec-17-18 02:15:00 File Count: 934 Dec-17-18 02:15:00 Processing... messages/errors-spam with 934 files Dec-17-18 02:15:52 0 attachment/image entries processed Dec-17-18 02:15:52 Imported Files for HeloBlackList: 933 Dec-17-18 02:15:52 Imported Files for Bayes/HMM: 933 Dec-17-18 02:15:52 Finished in 52 seconds (17.94 files/s - 9.88 MByte) Dec-17-18 02:15:52 c:/ASSP/messages/errors-notspam Dec-17-18 02:15:52 File Count: 2,209 Dec-17-18 02:15:52 Processing... messages/errors-notspam with 2,209 files Dec-17-18 02:18:36 0 attachment/image entries processed Dec-17-18 02:18:36 Imported Files for HeloBlackList: 2,208 Dec-17-18 02:18:36 Imported Files for Bayes/HMM: 2,208 Dec-17-18 02:18:36 Finished in 164 seconds (13.46 files/s - 34.86 MByte) Dec-17-18 02:18:36 info: corpusnorm after processing messages/errors-spam and messages/errors-notspam is Spam Weight: 657272 / Not-Spam Weight: 3563832 => norm: 0.184 Dec-17-18 02:18:36 info: require approximately all files (2,061,306 words) from folder messages/spam to get the wanted corpusnorm (1.000) Dec-17-18 02:18:36 c:/ASSP/messages/spam Dec-17-18 02:18:36 File Count: 14,937 Dec-17-18 02:18:36 Processing... messages/spam with 14,937 files Dec-17-18 02:25:25 0 attachment/image entries processed Dec-17-18 02:25:25 Imported Files for HeloBlackList: 14,937 Dec-17-18 02:25:25 Imported Files for Bayes/HMM: 14,937 Dec-17-18 02:25:25 Finished in 409 seconds (36.52 files/s - 69.05 MByte) Dec-17-18 02:25:25 info: require approximately 1 files (2 words) from folder messages/notspam to get the wanted corpusnorm (1.000) Dec-17-18 02:25:25 c:/ASSP/messages/notspam Dec-17-18 02:25:25 File Count: 9,382 Dec-17-18 02:25:25 Processing... messages/notspam with 9,382 files Dec-17-18 02:26:42 0 attachment/image entries processed Dec-17-18 02:26:42 Imported Files for HeloBlackList: 9,382 Dec-17-18 02:26:42 Imported Files for Bayes/HMM: 0 Dec-17-18 02:26:42 Finished in 77 seconds (121.84 files/s - 81.79 MByte) Dec-17-18 02:26:42 Generating weighted Bayesian tuplets Dec-17-18 02:27:04 start populating Spamdb with 465,296 records - Bayesian check is now disabled! Dec-17-18 02:28:19 Finished populating Spamdb with 465,296 records - Bayesian check is now enabled! Dec-17-18 02:28:19 done - Generating weighted Bayesian tuplets Dec-17-18 02:28:19 Bayesian Pairs: 465,296 now in list Dec-17-18 02:28:19 Generating consolidated Hidden-Markov-Model database from 2,155,159 record model Dec-17-18 02:30:25 HMM sequences: 1,059,525 now in list Dec-17-18 02:30:26 generating Spamdb.helo records from 13,393 collected HELO's Dec-17-18 02:30:28 cleaning old Spamdb.helo records Dec-17-18 02:30:28 done - cleaning old Spamdb.helo records Dec-17-18 02:30:28 HELO Blacklist: 25 new, 1,159 now in list Dec-17-18 02:30:28 Spam Weight : 2,745,357 Dec-17-18 02:30:28 Not-Spam Weight: 3,563,832 Dec-17-18 02:30:28 Corpus norm: 0.7703 - (ok - slighly ham heavy) Dec-17-18 02:30:28 Corpus confidence: 0.66134618 Dec-17-18 02:30:33 Start populating Hidden Markov Model. HMM-check is disabled for this time! Dec-17-18 02:30:33 start populating Hidden Markov Model with 1,059,525 records! Dec-17-18 02:33:08 Finished populating Hidden Markov Model with 1,059,525 records! Dec-17-18 02:33:08 Finished populating Hidden Markov Model. HMM-check is now enabled again! Dec-17-18 02:33:08 Total processing time: 1,088 second(s) Dec-17-18 02:33:08 Total processing data: 195.58 MByte Dec-17-18 02:33:08 Rebuild processed 39.12 files per second. Dec-17-18 02:33:08 After finishing the Rebuild process, the c:/ASSP/tmpDB folder contains 363.74 MByte. Dec-17-18 02:33:08 After finishing the Rebuild process, the drive that contains the c:/ASSP/tmpDB folder has 12.89 GByte free space from total 25.20 GByte. Dec-17-18 02:33:08 building new GripList records and bounce report Dec-17-18 02:33:08 processing Logfile c:/ASSP/logs/maillog.txt Dec-17-18 02:33:08 processing Logfile c:/ASSP/logs/18-12-16.maillog.txt Dec-17-18 02:33:15 processing Logfile c:/ASSP/logs/18-12-15.maillog.txt Dec-17-18 02:33:20 processing Logfile c:/ASSP/logs/18-12-14.maillog.txt Dec-17-18 02:33:28 processing Logfile c:/ASSP/logs/18-12-13.maillog.txt Dec-17-18 02:33:29 processing Logfile c:/ASSP/logs/18-12-12.maillog.txt Dec-17-18 02:33:30 bounce report for the last two days: 11 bounces received (possibly delayed) - 1 bounces blocked Dec-17-18 02:33:30 list of the top ten local addresses with blocked bounces in the last two days: b...@ourcharity.org : 1 Dec-17-18 02:33:30 end of bounce report Dec-17-18 02:33:31 Uploading Griplist via Direct Connection Dec-17-18 02:33:32 Submitted 6,144 bytes: 0 IPv6 addresses, 2,654 IPv4 addresses, good IP's 811 , bad IP's 1,137 Dec-17-18 02:33:32 Trashlist was saved to c:/ASSP/trashlist.db THANKS!!
_______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test