I just reviewed a rebuild llog and was shocked to see:
Dec-17-18 02:25:25 info: require approximately 1 files (2 words) from
folder messages/notspam to get the wanted corpusnorm (1.000)

That's after the messages/spam folder (15k messages) is processed.
I have maxfiles set to 15,000
maxbytes set to 4,000

Suggestions?  I certainly want our users' good mail to be considered!
Can't say I've seen this ever before, but I don't review the rebuild log
terribly often.

Copy of rebuild log:


File rebuildrun.txt follows:


Dec-17-18 02:15:00 RebuildSpamDB-thread rebuildspamdb-version 7.50 started
in ASSP version 2.6.2(18339)

Dec-17-18 02:15:00 RebuildSpamDB uses BerkeleyDB for temporary hashes

Dec-17-18 02:15:00 RebuildSpamDB uses BerkeleyDB-ENV with 62.50 MByte

Dec-17-18 02:15:00 RebuildSpamDB will create a Hidden Markov Model

Dec-17-18 02:15:00 RebuildSpamDB will include attachment-database-entries
in to spamdb

Dec-17-18 02:15:00 RebuildSpamDB will create unicode enabled databases

Dec-17-18 02:15:00 RebuildSpamDB will process all words as Sequence of UAX
#29 Grapheme Clusters

Dec-17-18 02:15:00 RebuildSpamDB will normalize unicode characters

Dec-17-18 02:15:00 RebuildSpamDB will use the ASSP_WordStem engine

Dec-17-18 02:15:00 ---ASSP Settings---
Dec-17-18 02:15:00 Do Not Collect Messages with RedListed address: Enabled
**Messages with RedListed addresses will be removed from the corpus!**

Dec-17-18 02:15:00 Do Not Collect RedRe Messages: Enabled **Messages
matching the RedRe will be removed from the corpus!**

Dec-17-18 02:15:00 Use Subject as Maillog Names: True
Dec-17-18 02:15:00 Maxbytes: 4,000
Dec-17-18 02:15:00 Maxfiles: 15,000
Dec-17-18 02:15:00 RebuildFileTimeLimit: 1 5
Dec-17-18 02:15:00 RebuildFileTimeLimit: files will be moved away from the
corpus if their processing takes longer than 5 second(s)

Dec-17-18 02:15:00 Trashlist cleaning finished, 2 of 56 files deleted

Dec-17-18 02:15:00 c:/ASSP/messages/errors-spam
Dec-17-18 02:15:00 File Count: 934
Dec-17-18 02:15:00 Processing... messages/errors-spam with 934 files
Dec-17-18 02:15:52 0 attachment/image entries processed
Dec-17-18 02:15:52 Imported Files for HeloBlackList: 933
Dec-17-18 02:15:52 Imported Files for Bayes/HMM: 933
Dec-17-18 02:15:52 Finished in 52 seconds (17.94 files/s - 9.88 MByte)

Dec-17-18 02:15:52 c:/ASSP/messages/errors-notspam
Dec-17-18 02:15:52 File Count: 2,209
Dec-17-18 02:15:52 Processing... messages/errors-notspam with 2,209 files
Dec-17-18 02:18:36 0 attachment/image entries processed
Dec-17-18 02:18:36 Imported Files for HeloBlackList: 2,208
Dec-17-18 02:18:36 Imported Files for Bayes/HMM: 2,208
Dec-17-18 02:18:36 Finished in 164 seconds (13.46 files/s - 34.86 MByte)
Dec-17-18 02:18:36 info: corpusnorm after processing messages/errors-spam
and messages/errors-notspam is Spam Weight: 657272 / Not-Spam Weight:
3563832 => norm: 0.184
Dec-17-18 02:18:36 info: require approximately all files (2,061,306 words)
from folder messages/spam to get the wanted corpusnorm (1.000)

Dec-17-18 02:18:36 c:/ASSP/messages/spam
Dec-17-18 02:18:36 File Count: 14,937
Dec-17-18 02:18:36 Processing... messages/spam with 14,937 files
Dec-17-18 02:25:25 0 attachment/image entries processed
Dec-17-18 02:25:25 Imported Files for HeloBlackList: 14,937
Dec-17-18 02:25:25 Imported Files for Bayes/HMM: 14,937
Dec-17-18 02:25:25 Finished in 409 seconds (36.52 files/s - 69.05 MByte)
Dec-17-18 02:25:25 info: require approximately 1 files (2 words) from
folder messages/notspam to get the wanted corpusnorm (1.000)

Dec-17-18 02:25:25 c:/ASSP/messages/notspam
Dec-17-18 02:25:25 File Count: 9,382
Dec-17-18 02:25:25 Processing... messages/notspam with 9,382 files
Dec-17-18 02:26:42 0 attachment/image entries processed
Dec-17-18 02:26:42 Imported Files for HeloBlackList: 9,382
Dec-17-18 02:26:42 Imported Files for Bayes/HMM: 0
Dec-17-18 02:26:42 Finished in 77 seconds (121.84 files/s - 81.79 MByte)

Dec-17-18 02:26:42 Generating weighted Bayesian tuplets
Dec-17-18 02:27:04 start populating Spamdb with 465,296 records - Bayesian
check is now disabled!
Dec-17-18 02:28:19 Finished populating Spamdb with 465,296 records -
Bayesian check is now enabled!
Dec-17-18 02:28:19 done - Generating weighted Bayesian tuplets

Dec-17-18 02:28:19 Bayesian Pairs: 465,296 now in list

Dec-17-18 02:28:19 Generating consolidated Hidden-Markov-Model database
from 2,155,159 record model
Dec-17-18 02:30:25 HMM sequences: 1,059,525 now in list

Dec-17-18 02:30:26 generating Spamdb.helo records from 13,393 collected
HELO's
Dec-17-18 02:30:28 cleaning old Spamdb.helo records
Dec-17-18 02:30:28 done - cleaning old Spamdb.helo records

Dec-17-18 02:30:28 HELO Blacklist: 25 new, 1,159 now in list

Dec-17-18 02:30:28 Spam Weight    :   2,745,357
Dec-17-18 02:30:28 Not-Spam Weight:   3,563,832

Dec-17-18 02:30:28 Corpus norm: 0.7703 - (ok - slighly ham heavy)
Dec-17-18 02:30:28 Corpus confidence: 0.66134618

Dec-17-18 02:30:33 Start populating Hidden Markov Model. HMM-check is
disabled for this time!
Dec-17-18 02:30:33 start populating Hidden Markov Model with 1,059,525
records!
Dec-17-18 02:33:08 Finished populating Hidden Markov Model with 1,059,525
records!
Dec-17-18 02:33:08 Finished populating Hidden Markov Model. HMM-check is
now enabled again!

Dec-17-18 02:33:08 Total processing time: 1,088 second(s)

Dec-17-18 02:33:08 Total processing data: 195.58 MByte


Dec-17-18 02:33:08 Rebuild processed 39.12 files per second.

Dec-17-18 02:33:08 After finishing the Rebuild process, the c:/ASSP/tmpDB
folder contains 363.74 MByte.

Dec-17-18 02:33:08 After finishing the Rebuild process, the drive that
contains the c:/ASSP/tmpDB folder has 12.89 GByte free space from total
25.20 GByte.

Dec-17-18 02:33:08 building new GripList records and bounce report
Dec-17-18 02:33:08 processing Logfile c:/ASSP/logs/maillog.txt
Dec-17-18 02:33:08 processing Logfile c:/ASSP/logs/18-12-16.maillog.txt
Dec-17-18 02:33:15 processing Logfile c:/ASSP/logs/18-12-15.maillog.txt
Dec-17-18 02:33:20 processing Logfile c:/ASSP/logs/18-12-14.maillog.txt
Dec-17-18 02:33:28 processing Logfile c:/ASSP/logs/18-12-13.maillog.txt
Dec-17-18 02:33:29 processing Logfile c:/ASSP/logs/18-12-12.maillog.txt

Dec-17-18 02:33:30 bounce report for the last two days: 11 bounces received
(possibly delayed) - 1 bounces blocked

Dec-17-18 02:33:30 list of the top ten local addresses with blocked bounces
in the last two days:

 b...@ourcharity.org : 1

Dec-17-18 02:33:30 end of bounce report

Dec-17-18 02:33:31 Uploading Griplist via Direct Connection
Dec-17-18 02:33:32 Submitted 6,144 bytes: 0 IPv6 addresses, 2,654 IPv4
addresses, good IP's 811 , bad IP's 1,137

Dec-17-18 02:33:32 Trashlist was saved to c:/ASSP/trashlist.db


THANKS!!
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to