1 of the changes in 21277 is because of my report. Very slow startup of the rebuild process. minor fix for too slow SQL servers by preventing one unneeded SQL statement
2+ of the changes in 21280 stemmed from my messages. Too many open files in Windows, early bad SSL changes, catching invalid regex instead of ASSP crashing fixes a Super-DAU setup 21287 & 21290: your changes to griplist folder creation, changes/fixes to BereleyDB error logging, gui changes, and windows file descriptor changes are because of things I've brought up fixes a Super-DAU setup 21293: The NWLI changes are because of what I asked has worked before and after code changes nearly the same way (no changes for advanced users) - yes code and doc are somehow better now 7 of the 8 changes in 21302 are because of my reports, questions, requests, and suggestions. Related to external file change times not being recorded in ASSP (long time bug), improvement in a single file changing causing all to be reloaded, changes to the analyzer for reports from Outlook, corpus cleanup for DKIM WL/NP matches. bug in file change time for 'Groups' feature if include files are externaly changed, not using any of the recommended and documented ways anything else is more or less code cosmetic - there is no need to cleanup anything from the corpus (it's nice to have) - the default rebuild engine is doing it well 21396 more changes because of discussions about Outlook reporting (FYI forward as attachment from Outlook still doesn't result in correct analyze reports nor does multiple report attachments in a single email from Outlook work at all.) nothing really changed - one minor change to catch wrong outlook reporting .msg + header corrections for wrong reported mails .eml 21317 After my questions about the unusual request for help for a way to match username of the recipient to the sender we discovered the bug about unoptimized weighted bombs with a scoring parameter and the bug with definite statements ?(DEFINE) forced by me - nobody is using it <<<...>>>=>xx really a bug So what is left from over 30 of your posts in the last 2 months - hours of reading, rereading and answering, analyzing and fixing things which normaly never happen, thinking about touching the assp core functions? - Not much left - one bug. To come to an end - for example take the subject of this thread ...."Scan entire message for Bombs, regardless of MaxBytes setting?" .... Everybody who knows the concept of assp will get tears in the eyes if reading this. I don't want to talk with you about the assp concept! Thank you again Ken for testing assp and reporting bugs. Join the assp forum, if you want. This mailing list has only ~80 members the forum has ~1390 members. Possibly you'll get better help there. Every forum has a 'Suggestion and Feedback', a 'How do I' and a 'Troubleshooting' section. Ken, I don't want to prevent you from posting here using any SF project rule - everyone should be and is free to join or left this mailinglist. But 'think assp' before you post, keep you posts short, be patient and accept some thing like 'I'll think about' and in particular 'No'. Thomas Von: "K Post" <nntp.p...@gmail.com> An: "ASSP development mailing list" <assp-test@lists.sourceforge.net> Datum: 14.11.2021 17:01 Betreff: Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation? I can not decypher what this means: most - where? -> forum , bug tracker , self testing, forced by attackers and it's my lack of clarity on your short replies which leads me to question further. I need to find a way to still be able to report my findings and ask my questions without being a bother. The last thing I want to be is a burden, but I have no other way to communicate with you, as the sole developer on a project that has minimal user communication other than what you and I discuss. While I wish it were easier for me to be more concise, my persistence and full description of issues and challenges has resulted in far more than the one change you referenced. I've outlined some of them from the last 7 versions below. 1 of the changes in 21277 is because of my report. Very slow startup of the rebuild process. 2+ of the changes in 21280 stemmed from my messages. Too many open files in Windows, early bad SSL changes, catching invalid regex instead of ASSP crashing 21287 & 21290: your changes to griplist folder creation, changes/fixes to BereleyDB error logging, gui changes, and windows file descriptor changes are because of things I've brought up 21293: The NWLI changes are because of what I asked 7 of the 8 changes in 21302 are because of my reports, questions, requests, and suggestions. Related to external file change times not being recorded in ASSP (long time bug), improvement in a single file changing causing all to be reloaded, changes to the analyzer for reports from Outlook, corpus cleanup for DKIM WL/NP matches. 21396 more changes because of discussions about Outlook reporting (FYI forward as attachment from Outlook still doesn't result in correct analyze reports nor does multiple report attachments in a single email from Outlook work at all.) 21317 After my questions about the unusual request for help for a way to match username of the recipient to the sender we discovered the bug about unoptimized weighted bombs with a scoring parameter and the bug with definite statements And over the years you've added useful features and fixed bugs because of my questions or requests which you originally dismissed as being misguided There's a trend here. When I'm active on this forum, I discuss things that lead you to improve ASSP which benefits everyone. If I had asked my question and then not responded to your short "no" or "have you thought about this" type of replies, would these changes have been made? If I hadn't fully described the issue/question/challenge, how would you have known what I was talking about? I will now step away from this form as requested for as long as I am able. I do hope that you are willing to entertain future questions/concerns once I return, if not for me, then for the rest of the quiet spam fighters on this list. On Sun, Nov 14, 2021 at 5:59 AM Thomas Eckardt <thomas.ecka...@thockar.com > wrote: >How many of the changes in the last 10 or so versions of ASSP have been from the requests of anyone else on this list? how many? 1 at 5.11.2021 - weight bug most - where? -> forum , bug tracker , self testing, forced by attackers You may use the forum, where everyone is free to skip reading your endless posts and blogs. It takes simply too much time to pick up the 1 to 5% of helpful content and to be forced by you to answer also the rest. Thomas Von: "K Post" <nntp.p...@gmail.com> An: "ASSP development mailing list" < assp-test@lists.sourceforge.net> Datum: 14.11.2021 00:14 Betreff: Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation? I don't know what I've done to deserve that reply, but regardless, I'm sorry to have upset you. I will take a long break from posting further here, but please do know that I'm appreciative of your continued support of this important program. Before I go, please entertain these thoughts: I hope that you're able to re-evaluate your request for me to go away. I've recommended more very good change requests to ASSP than ones that you consider to be bad. I'm not able to implement them myself. I'm not perfect, but your request for me to sign off of this list, which is a critical resource, is unfair. How many of the changes in the last 10 or so versions of ASSP have been from the requests of anyone else on this list? How many bugs have been quashed because of things I've discovered? How many improvements did you, and only you, make because of questions I've asked and because of feature requests I've made (recently and over the many years)? Are you angry because I'm (adminitedly) long winded? Please understand that this is not out of disrespect, it's because I want to make sure that I'm being clear. When I get a short answer, I try to continue the conversation. This is a discussion list after all. Are you angry because I'm persistent? My persistence is also not out of disrespect, it's because I'm inquisitive, am by no means an expert in coding or the inner workings of spam detection, and have a burning desire to continue to see ASSP improve. Often I ask a detailed question, and only get an answer back from you like "have you considered this?" or "no" without explanation. Is it so bad that I ask why not? I wait patiently for your replies, but do inquire more if my questions haven't been fully answered. If you don't have the time or desire to entertain my questions, so be it, but please remember that most of what I ask has ultimately led to you eventually improving ASSP. Anyway, I don't expect and certainly don't require a reply here. But please know that my intentions are pure, I'm charitable, patient, and a good person. It hurts deeply that you seem to think otherwise. I don't have the experience nor the ability that you do, not even close, but I like to think that even if I can be frustrating that I'm ultimately bring some good to the ASSP world by offering suggestions and asking questions. On Sat, Nov 13, 2021 at 3:56 AM Thomas Eckardt <thomas.ecka...@thockar.com > wrote: Ken , it would be nice if you consider to signoff this list or at least to no longer post here. Thank you. Thomas Von: "K Post" <nntp.p...@gmail.com> An: "ASSP development mailing list" < assp-test@lists.sourceforge.net> Datum: 12.11.2021 22:46 Betreff: Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation? First off, WOW. Our rebuild times are in no way similar. At first I thought it was you with fancy SSD's and lots of horsepower, but I'm seeing now that you have both useDB4Rebuild off and RebuildUseFileModel on. The opposite of my settings. I have useDB4Rebuild on and never enabled the RebuildUsedFileModel after initial attempts were failing (Early on with that feature). useDB4Rebuild is the default and I was always worried about RAM when I started using ASSP 10+ years ago and never looked back. A long rebuild time doesn't bother me, but seeing how fast you can do one has got me back to needing to test the settings on my end again. Thanks for that encouragement. I'm worried that going up to 50k maxbytes on my system seemed to cause a lot of false positives. I don't understand how that's possible, but it's what happened. I would have thought it was the other way around, too much spam getting through vs. too much legit being blocked. Plus, I don't think that generally using that much for bayesian is necessary (or maybe it's even detrimental?) Accuracy was very high for me at 6k and 10k, but I was missing the bombs. The question remains for me about the >CONCEPT< of optionally scanning more of a message at the time of attempted delivery for bombs. ClamAV uses its own maximum size setting. Why not also give us that option for Bombs? For the case I explained where bombs are late in the email body and likely other scenarios, don't you think it would be helpful to have a BombAddlBytes variable in the GUI? You know there's no way that I could ever code a plugin and that there's even less of a chance of this charity paying for one to be built! I still have duct tape holding my desk chair together. Modifying getbody seems pretty straight forward. Add a new variable called $bombdataref that would be used in place of $dataref for all bomb comparisons - similarly to the way that $clamavbytes is for the clamav stuff. my $bombdataref = $maxbytes + $BombAddlBytes : $BombAddlBytes : 0; then, instead of if ( ! BombOK( $fh, $dataref ) ) { if ( ! BombOK( $fh, $bombdataref ) ) { and the like everywhere that there's a bomb or script check in getbody There would also need to be changes in analyze and anywhere else that the bomb checks are done. I'm more than willing to try to modify ASSP as described above, give it a go, and report back. It won't be easy for me to make the changes and have it work, but I'm game. Before I do though, I'm concerned that you don't think that scanning more for bombs is a sound concept. Or maybe you just don't think it's necessary? I'm most interested in your opinion on that before I move forward. On Fri, Nov 12, 2021 at 1:08 PM Thomas Eckardt <thomas.ecka...@thockar.com > wrote: Nov-12-21 04:00:20 RebuildSpamDB-thread rebuildspamdb-version 8.14 started in ASSP version 2.6.6(21314) Nov-12-21 04:00:20 detection of local disclaimers is enabled Nov-12-21 04:00:20 info: 'useDB4Rebuild' is NOT set to on - the rebuild spamdb process will possibly require a large amount of memory - but it will run very fast! Nov-12-21 04:00:20 RebuildSpamDB reloaded and uses the internal FileModel (with 39917 entries) to speedup processing Nov-12-21 04:00:20 RebuildSpamDB allocated 963.08 MByte of RAM to load the internal FileModel Nov-12-21 04:00:20 RebuildSpamDB will create a Hidden Markov Model Nov-12-21 04:00:20 RebuildSpamDB will include attachment-database-entries in to spamdb Nov-12-21 04:00:20 RebuildSpamDB will create unicode enabled databases Nov-12-21 04:00:20 RebuildSpamDB will process all words as Sequence of UAX #29 Grapheme Clusters Nov-12-21 04:00:20 RebuildSpamDB will normalize unicode characters Nov-12-21 04:00:20 RebuildSpamDB will use the ASSP_WordStem engine Nov-12-21 04:00:20 ---ASSP Settings--- Nov-12-21 04:00:20 RebuildSpamDB will create private spamdb entries for users email addresses and each local domain. Nov-12-21 04:00:20 Do Not Collect RedRe Messages: Enabled **Messages matching the RedRe will be removed from the corpus!** Nov-12-21 04:00:20 Use Subject as Maillog Names: True Nov-12-21 04:00:20 Maxbytes: 25,000 Nov-12-21 04:00:20 Maxfiles: 31,000 Nov-12-21 04:00:20 RebuildFileTimeLimit: 1 5 Nov-12-21 04:00:20 RebuildFileTimeLimit: files will be moved away from the corpus if their processing takes longer than 5 second(s) processing ~40.000 corpus files in ~4 minutes building 15.500 spamdb.helo records in 2 seconds building 3.200.000 spamdb records in 25 seconds building 7.200.000 hmmdb records in 1:33 seconds complete processing time is 6 minutes. populating the records to the mysql database takes some minutes longer So - maxBytes:=100.000 seems to be a possible setting (but this will IMHO not improve detection rates) If you need to process complete mails for bombs - you'll need to write your own level 2 assp-plugin. Thomas Von: "K Post" <nntp.p...@gmail.com> An: "ASSP development mailing list" < assp-test@lists.sourceforge.net> Datum: 12.11.2021 16:56 Betreff: Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation? Absolutely I've thought about this. I consider everything I post prior to posting. Can you briefly explain why the ability to scan (MaxBytes + some additional amount)kb on incoming mails for bombs but only use MaxBytes for bayesian and the rebuild would be such a bad idea? Since you questioned if I ever thought about this, here's what the thought process is and the reason for the request. Maybe I didn't explain myself well enough in the previous messages: The MaxBytes "documentation" says to lower it to 3000 for a mature installation, but 10x larger than that if you can handle it. How many bytes of the message body will ASSP look at - the message header is always included in all checks. Mails stored in the collecting folders will be truncated to this size, if StoreCompleteMail is disabled. The average of Ham messages (message body) is 6K, the average of Spam messages is 3K. Usually the spam folder will be filled quicker than the notspam folder, therefore set this value to 4000 to get more wordpairs per Ham Message. When both folders are close to the maxfiles limit, reduce it to 3000. If your system is fast enough and has enough RAM multiply all the above recommendations and the default value by ten. The gui doesn't say "IF the average is 6k ham, 3k spam," is says that it IS 6k ham / 3k spam. That's not true of my installation. My average spam size, as I've mentioned before, has a median size of about 20kb because of all of the html in them. And not-spam has a median size of 40kb. Using the logic in your gui, I believe I should set my MaxBytes to 20kb, the median size of my spam corpus. But, if I set my MaxBytes to 20kb (which it appears to be able to handle okay, rebuilding in an hour and change), then bombs after 20kb aren't detected when a message is attempting delivery. Why does this matter to me? We're seeing messages with @gmail.com and @whatever.onmicrosoft.com addresses that are copying legitimate looking order receipts from vendors like Amazon.com, BestBuy (US based big box electronics store), and Norton. Many look identical to a legitimate message. Ultimately, they want to call them on the phone and give your credit card number, using the guise that they're going to refund it. Classic scam. These messages will always pass bayesian, they read identically to real messages. BUT, I can detect some with the phone numbers that they direct people to. The email addresses change frequently, but the scam phone numbers remain pretty constant. I could maintain a list of known bad phone numbers (also available online) to capture these messages before they're delivered. Simple. If the message has one of these phone numbers, score it such that it'll get blocked. The problem with many of these emails is that the phone number is way past the 3k mark, and past the 20k mark too. The scammers have a bunch of HTML in the "confirmation" email, just like real stores tend to do. I tried increasing MaxBytes up to 50kb, which easily caught messages with bombs later in the body, but that then seemed to cause a lot of false positives and obviously much longer rebuild process. If there could be a "continue canning for bombs for ___kb after maxbytes" setting, that would let bombs later in the body be detected. I don't know what the downside to having such a feature would be. Based on your reaction to my question, I'm obviously missing something important. On Thu, Nov 11, 2021 at 1:38 AM Thomas Eckardt <thomas.ecka...@thockar.com > wrote: >Is there logic to having a separate MaxBytes setting like MaxBytesForBombs that's used only during message delivery? That way, the entire message can be scanned for bombs, but the rebuild could use a lower number to better balance the differential between the average sized spam and average sized not-spam message. DID YOU EVER thougth about that ??????????????? Or do you only write something to fillup the community mailing list? No - no way! Thomas Von: "K Post" <nntp.p...@gmail.com> An: "ASSP development mailing list" < assp-test@lists.sourceforge.net> Datum: 10.11.2021 20:22 Betreff: Re: [Assp-test] Concept Question: Scan entire message for Bombs, regardless of MaxBytes setting? New MaxBytes recommendation? After about 12 weeks of going from MaxBytes of 4k to MaxBytes of 50k, 've seen: 1) Rebuild go from just over an hour (with 30k MaxFiles) to just over 2 hours. I'm fine with that, there's more to scan 2) Bomb detections improve, as a lot of what's detected is beyond the 20k or 30k mark 3) but, bayesian false positives going way up. Lots of mail that would have (correctly) been delivered, is now getting too high of a score and is blocked. Surely #3 is specific to the types of messages my users are getting and I can tweak settings. BUT, it makes me raise this question again: Is there logic to having a separate MaxBytes setting like MaxBytesForBombs that's used only during message delivery? That way, the entire message can be scanned for bombs, but the rebuild could use a lower number to better balance the differential between the average sized spam and average sized not-spam message. On Mon, Nov 1, 2021 at 2:43 PM K Post <nntp.p...@gmail.com> wrote: When looking at the "Use this HTML Parser" section on the GUI, I found this line: it is recommended to set MaxBytes to 50000 (be carefull on heavy load systems - spam bomb regular expressions will take longer using 50000!).\ I'm going to change my settings and see how bad the rebuild time is. I've got enough processing power and RAM now, but the disks aren't SSD. Just a 4 disk Raid 1+0 traditional HDD setup. We'll see... Since HTMl email accounts for a big percentage of all mail, might it be a good idea to update/expand the guidance in the MaxBytes section of the GUI? On Fri, Oct 29, 2021 at 8:40 PM K Post <nntp.p...@gmail.com> wrote: Summary: Should/could any consideration be given to having ASSP scan the entire message at the time it is received for Bombs (only), while still using MaxBytes for Bayesian/HMM? We've been having some cleverly crafted messages slipping through all filters that would be easy to catch with Bombs if only the catchable content came before MaxBytes. These messages are 20kb+, They have a scam phone number at the very end of the larger than MaxBytes messages. I want/need to use bombs to catch the scam phone numbers. With MaxBytes set to 3000, which is useful for faster RebuildSpamDB, these BombDataRE matches just aren't being caught. If I increase MaxBytes, my BombDataRE catches them, but then rebuildspamdb is (probably? see below) longer than it needs to be. So, is there any value in considering a MaxBytesAdditionalForBombs variable which would be added to MaxBytes and only used when scanning for bombs as messages arrive? Would that kill performance?? Other downsides? We could still only look at MaxBytes for Bayesian/HMM since it's only MaxBytes used when building those databases. What do you think? And while we're talking MaxBytes: I've asked this before, is the guidance for 3kb for MaxBytes once there's a mature corpus still a valid recommendation? With unlimited horsepower and ram, sure, why not, do 30kb or 100kb. That's not my reality, so I want to see where to best allocate resources. If 3kb is still the guidance, even though the spam files I'm seeing have a median size around 20kb, so be it. I feel like when that guidance was written, html wasn't used as prolifically in spam. The median size of notspam in my corpus is about 40kb. That's determined unscientifically by sorting by size and scrolling to approximately half way down. Thanks. Have a good weekend. Ken _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test DISCLAIMER: ******************************************************* This email and any files transmitted with it may be confidential, legally privileged and protected in law and are intended solely for the use of the individual to whom it is addressed. This email was multiple times scanned for viruses. There should be no known virus in this email! ******************************************************* _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test DISCLAIMER: ******************************************************* This email and any files transmitted with it may be confidential, legally privileged and protected in law and are intended solely for the use of the individual to whom it is addressed. This email was multiple times scanned for viruses. There should be no known virus in this email! ******************************************************* _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test DISCLAIMER: ******************************************************* This email and any files transmitted with it may be confidential, legally privileged and protected in law and are intended solely for the use of the individual to whom it is addressed. This email was multiple times scanned for viruses. There should be no known virus in this email! ******************************************************* _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test DISCLAIMER: ******************************************************* This email and any files transmitted with it may be confidential, legally privileged and protected in law and are intended solely for the use of the individual to whom it is addressed. This email was multiple times scanned for viruses. There should be no known virus in this email! ******************************************************* _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test _______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test DISCLAIMER: ******************************************************* This email and any files transmitted with it may be confidential, legally privileged and protected in law and are intended solely for the use of the individual to whom it is addressed. This email was multiple times scanned for viruses. There should be no known virus in this email! *******************************************************
_______________________________________________ Assp-test mailing list Assp-test@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/assp-test