>Is there logic to having a separate MaxBytes setting like 
MaxBytesForBombs that's used only during message delivery?  That way, the 
entire message can be scanned for bombs, but the rebuild could use a lower 
number to better balance the differential between the average sized spam 
and average sized not-spam message.

DID YOU EVER thougth about that ??????????????? Or do you only write 
something to fillup the community mailing list?

No - no way!

Thomas







Von:    "K Post" <nntp.p...@gmail.com>
An:     "ASSP development mailing list" <assp-test@lists.sourceforge.net>
Datum:  10.11.2021 20:22
Betreff:        Re: [Assp-test] Concept Question: Scan entire message for 
Bombs, regardless of MaxBytes setting? New MaxBytes recommendation?



After about 12 weeks of going from MaxBytes of 4k to MaxBytes of 50k, 've 
seen:
1) Rebuild go from just over an hour (with 30k MaxFiles) to just over 2 
hours.  I'm fine with that, there's more to scan
2) Bomb detections improve, as a lot of what's detected is beyond the 20k 
or 30k mark
3) but, bayesian false positives going way up.  Lots of mail that would 
have (correctly) been delivered, is now getting too high of a score and is 
blocked.

Surely #3 is specific to the types of messages my users are getting and I 
can tweak settings.  BUT, it makes me raise this question again:
Is there logic to having a separate MaxBytes setting like MaxBytesForBombs 
that's used only during message delivery?  That way, the entire message 
can be scanned for bombs, but the rebuild could use a lower number to 
better balance the differential between the average sized spam and average 
sized not-spam message.



On Mon, Nov 1, 2021 at 2:43 PM K Post <nntp.p...@gmail.com> wrote:
When looking at the "Use this HTML Parser" section on the GUI, I found 
this line:
it is recommended to set MaxBytes to 50000 (be carefull on heavy load 
systems - spam bomb regular expressions will take longer using 50000!).\
I'm going to change my settings and see how bad the rebuild time is.  I've 
got enough processing power and RAM now, but the disks aren't SSD.  Just a 
4 disk Raid 1+0 traditional HDD setup.  We'll see...

Since HTMl email accounts for a big percentage of all mail,  might it be a 
good idea to update/expand the guidance in the MaxBytes section of the 
GUI?   



On Fri, Oct 29, 2021 at 8:40 PM K Post <nntp.p...@gmail.com> wrote:
Summary:
Should/could any consideration be given to having ASSP scan the entire 
message at the time it is received for Bombs (only), while still using 
MaxBytes for Bayesian/HMM?

We've been having some cleverly crafted messages slipping through all 
filters that would be easy to catch with Bombs if only the catchable 
content came before MaxBytes.  These messages are 20kb+, They have a scam 
phone number at the very end of the larger than MaxBytes messages.  I 
want/need to use bombs to catch the scam phone numbers.

With MaxBytes set to 3000, which is useful for faster RebuildSpamDB, these 
BombDataRE matches just aren't being caught.  If I increase MaxBytes, my 
BombDataRE catches them, but then rebuildspamdb is (probably? see below) 
longer than it needs to be.

So, is there any value in considering a MaxBytesAdditionalForBombs 
variable which would be added to MaxBytes and only used when scanning for 
bombs as messages arrive?   Would that kill performance??  Other 
downsides?

We could still only look at MaxBytes for Bayesian/HMM since it's only 
MaxBytes used when building those databases.

What do you think?

And while we're talking MaxBytes:
I've asked this before, is the guidance for 3kb for MaxBytes once there's 
a mature corpus still a valid recommendation?  With unlimited horsepower 
and ram, sure, why not, do 30kb or 100kb.  That's not my reality, so I 
want to see where to best allocate resources. If 3kb is still the 
guidance, even though the spam files I'm seeing have a median size around 
20kb, so be it.  I feel like when that guidance was written, html wasn't 
used as prolifically in spam.  The median size of notspam in my corpus is 
about 40kb.  That's determined unscientifically by sorting by size and 
scrolling to approximately half way down.

Thanks.  Have a good weekend.
Ken
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test




DISCLAIMER:
*******************************************************
This email and any files transmitted with it may be confidential, legally 
privileged and protected in law and are intended solely for the use of the 

individual to whom it is addressed.
This email was multiple times scanned for viruses. There should be no 
known virus in this email!
*******************************************************


_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to