On Mon, 2011-02-14 at 04:31 +0200, David Juran wrote:
> After trying to run sa-learn on some false positives I had, I discovered
> (by the method of googling, so the topic certainly has been discussed
> before) that sa-learn by default ignores any message larger then 250k.

Oh, yay. You found my recent-ish posts regarding that? (Can't remember
this ever being discussed at least in years other than one thread with
my comments.)

> Now this limit is easy enough to disable but my question is, why is this
> default limit there? Won't is skew learning (i.e not all messages will
> be fed to the filter)? Or will disabling it somehow skew it? Or are
> there any other implications I should consider?

I believe that once just was an oversight, when the spamc default limit
has been raised to 500k years ago. The bad thing about it is, that it is
a hardcoded limit. I guess it should just be removed. Well, for the
explicit case, not auto-learning. That makes it more complicated.

Generally, no -- it won't skew learning. It will though, in the rare-ish
cases of spam exceeding 256k that kind of is resilient to training [1].
Even explicit training. But other than those few buggers, no, there's
hardly any impact to expect.

Thanks for reminding me, though. I meant to fix this eventually.


[1] Have some casino spam on file, that falls between the cracks.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to