Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-05 Thread Bowie Bailey
On 8/4/2010 6:07 PM, Happy Chap wrote: > > No, we're not using an SQL backend and every users has their own bayes > database. You mentioned previously that you are using 'sa-learn -u'. I thought that option only worked with SQL databases? In my setup, I have lots of virtual users under the same

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-05 Thread Matus UHLAR - fantomas
> >It's unlikely that that could push the BAYES RESULT down to BAYES_00 > >unless there is uncorrected mistraining. On 04.08.10 06:07, Happy Chap wrote: > Possibly, but I suspect mistraining isn't a problem because apart from this > specific type of spam, Spamassassin is doing (and has done for so

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread Happy Chap
Karsten Bräckelmann-2 wrote: > > > So when you confirmed by running sa-learn --dump magic previously, did > you first su to the user in question? The Bayes database does exist in > the user's $HOME/.spamassassin/, right? > Yes, I had su'ed to that user and yes, they have their own bayes_see

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread Karsten Bräckelmann
On Wed, 2010-08-04 at 14:39 -0700, Happy Chap wrote: > Bowie Bailey wrote: > > Stupid question here, but are you sure you are training the same > > database that SA is using? > > > > This is a fairly frequent problem. Common cases are: > > > > 1) SA being called as 'mailuser' and you are doing

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread Happy Chap
Bowie Bailey wrote: > > > Stupid question here, but are you sure you are training the same > database that SA is using? > > This is a fairly frequent problem. Common cases are: > > 1) SA being called as 'mailuser' and you are doing manual training on > root's database. > 2) You are manuall

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread Bowie Bailey
On 8/4/2010 4:24 PM, Happy Chap wrote: > Bowie Bailey wrote: >> On 8/4/2010 4:23 AM, Happy Chap wrote: >> >> You ARE manually training bayes (sa-learn) on these missed spams, >> right? That is probably the most useful thing you can do if you are >> getting Bayes_00 on them. > Hi Bowie, oh yes, e

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread Happy Chap
John Hardin wrote: > > On Wed, 4 Aug 2010, Happy Chap wrote: > > > Apart from BAYES_00 what rules are they hitting? > > Thanks for your reply John. They're all more or less the same triggering: BAYES_00 HTML_MESSAGE MPART_ALT_DIFF RDNS_NONE and occasionally they also pick up one of the

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread Happy Chap
Henrik K wrote: > > On Wed, Aug 04, 2010 at 06:58:52AM -0700, Happy Chap wrote: > > Do the tokens look such that they might be used in legimate messages? > Usually you just have to sa-learn --spam enough of such spams to get > atleast > BAYES_50. > > I have no idea what kind of spams they are

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread Happy Chap
Bowie Bailey wrote: > > On 8/4/2010 4:23 AM, Happy Chap wrote: > > You ARE manually training bayes (sa-learn) on these missed spams, > right? That is probably the most useful thing you can do if you are > getting Bayes_00 on them. > > Hi Bowie, oh yes, every night. -- View this message in

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread John Hardin
On Wed, 4 Aug 2010, Happy Chap wrote: In that case (and I've been barking up the wrong tree) do you have any suggestion as to what my next move should be to try to trap this type of spam? I'm moderately technical, but I think I've probably reached the limit of my current knowledge but am happy

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread Bowie Bailey
On 8/4/2010 4:23 AM, Happy Chap wrote: > Hi, > > We started getting (over the last 2 months say) lots of spam, which > Spamassassin isn't picking up as spam. Analysing these, they all seem to be > of the same type where many paragraphs of random text are "hidden" inside an > HTML comment (either c

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread Henrik K
On Wed, Aug 04, 2010 at 06:58:52AM -0700, Happy Chap wrote: > > > > Henrik K wrote: > > > > > > Instead of speculating, try: > > > > cat msg | spamassassin -t -D bayes 2>&1 | grep bayes: > > > > It will tell you exactly what tokens are considered. > > > > > > Hi Henrik, > > Thanks for yo

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread Happy Chap
Henrik K wrote: > > > Instead of speculating, try: > > cat msg | spamassassin -t -D bayes 2>&1 | grep bayes: > > It will tell you exactly what tokens are considered. > > Hi Henrik, Thanks for your reply. I'm not sure I totally understand all of the output to that, but I think that's tel

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread Henrik K
On Wed, Aug 04, 2010 at 01:23:32AM -0700, Happy Chap wrote: > > Hi, > > We started getting (over the last 2 months say) lots of spam, which > Spamassassin isn't picking up as spam. Analysing these, they all seem to be > of the same type where many paragraphs of random text are "hidden" inside an

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread Happy Chap
Hi RW, thanks for your reply. >It's unlikely that that could push the BAYES RESULT down to BAYES_00 >unless there is uncorrected mistraining. Possibly, but I suspect mistraining isn't a problem because apart from this specific type of spam, Spamassassin is doing (and has done for sometime) a ver

Re: Text contained in HTML comments causing BAYES_00 to classify as non-spam

2010-08-04 Thread RW
On Wed, 4 Aug 2010 01:23:32 -0700 (PDT) Happy Chap wrote: > > Hi, > > We started getting (over the last 2 months say) lots of spam, which > Spamassassin isn't picking up as spam. Analysing these, they all seem > to be of the same type where many paragraphs of random text are > "hidden" inside a