maarten van den Berg said: > > Following up to myself, since I want to clarify something here... > > Another aspect that is relevant to me (but arguably not to most users of > SA > and I'm aware of that...) is that for me, english is not my native > language, > neither am I a resident of an english-speaking country. And because of > this, > my email is mixed; one part is dutch, one part is all the mailinglists I > try > to follow (which are in english). But not being a resident, the fact is > that > for all of my customers and myself, ANY mail mentioning mortgages, loans, > ejaculation et al is a surefire sign of spam. If not it would have > mentioned > hypotheken, leningen and klaarkomen, which are the dutch translations. :-) > > Now I don't expect SA to know dutch; that would be unfair. But what I > would > like is some way to score those english terms way higher than an american > would or could. For an american, mortgage does not spell spam per se. But > for ME it does, and I can practically guarantee I will not ever get an > email > that mentions "mortgage" together with "you have been approved" which > won't > be spam.
At the risk of being repetitive, this is precisely the sort of thing bayes excels at. Give it a shot (hopefully you have some ham'n'spam saved up already), I think you will be pleased. > Well, none of this is your concern of course. But I would really really Perhaps it's true that your success is not directly anyone's concern but your own. However, the regulars on this list are basically a buncha SA users who are trying to improve their results and help others do the same along the way. > really > like if there was a way to have those typical english spam-words score way > higher than they do now. Could we maybe envision two rulesets, one for > english-speaking residents and one for non-english speaking residents...? > I edited the score file myself but not only is it a hard, long and > error-prone > task, but by editing it I throw away much of the valueable knowhow which > assembled that score-list in the first place. But I am faced with the > fact > that over 95% of my spam is in english and that I cannot sit back while > the > online pharmacies fly around me, so to speak. > Put yourself in my (our, if i'd be speaking for all non-english countries) > place and ask yourself this question: Would you accept a score of only 0.5 > for a rule that says "gratis hypotheekadvies" or "vijf miljoen > emailadressen" > ?? No, of course you wouldn't, because you'd know that a company that > pretends to sell you a mortgage from 12000 miles away will never ever be a > genuine offer... Knowing that there are regulars on this list who's primary language is NOT English, anyone care to share how their setup handles English and non-English spam? > > In other words, a lot of us get bitten by the fact that "mortgage" in some > countries, in some contexts can be non-spam but for the rest of us it is a > surefire sign to be spam. And again that is not anyone's fault but we > should > try and make SA flexible enough to accomodate this fact by changing the > scoring. I know you can teach SA to recognize spam in ones' own language, > but what is missing right now is a simple way to make SA much more immune > to > the abundant english spam, which arguably is by FAR the bulk of all > spam... > > Kind regards, > Maarten > > > On Friday 07 November 2003 22:21, maarten van den Berg wrote: >> On Friday 07 November 2003 18:43, Matt Kettler wrote: >> > At 10:29 AM 11/7/2003, Maarten J H van den Berg wrote: >> > >Sorry if this has been discussed in the past... >> > >> > It's been discussed many times.. It's very common for people to have a >> > very deep misunderstanding of how SA scoring works. Most people fall >> into >> > the trap of over-simplifying the problem, and simply assuming that >> some >> > rule or another "must" be a good spam rule, when in fact it's not. > > <snip> -- Chris Thielen Easily generate SpamAssassin rules to catch obfuscated spam phrases: http://www.sandgnat.com/cmos/ ------------------------------------------------------- This SF.Net email sponsored by: ApacheCon 2003, 16-19 November in Las Vegas. Learn firsthand the latest developments in Apache, PHP, Perl, XML, Java, MySQL, WebDAV, and more! http://www.apachecon.com/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk