Eduardo,

It would be very helpful if you could translate as many rules as you
want (and even create new portuguese ones where appropriate).  The way
to do this would be to follow the examples for the
spanish/german/french/polish rules and descriptions we already have. 
See the files in rules/25* and rules/30* of the distribution. 
Essentially, th idea is to prefix lines in your file with "lang pt" and
then create rules/descriptions as normal.  You can override scores for a
particular locale only also by prefixing with lang pt.  If you put
together a 25_xxx_pt.cf and 30_text_pt.cf you could then create a
bugzilla ticket at http://bugzilla.spamassassin.org/ and attach the
files to that ticket (tgz file or just straight attachment is fine) and
I'll add those into the distribution.

As far as setting scores on locale-specific rules (ie non english), I
suppose what we'll need to start doing is get a whole set of
spam/nonspam corpuses for each locale, and run mass-check separately for
each locale or something.  That starts getting into the domain of "time
to go professional" though.  I'm not sure how easy it would be to find
robust corpuses for each locale (we currently use about 200,000 english
spams and 100,000 english nonspams for feeding the GA).

C

On Thu, 2002-04-18 at 11:50, Eduardo Marcel Maçan wrote:
> Hello I have just joined the userbase of spamassassin, I have doing
> some preeliminary tests with my spam collection. I was happy to
> see that 50% of all spam I received is correctly identified as
> SPAM, and most of the rest lies in the border of being marked as
> SPAM (scoring around 4.1 ~ 4.9) by the default settings.
> 
> This rate would be greatly improved if there were rules specific
> for Brazilian spam (which comes in portuguese most of the time).
> Many of the english rules apply, it would be just a matter of
> adapting the strings that are looked for.
> 
> Since I didn't find it anywhere I ask it here: Would it be a good
> start trying to replicate the english tests, translate them and
> assign them the same weights as the english equivalents? Since
> there are not specific portuguese language tests in the rules
> I guess that only using the tools provided for feeding the
> genetic algorhythms with my false negatives would not contribute
> much to the overall process...
> 
> What should I do? Adapt the current tests in the way I've stated
> above and send the mods back to the development team? And after
> that begin sending stats about my false positives...?
> 
> Regards...
> 
> -- 
> Eduardo Marcel Maçan          Gerente de Redes / Network Manager
> [EMAIL PROTECTED]          Colégio Bandeirantes
> 
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
> 
> 


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to