On Thu, 29 Jul 2010 04:52:15 -0700 (PDT) andrij <andriy.stet...@gmail.com> wrote:
> If I run sa-learn, will these e-mails be processed with the > RelayCountry plugin before being tokenized? Yes. > Is it not > enough just to add something to the country code in RelayCountrly.pm > to make it longer, like "$cc = "Code" . $cc;"? Could do, but I dislike the way that would tokenize. e.g. in the two cases: "TW" and "GB TW", TW would be the same token even though they correspond to different scenarios -------------------------------------------------------------------- On Thu, 29 Jul 2010 09:26:18 +0300 Henrik K <h...@hege.li> wrote: > Maybe also the originating (first) client should have some prefix. I thought about that, but I don't like the idea of giving special significance to anything other than the verified country recorded at the edge of the trusted network. With ordered pairs the spammer is likely to make thing worse by forging headers. > Also not tokenized is X-Languages.. The ASN plugin is nominally tokenized, but it never seems to work - probably a timing problem with the dns reply. -------------------------------------------------------------------- On Thu, 29 Jul 2010 12:28:11 +1200 Jason Haar <jason.h...@trimble.co.nz> wrote: > shouldn't you choose more > unique strings (since you can)? Otherwise couldn't Bayes misclassify > when such words show up as part of email messages? Received headers have prefixes.