On Thu, 31 May 2012 09:05:00 +0200
"Andrzej A. Filip" wrote:
> a) Unicode itself may require canonicalization too.
Perl's Encode module should take care of that.
> b) some spammers do not declare encoding properly so some encoding
> guessing would be handy
Possibly, but probably not. Guessin
On 05/29/2012 09:58 PM, David F. Skoll wrote:
> This idea is growing out of a thread I started in which someone pointed me
> to https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3062
>
> Ignoring the locale under which SA runs and also ignoring the character
> encoding of the message can make
On Wed, 30 May 2012 08:26:44 -0700
jdow wrote:
> I'm idly wondering what affect this would have on the time to scan a
> single email.
Actually converting from the original encoding to UTF-8 is very fast.
Internally, Perl uses pretty fast C code to convert between character
encodings.
As for Uni
I'm idly wondering what affect this would have on the time to scan a
single
email. I'd suspect the time required would increase significantly if the
user has a "bloody ridiculous (but effective) lot of rules", such as I
use.
I had the same thought but figured that we will have to improve th
On 2012/05/29 13:18, Kevin A. McGrail wrote:
On 5/29/2012 3:58 PM, David F. Skoll wrote:
This idea is growing out of a thread I started in which someone pointed me
to https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3062
Ignoring the locale under which SA runs and also ignoring the charac
On Wed, 30 May 2012 14:43:54 +0100
RW wrote:
> UTF-8 wont work, it will need to be UTF-32 to be compatible with
> sa-compile. From the re2c man page:
Ah. Too bad. :(
(I don't use sa-compile, so this is not a killer problem for me, but
I can see how it could be for some people.)
On Wed, 30 Ma
On Wed, May 30, 2012 at 02:43:54PM +0100, RW wrote:
> On Tue, 29 May 2012 15:58:21 -0400
> David F. Skoll wrote:
>
>
> > I'm thinking of making something (a plugin, maybe?) that canonicalizes
> > text/* parts to UTF-8 and lets you write rules using Unicode regexes.
> > Something like:
>
> > Acc
On Tue, 29 May 2012 15:58:21 -0400
David F. Skoll wrote:
> I'm thinking of making something (a plugin, maybe?) that canonicalizes
> text/* parts to UTF-8 and lets you write rules using Unicode regexes.
> Something like:
> According to the perlunicode man page:
>
>Regular Expressions
>
On 5/29/2012 3:58 PM, David F. Skoll wrote:
This idea is growing out of a thread I started in which someone pointed me
to https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3062
Ignoring the locale under which SA runs and also ignoring the character
encoding of the message can make body matc
Hi,
This idea is growing out of a thread I started in which someone pointed me
to https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3062
Ignoring the locale under which SA runs and also ignoring the character
encoding of the message can make body matching rules behave differently
on differen
10 matches
Mail list logo