On 18/05/12 03:18, David F. Skoll wrote:
I looked at the regex and it seems that Perl treats är as having a
word boundary in the \b sense between the "ä" and the "r"
On 18.05.12 07:26, Jason Haar wrote:
A bit OT, but is it because your perl is running under "C" locale
instead of se? i.e. would
On Fri, 18 May 2012 08:37:07 +1200
Jason Haar wrote:
> I'm no linguist but this is probably an extremely hard problem to
> solve. An email can have mixtures of languages, so in a perfect world
> we should be able to change locale per word (or per char? - eeek!).
The only sane solution is to re-e
On 18/05/12 07:54, dar...@chaosreigns.com wrote:
> Locale handling is a known problem is SA:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=3062
bug opened in 2004 :-(
I'm no linguist but this is probably an extremely hard problem to solve.
An email can have mixtures of languages, so i
On Fri, 18 May 2012 07:26:56 +1200
Jason Haar wrote:
> > I looked at the regex and it seems that Perl treats är as having a
> > word boundary in the \b sense between the "ä" and the "r"
> A bit OT, but is it because your perl is running under "C" locale
> instead of se?
Ah... could be. Hmm, ok.
On 05/18, Jason Haar wrote:
> A bit OT, but is it because your perl is running under "C" locale
> instead of se? i.e. would the word boundary definition change under
> different localization contexts? Doesn't help solve the problem for you,
> but it certainly flags a potential issue with a tonne of
On 18/05/12 03:18, David F. Skoll wrote:
>
> I looked at the regex and it seems that Perl treats är as having a
> word boundary in the \b sense between the "ä" and the "r"
A bit OT, but is it because your perl is running under "C" locale
instead of se? i.e. would the word boundary definition change