Re: More text/plain questions

2014-08-05 Thread Quanah Gibson-Mount
--On Wednesday, July 23, 2014 9:39 PM +0100 Martin Gregorie wrote: On Wed, 2014-07-23 at 11:45 -0600, Amir 'CG' Caspi wrote: I'm definitely considering writing a rule to catch �[0-9]{3}; patterns. I'm definitely worried it could cause FPs, but are there common circumstances where legitimate

Re: More text/plain questions

2014-07-25 Thread Kevin A. McGrail
On 7/25/2014 6:19 PM, Amir Caspi wrote: On Jul 25, 2014, at 4:11 PM, Kevin A. McGrail wrote: You should look at the patch on bug 7068 (https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7068) Yes, but this is within the code itself. I was referring to how to do this in a local.cf, for

Re: More text/plain questions

2014-07-25 Thread Amir Caspi
On Jul 25, 2014, at 4:11 PM, Kevin A. McGrail wrote: > You should look at the patch on bug 7068 > (https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7068) Yes, but this is within the code itself. I was referring to how to do this in a local.cf, for example... Amir

Re: More text/plain questions

2014-07-25 Thread Kevin A. McGrail
On 7/25/2014 5:55 PM, Amir Caspi wrote: On Jul 24, 2014, at 4:08 PM, Philip Prindeville wrote: In text/plain with CTE of ‘7bit’ or ‘8bit’ it’s meaningless to use Unicode HTML entity encodings. It’s obviously not HTML. If you want Unicode in text/plain, it should be in base64 or quoted-prin

Re: More text/plain questions

2014-07-25 Thread Kevin A. McGrail
On 7/23/2014 2:27 PM, Paul Stead wrote: KAM's rules are also helping add a few extra points I try. https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7068 and https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7063 I've also implemented several rules to try and catch these types of

Re: More text/plain questions

2014-07-25 Thread Amir Caspi
On Jul 24, 2014, at 4:08 PM, Philip Prindeville wrote: > In text/plain with CTE of ‘7bit’ or ‘8bit’ it’s meaningless to use Unicode > HTML entity encodings. It’s obviously not HTML. > > If you want Unicode in text/plain, it should be in base64 or quoted-printable > CTE. Sure, but these spam

Re: More text/plain questions

2014-07-24 Thread Philip Prindeville
On Jul 24, 2014, at 4:48 PM, Amir 'CG' Caspi wrote: > On 2014-07-24 16:11, Philip Prindeville wrote: > >> You might have a shorter wait if you move to CentOS 6.5 instead. > I would, but the VPS software I'm using does not run on CentOS 6.x, only 5.x. > It's rather old software and I should co

Re: More text/plain questions

2014-07-24 Thread Amir 'CG' Caspi
On 2014-07-24 16:11, Philip Prindeville wrote: > You might have a shorter wait if you move to CentOS 6.5 instead. I would, but the VPS software I'm using does not run on CentOS 6.x, only 5.x. It's rather old software and I should convert to something else, but it's not worth the time I don't

Re: More text/plain questions

2014-07-24 Thread Philip Prindeville
On Jul 23, 2014, at 1:21 PM, Amir 'CG' Caspi wrote: > On 2014-07-23 13:14, Axb wrote: >> doesn't your VPS offer you shell access? >> if yes, uninstall the SA rpm stuff and install SA 3.4 from source/trunk. > > I think I didn't explain properly. I'm running the dedicated server on which > ther

Re: More text/plain questions

2014-07-24 Thread Philip Prindeville
On Jul 23, 2014, at 12:54 PM, Amir 'CG' Caspi wrote: >> >> Hope the patches above get pushed into production > Indeed, though I'm still running SA v3.3.x ... I'm on a CentOS 5.10 platform > and, because it's of the virtual-hosting control panel I use, I need my > software distributed in RPMs.

Re: More text/plain questions

2014-07-24 Thread Philip Prindeville
On Jul 23, 2014, at 11:45 AM, Amir 'CG' Caspi wrote: > On 2014-07-02 15:04, Amir Caspi wrote: >> For what it's worth, I just received a spam that basically is the same >> as what Philip complained about. I've posted a spample here: >> http://pastebin.com/Y2YGwL49 > [...] >> I'm wondering if we

Re: More text/plain questions

2014-07-23 Thread Paul Stead
On 23/07/14 21:24, Axb wrote: look at the HTML source, sharply - there's tons of little traits to dump in a meta rule I have these 'traits' in my custom Clamav rules, but that's another list... :) -- Paul Stead, Zen Internet Systems Engineer

Re: More text/plain questions

2014-07-23 Thread Axb
On 07/23/2014 09:54 PM, Paul Stead wrote: Making use of the meta rules seems to be the best here - this spam is being very tricky to catch - I'll mirror my previous statement that the suggested patches do pick up on this spam too look at the HTML source, sharply - there's tons of little trait

Re: More text/plain questions

2014-07-23 Thread Martin Gregorie
On Wed, 2014-07-23 at 21:49 +0200, Axb wrote: > Centos 5.x is rather dated. > > Not sure there'd be such an old Fedora > equivalent offering SA 3.4 rpms. > I'll say - a quick search shows that Centos 7.x is current. and SA 3.4.0 arrived after Fedora 20 was released. > He'd have to find the equ

Re: More text/plain questions

2014-07-23 Thread Axb
On 07/23/2014 10:06 PM, Amir 'CG' Caspi wrote: On 2014-07-23 13:38, Axb wrote: If you're using spamd, why not run a/multiple dedicated VMs for SA 3.4 and have your other VMs use the spamd on the SA VMs ? There is a dedicated spamd. It's the other tools that need to be distributed, like sa-le

Re: More text/plain questions

2014-07-23 Thread Amir 'CG' Caspi
On 2014-07-23 13:38, Axb wrote: If you're using spamd, why not run a/multiple dedicated VMs for SA 3.4 and have your other VMs use the spamd on the SA VMs ? There is a dedicated spamd. It's the other tools that need to be distributed, like sa-learn. Bayes rules are handled per-user. (No, I

Re: More text/plain questions

2014-07-23 Thread Paul Stead
On 23/07/14 20:44, John Hardin wrote: On Wed, 23 Jul 2014, Paul Stead wrote: body __LOC_COUNT_UNI /x[0-9A-F]{4};/ tflags __LOC_COUNT_UNI multiple Recommend maxhits on that. Apologies, I omitted the max hits... If you're only looking for 10+ hits, then maxhits=11 will allow you to det

Re: More text/plain questions

2014-07-23 Thread Axb
On 07/23/2014 09:43 PM, Martin Gregorie wrote: On Wed, 2014-07-23 at 13:21 -0600, Amir 'CG' Caspi wrote: I'm hoping someone will take up that task. 3.3.x was packaged as an rpm (on EPEL and other repos), so hopefully 3.4 will be, too. 3.4.0 is the standard SA package for Fedora, so I'd expec

Re: More text/plain questions

2014-07-23 Thread John Hardin
On Wed, 23 Jul 2014, Paul Stead wrote: On 23/07/14 19:54, Amir 'CG' Caspi wrote: Care to share? Counting encoded chars is easy, of course. I use the following to count the encoded chars: body __LOC_COUNT_UNI /x[0-9A-F]{4};/ tflags __LOC_COUNT_UNI multiple Recommend maxhits on that.

Re: More text/plain questions

2014-07-23 Thread Martin Gregorie
On Wed, 2014-07-23 at 13:21 -0600, Amir 'CG' Caspi wrote: > I'm hoping someone will take up that task. 3.3.x was packaged as an rpm > (on EPEL and other repos), so hopefully 3.4 will be, too. > 3.4.0 is the standard SA package for Fedora, so I'd expect to find it on RHEL and their various clone

Re: More text/plain questions

2014-07-23 Thread Martin Gregorie
On Wed, 2014-07-23 at 11:45 -0600, Amir 'CG' Caspi wrote: > I'm definitely considering writing a rule to catch �[0-9]{3}; > patterns. I'm definitely worried it could cause FPs, but are there > common circumstances where legitimate emails would include dozens to > hundreds of these? (The lates

Re: More text/plain questions

2014-07-23 Thread Axb
On 07/23/2014 09:21 PM, Amir 'CG' Caspi wrote: On 2014-07-23 13:14, Axb wrote: doesn't your VPS offer you shell access? if yes, uninstall the SA rpm stuff and install SA 3.4 from source/trunk. I think I didn't explain properly. I'm running the dedicated server on which there is VPS software.

Re: More text/plain questions

2014-07-23 Thread Paul Stead
On 23/07/14 19:54, Amir 'CG' Caspi wrote: Care to share? Counting encoded chars is easy, of course. I use the following to count the encoded chars: body __LOC_COUNT_UNI /x[0-9A-F]{4};/ tflags __LOC_COUNT_UNI multiple We can make some vars if we want: meta __LOC_HAS_0_UNI (__PDS_COUNT_U

Re: More text/plain questions

2014-07-23 Thread Amir 'CG' Caspi
On 2014-07-23 13:14, Axb wrote: doesn't your VPS offer you shell access? if yes, uninstall the SA rpm stuff and install SA 3.4 from source/trunk. I think I didn't explain properly. I'm running the dedicated server on which there is VPS software. I need RPMs so that they get distributed to

Re: More text/plain questions

2014-07-23 Thread Axb
On 07/23/2014 08:54 PM, Amir 'CG' Caspi wrote: Indeed, though I'm still running SA v3.3.x ... I'm on a CentOS 5.10 platform and, because it's of the virtual-hosting control panel I use, I need my software distributed in RPMs. Until someone builds a proper 3.4 rpm for CentOS/RHEL 5, I'm stuck. (I

Re: More text/plain questions

2014-07-23 Thread Amir 'CG' Caspi
On 2014-07-23 12:23, Paul Stead wrote: > I've also implemented several rules to try and catch these types of emails. Care to share? Counting encoded chars is easy, of course. One thing to note, webmail and my MUA often will render the encoded characters in their translated format, not liter

Re: More text/plain questions

2014-07-23 Thread Paul Stead
KAM's rules are also helping add a few extra points On 23/07/14 19:23, Paul Stead wrote: On 23/07/14 18:45, Amir 'CG' Caspi wrote: So, to follow up on this... over the past couple of weeks I've been getting a lot more FNs than normal, and almost every single one of these is an "encoded character

Re: More text/plain questions

2014-07-23 Thread Paul Stead
On 23/07/14 18:45, Amir 'CG' Caspi wrote: So, to follow up on this... over the past couple of weeks I've been getting a lot more FNs than normal, and almost every single one of these is an "encoded character" spam like the example above. Bayes training does appear to work, in that many of these

Re: More text/plain questions

2014-07-23 Thread Amir 'CG' Caspi
On 2014-07-02 15:04, Amir Caspi wrote: For what it's worth, I just received a spam that basically is the same as what Philip complained about. I've posted a spample here: http://pastebin.com/Y2YGwL49 [...] I'm wondering if we shouldn't write a rule looking for lots of �[0-9]{3}; patterns... s

Re: More text/plain questions

2014-07-07 Thread David F. Skoll
On Mon, 07 Jul 2014 19:29:11 -0400 Daniel Staal wrote: > Just to start the discussion: I'd say default to UTF-8 if not > otherwise specified and can't be worked out. (How hard to work on > 'working it out' is a question, of course.) It's the growing > standard, as far as I can tell. +1. UTF-8

Re: More text/plain questions

2014-07-07 Thread Daniel Staal
--As of July 7, 2014 5:20:01 PM -0400, Kevin A. McGrail is alleged to have said: On 7/7/2014 5:09 PM, Philip Prindeville wrote: On Jul 7, 2014, at 7:15 AM, Kevin A. McGrail wrote: On 7/7/2014 2:28 AM, John Wilcock wrote: Le 05/07/2014 19:08, Philip Prindeville a écrit : As for encoding a

Re: More text/plain questions

2014-07-07 Thread Kevin A. McGrail
On 7/7/2014 5:09 PM, Philip Prindeville wrote: On Jul 7, 2014, at 7:15 AM, Kevin A. McGrail wrote: On 7/7/2014 2:28 AM, John Wilcock wrote: Le 05/07/2014 19:08, Philip Prindeville a écrit : As for encoding a cyrillic small a: there are many ways to do this. iso-8859-4, utf-8, jp2212, gb2312,

Re: More text/plain questions

2014-07-07 Thread Philip Prindeville
On Jul 7, 2014, at 7:15 AM, Kevin A. McGrail wrote: > On 7/7/2014 2:28 AM, John Wilcock wrote: >> Le 05/07/2014 19:08, Philip Prindeville a écrit : >>> As for encoding a cyrillic small a: there are many ways to do this. >>> iso-8859-4, utf-8, jp2212, gb2312, win1252, etc. I don’t think this >>>

Re: More text/plain questions

2014-07-07 Thread Kevin A. McGrail
On 7/7/2014 2:28 AM, John Wilcock wrote: Le 05/07/2014 19:08, Philip Prindeville a écrit : As for encoding a cyrillic small a: there are many ways to do this. iso-8859-4, utf-8, jp2212, gb2312, win1252, etc. I don’t think this would be very efficient—there are just too many charsets possible.

Re: More text/plain questions

2014-07-06 Thread John Wilcock
Le 05/07/2014 19:08, Philip Prindeville a écrit : As for encoding a cyrillic small a: there are many ways to do this. iso-8859-4, utf-8, jp2212, gb2312, win1252, etc. I don’t think this would be very efficient—there are just too many charsets possible. Normalising the input message to UTF-8 bef

Re: More text/plain questions

2014-07-05 Thread Philip Prindeville
On Jul 4, 2014, at 12:08 AM, haman...@t-online.de wrote: > > Hi, > > while this is certainly not correct - and likely does not display in every > mail client - it would > probably work in several webmailers. Perhaps this is the configuration the > author of that > crap tested. > Now, I am som

Re: More text/plain questions

2014-07-03 Thread hamann . w
>> >> I got the following MIME body part below, and I�m wondering if it would >> >> make sense to filter on this as well. >> >> Given that it�s text/plain with an implicit charset=�us-ascii� and an >> >> implicit content-transfer-encoding of 7bit, the sequence &#x[0-9A-F]{4} >> >> doesn�t really

Re: More text/plain questions

2014-07-03 Thread Kevin A. McGrail
On 7/2/2014 5:04 PM, Amir Caspi wrote: Is there also a rule for UTF8-encoded Subject line? If so, it didn't pop. Just a quick note about this part of your email. This is extremely common to use UTF-8 and I doubt it would be an indicator of spam vs ham. I wouldn't even bother looking...

Re: More text/plain questions

2014-07-02 Thread Karsten Bräckelmann
On Wed, 2014-07-02 at 19:10 -0600, Philip Prindeville wrote: > On Jul 2, 2014, at 5:16 PM, Karsten Bräckelmann > wrote: > > That RE is a single, straight-forward alternation with two alternatives. > > > > The first one translates to a single char in a given, specific range. > > Basically, anyth

Re: More text/plain questions

2014-07-02 Thread Philip Prindeville
On Jul 2, 2014, at 5:16 PM, Karsten Bräckelmann wrote: > On Wed, 2014-07-02 at 14:44 -0600, Philip Prindeville wrote: >> Okay, was tinkering with the code below but the zero-width lookahead is >> not disqualifying ampersand followed by #x[0-9A-F]{4}; so the output >> is bogus (you can run this a

Re: More text/plain questions

2014-07-02 Thread Karsten Bräckelmann
On Wed, 2014-07-02 at 14:44 -0600, Philip Prindeville wrote: > Okay, was tinkering with the code below but the zero-width lookahead is > not disqualifying ampersand followed by #x[0-9A-F]{4}; so the output > is bogus (you can run this and see what I mean). > > What am I doing wrong? You are using

Re: More text/plain questions

2014-07-02 Thread Amir Caspi
On Jul 2, 2014, at 12:58 PM, David F. Skoll wrote: > I don't think so. Any MUA that tried to convert "е" to a > Unicode character in a text/plain part with implicit US-ASCII charset > and 7bit content transfer encoding is broken. An MUA should diplay > exactly "е" in this situation. It's a dif

Re: More text/plain questions

2014-07-02 Thread John Hardin
On Wed, 2 Jul 2014, Philip Prindeville wrote: On Jul 2, 2014, at 12:37 PM, John Hardin wrote: On Wed, 2 Jul 2014, Philip Prindeville wrote: Given that it’s text/plain with an implicit charset=“us-ascii” and an implicit content-transfer-encoding of 7bit, the sequence &#x[0-9A-F]{4} doesn’t

Re: More text/plain questions

2014-07-02 Thread Philip Prindeville
Okay, was tinkering with the code below but the zero-width lookahead is not disqualifying ampersand followed by #x[0-9A-F]{4}; so the output is bogus (you can run this and see what I mean). What am I doing wrong? #!/usr/bin/perl -w use warnings; use strict; my $data = <<__EOF__; Thе Rеаl Rе

Re: More text/plain questions

2014-07-02 Thread Philip Prindeville
On Jul 2, 2014, at 12:37 PM, John Hardin wrote: > On Wed, 2 Jul 2014, Philip Prindeville wrote: > >> Given that it’s text/plain with an implicit charset=“us-ascii” and an >> implicit content-transfer-encoding of 7bit, the sequence &#x[0-9A-F]{4} >> doesn’t really parse into a 16-bit character

Re: More text/plain questions

2014-07-02 Thread David F. Skoll
On Wed, 2 Jul 2014 11:37:33 -0700 (PDT) John Hardin wrote: > Nope. The content-transfer-encoding is only for the *transfer* part > of the process. Once the content reaches the MUA that content can be > further parsed by the MUA according to other encoding rules, such as > these escape sequences f

Re: More text/plain questions

2014-07-02 Thread John Hardin
On Wed, 2 Jul 2014, John Hardin wrote: On Wed, 2 Jul 2014, Philip Prindeville wrote: Given that it’s text/plain with an implicit charset=“us-ascii” and an implicit content-transfer-encoding of 7bit, the sequence &#x[0-9A-F]{4} doesn’t really parse into a 16-bit character, would it? That wou

Re: More text/plain questions

2014-07-02 Thread John Hardin
On Wed, 2 Jul 2014, Philip Prindeville wrote: Given that it’s text/plain with an implicit charset=“us-ascii” and an implicit content-transfer-encoding of 7bit, the sequence &#x[0-9A-F]{4} doesn’t really parse into a 16-bit character, would it? That would be a broken MUA that made such a leap..