Re: Re-running SA on an mbox

2009-09-20 Thread Theo Van Dinter
You probably want "spamassassin --mbox". :) It won't modify the messages in-place, but you can do something like "spamassassin --mbox infile > outfile". If you're talking about sa-learn, though, it also knows --mbox. On Sun, Sep 20, 2009 at 9:46 PM, MySQL Student wrote: > Yeah, that's kind of w

Re: About reporting

2009-09-13 Thread Theo Van Dinter
On Sun, Sep 13, 2009 at 5:08 PM, João Eiras wrote: > Should the file message.txt in the example contain the full -mail with > headers, attachments and everything ? Yes. It should be the original and complete message. > Does the reporting tool remove all information about the receiver for > priv

Re: Filtering depending mail header

2009-09-08 Thread Theo Van Dinter
There's no way to do that with SpamAssassin itself. Once you send something to SA, it will do the whole process (there's short circuiting, but that's not really what you want here). It sounds like you're trying to not filter internal mail but filter external mail, so I would recommend two things:

Re: How do I make Net::DNS::Resolver take /etc/hosts into account?

2009-07-01 Thread Theo Van Dinter
On Wed, Jul 1, 2009 at 3:23 AM, Per Jessen wrote: > Back to the subject line - how do I make Net::DNS::Resolver > take /etc/hosts into account? a) of course it doesn't, /etc/hosts isn't DNS, so why would Net::DNS look at it? :) b) my guess is that you can't, but it's a question for the Net::DNS fo

Re: How many people are still using perl 5.6.x?

2009-06-25 Thread Theo Van Dinter
Well, the point is that if it works, don't break it. Yes, you can totally avoid upgrades, depending on your environment. Sometimes you have no choice and continue to run old versions of software or firmware or ... Get over it. :) If you want to continue debating system administration issues, there

Re: Plugin extracting text from docs

2009-06-25 Thread Theo Van Dinter
On Thu, Jun 25, 2009 at 3:41 PM, Jonas Eckerman wrote: > Matus example was a Word document that contained as PDF wich (might in turn > contain an image). A plugin that knows how to read word document could > extract th text of the word document and then use "set_rendered" to make > that avaiölable

Re: Plugin extracting text from docs

2009-06-25 Thread Theo Van Dinter
On Thu, Jun 25, 2009 at 1:12 PM, Jonas Eckerman wrote: >> Already exists, check recent list history for "set_rendered". > > I though that was for text only. It is only for text. > In any case, any plugin looking for images, or a PDF, will most likely look > at MIME type and/or file name, and then

Re: Plugin extracting text from docs

2009-06-25 Thread Theo Van Dinter
On Thu, Jun 25, 2009 at 11:48 AM, Matus UHLAR - fantomas wrote: > I am not sure but I think something alike was done. What I mean is to have > generic chain of format converters, where at the end would be plain image > or even text, that could be processed by classic rules like bayes, > replacetags

Re: BAYES_99 score & lint

2009-06-22 Thread Theo Van Dinter
The debug output is saying that the meta rule, LOCAL_BAYES_RTF, has a dependency, BAYES_99, which has a 0 score. In the score line, there are two zero values. ;) It depends what scoreset you're running in. Also, just because 50_scores.cf has something set doesn't mean something later on doesn't c

Re: Bayes and SQL.

2009-06-22 Thread Theo Van Dinter
On Mon, Jun 22, 2009 at 6:06 AM, Kasper Sacharias Eenberg wrote: > I'm not completely sure that force-expire does anything. I ran it > several times last week, and nothing showed up in the 'last expiry > atime' column. So i figured it wasn't working. Please keep in mind that "--force-expire" mean

Re: new spam using large images

2009-06-19 Thread Theo Van Dinter
09 at 7:00 PM, Rosenbaum, Larry M. wrote: >> From: felic...@kluge.net On Behalf Of Theo Van Dinter >> >> On Fri, Jun 19, 2009 at 3:04 AM, Jason Haar >> wrote: >> > Speaking of image/rtf/word attachment spam; is there any work going >> on >> > to standardize

Re: new spam using large images

2009-06-19 Thread Theo Van Dinter
On Fri, Jun 19, 2009 at 4:42 PM, Charles Gregory wrote: > H. Big question for developers: Does the performance 'burden' of a large > e-mail come from the 'reading' of that mail into spamassassin and initial > processing? Or is the 'cost' of a large message only 'paid' when SA attempts > to run

Re: new spam using large images

2009-06-19 Thread Theo Van Dinter
On Fri, Jun 19, 2009 at 3:04 AM, Jason Haar wrote: > Speaking of image/rtf/word attachment spam; is there any work going on > to standardize this so that the textual output of such attachments could > be fed back into SA? That functionality already exists (has for almost 3 years, actually), but as

Re: Suggested Change For FS_TEEN_BAD

2009-06-18 Thread Theo Van Dinter
On Thu, Jun 18, 2009 at 7:26 AM, Michael Monnerie wrote: > On Mittwoch 17 Juni 2009 Theo Van Dinter wrote: >> Yes, it matters (one path is tried then the other has to be tried, as >> opposed to having a single path) > > So which is better performance wise? I guess [sz]? but I

Re: Suggested Change For FS_TEEN_BAD

2009-06-17 Thread Theo Van Dinter
Yes, it matters (one path is tried then the other has to be tried, as opposed to having a single path), though the overall amount is probably negligible. Perl's RE compiler could well optimize this away anyway. On Wed, Jun 17, 2009 at 7:45 PM, Kelson wrote: > Wouldn't it be more efficient to wri

Re: Suggested Change For FS_TEEN_BAD

2009-06-15 Thread Theo Van Dinter
On Tue, Jun 16, 2009 at 12:23 AM, Andy Dorman wrote: > However, I was a little surprised that SpamAssassin did not have a test for > a phrase in the subject that seemed to clearly indicate potential child porn > like "girls getting f**ked". SpamAssassin is not a porn filter, whatever the variety.

Re: Capturing and using values....

2009-06-14 Thread Theo Van Dinter
No, SA doesn't do that. The best way to do this is to write a plugin where you can do whatever you want. :) On Sun, Jun 14, 2009 at 3:18 PM, Charles Gregory wrote: > Got a usage question. Is there a simple mechanism, similar to Perl's use > of parantheses and $1 to 'capture' a value in one rule a

Re: Question on add-to-blacklist

2009-06-02 Thread Theo Van Dinter
Well, the first problem is that the AWL has no impact on Bayes. They're totally independent. Perhaps you want "sa-learn" ? On Tue, Jun 2, 2009 at 2:32 PM, Larry Starr wrote: > I have been using the AWL ( --add-addr-to-blacklist ) for some time, to bump > new spam senders above the "Bayes-99" scor

Re: Identifying Source of False Positives

2009-06-01 Thread Theo Van Dinter
fwiw, even if there isn't a blank line, SA will figure it out (though it'll trigger a MISSING_HB_SEP rule hit). As for the debug output ... it depends, how did you run the command (ie: what was the command you tried). My guess is you did something like "spamassassin -D filename", where filename g

Re: sa-learn doesn't remember messages it's already learned from

2009-05-31 Thread Theo Van Dinter
When you say "the database", do you mean "bayes_toks" or "bayes_toks and bayes_seen"? If the former, you need to grant write privs to bayes_seen as well. Also, when in doubt, run w/ -D to see what's going on. On Sun, May 31, 2009 at 1:41 PM, Russell Jones wrote: > I am running a global bayes d

Re: Plugin/TVD.pm

2009-05-31 Thread Theo Van Dinter
That depends, what's TVD.pm? ;) Doing a quick search shows http://mail-archives.apache.org/mod_mbox/spamassassin-users/200603.mbox/%3c20060316233124.gv22...@kluge.net%3e which was a conversation we had way back in 2006 about SA 3.1 and bug 4255. There was a TVD.pm in discussion, so I assume that

Re: Filtering through mailing lists

2009-05-29 Thread Theo Van Dinter
Sure, change your mail system so it doesn't call SA more than once on the same message. :) On Fri, May 29, 2009 at 9:26 AM, Garik wrote: > Is there anything that can be done so there's only one instance of > [**SPAM**] in the subject? Have postfix strip out the spam headers from the > subject, or

Re: Problem with check_invalid_ip()

2009-05-29 Thread Theo Van Dinter
None of the IPs you listed will match. Have you tried simply running a loop in Perl to see what the results are? Also, "negation ~" ? What do you mean? "=~" is not a negation (that would be !~). Also also, the "^" and "$" chars are important. If you remove them, you change the RE. On Fri, May

Re: Error when running sa-update

2009-05-20 Thread Theo Van Dinter
What version of IO::Zlib do you have installed? sa-update line 82 is it trying to load IO::Zlib 1.04 or later: use IO::Zlib 1.04; So my guess is that you either have an early non-version exporting version, or a strange/corrupted module. Either way, reinstalling it would be the way to go. On

Re: catch22: MIRRORED.BY wrong, sa-update won't

2009-05-19 Thread Theo Van Dinter
just fyi, I left spamassassin.kluge.net up for over a month after removing it from the MIRRORED.BY file, and forced a new update to deal with https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6083. I figured that anyone using sa-update would run it at least once a month, and then get the new

Re: Boxtrapper and Spamassassin Cpanel 11 strange behaviour.

2009-05-11 Thread Theo Van Dinter
fwiw, I also confirm any CR mails that I get. I just wanted to paste in this quote... :) "challenge response is a great way to tell people they are less important than you" - Dan Quinlan via IRC On Mon, May 11, 2009 at 2:33 PM, Dave Pooser wrote: > Not necessarily true-- anytime I see o

Re: bayes training doesn't seem to have any affect

2009-05-05 Thread Theo Van Dinter
On Tue, May 5, 2009 at 5:40 PM, Micah Anderson wrote: >> Eh?  Last journal sync atime is Jan 1 1970? >> Try running:   sa-learn --sync > > Doesn't seem to change the 'last journal sync atime' from 0. [...] > I'm using a mysql DB and I've got the following set in my local.cf: SQL Bayes DBs don't h

Re: Error: "spamc: connection attempt to spamd aborted after 3 retries"

2009-05-05 Thread Theo Van Dinter
This has been said before, but there seems to still be some confusion. In short -- you seem to think you're using amavis, and have an amavis config file ... But instead you seem to be calling spamc/spamd, which is completely different and unrelated. If you want to use amavis, then stop using spa

Re: Errors during installation spamassasssin

2009-05-05 Thread Theo Van Dinter
Mail::SPF replaced Mail::SPF::Query. You should pick one or the other, though Mail::SPF is preferred. See the INSTALL doc. Also note, the module diag output is not a list of things that you need to install, it's just a list that can help when debugging. On Tue, May 5, 2009 at 4:58 AM, Jack Raa

Re: Spam from windows live

2009-05-04 Thread Theo Van Dinter
2009/5/4 Karsten Bräckelmann : >> via https://issues.apache.org/SpamAssassin/show_bug.cgi?id=2865.  In > > No commit pointer. I'm lazy, Theo, any hints to the actual commit so I > don't have to dig? :) Sure. I found it by a) looking at the code and validating my understanding, and b) looking at s

Re: Error: "spamc: connection attempt to spamd aborted after 3 retries"

2009-05-04 Thread Theo Van Dinter
If you're using amavis, what is calling spamc? It sounds like something changed your config somewhere. Did someone put in a procmailrc entry? On Mon, May 4, 2009 at 2:57 PM, Alejandro Cabrera Obed wrote: > Dear all, I use Postfix (version 2.3.8-2+etch1) + amavisd-new (version > 2.4.2-6.1) + sp

Re: The weirdest problem

2009-05-04 Thread Theo Van Dinter
would probably just cause more confusion. On Mon, May 4, 2009 at 1:27 PM, Adam Katz wrote: > Theo Van Dinter wrote: >> Then there's the AWL, aka the historical score averager, which has >> some commands via "spamassassin" to do simple manipulation, usually to >

Re: Spam from windows live

2009-05-04 Thread Theo Van Dinter
2009/5/4 Karsten Bräckelmann : >> Bear in mind that an email that gets a Bayes score of more than one >> point can't be autolearned as ham. > > Nope, this is wrong. > > The Bayes rules (as well as some other rules) do NOT have any impact on > the auto-learning. In fact, the auto-learner even uses a

Re: [sa] Re: The weirdest problem .....

2009-05-04 Thread Theo Van Dinter
You're wrong (but you're close). :) You can configure your own whitelist_from_* and blacklist_from_* (or the other whitelist/blacklist commands) in your user_prefs/configs. Either you have the config or you don't, and the scores are for the rule not each sender, so in that sense, it's "permanent".

Re: Can't locate File/Scan/ClamAV.pm

2009-05-03 Thread Theo Van Dinter
Apparently the clamav.pm plugin requires other modules which you didn't install. You need to find out what the dependencies are, and make sure they're met before trying to use the plugin. On Sun, May 3, 2009 at 12:05 PM, Chris wrote: > Can't locate File/Scan/ClamAV.pm in @INC (@INC > contains:

Re: Restarting bayes

2009-05-02 Thread Theo Van Dinter
bayes_seen is rather irrelevant. bayes_toks is very binary-oriented, and uses lots of pack() calls. There is no SA-based "validity" check for the DB files/data. If you think the DB file itself is corrupt, you could try the appropriate DBM tools (db_verify, etc.) The dump/restore method really sh

Re: Looks like sa-learn --spam troubles

2009-05-01 Thread Theo Van Dinter
I would say it's less someone poisoning your DB and more your DB becoming corrupt. As it says, a pack format of dec(73) is not a valid value. It's set by the BayesStore module itself, not influenced by the token in question. You can try to do a dump/verify/restore ... ala: sa-learn --sync sa-l

Re: trying to score based on image name and image size

2009-04-30 Thread Theo Van Dinter
There could be various reasons ranging from "plugin isn't loaded" (though you'd get an error w/ the rules then) to "image isn't exactly that size", to "plugin can't determine width+height from image", to ... Assuming the plugin is loaded ("spamassassin -D plugin --lint" would tell you), and you've

Re: 419 emailBL?

2009-04-29 Thread Theo Van Dinter
On Wed, Apr 29, 2009 at 7:56 PM, Adam Katz wrote: >> I guess it depends what you mean by "enormous".  A sought rule update is >> 135k. > > And 135k doesn't add up to a lot of bandwidth?  I suppose it depends > on the number of users, and I'm figuring worst-case scenario, e.g. > when/if it ships e

Re: 419 emailBL?

2009-04-29 Thread Theo Van Dinter
On Wed, Apr 29, 2009 at 8:06 PM, John Hardin wrote: >> And 135k doesn't add up to a lot of bandwidth? > > ...so don't look for updates more than once every day or two. Yeah, but I think the point was that a frequently changing ruleset would be downloaded frequently. > And if bandwidth at the ser

Re: [SA] 419 emailBL?

2009-04-29 Thread Theo Van Dinter
On Wed, Apr 29, 2009 at 6:24 PM, Adam Katz wrote: > The mechanism for sa-update is brilliant, but > doesn't lend itself to enormous indices of frequently-changing rulesets. I guess it depends what you mean by "enormous". A sought rule update is 135k. The likelihood is, imo, that you would proba

Re: Procmail Setup NOT Working

2009-04-28 Thread Theo Van Dinter
2009/4/28 Robert Ober : > It was global and I want it to stay global.  The old procmailrc is: > > DROPPRIVS=yes > > :0fw > | /usr/bin/spamc That's a global config, but you're running it per-user due to the DROPPRIVS line. fyi. > All I want to do now is have all the identified spam(X-Spam-Status:

Re: Code Rot?

2009-04-27 Thread Theo Van Dinter
fwiw, I was going to say "Yes" to the first question. Not sure about the second question, though I've always wanted to see more sharing/give-back from those folks. While there have been a bunch of mails on the dev list, most of it is incorrectly opened bugs, or other randomness. IMO, there hasn't

Re: Image spam and failing rule

2009-04-26 Thread Theo Van Dinter
It's already been mentioned, but mimeheader is the right way to look at the headers of MIME parts. The rule of thumb is "if you are using 'full' you're probably doing it wrong". :) On Sun, Apr 26, 2009 at 11:57 AM, Charles Gregory wrote: > On Sat, 25 Apr 2009, Gary Forrest wrote: >> >> We are r

Re: DATE_IN_FUTURE

2009-04-24 Thread Theo Van Dinter
You'd really want to post the message headers in pastebot or something so people can look at them. It's not just the Date header, the rule also looks at the Received headers, etc. On Fri, Apr 24, 2009 at 1:44 PM, Rik wrote: > I was stumped on a question today about DATE_IN_FUTURE. My googling >

Re: Bayes filter not always triggered

2009-04-20 Thread Theo Van Dinter
Mon, Apr 20, 2009 at 11:27 AM, m.b wrote: > If user would be missing, it would always cause problems. But it works 75% of > the time. > > Mark > > > Theo Van Dinter-2 wrote: >> >> On Mon, Apr 20, 2009 at 8:47 AM, m.b wrote: >>> scantime=3.2,size=2745,

Re: Bayes filter not always triggered

2009-04-20 Thread Theo Van Dinter
On Mon, Apr 20, 2009 at 8:47 AM, m.b wrote: > scantime=3.2,size=2745,user=(unknown),uid=104,required_score=5.0,rhost=,raddr=..,rport=57786,mid= > > Do you have any suggestions why not every message is passing through BAYESS? > I thought it is was locking problem but I'am using

Re: accept only gpg/pgp mail

2009-03-07 Thread Theo Van Dinter
It's already been mentioned, but SpamAssassin doesn't accept, deliver, or route mail. It simply marks up a message, particularly with some added headers, and then you would need something else to filter/route mails as you want. As for looking for encrypted vs unencrypted mails, you'd have to writ

Re: how to make a custom ruleset

2009-03-06 Thread Theo Van Dinter
Just fyi, this particular topic keeps getting raised here. It'd be great if people would search the list archives. :) One of the last times around: http://www.nabble.com/forum/ViewPost.jtp?post=21296293&framed=y In short, if you want to do this, write a plugin. REs are great until you get comp

Re: Something doofuzzled in a * ^To: line.

2009-02-23 Thread Theo Van Dinter
Oh, and having a sample mail via pastebin/etc would be handy if you want more commentary about the mail. :) On Mon, Feb 23, 2009 at 6:52 PM, Theo Van Dinter wrote: > It sounds like an issue w/ kmail/vim and not so much a nefarious > spammer ability. > > And I'm not sure

Re: Something doofuzzled in a * ^To: line.

2009-02-23 Thread Theo Van Dinter
It sounds like an issue w/ kmail/vim and not so much a nefarious spammer ability. And I'm not sure what you mean by "unlisted header". If you mean: [other headers] To: unlisted header Then the answer is "unlisted header" is actually the first line of the body. On Mon, Feb 23, 2009 at 5:55 PM,

Re: cpan question

2009-02-22 Thread Theo Van Dinter
Since you don't need Net::Ident for SA, I'm going to say no. :) On Sat, Feb 21, 2009 at 10:28 PM, Gene Heskett wrote: > On Saturday 21 February 2009, Bill Landry wrote: >>Gene Heskett wrote: >>> Using cpan, trying to install Net::Ident (the other bits except razor were >>> nominal from the same

Re: Everything gets a score of 0

2009-02-21 Thread Theo Van Dinter
According to the debug output, you just have the openprotect channel and not the SA updates channel. Hence, none of the standard rules exist. Run "sa-update". :) On Sat, Feb 21, 2009 at 8:15 PM, oliver wrote: > This is a clean install on a gentoo hardened box. I'm using SA 3.2.5 and > have lear

Re: NO_RELAYS FP on relayed mail via IPv6

2009-02-21 Thread Theo Van Dinter
On Sat, Feb 21, 2009 at 7:11 PM, Greg Troxel wrote: > This is a funny case, since the message in question is generated by a > machine that I would set as TRUSTED. I am the moderator for > regional-bos...@netbsd.org, and it gets spam, stunningly enough. The > mail is sent to me over IPv6, and SA

Re: misc_10.cf

2009-02-09 Thread Theo Van Dinter
10_misc.cf isn't in 3.2, 3.1 was the last version to have it. In 3.2 it's called 10_default_prefs.cf. You should have it installed in the default rules dir, probably /usr/share/spamassassin. And no, it's not editable. Or more specifically, you shouldn't edit it. On Mon, Feb 09, 2009 at 09:40:4

Re: Calling spamc and looping through files

2009-02-08 Thread Theo Van Dinter
I would use "formail -s" to go through the mbox file, and pipe the mail through procmail w/ an appropriate recipe file to filter the mails as you'd want. SpamAssassin is happy to markup your mails, but has no filtering capabilities since it doesn't deliver mail. On Sun, Feb 08, 2009 at 04:37:30PM

Re: html experts: empty

2009-01-29 Thread Theo Van Dinter
On Thu, Jan 29, 2009 at 08:50:32PM +0100, Per Jessen wrote: > > you have LEGIT EMAIL with this in it? > > > > I do too. AFAICT, it's Microsoft related. taking a look at my january corpus, there are a relative lot of hits for that, including things like "