Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Loren Wilton
Unfortunately I'm not on the SpamAssassin Bayes modules -- I wrote my own Bayes Engine because I wanted to do that and then thought about including the Rules results from SpamAssassin. I don't know where this might be going, but it seems to be working extremely well for me based on a train

Re: URIBL_BLACK matching on messages with no URLs in them...

2007-06-30 Thread SM
At 12:07 30-06-2007, Jo Rhett wrote: Note: yes, uribl has their own mailing list. That server has been down for quite some time, so I gave up and posted it here in case someone is dual listed and can fix it. There's no URL in this message. What is it mis-matching against? There was a URL in

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Tom Allison
On Jun 30, 2007, at 6:29 PM, Loren Wilton wrote: And after typing all this I'm thinking you might be right. But part of this approach is to run all these rules in YES/NO fashion and see if the probability is significant. For example: If I tested for SOME_TEST=NO and found it was sco

Re: URIBL_BLACK matching on messages with no URLs in them...

2007-06-30 Thread Theo Van Dinter
On Sat, Jun 30, 2007 at 12:07:04PM -0700, Jo Rhett wrote: > There's no URL in this message. What is it mis-matching against? When in doubt, run through "spamassassin -D": [9710] dbg: uridnsbl: domains to query: sync.pl svcolo.com SA doesn't just look for full URLs, it looks for things that coul

Re: plugins

2007-06-30 Thread Theo Van Dinter
On Sat, Jun 30, 2007 at 11:22:36AM -0700, JP Kelly wrote: > What is the best way to check what plugins SA is using? Same as everything else, run "spamassassin -D --lint". :) -- Randomly Selected Tagline: "Internet exceeded user level, please wait until a user logs off before attempting to log b

URIBL_BLACK matching on messages with no URLs in them...

2007-06-30 Thread Jo Rhett
Note: yes, uribl has their own mailing list. That server has been down for quite some time, so I gave up and posted it here in case someone is dual listed and can fix it. There's no URL in this message. What is it mis-matching against? Begin forwarded message: From: *snip* Date: June 29,

Re: Spam PDF

2007-06-30 Thread arni
Mikael Syska schrieb: Kind a new to spam ... and especially how people use bayes. So how many ham mails do you get per day ? wandering if I could do something to my system so bayes may score higher I have read some where that spam mails in bayes should be alot higher than ham mails ... is

Re: Spam PDF

2007-06-30 Thread Mikael Syska
arni wrote: [snip snap] I looked for the lowest scoring email of the past 2 days (dont save them longer), this is the one: X-Spam-Status: Yes, score=10.7 required=5.0 tests=BAYES_99,DCC_CHECK, DKIM_POLICY_SIGNSOME,HTML_MESSAGE,LOGINHASH1,LOGINHASH2,MIME_HTML_MOSTLY autolearn=no

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Loren Wilton
Just a thought - what if we had some central servers for real time reporting where the SA rule hits and scores were reported in real time for some sort of live scoring or analysis or dynamic adjusting? Just thinking out loud here. Something I've wanted to see for about 4 years now; ie: as long

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Loren Wilton
And after typing all this I'm thinking you might be right. But part of this approach is to run all these rules in YES/NO fashion and see if the probability is significant. For example: If I tested for SOME_TEST=NO and found it was scoring a probability of ~0.500 then it's indisputable tha

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Tom Allison
On Jun 30, 2007, at 2:55 PM, Bart Schaefer wrote: On 6/29/07, Tom Allison <[EMAIL PROTECTED]> wrote: The thought I had, and have been working on for a while, is changing how the scoring is done. Rather than making Bayes a part of the scoring process, make the scoring process a part of the B

DNS list service to detect the registrar barrier

2007-06-30 Thread Marc Perkel
OK - tell me if this is useful. I created a DNS list that you can pass a host name to and get information as to where the registrar barrier is. You can use it as follows: dig .rb.junkemailfilter.com Example: dig perkel.com.rb.junkemailfilter.com - returns 127.0.0.1 dig perkel.co.uk.rb.junkemai

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Bart Schaefer
On 6/29/07, Tom Allison <[EMAIL PROTECTED]> wrote: The thought I had, and have been working on for a while, is changing how the scoring is done. Rather than making Bayes a part of the scoring process, make the scoring process a part of the Bayes statistical Engine. As an example you would simp

Re: user_prefs

2007-06-30 Thread Duane Hill
On Fri, 29 Jun 2007 at 19:43 -0400, [EMAIL PROTECTED] confabulated: OK, thanks. I'm not using spamassassin or spamd. I'm using Mail::SpamAssassin in a perl script. What does '-x' do for Mail::SpamAssassin? Nothing being you are calling SA directly from perl. You should set dont_copy_prefs to

plugins

2007-06-30 Thread JP Kelly
What is the best way to check what plugins SA is using?

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Marc Perkel
Loren Wilton wrote: You have a bit of a chicken and egg problem at the start. Until some learning takes place in the system. Two possibilities. The rules exist and have scores. Assume they are maintained, for whatever reason. 1.Until Bayes has enough info to kick in, classification

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Marc Perkel
Tom Allison wrote: On Jun 30, 2007, at 1:20 AM, Marc Perkel wrote: Tom Allison wrote: For some years now there has been a lot of effective spam filtering using statistical approaches with variations on Bayesian theory, some of these are inverse Chi Square modifications to Niave Bayes or

Re: config clarification

2007-06-30 Thread Lindsay Haisley
On Sat, 2007-06-30 at 07:07 -0400, Tom Allison wrote: > For configuration options listed in perldoc Mail::SpamAssassin can I > put the settings into local.cf? > > Mail::SpamAssassin::Conf says yes, but it doesn't say it applies to > args for Mail::SpamAssassin->new(); According to the perldoc

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Tom Allison
On Jun 30, 2007, at 8:07 AM, Loren Wilton wrote: You have a bit of a chicken and egg problem at the start. Until some learning takes place in the system. Two possibilities. The rules exist and have scores. Assume they are maintained, for whatever reason. 1.Until Bayes has enough

Re: Confused about which bayes db gets used with spamc?

2007-06-30 Thread Bob McClure Jr
On Sat, Jun 30, 2007 at 05:41:19AM -0700, CptanPanic wrote: > > Hello, > I run spamc from my procmail on incoming messages. Does this mean that all > messages are using root bayes_db? No. > If so why do the clients have stuff > updated in their db in their home directories? Because spamc (actu

Confused about which bayes db gets used with spamc?

2007-06-30 Thread CptanPanic
Hello, I run spamc from my procmail on incoming messages. Does this mean that all messages are using root bayes_db? If so why do the clients have stuff updated in their db in their home directories? I am trying to figure this out so I can do sa-learn correctly. Thanks, CP -- View this message

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Loren Wilton
You have a bit of a chicken and egg problem at the start. Until some learning takes place in the system. Two possibilities. The rules exist and have scores. Assume they are maintained, for whatever reason. 1.Until Bayes has enough info to kick in, classification is done by the scores.

config clarification

2007-06-30 Thread Tom Allison
For configuration options listed in perldoc Mail::SpamAssassin can I put the settings into local.cf? Mail::SpamAssassin::Conf says yes, but it doesn't say it applies to args for Mail::SpamAssassin->new(); And what does 'save_pattern_hits' get me that I otherwise wouldn't have?

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Tom Allison
On Jun 30, 2007, at 4:46 AM, John Andersen wrote: On Friday 29 June 2007, Tom Allison wrote: It would be the Bayes process that determines the effective number of points you assign for each HIT based on what it's learned about it from you. So the tags of: ADVANCE_FEE_1, ADVANCE_FEE_2 would

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread Tom Allison
On Jun 30, 2007, at 1:20 AM, Marc Perkel wrote: Tom Allison wrote: For some years now there has been a lot of effective spam filtering using statistical approaches with variations on Bayesian theory, some of these are inverse Chi Square modifications to Niave Bayes or even CRM114 and o

Re: A different approach to scoring spamassassin hits

2007-06-30 Thread John Andersen
On Friday 29 June 2007, Tom Allison wrote: > It would be the Bayes process that determines the effective number of > points you assign for each HIT based on what it's learned about it > from you. So the tags of: ADVANCE_FEE_1, ADVANCE_FEE_2 would be > represented as a token of format: > ADVANCE_