On Fri, 2014-08-22 at 17:44 -0700, Ian Zimmerman wrote:
> I know that if you misclassify a mail as spam with
>
> sa-learn --spam /path/to/ham
>
> you can later run
>
> sa-learn --ham /path/to/ham
>
> to correct the mistake, and SA will do the right thing (ie. forget the
> wrong classification
I know that if you misclassify a mail as spam with
sa-learn --spam /path/to/ham
you can later run
sa-learn --ham /path/to/ham
to correct the mistake, and SA will do the right thing (ie. forget the
wrong classification). And conversely, with ham <-> spam.
My question is, what happens if you
On Mon, 26 Nov 2012, John Hardin wrote:
On Mon, 26 Nov 2012, Ed Flecko wrote:
Hi folks,
I'm running SpamAssassin version 3.3.2 (running on Perl version
5.14.2) on FreeBSD 9.0.
I've exported a bunch of spam and ham messages from my Baracuda 400.
What format did the Barracuda
in version 3.3.2 (running on Perl version
> 5.14.2) on FreeBSD 9.0.
>
> I've exported a bunch of spam and ham messages from my Baracuda 400.
>
> I have an Excel .csv file of about 2500 spam messages and 2500 ham
> messages, and I'm wondering if I can supply those as a
On Mon, 26 Nov 2012, Ed Flecko wrote:
Hi folks,
I'm running SpamAssassin version 3.3.2 (running on Perl version
5.14.2) on FreeBSD 9.0.
I've exported a bunch of spam and ham messages from my Baracuda 400.
What format did the Barracuda export the messages in? It might be possible
t
Hi folks,
I'm running SpamAssassin version 3.3.2 (running on Perl version
5.14.2) on FreeBSD 9.0.
I've exported a bunch of spam and ham messages from my Baracuda 400.
I have an Excel .csv file of about 2500 spam messages and 2500 ham
messages, and I'm wondering if I can su
On Mon, 22 Aug 2011 15:46:14 +0200, J4K wrote:
# sa-learn --dump magic
0.000 0 3 0 non-token data: bayes db
version
0.000 0640 0 non-token data: nspam
0.000 0 7001 0 non-token data: nham
0.000 0 36689
Afternoon gentlemen,
Seems the Bayes dB has become lop-sided in favour of ham. SA is
doing its job as there is little spam coming through these recently. I
had hoped we could keep it one third spam and two thirds spam. Does the
slant shown below (nspam verses nham) cause any problems w
On 01/24/2011 04:42 PM, J4 wrote:
> Dear all,
>
> I am cure this question has come up before on this list, yet after
> spending a little while trawling Google, I did not find any sites :(
> So I ask here!
>
> Are there are any recent (<6 months) ham or spam corporaout there that
> I can down
Dear all,
I am cure this question has come up before on this list, yet after
spending a little while trawling Google, I did not find any sites :( So
I ask here!
Are there are any recent (<6 months) ham or spam corporaout there that I
can download and feed into sa-learn? I would like to give
On Fri, 30 Apr 2010 11:53:49 +0200
"Giampaolo Tomassoni" wrote:
> > Correct, but if those counts came from autolearning 90% of spam and
> > 30% of ham, then rescaling may be the correct thing to do.
> >
> > It may also be pragmatic, if a high spam/ham ratio is leading to
> > FPs, to keep the le
> On Thu, 29 Apr 2010 18:32:04 +0200
> "Giampaolo Tomassoni" wrote:
>
> > > what you need to do write a script that divides the metadata
> > > num_spam value and all the token Nspam counts by 3. The updated
> > > database can then be loaded back in with --restore.
> >
> > I don't know if this is
On 4/29/2010 8:25 AM, Frank Bures wrote:
> I've been running spamassassin for years. I am using auto-learn with very
> conservative thresholds. However, after several years of usage my spam
> database is about three time larger than my ham database and I am starting
> to see false positives.
>
>
On Thu, 29 Apr 2010 18:32:04 +0200
"Giampaolo Tomassoni" wrote:
> > what you need to do write a script that divides the metadata
> > num_spam value and all the token Nspam counts by 3. The updated
> > database can then be loaded back in with --restore.
>
> I don't know if this is going to be eff
> Hi,
>
> > I would instead, in order of effectiveness:
> >
> > a) expire old tokens;
> > b) eliminate tokens with very few ham/spam occurrences.
> > c) eliminate tokens with very close nham to nspam values;
>
> Can you explain how to do this, or point to documentation that w
Hi,
> I would instead, in order of effectiveness:
>
> a) expire old tokens;
> b) eliminate tokens with very few ham/spam occurrences.
> c) eliminate tokens with very close nham to nspam values;
Can you explain how to do this, or point to documentation that would explain?
My
> On Thu, 29 Apr 2010 08:25:29 -0400
> Frank Bures wrote:
> what you need to do write a script that divides the metadata num_spam
> value and all the token Nspam counts by 3. The updated database can
> then be loaded back in with --restore.
I don't know if this is going to be effective. After all
On Thu, 29 Apr 2010 08:25:29 -0400
Frank Bures wrote:
> I've been running spamassassin for years. I am using auto-learn with
> very conservative thresholds. However, after several years of usage
> my spam database is about three time larger than my ham database and
> I am starting to see false
On 2010/04/29 8:25 AM, Frank Bures wrote:
I've been running spamassassin for years. I am using auto-learn with very
conservative thresholds. However, after several years of usage my spam
database is about three time larger than my ham database and I am starting
to see false positives.
Is there
I've been running spamassassin for years. I am using auto-learn with very
conservative thresholds. However, after several years of usage my spam
database is about three time larger than my ham database and I am starting
to see false positives.
Is there a way how to "shrink" the spam database?
T
e false positives.
>
> Is there a way how to "shrink" the spam database?
there's no spam and ham database - there's just database with tokens, where
each token has its own value that indicates if it's spammy or hammy token.
You can run expire on the database, but I recomme
I've been running spamassassin for years. I am using auto-learn with very
conservative thresholds. However, after several years of usage my spam
database is about three time larger than my ham database and I am starting
to see false positives.
Is there a way how to "shrink" the spam database?
T
On Tue, 1 Aug 2006 19:54:43 +0530, sokka <[EMAIL PROTECTED]> opined:
> Dear Group Member,
>
> Can anyone explian me the clear definition of SPAM and HAM
>
> regards
Please see:
http://tqmcube.com/spamdef.php and;
http://tqmcube.com/contrib.php (which was contributed by
> Dear Group Member,
> Can anyone explain me the clear definition of SPAM and HAM
Most everyone agrees that spam is unsolicited e-mail sent from entities
with whom you do not have a previously established business or personal
relationship ...OR... where you've opted to
Use google, or wikipedia on spam:
http://en.wikipedia.org/wiki/Spam_%28electronic%29
Ham is everything that is NOT spam.
hope that clears things... :)
Gabor Sipos
>
Dear Group Member,
Can anyone explian me the clear definition of SPAM and HAM
regards
On Tue, Aug 01, 2006 at 07:54:43PM +0530, sokka wrote:
> Can anyone explian me the clear definition of SPAM and HAM
The short version is:
spam: unsolicited bulk email (aka: bad mail)
ham: anything that's not spam (aka: good mail)
It really comes down to consent as opposed to what
From: sokka [mailto:[EMAIL PROTECTED]
Sent: Tue 01-Aug-06 16:24
To: SpamAssassin Users List
Subject: SPAM and HAM
Dear Group Member,
Can anyone explian me the clear definition of SPAM and HAM
regards
On Tuesday 01 August 2006 07:24, sokka wrote:
> Dear Group Member,
>
> Can anyone explian me the clear definition of SPAM and HAM
Spam is spam. Ham ain't. What's the problem?
--
Gary G. Taylor * Pomona, CA * 34.07°N 117.75°W
[EMAIL PROTECTED] * http://www.donavan.org
Dear Group Member,
Can anyone explian me the clear definition of SPAM and HAM
Yes. I'm sure quite a few people can.
Loren
Dear Group Member,
Can anyone explian me the clear definition of SPAM and HAM
regards
Jorgen Lundman a écrit :
I would assume someone has already solved this, but it seems hard to
search for.
I would like to setup SA site wide, so that all the users can use it.
However, users are not very technical, so it would be nice if they
could have an easy method to train their own DB
ot; <[EMAIL PROTECTED]>
Cc: "ML-spamassassin-talk" ; "William
Stearns" <[EMAIL PROTECTED]>
Sent: Monday, October 03, 2005 10:10 PM
Subject: Re: Forward to learn spam and ham?
Good evening, Jorgen,
On Tue, 4 Oct 2005, Jorgen Lundman wrote:
I would assume someone ha
t;
Sent: Monday, October 03, 2005 10:10 PM
Subject: Re: Forward to learn spam and ham?
Good evening, Jorgen,
On Tue, 4 Oct 2005, Jorgen Lundman wrote:
I would assume someone has already solved this, but it seems hard to
search for.
I would like to setup SA site wide, so that all the users c
Good evening, Jorgen,
On Tue, 4 Oct 2005, Jorgen Lundman wrote:
I would assume someone has already solved this, but it seems hard to search
for.
I would like to setup SA site wide, so that all the users can use it.
However, users are not very technical, so it would be nice if they could have
I would assume someone has already solved this, but it seems hard to search for.
I would like to setup SA site wide, so that all the users can use it. However,
users are not very technical, so it would be nice if they could have an easy
method to train their own DBs.
I envisioned that a "[EM
spam
if a mail scores <= ham threshold, it's ham; >= spam threshold, it's spam;
and > ham threshold and < spam threshold, it's "unsure". this is similar
to the SpamBayes UI.
- --j.
<|---|-->
| |
ham | .unsure.. | spam
if a mail scores <= ham threshold, it's ham; >= spam threshold, it's spam;
and > ham threshold and < spam threshold, it's "unsure".
; It sounds like you have put in a lot of time to become an expert in the
> traditional wisdom of SA and to tune it accordingly.
Not more than others here. Not really too much time.
And, I assume you
> spend a lot of time keeping it tuned and dealing with SA upgrades.
Not at all. I have
Kai Schaetzl wrote:
Joe Flowers wrote on Mon, 11 Jul 2005 12:09:29 -0400:
That's bad, really bad
detection ...
No. It's good, really good detection.
You should improve that instead of trying to find a
barrier which gives you the best FP:FN ratio.
I'm not trying to find the best F
Joe Flowers wrote:
BTW, if anyone knows a command line program that can easy run thu a
bunch of mbox files and tell how many messages are in them, I will
report back how many ham and how many spam messages that I have fed to
bayes. It's far from perfect, but it may offer some interesting info
> BTW, if anyone knows a command line program that can easy run thu a
bunch of mbox files and tell how many messages are in them, I will report
> back how many ham and how many spam messages that I have fed to bayes.
Well, I thought this might give some good stats on the FP:FN ratio, but
I for
Kai Schaetzl wrote on Mon, 11 Jul 2005 22:31:29 +0200:
> With the default of 5 we get almost none, not even one per day.
That was about FPs. Wrong. We don't get *any* FPs. We do not get even one
*FN* per day.
Kai
--
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: htt
jdow wrote:
> A few weeks ago I'd have said "Easy, Ducky!" Then I ran into DoveCot
> that uses an indexed almost "mbox" file. There is no way to do it
> other than "good guess". However, for a traditional UNIX mbox file
> you should be able to nail it perfectly simply looking for the "From"
> featu
Loren Wilton wrote on Mon, 11 Jul 2005 11:30:07 -0700:
> Which of course means that by picking the ratio value you can pick pretty
> much any fp/fn ratio you want.
Only if the distribution was equal.
Kai
--
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.c
; which may indeed float
because tomorrow's messages score different than yesterday's. It does not
float at all in the long run. And it exists *only* in the long run. It may
throw off next day's detection quite heavily, since there's no guarantee
spam and ham look the sa
A few weeks ago I'd have said "Easy, Ducky!" Then I ran into DoveCot
that uses an indexed almost "mbox" file. There is no way to do it other
than "good guess". However, for a traditional UNIX mbox file you should
be able to nail it perfectly simply looking for the "From" feature. The
dirt stupid "m
jdow wrote:
> The greater the separation the
> better the results for a decision point between them.
> But anything you can do that widens the
> typical score distribution between ham and spam is a good thing.
Amen
> There's another thing worth noting -- the SpamAssassin score distribution
> for hams and spams isn't even.
I don't necessarily see that those particular curve shapes necessarily in
any way invalidate this method, although they do bias the method somewhat.
The two curves are essentially smooth cu
Matt:
I know you know a lot more about this than I do, but for what it's
worth, you're impressions/intuitions are very close to mine.
Originally back in April, I started off using the "average of the
means", but that let through way too much spam.
So, what I have now is it set to 30% above th
> > score of -2.1532284. I have the divding line "set" at 30% of the
> > distance between the average ham score and average spam score (30% above
> > the average ham score). So, the dividing line is currently floating
> > around 0.55416414.
>
>
> The only problem I see with this approach is that i
From: "Matt Kettler" <[EMAIL PROTECTED]>
> Joe Flowers wrote:
> > I don't know if this will help anyone or not, but I wanted to report
> > back just in case.
> >
> > In early April, I completely unhinged the dividing line between what SA
> > score is used to mark a message as spam or ham (5.00 = d
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
the real-world figures can be seen for various thresholds in
the rules/STATISTICS*.txt files...
- --j.
Matt Kettler writes:
> Joe Flowers wrote:
> > Matt Kettler wrote:
> >
> >> The only problem I see with this approach is that it treats false
> >>
Joe Flowers wrote:
> Matt Kettler wrote:
>
>> The only problem I see with this approach is that it treats false
>> positives and
>> false negatives as being equally bad.
>>
>>
>
> We do get many more false negatives than false positives, even though we
> don't get false positives very often - t
Thanks Jason!
That's good, new info for me. That'll help me *at the very least*
visualize what I am trying to do a little better. I've been very curious
to know what the rough shapes of those graphs look like.
Joe
Justin Mason wrote:
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
There'
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
There's another thing worth noting -- the SpamAssassin score distribution
for hams and spams isn't even.
If you draw a graph of hams and spams, plotting the number of mails in
each category as the vertical axis and the score they get as teh
horizonta
Matt Kettler wrote:
The only problem I see with this approach is that it treats false positives and
false negatives as being equally bad.
We do get many more false negatives than false positives, even though we
don't get false positives very often - they are rare.
We certainly don't get 1
Joe Flowers wrote:
> I don't know if this will help anyone or not, but I wanted to report
> back just in case.
>
> In early April, I completely unhinged the dividing line between what SA
> score is used to mark a message as spam or ham (5.00 = default). This
> allows the system and this dividing l
ctron could jump across into the next
energy band without ever being in the middle no-man's land - giving rise
to a way to measure SA's imperfectness? These percentages are stabs in
the dark about what the distributions of spam and ham really look like.
What is intriguing to me is: why c
This is quite interesting, and seems reasonably obvious that with the right
sort of mail (at least, maybe with any mail) this shoudl work better, since
it self tunes to your conditions. It does of course assume a reasonable
fp/fn rate to start, but SA is generally pretty good about that.
How have
I don't know if this will help anyone or not, but I wanted to report
back just in case.
In early April, I completely unhinged the dividing line between what SA
score is used to mark a message as spam or ham (5.00 = default). This
allows the system and this dividing line to drift "freely" to an
60 matches
Mail list logo