Re: getting Bayes token data from spamassassin

2007-01-18 Thread Jonas Eckerman
Jonas Eckerman wrote: > I do not consider my plugin "nice" since it uses DBI in such an unoptimized > way. I did optimize it slightly yesterday, so maybe I do consider it almost nice now. :-) > It really should use a prepared statement Now it does this. It probably should use the DELAYED key

Re: getting Bayes token data from spamassassin

2007-01-17 Thread Michael Parker
Jonas Eckerman wrote: > Justin Mason wrote: >> http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin.html#item_bayes_learn > > Thanks! > >> by the way, a nice, working plugin that does this would be quite useful > > Since it was so straight-forward I made a small plugin that col

Re: getting Bayes token data from spamassassin

2007-01-17 Thread Jonas Eckerman
Jonas Eckerman wrote: > Justin Mason wrote: >> by the way, a nice, working plugin that does this would be quite useful > Since it was so straight-forward I made a small plugin that collects the raw > tokens in a SQL table. An extra note: I do not consider my plugin "nice" since it uses DBI in s

Re: getting Bayes token data from spamassassin

2007-01-17 Thread Jonas Eckerman
Justin Mason wrote: > http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin.html#item_bayes_learn Thanks! > by the way, a nice, working plugin that does this would be quite useful Since it was so straight-forward I made a small plugin that collects the raw tokens in a SQL table

Re[2]: getting Bayes token data from spamassassin

2007-01-17 Thread Fred T
Hello Stuart, Monday, January 15, 2007, 4:54:07 AM, you wrote: > I've searched around a bit, both on gmane and Google, but I haven't found > much more information regarding your two points. What IS stored in the > token field of the table bayes_token? And how is the SHA1 hash involved? > Where ca

Re: getting Bayes token data from spamassassin

2007-01-16 Thread Stuart Robinson
Thanks. Once I have this all figured out, I will write up something and put it on my homepage and post a link to it here. > > >> A SHA1 hash is taken of the original token value, and the bottom 40 > > >> bits are used as the token from then-on. There is a plugin call > > >> which can be used to s

Re: getting Bayes token data from spamassassin

2007-01-16 Thread Theo Van Dinter
On Tue, Jan 16, 2007 at 02:02:01PM -0800, Stuart Robinson wrote: > Couldn't the raw tokens just be kept in the same database by adding an > additional column to the table bayes_token that isn't indexed? That > wouldn't affect performance too much, would it? Besides requiring a new data layout for

Re: getting Bayes token data from spamassassin

2007-01-16 Thread Stuart Robinson
> On Tue, Jan 16, 2007 at 10:21:14AM +, Justin Mason wrote: > > by the way, a nice, working plugin that does this would be quite useful on > > the CustomPlugins wiki page, or contributed as an optional plugin... > > The plugin itself is pretty trivial -- the question is: what to do with > the

Re: getting Bayes token data from spamassassin

2007-01-16 Thread Theo Van Dinter
On Tue, Jan 16, 2007 at 10:21:14AM +, Justin Mason wrote: > by the way, a nice, working plugin that does this would be quite useful on > the CustomPlugins wiki page, or contributed as an optional plugin... The plugin itself is pretty trivial -- the question is: what to do with the token inform

Re: getting Bayes token data from spamassassin

2007-01-16 Thread Justin Mason
Michael Parker writes: > Stuart Robinson wrote: > > Hello, all. > > > >> On Mon, Jan 15, 2007 at 01:54:07AM -0800, Stuart Robinson wrote: > >>> I've searched around a bit, both on gmane and Google, but I haven't > >>> found much more information regarding your two points. What IS > >>> stored in

Re: getting Bayes token data from spamassassin

2007-01-15 Thread Theo Van Dinter
On Mon, Jan 15, 2007 at 08:48:33PM -0800, Stuart Robinson wrote: > I'll keep looking around. It might be nice to have a configuration option > that says whether or not to store the raw tokens in the database along > with their associated hash values. We discussed this at length when the change hap

Re: getting Bayes token data from spamassassin

2007-01-15 Thread Michael Parker
Stuart Robinson wrote: > Hello, all. > >> On Mon, Jan 15, 2007 at 01:54:07AM -0800, Stuart Robinson wrote: >>> I've searched around a bit, both on gmane and Google, but I haven't found >>> much more information regarding your two points. What IS stored in the >>> token field of the table bayes_tok

Re: getting Bayes token data from spamassassin

2007-01-15 Thread Stuart Robinson
Hello, all. > On Mon, Jan 15, 2007 at 01:54:07AM -0800, Stuart Robinson wrote: > > I've searched around a bit, both on gmane and Google, but I haven't found > > much more information regarding your two points. What IS stored in the > > token field of the table bayes_token? And how is the SHA1 hash

Re: getting Bayes token data from spamassassin

2007-01-15 Thread Theo Van Dinter
On Mon, Jan 15, 2007 at 01:54:07AM -0800, Stuart Robinson wrote: > I've searched around a bit, both on gmane and Google, but I haven't found > much more information regarding your two points. What IS stored in the > token field of the table bayes_token? And how is the SHA1 hash involved? A SHA1 ha

Re: getting Bayes token data from spamassassin

2007-01-15 Thread Matt Kettler
Stuart Robinson wrote: > I am interesting in getting the list of tokens used by spamassassin for > Bayesian classification so that I can investigate misclassifications. A > lot of pump-and-dump emails are getting through, and I'm trying to > understand why. > > My email set-up has spamassassin s

Re: getting Bayes token data from spamassassin

2007-01-15 Thread Stuart Robinson
I've searched around a bit, both on gmane and Google, but I haven't found much more information regarding your two points. What IS stored in the token field of the table bayes_token? And how is the SHA1 hash involved? Where can I find documentation of this? Any suggestions would be greatly apprecia

Re: getting Bayes token data from spamassassin

2007-01-15 Thread Theo Van Dinter
On Sun, Jan 14, 2007 at 11:43:39PM -0800, Stuart Robinson wrote: > However, the tokens produced all seems like gibberish. The same happens > when I called sa-learn with the --dump data flag. The tokens are parts of a SHA1 hash, so they may appear as "gibberish". > Why don't I see any of the words

getting Bayes token data from spamassassin

2007-01-14 Thread Stuart Robinson
I am interesting in getting the list of tokens used by spamassassin for Bayesian classification so that I can investigate misclassifications. A lot of pump-and-dump emails are getting through, and I'm trying to understand why. My email set-up has spamassassin storing tokens in a MySQL database.