Re: [SAtalk] Spam Tracking

2002-05-22 Thread Craig R Hughes
Mark Martinec wrote: MM> Matt, MM> MM> | Unfortunately that's not strictly true. MM> | MM> | You could very easily poison the database using short english phrases. I MM> | see an awful lot of emails that just contain single words, such as: MM> | "Hello???" or "How did it go?" etc. Generating thin

Re: [SAtalk] Spam Tracking

2002-05-20 Thread Mark Martinec
Matt, | Unfortunately that's not strictly true. | | You could very easily poison the database using short english phrases. I | see an awful lot of emails that just contain single words, such as: | "Hello???" or "How did it go?" etc. Generating things like that using a | Markov Chain system wo

Re: [SAtalk] Spam Tracking

2002-05-20 Thread Michael Stenner
On Mon, May 20, 2002 at 09:35:44AM +0100, Matt Sergeant wrote: > Michael Stenner wrote: > > On Fri, May 17, 2002 at 04:15:34PM -0400, Theo Van Dinter wrote: > >>I would be extremely surprised if two people report different messages > >>that result in the same hash. Although completely possible, i

Re: [SAtalk] Spam Tracking

2002-05-20 Thread Matt Sergeant
Michael Stenner wrote: > On Fri, May 17, 2002 at 04:15:34PM -0400, Theo Van Dinter wrote: > >>I would be extremely surprised if two people report different messages >>that result in the same hash. Although completely possible, it's also >>very very unlikely. > > > Someone said on this list tha

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Olivier Nicole
> Yes, that is why I'm thinking of creating this database -- we can see what > tests are consistently bad and modify/eliminate them. Just one thought, you have to be carefull of rules that change contents along the time, but kept the same name. Olivier __

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Chris Petersen
> > 0.01 * 10^34 = 10^32 times. at 1,000,000,000 tries per second, that > > will only take you 10^23 seconds = roughly the age of the universe. > > Not to mention the challenge of coming up with 10^32 unique intelligible > ways of talking about penis enlargement, multilevel marketing, and wild >

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Sidney Markowitz
On Fri, 2002-05-17 at 14:32, Michael Stenner wrote: > Now, with odds of about 10^-34, if you decide you're going to try > enough hashes to give yourself a 1% CHANCE of finding one, you only > need to try > > 0.01 * 10^34 = 10^32 times. at 1,000,000,000 tries per second, that > will only take you

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Michael Stenner
On Fri, May 17, 2002 at 04:15:34PM -0400, Theo Van Dinter wrote: > I would be extremely surprised if two people report different messages > that result in the same hash. Although completely possible, it's also > very very unlikely. Someone said on this list that razor uses SHA1 (which I know to

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Chris Petersen
> Subjects being slightly different shouldn't be a problem because you can do > soundex or "like" searches when you have the data set. good point. advanced comparisons like that would help a lot. > I was debating the reply-to and from but maybe it's best just ot use all of > them for now. Aw

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Andrew Kohlsmith
> "last" received? or "first"? (meaning to say, the oldest). anyway, > yeah, that's probably accurate enough. Subject should also be a good one, > except for the few spams that put your name (or what they think your name > is) into the subject. You could also check reply-to or mailer-agent (o

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Sidney Markowitz
On Fri, 2002-05-17 at 12:03, Chris Petersen wrote: > > Offhand, how does Razor get false positives? I thought that it was MD5-based > > and the email had to be exact? > > it does. but md5 doesn't generate a unique id... there's no way that a > smallish number can be used to identify an infi

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Chris Petersen
> It's just an old habit. When I learned SQL I was taught (mostly from > the big SQL books) and of course the little black book of normalization, > _Handbook of Relational Database Design_ that table columns should try > to be unique yet understandable. ahh. I started db stuff with filemaker (a

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Theo Van Dinter
On Fri, May 17, 2002 at 01:11:26PM -0700, Chris Petersen wrote: > dunno. when I was exploring razor as a solution, I read a relatively > large number of complaints in their mailing list archive about false > positives (though "it's slow" seemed to be more of a concern to most > people) false

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Chris Petersen
> 1) Razor uses SHA1, not MD5. ah, noted. > 2) Either way, while you're correct (you _can_ have multiple inputs >with the same resulting hash), it's very unlikely to find two sets of >different data with the same hash output. So in reality, MD5/SHA1/etc >aren't unique, but they're u

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Theo Van Dinter
On Fri, May 17, 2002 at 12:03:24PM -0700, Chris Petersen wrote: > it does. but md5 doesn't generate a unique id... there's no way that a > smallish number can be used to identify an infinite number of possible > email combinations.. so while md5 can be used to check integrity of data > (si

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Andrew Kohlsmith
> heh, it all looks good to me. I think I'm just not quite sure what you're > up to (that, and understores in field names confuse me for some reason ;). It's just an old habit. When I learned SQL I was taught (mostly from the big SQL books) and of course the little black book of normalization,

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Chris Petersen
> Now I really want to do this. I'll see what I'm up to this weekend. :-) heh, it all looks good to me. I think I'm just not quite sure what you're up to (that, and understores in field names confuse me for some reason ;). > What really can you track with this besides scoring and the correla

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Andrew Kohlsmith
> wouldn't it be easier to integrate this into spamd? You'd already have > your db client set up that way. You're absolutely correct. duh on my part. :-) > Sounds like you've got it right.. You'd need two tables, something like: > > Create Table messages ( > m_id bigint primary key

Re: [SAtalk] Spam Tracking

2002-05-17 Thread Chris Petersen
> One thing I want to do is write a little C program that connects to Postgres > (or Perl but with a C client just like spamc/d) and reports on the tests that > *all* messages score on. wouldn't it be easier to integrate this into spamd? You'd already have your db client set up that way. > F