Mark Martinec wrote:
MM> Matt,
MM>
MM> | Unfortunately that's not strictly true.
MM> |
MM> | You could very easily poison the database using short english phrases. I
MM> | see an awful lot of emails that just contain single words, such as:
MM> | "Hello???" or "How did it go?" etc. Generating thin
Matt,
| Unfortunately that's not strictly true.
|
| You could very easily poison the database using short english phrases. I
| see an awful lot of emails that just contain single words, such as:
| "Hello???" or "How did it go?" etc. Generating things like that using a
| Markov Chain system wo
On Mon, May 20, 2002 at 09:35:44AM +0100, Matt Sergeant wrote:
> Michael Stenner wrote:
> > On Fri, May 17, 2002 at 04:15:34PM -0400, Theo Van Dinter wrote:
> >>I would be extremely surprised if two people report different messages
> >>that result in the same hash. Although completely possible, i
Michael Stenner wrote:
> On Fri, May 17, 2002 at 04:15:34PM -0400, Theo Van Dinter wrote:
>
>>I would be extremely surprised if two people report different messages
>>that result in the same hash. Although completely possible, it's also
>>very very unlikely.
>
>
> Someone said on this list tha
> Yes, that is why I'm thinking of creating this database -- we can see what
> tests are consistently bad and modify/eliminate them.
Just one thought, you have to be carefull of rules that change
contents along the time, but kept the same name.
Olivier
__
> > 0.01 * 10^34 = 10^32 times. at 1,000,000,000 tries per second, that
> > will only take you 10^23 seconds = roughly the age of the universe.
>
> Not to mention the challenge of coming up with 10^32 unique intelligible
> ways of talking about penis enlargement, multilevel marketing, and wild
>
On Fri, 2002-05-17 at 14:32, Michael Stenner wrote:
> Now, with odds of about 10^-34, if you decide you're going to try
> enough hashes to give yourself a 1% CHANCE of finding one, you only
> need to try
>
> 0.01 * 10^34 = 10^32 times. at 1,000,000,000 tries per second, that
> will only take you
On Fri, May 17, 2002 at 04:15:34PM -0400, Theo Van Dinter wrote:
> I would be extremely surprised if two people report different messages
> that result in the same hash. Although completely possible, it's also
> very very unlikely.
Someone said on this list that razor uses SHA1 (which I know to
> Subjects being slightly different shouldn't be a problem because you can do
> soundex or "like" searches when you have the data set.
good point. advanced comparisons like that would help a lot.
> I was debating the reply-to and from but maybe it's best just ot use all of
> them for now. Aw
> "last" received? or "first"? (meaning to say, the oldest). anyway,
> yeah, that's probably accurate enough. Subject should also be a good one,
> except for the few spams that put your name (or what they think your name
> is) into the subject. You could also check reply-to or mailer-agent (o
On Fri, 2002-05-17 at 12:03, Chris Petersen wrote:
> > Offhand, how does Razor get false positives? I thought that it was MD5-based
> > and the email had to be exact?
>
> it does. but md5 doesn't generate a unique id... there's no way that a
> smallish number can be used to identify an infi
> It's just an old habit. When I learned SQL I was taught (mostly from
> the big SQL books) and of course the little black book of normalization,
> _Handbook of Relational Database Design_ that table columns should try
> to be unique yet understandable.
ahh. I started db stuff with filemaker (a
On Fri, May 17, 2002 at 01:11:26PM -0700, Chris Petersen wrote:
> dunno. when I was exploring razor as a solution, I read a relatively
> large number of complaints in their mailing list archive about false
> positives (though "it's slow" seemed to be more of a concern to most
> people)
false
> 1) Razor uses SHA1, not MD5.
ah, noted.
> 2) Either way, while you're correct (you _can_ have multiple inputs
>with the same resulting hash), it's very unlikely to find two sets of
>different data with the same hash output. So in reality, MD5/SHA1/etc
>aren't unique, but they're u
On Fri, May 17, 2002 at 12:03:24PM -0700, Chris Petersen wrote:
> it does. but md5 doesn't generate a unique id... there's no way that a
> smallish number can be used to identify an infinite number of possible
> email combinations.. so while md5 can be used to check integrity of data
> (si
> heh, it all looks good to me. I think I'm just not quite sure what you're
> up to (that, and understores in field names confuse me for some reason ;).
It's just an old habit. When I learned SQL I was taught (mostly from the big
SQL books) and of course the little black book of normalization,
> Now I really want to do this. I'll see what I'm up to this weekend. :-)
heh, it all looks good to me. I think I'm just not quite sure what you're
up to (that, and understores in field names confuse me for some reason ;).
> What really can you track with this besides scoring and the correla
> wouldn't it be easier to integrate this into spamd? You'd already have
> your db client set up that way.
You're absolutely correct. duh on my part. :-)
> Sounds like you've got it right.. You'd need two tables, something like:
>
> Create Table messages (
> m_id bigint primary key
> One thing I want to do is write a little C program that connects to Postgres
> (or Perl but with a C client just like spamc/d) and reports on the tests that
> *all* messages score on.
wouldn't it be easier to integrate this into spamd? You'd already have
your db client set up that way.
> F
19 matches
Mail list logo