On Fri, 2009-05-08 at 19:09 -0400, Adam Katz wrote:
> Finally, IIRC, some of the fuzzy checksum mechanisms go by patterns
> that take a keen interest in paragraph structure like that (or at
> least one was mentioned as well-loved at the last MIT Spam
> Conference), so make sure you're using Razor2, Pyzor, iXhash, and if
> permissible, DCC (though I'm not sure which of those use this method
> ... iXhash certainly does not).

Sure does.

Actually, IIRC, that's it's main algo. Strip everything but whitespace,
condense repeating whitespace into a single occurrence, and hash that.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to