Re: [WL] Re: [SAtalk] More obfuscation

2004-01-21 Thread David B Funk
On Tue, 20 Jan 2004, Charles Gregory wrote: > Right now, there would be no statistics, because the text obfu has just > started. But as a side note, we don't have the disk space to run Bayes for > all our users though I'm getting awfully tempted to talk the boss into > an extra disk or two. So

Re: [WL] Re: [SAtalk] More obfuscation

2004-01-20 Thread Lucas Albers
detcting obfuscation: html garbage tags:done normal language letter frequency:easy to do, easy to get by just modify random keyword to generate same frequency as english words. This would still catch the stupider spammers doing bayes poisoning. Detect poisoning attempt, and reject an addition to

Re: [WL] Re: [SAtalk] More obfuscation

2004-01-20 Thread Charles Gregory
On Wed, 21 Jan 2004, Sidney Markowitz wrote: > Does anyone who is concerned about the obfuscation have any statistics > to show that it really is a problem for the current rules plus network > tests plus a well-trained Bayes? Right now, there would be no statistics, because the text obfu has jus

Re: [WL] Re: [SAtalk] More obfuscation

2004-01-20 Thread Sidney Markowitz
Charles Gregory wrote: So I guess the question is, how 'expensive' would it be in terms of processing power There's also the question of how much benefit would it have. I recall someone trying out searching for close matches to spam words in a corpus and not getting very good results at picking u

Re: [WL] Re: [SAtalk] More obfuscation

2004-01-20 Thread Charles Gregory
On Tue, 20 Jan 2004, Marcus Frischherz wrote: > But there is: there exists (at least in PHP) a function called > levenshtein, which calculates the similarity between two words. Surely > there must exist a perl equivalent to it. see: > http://at.php.net/manual/en/function.levenshtein.php So I g