On Tue, 20 Jan 2004, Charles Gregory wrote:
> Right now, there would be no statistics, because the text obfu has just
> started. But as a side note, we don't have the disk space to run Bayes for
> all our users though I'm getting awfully tempted to talk the boss into
> an extra disk or two. So
detcting obfuscation:
html garbage tags:done
normal language letter frequency:easy to do, easy to get by just modify
random keyword to generate same frequency as english words. This would
still catch the stupider spammers doing bayes poisoning.
Detect poisoning attempt, and reject an addition to
On Wed, 21 Jan 2004, Sidney Markowitz wrote:
> Does anyone who is concerned about the obfuscation have any statistics
> to show that it really is a problem for the current rules plus network
> tests plus a well-trained Bayes?
Right now, there would be no statistics, because the text obfu has jus
Charles Gregory wrote:
So I guess the question is, how 'expensive'
would it be in terms of processing power
There's also the question of how much benefit would it have.
I recall someone trying out searching for close matches to spam words in
a corpus and not getting very good results at picking u
On Tue, 20 Jan 2004, Marcus Frischherz wrote:
> But there is: there exists (at least in PHP) a function called
> levenshtein, which calculates the similarity between two words. Surely
> there must exist a perl equivalent to it. see:
> http://at.php.net/manual/en/function.levenshtein.php
So I g