Good evening, all,

On Wed, 8 Oct 2003, Daniel Quinlan wrote:

> Scott A Crosby <[EMAIL PROTECTED]> writes:
> 
> > The thing is that a gibberish token (not-with the statistics of $LANG,
> > not-dictionary) should, as a new token, be given a different bayes
> > catagory than one that is in a dictionary, etc.
> 
> Perhaps.  It would probably be somewhat expensive to test every word for
> gibberish.

        I'm almost _certain_ I'm about to look incredibly stupid here, but 
might I suggest:
        Could we simply test for letter frequency?  For a given language, 
it would seem that the frequency would stay predictable; random strings of 
characters would show up with different histograms.
        Note that I handwave over the fact that we probably don't know the 
intended langauge beforehand.  :-(
        As I said, my apologies for a one-half^Wone-quarter^Wone-eigth 
baked idea.
        Cheers,
        - Bill

---------------------------------------------------------------------------
        "``Threads are like salt.  You like salt, I like salt, but we eat a
lot more pasta than salt.''  The thread guys are trying to tell you that
diet of salt is a good idea.  They are wrong, don't listen, eat more 
pasta and be happy."
        -- Larry McVoy <[EMAIL PROTECTED]>
--------------------------------------------------------------------------
William Stearns ([EMAIL PROTECTED]).  Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at:   http://www.stearns.org
Linux articles at:                         http://www.opensourcedigest.com
--------------------------------------------------------------------------



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to