On Thu, 2005-01-27 at 13:05 -0600, Damian Menscher wrote:
> Oh, ok. Apparently we have a different definition of plaintext. I > generally take anything using only the lower 7 bits (ASCII table) to > mean plaintext, and things that use the 8th bit to mean binary. > Regardless of your definition of "plaintext", it would seem that my > conclusion that phishing signatures that rely exclusively on 7-bit ascii > are more likely to have a false positive than binary signatures that use > the full 8 bits is correct.
Even with your definition of plaintext you are still wrong :-)
Why? Because the structure of language in plaintext files is much richer than that used in the binaries of computer programs.
I don't believe you, but at least now we're down to something that can be tested. I've heard, for example, that English has about 3 bits of entropy per word. Ao, assuming a word is 5 characters (typical assumption from speed-typing tests) then a 5-byte signature would provide 3 bits of entropy, if it was matching something designed for humans to read. Anyone care to guess how many bits of entropy are in 5 bytes of machine code? I'm guessing it's larger, but I suppose I could be wrong.
The simple test is to assume that bzip2 is an ideal compression program. As such, it will compress data down to a size roughly equal to its level of entropy. So, compress 10K of human-readable text (be it HTML, or whatever) and 10K of a machine-readable binary (say, from a virus). Which compresses down to something smaller? I'll leave this as an exercise to the reader... I'm fairly confident that I already know the answer.
Damian Menscher -- -=#| Physics Grad Student & SysAdmin @ U Illinois Urbana-Champaign |#=- -=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=- -=#| 4602 Beckman, VMIL/MS, Imaging Technology Group:(217)244-3074 |#=- -=#| <[EMAIL PROTECTED]> www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=- -=#| The above opinions are not necessarily those of my employers. |#=- _______________________________________________ http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users