On Thu, 27 Jan 2005, Trog wrote:
On Thu, 2005-01-27 at 13:05 -0600, Damian Menscher wrote:

> Oh, ok. Apparently we have a different definition of plaintext. I > generally take anything using only the lower 7 bits (ASCII table) to > mean plaintext, and things that use the 8th bit to mean binary. > Regardless of your definition of "plaintext", it would seem that my > conclusion that phishing signatures that rely exclusively on 7-bit ascii > are more likely to have a false positive than binary signatures that use > the full 8 bits is correct.

Even with your definition of plaintext you are still wrong :-)

Why? Because the structure of language in plaintext files is much richer
than that used in the binaries of computer programs.

I don't believe you, but at least now we're down to something that can be tested. I've heard, for example, that English has about 3 bits of entropy per word. Ao, assuming a word is 5 characters (typical assumption from speed-typing tests) then a 5-byte signature would provide 3 bits of entropy, if it was matching something designed for humans to read. Anyone care to guess how many bits of entropy are in 5 bytes of machine code? I'm guessing it's larger, but I suppose I could be wrong.


The simple test is to assume that bzip2 is an ideal compression program. As such, it will compress data down to a size roughly equal to its level of entropy. So, compress 10K of human-readable text (be it HTML, or whatever) and 10K of a machine-readable binary (say, from a virus). Which compresses down to something smaller? I'll leave this as an exercise to the reader... I'm fairly confident that I already know the answer.

Damian Menscher
--
-=#| Physics Grad Student & SysAdmin @ U Illinois Urbana-Champaign |#=-
-=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=-
-=#| 4602 Beckman, VMIL/MS, Imaging Technology Group:(217)244-3074 |#=-
-=#| <[EMAIL PROTECTED]> www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=-
-=#| The above opinions are not necessarily those of my employers. |#=-
_______________________________________________
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users

Reply via email to