Paul Kosinski wrote: > Hi, > > I noticed the following anomaly when scanning a tar.gz file compared > to scanning the result of untarring it. Scanning the tar.gz file > results in less "data read" than scanning the files which it expands > to (as one would expect), but the "data scanned" amount is *much* more > for the tar.gz file than for the resultant files in the directory > tree. > > Does this indicate some problem with the way clamav handles > compressed files, or is it some peculiarity of this tar.gz file? > I don't think this is a problem with clamav's handling of compressed files. I think it is a feature.
The following is just a bunch of assumptions, though: When clamav scans a tar.gz, it initially scans the raw tar.gz data and tries to match that against virus patterns. Then it scans the ungzip-ed tar and tries to match some hashes of that data against virus defs. And then it scans the individual files in the tar, possibly scanning and then expanding and scanning other archives that are found in the tar. The result is that more data is scanned. This is a feature for two reasons: 1) signatures that match against the part of a tar archive that represents a file will catch a virus more efficiently than having clamav expand the viral file and then scan it. This improves clam's efficiency as, IIRC, clam stops scanning once it encounters a virus match. 2) the gzip or tar stream may be specially crafted to take advantage of exploits in buggy versions of gzip, GNU tar, or proprietary implementations of the programs. Clamav should detect this, not just viruses stored in tars or files encoded using gzip. > 09:51:08 u...@host:~/src/openssl> clamscan -ri openssl-0.9.8k/ > Is your username really ``user'' and hostname really ``host''? -- binki _______________________________________________ Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net http://www.clamav.net/support/ml