-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Tomasz Kojm wrote: > On Wed, 28 Feb 2007 16:03:08 +0100 > Gianluigi Tiesi <[EMAIL PROTECTED]> wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Tomasz Kojm wrote: >>> On Wed, 28 Feb 2007 15:21:52 +0100 >>> Gianluigi Tiesi <[EMAIL PROTECTED]> wrote: >>> >>>>>> I've noticed it too, in my port I have changed it to: >>>>>> >>>>>> if(!(iscntrl(buf[i]) || isprint(buf[i])) || !internat[buf[i] & xff]) >>>>> This one is much worse because it will lead to many false nagatives with >>>>> HTML and mail files. >>>>> >>>> yes so I've never posted it as official patch, >>>> btw I do the check for whole magic buffer (150?) to be more realable >>>> also I've noticed the internat table is quite different from the one in >>>> file (magic) utility. >>> In your case checking more data will only increase the chance for a false >>> negative. After your change the first condition (i.e. !(iscntrl(buf[i]) || >>> isprint(buf[i]))) will disqualify LOTS (more than 100 for sure) of >>> characters which can be valid international chars. >>> >> So what we can use for the better (or at least optimal) way to guess the >> kind of data (rather than having a always true/false check)? isprint > > First of all, you should drop your change which is erroneous and for now I'd > strongly suggest to classify all unknown data as CL_TYPE_UNKNOWN_TEXT. > > We will address this issue in the near future and depending on the results of > regression testing decide which way to go. > There is a reason if we (clamwin) changed this, we still prefer to skip unknown files, and we don't need to care much about html and mail files, so I've made some tweaks (not only this one) to save some cpu cycles avoiding scan of unneeded files. I'm aware that for a mail server scanner it's not the correct approach, so in fact my post was only a "comment", it was never intended to be in clamav tree. A scan of a real pc hd can take ages, clamscan without any change scans large avi files in raw mode (there is only a specific check for anim riffs), other media files and e.g. iso files are also scanned in raw mode. 10-20gb of media/iso is not uncommon to find in a user pc, while they are very unlikely to be in a mail. Perhaps linux doesn't need itself to have a scanner for executable files (linux but also the other unixes).
Regards - -- Gianluigi Tiesi <[EMAIL PROTECTED]> EDP Project Leader Netfarm S.r.l. - http://www.netfarm.it/ Free Software: http://oss.netfarm.it/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF5cQ73UE5cRfnO04RAjjYAKCLeVZnaAqru8ghdCwBgJV4v6jh4QCff8w0 hHf6lO6xin6ZsQUTKhydaIA= =4m9a -----END PGP SIGNATURE----- _______________________________________________ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net