On 2017-06-06 01:34:37 +0200, Andries E. Brouwer wrote: > On Mon, Jun 05, 2017 at 03:35:28PM -0700, Kevin J. McCarthy wrote: > > This patch considers 8-bit characters as binary for the calculation. In > > general, it's probably better to guess wrong on the conservative side > > than possibly corrupt attachments. > > > - if (info->lobin == 0 || (info->lobin + info->hibin + info->ascii)/ > > info->lobin >= 10) > > + if ((info->lobin == 0 && info->hibin == 0) || > > + (info->lobin + info->hibin + info->ascii) / (info->lobin + > > info->hibin) >= 10) > > Yes, this fixes my problem. > On the other hand, some simple UTF-8 text files are now also treated > as binary.
This is what I feared just after seeing this patch. This is really bad, in particular because text/plain files often don't have an extension, contrary to binary files (well, executables don't have an extension, but one normally don't send executables by e-mail as attachments). > It is also very easy to check for well-formed UTF-8. Well-formed UTF-8 > with short lines should perhaps be classified as "text". Yes, and even with long lines. IMHO, this would be safe in practice. -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)