On 2017-06-06 01:34:37 +0200, Andries E. Brouwer wrote:
> On Mon, Jun 05, 2017 at 03:35:28PM -0700, Kevin J. McCarthy wrote:
> > This patch considers 8-bit characters as binary for the calculation.  In
> > general, it's probably better to guess wrong on the conservative side
> > than possibly corrupt attachments.
> 
> > -    if (info->lobin == 0 || (info->lobin + info->hibin + info->ascii)/ 
> > info->lobin >= 10)
> > +    if ((info->lobin == 0 && info->hibin == 0) ||
> > +        (info->lobin + info->hibin + info->ascii) / (info->lobin + 
> > info->hibin) >= 10)
> 
> Yes, this fixes my problem.
> On the other hand, some simple UTF-8 text files are now also treated
> as binary.

This is what I feared just after seeing this patch. This is really
bad, in particular because text/plain files often don't have an
extension, contrary to binary files (well, executables don't have
an extension, but one normally don't send executables by e-mail as
attachments).

> It is also very easy to check for well-formed UTF-8. Well-formed UTF-8
> with short lines should perhaps be classified as "text".

Yes, and even with long lines. IMHO, this would be safe in practice.

-- 
Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

Reply via email to