bug#20526: grep BUG: text file is detected as binary

Norihiro Tanaka Sat, 02 Jan 2016 17:33:25 -0800

On Fri, 1 Jan 2016 21:22:54 -0800
Paul Eggert <egg...@cs.ucla.edu> wrote:


> Ouch, good point. I missed the possibility of a unibyte encoding where
> not all bytes are valid unibyte characters. I installed the attached
> additional patch to fix this, and to test for the bug I recently
> introduced here.

Thanks, I see that it is good idea, but I propose minor change for your
fix.  Perhaps, it will be what you want.

From d36cf4208363c0f56ff32d38a9fea422342036fe Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <nori...@kcn.ne.jp>
Date: Sat, 2 Jan 2016 00:20:43 +0900
Subject: [PATCH] grep: minor improvements to previous change

* src/grep.c (skip_easy_bytes): Do nothing if the locale does not have
any skippable character.
* (buf_has_encoding_errors): Do nothing if all bytes are single byte
character in the locale.
---
 src/grep.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/grep.c b/src/grep.c
index a5f1fa2..d5a8183 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -535,6 +535,8 @@ skip_easy_bytes (char const *buf)
      the buffer end, but that's benign.  */
   char const *p;
   uword const *s;
+  if (! unibyte_mask)
+    return buf;
   for (p = buf; (uintptr_t) p % sizeof (uword) != 0; p++)
     if (to_uchar (*p) & unibyte_mask)
       return p;
@@ -551,7 +553,7 @@ skip_easy_bytes (char const *buf)
 static bool
 buf_has_encoding_errors (char *buf, size_t size)
 {
-  if (! unibyte_mask)
+  if (unibyte_mask == (uword) -1)
     return false;
 
   mbstate_t mbs = { 0 };
-- 
2.6.4

bug#20526: grep BUG: text file is detected as binary

Reply via email to