Re: Grepping Unicode files?

Vince Rice Thu, 14 May 2015 10:15:13 -0700

> On May 14, 2015, at 11:43 AM, Eric Blake <ebl...@redhat.com> wrote:
> 
> On 05/14/2015 10:32 AM, Vince Rice wrote:
> 
> …
>> 
>> Now, pardon my continued ignorance, but which of those variables needs to be 
>> set to UTF16 in order for grep to work? And I assume it (they?) should be 
>> set to en_US.UTF-16?
> 
> None.  UTF16 is not a valid locale.  It is a valid encoding (wide
> character), but locales must operate on multi-byte sequences, not wide
> characters.  So you HAVE to convert from wide character to multi-byte
> before you can do anything that requires a locale to work correctly.


Oh my, the rabbit-hole gets deeper. I don’t know the difference between wide 
character and multi-byte. A little searching appears to indicate that Unicode 
is a type of wide-character, while multi-byte is … well, I still don’t know 
what multi-byte is. :) But, we’re definitely out in the weeds of non-cygwinness 
here, and my file is UTF16, so I can learn what multi-byte is and the 
difference later.

Bottom-line…

>> 
>> Thanks to everyone for your help. I think you’ve all confirmed this isn’t 
>> cygwin-specific, but I couldn’t find anything even searching generically 
>> (“grep unicode” and now “grep utf16”). I did finally find an external 
>> reference to iconv, but if grep is supposed to be handle this natively, I 
>> haven’t been able to find much on how to do it.
> 
> grep cannot handle UTF16 natively.  iconv exists to do encoding
> transformations, so that the rest of the system can live in multi-byte
> world instead of worrying about wide-character encodings.

… grep can’t handle unicode files. Good to know. iconv it is.

Thanks again!
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

Re: Grepping Unicode files?

Reply via email to