> On May 14, 2015, at 11:43 AM, Eric Blake <ebl...@redhat.com> wrote: > > On 05/14/2015 10:32 AM, Vince Rice wrote: > > … >> >> Now, pardon my continued ignorance, but which of those variables needs to be >> set to UTF16 in order for grep to work? And I assume it (they?) should be >> set to en_US.UTF-16? > > None. UTF16 is not a valid locale. It is a valid encoding (wide > character), but locales must operate on multi-byte sequences, not wide > characters. So you HAVE to convert from wide character to multi-byte > before you can do anything that requires a locale to work correctly.
Oh my, the rabbit-hole gets deeper. I don’t know the difference between wide character and multi-byte. A little searching appears to indicate that Unicode is a type of wide-character, while multi-byte is … well, I still don’t know what multi-byte is. :) But, we’re definitely out in the weeds of non-cygwinness here, and my file is UTF16, so I can learn what multi-byte is and the difference later. Bottom-line… >> >> Thanks to everyone for your help. I think you’ve all confirmed this isn’t >> cygwin-specific, but I couldn’t find anything even searching generically >> (“grep unicode” and now “grep utf16”). I did finally find an external >> reference to iconv, but if grep is supposed to be handle this natively, I >> haven’t been able to find much on how to do it. > > grep cannot handle UTF16 natively. iconv exists to do encoding > transformations, so that the rest of the system can live in multi-byte > world instead of worrying about wide-character encodings. … grep can’t handle unicode files. Good to know. iconv it is. Thanks again! -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple