>> That's because non-breaking space (nbsp) is not encoded as 0xa0 in >> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code >> point in Unicode. i.e. U+00A0). >> So grep -P "[\xC2\xA0]" should work to detect nbsp. > > `LC_ALL=C grep -P "\xC2\xA0"` works for my environment. > ([ and ] were not necessary.) > > When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in > charset.sgml, > but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making > sure detecting > nbsp. > > One problem is that -P option can be used in only GNU grep, and grep in mac > doesn't support it. > > On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume > the shell is bash. > > Maybe, better way is use perl itself rather than grep as following. > > `perl -ne '/\xC2\xA0/ and print' ` > > I attached a patch fixed in this way.
GNU sed can also be used without setting LC_ALL: sed -n /"\xC2\xA0"/p However I am not sure if non-GNU sed can do this too... Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp