> On Tue, 1 Oct 2024 22:20:55 +0900 > Yugo Nagata <nag...@sraoss.co.jp> wrote: > >> On Tue, 1 Oct 2024 15:16:52 +0900 >> Yugo NAGATA <nag...@sraoss.co.jp> wrote: >> >> > On Tue, 01 Oct 2024 10:33:50 +0900 (JST) >> > Tatsuo Ishii <is...@postgresql.org> wrote: >> > >> > > >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in >> > > >> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code >> > > >> point in Unicode. i.e. U+00A0). >> > > >> So grep -P "[\xC2\xA0]" should work to detect nbsp. >> > > > >> > > > `LC_ALL=C grep -P "\xC2\xA0"` works for my environment. >> > > > ([ and ] were not necessary.) >> > > > >> > > > When LC_ALL is null, `grep -P "\xA0"` could not detect any characters >> > > > in charset.sgml, >> > > > but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for >> > > > making sure detecting >> > > > nbsp. >> > > > >> > > > One problem is that -P option can be used in only GNU grep, and grep >> > > > in mac doesn't support it. >> > > > >> > > > On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can >> > > > assume the shell is bash. >> > > > >> > > > Maybe, better way is use perl itself rather than grep as following. >> > > > >> > > > `perl -ne '/\xC2\xA0/ and print' ` >> > > > >> > > > I attached a patch fixed in this way. >> > > >> > > GNU sed can also be used without setting LC_ALL: >> > > >> > > sed -n /"\xC2\xA0"/p >> > > >> > > However I am not sure if non-GNU sed can do this too... >> > >> > Although I've not check it myself, BSD sed doesn't support \x escape >> > according to [1]. >> > >> > [1] >> > https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference >> > >> > By the way, I've attached a patch a bit modified to use the plural form >> > statement >> > as same as check-tabs. >> > >> > Non-breaking **spaces** appear in SGML/XML files >> >> The previous patch was broken because the perl command failed to return the >> correct result. >> I've attached an updated patch to fix the return value. In passing, I added >> line breaks >> for long lines. > > I've attached a updated patch. > I added the comment to explain why Perl is used instead of grep or sed.
Looks good to me. If there's no objection, I will commit this to master branch. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp