On Tue, 1 Oct 2024 15:16:52 +0900 Yugo NAGATA <nag...@sraoss.co.jp> wrote:
> On Tue, 01 Oct 2024 10:33:50 +0900 (JST) > Tatsuo Ishii <is...@postgresql.org> wrote: > > > >> That's because non-breaking space (nbsp) is not encoded as 0xa0 in > > >> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code > > >> point in Unicode. i.e. U+00A0). > > >> So grep -P "[\xC2\xA0]" should work to detect nbsp. > > > > > > `LC_ALL=C grep -P "\xC2\xA0"` works for my environment. > > > ([ and ] were not necessary.) > > > > > > When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in > > > charset.sgml, > > > but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for > > > making sure detecting > > > nbsp. > > > > > > One problem is that -P option can be used in only GNU grep, and grep in > > > mac doesn't support it. > > > > > > On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can > > > assume the shell is bash. > > > > > > Maybe, better way is use perl itself rather than grep as following. > > > > > > `perl -ne '/\xC2\xA0/ and print' ` > > > > > > I attached a patch fixed in this way. > > > > GNU sed can also be used without setting LC_ALL: > > > > sed -n /"\xC2\xA0"/p > > > > However I am not sure if non-GNU sed can do this too... > > Although I've not check it myself, BSD sed doesn't support \x escape > according to [1]. > > [1] > https://stackoverflow.com/questions/24275070/sed-not-giving-me-correct-substitute-operation-for-newline-with-mac-difference > > By the way, I've attached a patch a bit modified to use the plural form > statement > as same as check-tabs. > > Non-breaking **spaces** appear in SGML/XML files The previous patch was broken because the perl command failed to return the correct result. I've attached an updated patch to fix the return value. In passing, I added line breaks for long lines. Regards, Yugo Nagata -- Yugo Nagata <nag...@sraoss.co.jp>
diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile index 9c9bbfe375..e5607585af 100644 --- a/doc/src/sgml/Makefile +++ b/doc/src/sgml/Makefile @@ -194,7 +194,7 @@ MAKEINFO = makeinfo ## # Quick syntax check without style processing -check: postgres.sgml $(ALLSGML) check-tabs +check: postgres.sgml $(ALLSGML) check-tabs check-nbsp $(XMLLINT) $(XMLINCLUDE) --noout --valid $< @@ -255,9 +255,15 @@ clean-man: endif # sqlmansectnum != 7 -# tabs are harmless, but it is best to avoid them in SGML files +# tabs and non-breaking spaces are harmless, but it is best to avoid them in SGML files check-tabs: - @( ! grep ' ' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || (echo "Tabs appear in SGML/XML files" 1>&2; exit 1) + @( ! grep ' ' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \ + (echo "Tabs appear in SGML/XML files" 1>&2; exit 1) + +check-nbsp: + @ ( $(PERL) -ne '/\xC2\xA0/ and print("$$ARGV:$$_"),$$n++; END {exit($$n>0)}' \ + $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || \ + (echo "Non-breaking spaces appear in SGML/XML files" 1>&2; exit 1) ## ## Clean