On Mon, 30 Sep 2024 20:07:31 +0900 (JST) Tatsuo Ishii <is...@postgresql.org> wrote:
> >> I wonder if it would be worth to add a check for this like we have to tabs? > > +1. > > >> The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp > >> (doing so made me realize we don't have an equivalent meson target). > > > > Your patch couldn't detect 0xA0 in config.sgml in my machine, but it works > > when I use `grep -P "[\xA0]"` instead of `grep -e "\xA0"`. > > > > However, it also detects the following line in charset.sgml. > > (https://www.postgresql.org/docs/current/collation.html) > > > > For example, locale und-u-kb sorts 'àe' before 'aé'. > > > > This is not non-breaking space, so should not be detected as an error. > > That's because non-breaking space (nbsp) is not encoded as 0xa0 in > UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code > point in Unicode. i.e. U+00A0). > So grep -P "[\xC2\xA0]" should work to detect nbsp. `LC_ALL=C grep -P "\xC2\xA0"` works for my environment. ([ and ] were not necessary.) When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in charset.sgml, but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making sure detecting nbsp. One problem is that -P option can be used in only GNU grep, and grep in mac doesn't support it. On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume the shell is bash. Maybe, better way is use perl itself rather than grep as following. `perl -ne '/\xC2\xA0/ and print' ` I attached a patch fixed in this way. Regards, Yugo Nagata > > Best reagards, > -- > Tatsuo Ishii > SRA OSS K.K. > English: http://www.sraoss.co.jp/index_en/ > Japanese:http://www.sraoss.co.jp -- Yugo NAGATA <nag...@sraoss.co.jp>
diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile index 9c9bbfe375..2081ba1ffc 100644 --- a/doc/src/sgml/Makefile +++ b/doc/src/sgml/Makefile @@ -194,7 +194,7 @@ MAKEINFO = makeinfo ## # Quick syntax check without style processing -check: postgres.sgml $(ALLSGML) check-tabs +check: postgres.sgml $(ALLSGML) check-tabs check-nbsp $(XMLLINT) $(XMLINCLUDE) --noout --valid $< @@ -259,6 +259,9 @@ endif # sqlmansectnum != 7 check-tabs: @( ! grep ' ' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || (echo "Tabs appear in SGML/XML files" 1>&2; exit 1) +check-nbsp: + @( ! $(PERL) -ne '/\xC2\xA0/ and print "$$ARGV $$_"' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || (echo "Non-breaking space appear in SGML/XML files" 1>&2; exit 1) + ## ## Clean ##