On Mon, 30 Sep 2024 20:07:31 +0900 (JST)
Tatsuo Ishii <is...@postgresql.org> wrote:

> >> I wonder if it would be worth to add a check for this like we have to tabs?
> 
> +1.
> 
> >> The attached adds a rule to "make -C doc/src/sgml check" for trapping nbsp
> >> (doing so made me realize we don't have an equivalent meson target).
> > 
> > Your patch couldn't detect 0xA0 in config.sgml in my machine, but it works
> > when I use `grep -P "[\xA0]"` instead of `grep -e "\xA0"`.
> > 
> > However, it also detects the following line in charset.sgml.
> > (https://www.postgresql.org/docs/current/collation.html)
> > 
> >  For example, locale und-u-kb sorts 'àe' before 'aé'.
> > 
> > This is not non-breaking space, so should not be detected as an error.
> 
> That's because non-breaking space (nbsp) is not encoded as 0xa0 in
> UTF-8. nbsp in UTF-8 is "0xc2 0xa0" (2 bytes) (A 0xa0 is a nbsp's code
> point in Unicode. i.e. U+00A0).
> So grep -P "[\xC2\xA0]" should work to detect nbsp.

`LC_ALL=C grep -P "\xC2\xA0"` works for my environment. 
([ and ] were not necessary.)

When LC_ALL is null, `grep -P "\xA0"` could not detect any characters in 
charset.sgml,
but I think it is better to specify both LC_ALL=C and "\xC2\xA0" for making 
sure detecting
nbsp.

One problem is that -P option can be used in only GNU grep, and grep in mac 
doesn't support it.

On bash, we can also use `grep $'\xc2\xa0'`, but I am not sure we can assume 
the shell is bash.

Maybe, better way is use perl itself rather than grep as following.

 `perl -ne '/\xC2\xA0/ and print' `

I attached a patch fixed in this way.

Regards,
Yugo Nagata

> 
> Best reagards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp


-- 
Yugo NAGATA <nag...@sraoss.co.jp>
diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index 9c9bbfe375..2081ba1ffc 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -194,7 +194,7 @@ MAKEINFO = makeinfo
 ##
 
 # Quick syntax check without style processing
-check: postgres.sgml $(ALLSGML) check-tabs
+check: postgres.sgml $(ALLSGML) check-tabs check-nbsp
 	$(XMLLINT) $(XMLINCLUDE) --noout --valid $<
 
 
@@ -259,6 +259,9 @@ endif # sqlmansectnum != 7
 check-tabs:
 	@( ! grep '	' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || (echo "Tabs appear in SGML/XML files" 1>&2;  exit 1)
 
+check-nbsp:
+	@( ! $(PERL) -ne '/\xC2\xA0/ and print "$$ARGV $$_"' $(wildcard $(srcdir)/*.sgml $(srcdir)/ref/*.sgml $(srcdir)/*.xsl) ) || (echo "Non-breaking space appear in SGML/XML files" 1>&2;  exit 1)
+
 ##
 ## Clean
 ##

Reply via email to