On Fri, 11 Oct 2024 12:16:50 +0900 (JST) Tatsuo Ishii <is...@postgresql.org> wrote:
> > We can check non-ASCII letters SGML/XML files by preparing "allowlist" > > that contains lines which are allowed to have non-ascii characters, > > although this list will need to be maintained when lines in it are modified. > > I've attached a patch to add a simple Perl script to do this. > > I doubt it really works. For example, nbsp can be used formatting > (that's the purpose of the character in the first place). Whenever a > developer decides to or not to use nbsp, "allowlist" needs to be > maintained. It's too annoying. I suppose non-ascii characters including nbsp are basically disallowed, so the allowlist will not increase unless there is some special reason. However, it is true that there might be a cost for maintaining the list more or less, so if people don't think it is worth adding this check, I will withdraw this proposal.l. > I think it's better to add the non-ASCII character checking to the > comitting check list and let committers check non-ASCII character in > the patch. Non-ASCII characters rarely used and it would not become a > burden. > https://wiki.postgresql.org/wiki/Committing_checklist > > Maybe we can add to the wiki page something like this? > > git diff origin/master | grep -P '[^\x00-\x7f]' > > > During testing this script, I found "stylesheet-man.xsl" also has non-ascii > > characters. I don't know these characters are really necessary though, since > > I don't understand this file well. > > They are U+201C (double turned comma quotation mark) and U+201D > (double comma quotation mark). > > <l:template name="sect3" text="Section %n, â%tâ, in the > documentation"/> > > I would like to know why they are necessary too. +1 Regards, Yugo Nagata -- Yugo NAGATA <nag...@sraoss.co.jp>