> We can check non-ASCII letters SGML/XML files by preparing "allowlist" > that contains lines which are allowed to have non-ascii characters, > although this list will need to be maintained when lines in it are modified. > I've attached a patch to add a simple Perl script to do this.
I doubt it really works. For example, nbsp can be used formatting (that's the purpose of the character in the first place). Whenever a developer decides to or not to use nbsp, "allowlist" needs to be maintained. It's too annoying. I think it's better to add the non-ASCII character checking to the comitting check list and let committers check non-ASCII character in the patch. Non-ASCII characters rarely used and it would not become a burden. https://wiki.postgresql.org/wiki/Committing_checklist Maybe we can add to the wiki page something like this? git diff origin/master | grep -P '[^\x00-\x7f]' > During testing this script, I found "stylesheet-man.xsl" also has non-ascii > characters. I don't know these characters are really necessary though, since > I don't understand this file well. They are U+201C (double turned comma quotation mark) and U+201D (double comma quotation mark). <l:template name="sect3" text="Section %n, â%tâ, in the documentation"/> I would like to know why they are necessary too. Best reagards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp