RE: [PGdocs] fix description for handling pf non-ASCII characters

Hayato Kuroda (Fujitsu) Tue, 26 Sep 2023 06:40:54 -0700

Dear Karl,

Thank you for reviewing! PSA new version.


> I see a few problems with the English and style of the patches
> and am commenting below and have signed up as a reviewer.

Your effort is quite helpful for me.

> At
> commitfest.postgresql.org I have marked the thread
> as needing author attention.  Hayato, you will need
> to mark the thread as needing review when you have
> replied to this message.

Sure. I will change the status after posting the patch.

Before replying your comments, I thought I should show the difference between
versions. Regarding old versions (here PG15 was used), non-ASCIIs (like 
Japanese) are
replaced with '?'.

```
psql (15.4)
Type "help" for help.

postgres=# SET application_name TO 'あああ';
SET
postgres=# SHOW application_name ;
 application_name 
------------------
 ?????????
(1 row)
```

As for the HEAD, as my patch said, non-ASCIIs are replaced
with hexadecimal representations. (Were my terminologies correct?).

```
psql (17devel)
Type "help" for help.

postgres=# SET application_name TO 'あああ';
SET
postgres=# SHOW application_name ;
           application_name           
--------------------------------------
 \xe3\x81\x82\xe3\x81\x82\xe3\x81\x82
(1 row)
```

Note that non-printable ASCIIs are also replaced with the same rule.

```
psql (15.4)
Type "help" for help.

postgres=# SET application_name TO E'\x03';
SET
postgres=# SHOW application_name ;
 application_name 
------------------
 ?
(1 row)

psql (17devel)
Type "help" for help.

postgres=# SET application_name TO E'\x03';
SET
postgres=# SHOW application_name ;
 application_name 
------------------
 \x03
(1 row)
```

> Now, to reviewing the patch:
> 
> First, it is now best practice in the PG docs to
> put a line break at the end of each sentence.
> At least for the sentences on the lines you change.
> (No need to update the whole document!)  Please
> do this in the next version of your patch.  I don't
> know if this is a requirement for acceptance by
> a committer, but it won't hurt.

I didn't know that. Could you please tell me if you have a source? Anyway,
I put a line break for each sentences for now.

> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index e700782d3c..a4ce99ba4d 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -7040,9 +7040,8 @@ local0.*    /var/log/postgresql
>          The name will be displayed in the
> <structname>pg_stat_activity</structname> view and included in CSV log
> entries.  It can also be included in regular log entries via the <xref
> linkend="guc-log-line-prefix"/> parameter.
> -        Only printable ASCII characters may be used in the
> -        <varname>application_name</varname> value. Other characters
> will be
> -        replaced with question marks (<literal>?</literal>).
> +        Non-ASCII characters used in the
> <varname>application_name</varname>
> +        will be replaced with hexadecimal strings.
>         </para>
>        </listitem>
>       </varlistentry>
> 
> Don't use the future tense to describe the system's behavior.  Instead
> of "will be" write "are".  (Yes, this change would be an improvement
> on the original text.  We should fix it while we're working on it
> and our attention is focused.)
> 
> It is more accurate, if I understand the issue, to say that characters
> are replaced with hexadecimal _representations_ of the input bytes.
> Finally, it would be good to know what representation we're talking
> about.  Perhaps just give the \xhh example and say: the Postgres
> C-style escaped hexadecimal byte value.  And hyperlink to
> https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-ST
> RINGS-ESCAPE
> 
> (The docbook would be, depending on text you want to link:
> 
> <link linkend="sql-syntax-strings-escape">C-style escaped hexadecimal
> byte value</link>.
> 
> I think.  You link to id="someidvalue" attribute values.)

IIUC the word " Postgres" cannot be used in the doc.
Based on your all comments, I changed as below. How do you think?

```
        Characters that are not printable ASCII, like <literal>\x03</literal>,
        are replaced with the <productname>PostgreSQL</productname>
        <link linkend="sql-syntax-strings-escape">C-style escaped hexadecimal 
byte value</link>.
```


> @@ -8037,10 +8036,9 @@ COPY postgres_log FROM
> '/full/path/to/logfile.csv' WITH csv; <para>
>          The name can be any string of less
>          than <symbol>NAMEDATALEN</symbol> characters (64 characters
> in
> a standard
> -        build). Only printable ASCII characters may be used in the
> -        <varname>cluster_name</varname> value. Other characters will be
> -        replaced with question marks (<literal>?</literal>).  No name
> is shown
> -        if this parameter is set to the empty string
> <literal>''</literal> (which is
> +        build). Non-ASCII characters used in the
> <varname>cluster_name</varname>
> +        will be replaced with hexadecimal strings. No name is shown if
> this
> +        parameter is set to the empty string <literal>''</literal>
> (which is the default). This parameter can only be set at server start.
>         </para>
>        </listitem>
> 
> Same review as for the first patch hunk.

Fixed like above. You can refer the patch.

> diff --git a/doc/src/sgml/postgres-fdw.sgml
> b/doc/src/sgml/postgres-fdw.sgml index 5062d712e7..98785e87ea 100644
> --- a/doc/src/sgml/postgres-fdw.sgml
> +++ b/doc/src/sgml/postgres-fdw.sgml
> @@ -1067,9 +1067,8 @@ postgres=# SELECT postgres_fdw_disconnect_all();
>        of any length and contain even non-ASCII characters.  However
> when it's passed to and used as <varname>application_name</varname>
>        in a foreign server, note that it will be truncated to less than
> -      <symbol>NAMEDATALEN</symbol> characters and anything other than
> -      printable ASCII characters will be replaced with question
> -      marks (<literal>?</literal>).
> +      <symbol>NAMEDATALEN</symbol> characters and non-ASCII
> characters
> will be
> +      replaced with hexadecimal strings.
>        See <xref linkend="guc-application-name"/> for details.
>       </para>
> 
> Same review as for the first patch hunk.

Fixed like above.

> Since the both of you have looked and confirmed that the
> actual behavior matches the new documentation I have not
> done this.

I showed the result again, please see.

> But, have either of you checked that we're really talking about
> replacing everything outside the 7-bit ASCII encodings?
> My reading of the commit referenced in the first email of this
> thread says that it's everything outside of the _printable_
> ASCII encodings, ASCII values outside of the range 32 to 127,
> inclusive.
> 
> Please check.  The docs should probably say "printable ASCII",
> or "non-printable ASCII", depending.  I think the meaning
> of "printable ASCII" is widely enough known to be 32-127.
> So "printable" is good enough, right?

For me, "non-printable ASCII" sounds like control characters. So that I used the
sentence "Characters that are not printable ASCII ... are replaced with...".
Please tell me if you have better explanation?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

v3_doc_fix.patch
Description: v3_doc_fix.patch

RE: [PGdocs] fix description for handling pf non-ASCII characters

Reply via email to