On Thu, Oct 03, 2024 at 06:55:47AM -0700, Masahiko Sawada wrote:
> I've attached PoC patches for the idea Noah proposed. Newly created
> clusters unconditionally have default_char_signedness=true, and the
> only source of signedness=false is pg_upgrade. To update the
> signedness in the controlfile, pg_resetwal now has a new option
> --char-signedness, which is used by pg_upgrade internally. Feedback is
> very welcome.

Upthread, we discussed testability.  Does pg_resetwal facilitate all
appropriate testing, or do testing difficulties remain?

I reviewed these patches, finding only one non-cosmetic review comment.  Given
the PoC status, some of the observations below are likely ones you already
know or would have found before exiting PoC.

> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -27858,6 +27858,11 @@ acl      | 
> {postgres=arwdDxtm/postgres,foo=r/postgres}
>         <entry><type>integer</type></entry>
>        </row>
>  
> +      <row>
> +       <entry><structfield>signed_char</structfield></entry>
> +       <entry><type>bool</type></entry>

s/signed_char/default_char_signedness/ to align with the naming below.

> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -4256,6 +4256,33 @@ WriteControlFile(void)

> +      * Newly created database clusters unconditionally set the default char
> +      * signedness to true. pg_upgrade changes this flag for clusters that 
> were
> +      * initialized on signedness=false platforms. As a result,
> +      * signedness=false setting will become rare over time. will get rarer
> +      * over time.

There's a redundant fragment of sentence.

> --- a/doc/src/sgml/ref/pg_resetwal.sgml
> +++ b/doc/src/sgml/ref/pg_resetwal.sgml
> @@ -171,6 +171,16 @@ PostgreSQL documentation
>    </para>
>  
>    <variablelist>
> +   <varlistentry>
> +    <term><option>--char-signedness=<replaceable 
> class="parameter">option</replaceable></option></term>
> +    <listitem>
> +     <para>
> +      Manually set the default char signedness. Possible values are
> +      <literal>signed</literal> and <literal>unsigned</literal>.
> +     </para>
> +    </listitem>
> +   </varlistentry>

This needs more docs, like its <varlistentry> neighbors have.  First, see the
point below about the pg_upgrade docs.

> --- a/src/bin/pg_resetwal/pg_resetwal.c
> +++ b/src/bin/pg_resetwal/pg_resetwal.c
> @@ -1189,6 +1213,7 @@ usage(void)
>       printf(_("  -u, --oldest-transaction-id=XID  set oldest transaction 
> ID\n"));
>       printf(_("  -x, --next-transaction-id=XID    set next transaction 
> ID\n"));
>       printf(_("      --wal-segsize=SIZE           size of WAL segments, in 
> megabytes\n"));
> +     printf(_("      --char-signedness=OPTION     set char signedness to 
> \"signed\"  or \"unsigned\"\n"));

This help output alphabetizes by short option, with the one non-short option
at the end.  So I'd also alphabetize among the non-short options, specifically
putting the new line before --wal-segsize.

> source cluster was initialized on the same platform where pg_upgrae is

Typo "pg_upgrade"

> --- a/src/bin/pg_upgrade/controldata.c
> +++ b/src/bin/pg_upgrade/controldata.c
> @@ -62,6 +62,7 @@ get_control_data(ClusterInfo *cluster)
>       bool            got_date_is_int = false;
>       bool            got_data_checksum_version = false;
>       bool            got_cluster_state = false;
> +     bool            got_default_char_signedness = false;
>       char       *lc_collate = NULL;
>       char       *lc_ctype = NULL;
>       char       *lc_monetary = NULL;
> @@ -206,6 +207,13 @@ get_control_data(ClusterInfo *cluster)
>               got_data_checksum_version = true;
>       }
>  
> +     /* Only in <= 17 */
> +     if (GET_MAJOR_VERSION(cluster->major_version) <= 1700)
> +     {
> +             cluster->controldata.default_char_signedness = true;

If anything reads the "true" stored here, that would be a bug.  Is there a
reasonable way do one or two of the following?

1. not initialize it at all
2. store something like -1 instead
3. add a comment that it's never read

> +             got_default_char_signedness = true;
> +     }

I wouldn't say we "got" this.  I'd handle this more like got_oldestmulti,
where we know !got_oldestmulti is okay until MULTIXACT_FORMATCHANGE_CAT_VER.
Maybe also replace the "<= 1700" checks with catversion checks.

> +
>       /* we have the result of cmd in "output". so parse it line by line now 
> */
>       while (fgets(bufin, sizeof(bufin), output))
>       {
> @@ -501,6 +509,17 @@ get_control_data(ClusterInfo *cluster)
>                       cluster->controldata.data_checksum_version = 
> str2uint(p);
>                       got_data_checksum_version = true;
>               }
> +             else if ((p = strstr(bufin, "char data signedness:")) != NULL)
> +             {
> +                     p = strchr(p, ':');
> +
> +                     if (p == NULL || strlen(p) <= 1)
> +                             pg_fatal("%d: controldata retrieval problem", 
> __LINE__);
> +
> +                     p++;                            /* remove ':' char */
> +                     cluster->controldata.default_char_signedness = 
> strstr(p, "signed") != NULL;

Won't this match "unsigned", too?  (This is my only non-cosmetic review
comment.)  I'd strcmp for the two expected values.  If neither matches, report
the pg_fatal "retrieval problem".

> --- a/doc/src/sgml/ref/pgupgrade.sgml
> +++ b/doc/src/sgml/ref/pgupgrade.sgml
> @@ -276,6 +276,24 @@ PostgreSQL documentation
>        </listitem>
>       </varlistentry>
>  
> +     <varlistentry>
> +      
> <term><option>--set-char-signedness=</option><replaceable>option</replaceable></term>
> +      <listitem>
> +       <para>
> +        Manually set the default char signedness of new clusters. Possible 
> values
> +        are <literal>signed</literal> and <literal>unsigned</literal>.
> +       </para>
> +       <para>
> +        The signedness of the 'char' type in C is implementation-dependent. 
> For instance,
> +        'signed char' is used by default on x86 CPUs, while 'unsigned char' 
> is used on aarch
> +        CPUs. <application>pg_upgrade</application> automatically inherits 
> the char
> +        signedness from the old cluster. This option is useful for upgrading 
> the cluster
> +        that user knew they copied it to a platform having different char 
> signedness
> +        (e.g. from x86 to aarch).

That last sentence would need some grammar help.  Instead, let's rewrite this
paragraph to assume the reader is not a C programmer.  Focus on what such a
reader should do to choose what argument, if any, to pass.

Our docs haven't used the term "aarch".  This applies to 32-bit ARM in
addition to 64-bit ARM.  Hence, I'd write the term "ARM" instead.

> --- a/src/bin/pg_upgrade/option.c
> +++ b/src/bin/pg_upgrade/option.c
> @@ -307,6 +316,7 @@ usage(void)
>       printf(_("  --copy                        copy files to new cluster 
> (default)\n"));
>       printf(_("  --copy-file-range             copy files to new cluster 
> with copy_file_range\n"));
>       printf(_("  --sync-method=METHOD          set method for syncing files 
> to disk\n"));
> +     printf(_("  --set-char-signedness=OPTION  set new cluster char 
> signedness to \"signed\" or \"unsigned\""));

Move this up one line, to preserve alphabetization.

> --- a/src/backend/storage/lmgr/predicate.c
> +++ b/src/backend/storage/lmgr/predicate.c
> @@ -2588,6 +2588,11 @@ PredicateLockPage(Relation relation, BlockNumber 
> blkno, Snapshot snapshot)
>       if (!SerializationNeededForRead(relation, snapshot))
>               return;
>  
> +     ereport(LOG, (errmsg("predicate %s blk %u",
> +                                              
> RelationGetRelationName(relation),
> +                                              blkno),
> +                               errbacktrace()));

Revert this file.

Thanks,
nm


Reply via email to