On Mon, Feb 17, 2025 at 2:57 PM Noah Misch <n...@leadboat.com> wrote: > > On Fri, Jan 17, 2025 at 05:11:41PM -0800, Masahiko Sawada wrote: > > Thank you for reviewing the patches. I agree with all the comments you > > made. I've addressed them and I've attached new version patches that > > now have some regression tests.
Thank you for reviewing the patches. > > Subject: [PATCH v3 4/5] pg_upgrade: Add --set-char-signedness to set the > > default char signedness of new cluster. > > > > This change adds a new option --set-char-signedness to pg_upgrade. It > > enables user to set arbitrary signedness during pg_upgrade. This helps > > cases where user who knew they copied the v17 source cluster from > > x86 (signedness=true) to ARM (signedness=falese) can pg_upgrade > > s/falese/false/ Fixed. > > > --- a/doc/src/sgml/ref/pgupgrade.sgml > > +++ b/doc/src/sgml/ref/pgupgrade.sgml > > @@ -276,6 +276,54 @@ PostgreSQL documentation > > </listitem> > > </varlistentry> > > > > + <varlistentry> > > + > > <term><option>--set-char-signedness=</option><replaceable>option</replaceable></term> > > If upgrading from v18+, valid use cases for this option are rare. I think > using the option should raise an error for v18+ source. A user who knows what > they're doing can use pg_resetwal manually, before the upgrade. This page > shouldn't refer to the pg_resetwal flag specifically, because that need is so > rare. What do you think? Makes sense. I've added that check and updated the regression tests accordingly. > > > + <listitem> > > + <para> > > + Manually set the default char signedness of new clusters. Possible > > values > > + are <literal>signed</literal> and <literal>unsigned</literal>. > > + </para> > > + <para> > > + In the C language, the default signedness of the <type>char</type> > > type > > + (when not explicitly specified) varies across platforms. For > > example, > > + <type>char</type> defaults to <type>signed char</type> on x86 CPUs > > but > > + to <type>unsigned char</type> on ARM CPUs. When data stored using > > the > > + <type>char</type> type is persisted to disk, such as in GIN > > indexes, > > + this platform-dependent behavior results in incorrect data > > comparisons > > + in two scenarios: > > + </para> > > + <itemizedlist> > > + <listitem> > > + <para> > > + When data is moved between platforms with different char > > signedness. > > + </para> > > + </listitem> > > + <listitem> > > + <para> > > + When data is replicated using streaming replication across > > different architectures. > > + </para> > > + </listitem> > > + </itemizedlist> > > After your patch series, core PostgreSQL persistent data doesn't depend on > "char" signedness. Extensions should also start calling > GetDefaultCharSignedness() to become independent of "char" signedness. Hence, > let's remove the part starting at "When data stored using the char type" and > ending here. Future users don't need to understand why pre-v18 had a problem. > They mainly need to know how to set this flag. Removed. > If we wanted to say something about the breakage before v18, it might be, "If > a pre-v18 cluster has trgm indexes and ran on different platforms at different > times in the history of those indexes, REINDEX those indexes before or after > running pg_upgrade." I'd put that in the release notes, not here in the > pg_upgrade doc page. Agreed. > > > + <para> > > + Starting from <productname>PostgreSQL</productname> 18, database > > clusters > > + maintain their own default char signedness setting, which can be > > used as > > + a hint to ensure consistent behavior across platforms with > > different > > Setting this wrong makes your trgm indexes return wrong answers, so it's not a > "hint". (I think "hint" implies minor consequences for getting it wrong.) Fixed. > > + default char signedness. By default, > > <application>pg_upgrade</application> > > + preserves the char signedness setting when upgrading from an > > existing cluster. > > + However, when upgrading from <productname>PostgreSQL</productname> > > 17 or > > + earlier, <application>pg_upgrade</application> adopts the char > > signedness > > + of the platform on which it was built. > > + </para> > > + <para> > > + This option allows you to explicitly set the default char > > signedness for > > + the new cluster, overriding any inherited values. This is > > particularly useful > > + when you plan to migrate the upgraded cluster to a platform with > > different > > + char signedness (for example, when moving from an x86-based system > > to an > > + ARM-based system). > > Let's refine the terminology around "plan to migrate". Here's an informal > description (not documentation-quality): > > 1. If you "plan to migrate" but have not yet done so, don't use this flag. > The default behavior is right. Upgrade on the old platform without using > this flag, and then migrate. This is the most-foolproof approach. > > 2. If you already migrated the cluster data files to a different platform just > before running pg_upgrade, use this flag. It's essential not to modify any > trgm indexes between migrating the data files and running pg_upgrade. Best > to make pg_upgrade the first action that starts the cluster on the new > platform, not starting the cluster manually. This is trickier than (1). Updated this part. > > > --- a/src/bin/pg_upgrade/pg_upgrade.c > > +++ b/src/bin/pg_upgrade/pg_upgrade.c > > @@ -399,8 +399,14 @@ set_new_cluster_char_signedness(void) > > { > > bool new_char_signedness; > > > > - /* Inherit the source database's signedness */ > > - new_char_signedness = old_cluster.controldata.default_char_signedness; > > + /* > > + * Use the specified char signedness if specifies. Otherwise we > > inherit > > s/if specifies/if specified/ Fixed. > > > --- a/contrib/pg_trgm/trgm.h > > +++ b/contrib/pg_trgm/trgm.h > > @@ -41,14 +41,12 @@ > > typedef char trgm[3]; > > > > #define CMPCHAR(a,b) ( ((a)==(b)) ? 0 : ( ((a)<(b)) ? -1 : 1 ) ) > > -#define CMPPCHAR(a,b,i) CMPCHAR( *(((const char*)(a))+i), *(((const > > char*)(b))+i) ) > > -#define CMPTRGM(a,b) ( CMPPCHAR(a,b,0) ? CMPPCHAR(a,b,0) : ( > > CMPPCHAR(a,b,1) ? CMPPCHAR(a,b,1) : CMPPCHAR(a,b,2) ) ) > > I recommend moving CMPCHAR into the C file along with the two related macros > that you're moving. Moved. > > > --- a/contrib/pg_trgm/trgm_op.c > > +++ b/contrib/pg_trgm/trgm_op.c > > @@ -42,6 +42,9 @@ PG_FUNCTION_INFO_V1(strict_word_similarity_commutator_op); > > PG_FUNCTION_INFO_V1(strict_word_similarity_dist_op); > > PG_FUNCTION_INFO_V1(strict_word_similarity_dist_commutator_op); > > > > +static inline int CMPTRGM_CHOOSE(const void *a, const void *b); > > All instances of the "inline" keyword in this patch series are on functions > called only via function pointers. Hence, those functions are never inlined. > I'd remove the keyword. Removed. > > > --- a/src/bin/pg_upgrade/pg_upgrade.h > > +++ b/src/bin/pg_upgrade/pg_upgrade.h > > @@ -125,6 +125,12 @@ extern char *output_files[]; > > */ > > #define JSONB_FORMAT_CHANGE_CAT_VER 201409291 > > > > +/* > > + * The control file was changed to have the default char signedness, > > + * commit XXXXX. > > + */ > > +#define DEFAULT_CHAR_SIGNEDNESS_VAT_VER 202501161 > > s/VAT/CAT/ Fixed. > > > +command_like( > > + [ 'pg_controldata', $old->data_dir ], > > + qr/Default char data signedness:\s+signed/, > > + 'default char signedness of old cluster is signed in control file'); > > +command_like( > > + [ 'pg_controldata', $new->data_dir ], > > + qr/Default char data signedness:\s+signed/, > > + 'default char signedness is new cluster signed in control file'); > > s/is new cluster signed/of new cluster is signed/ for consistency with the > previous test name. Fixed. > > > + > > +# Set the old cluster's default char signedness to unsigned for test. > > +command_ok( > > + [ > > + 'pg_resetwal', '--char-signedness', 'unsigned', > > + '-f', $old->data_dir > > + ], > > + "set old cluster's default char signeness to unsigned"); > > s/signeness/signedness/ Fixed. > > > Subject: [PATCH v3 2/5] pg_resetwal: Add --char-signedness option to change > > the default char signedness. > > > > With the newly added option --char-signedness, pg_resetwal updates the > > default char signeness flag in the controlfile. > > s/signeness/signedness/ Fixed. > > > --- a/doc/src/sgml/ref/pg_resetwal.sgml > > +++ b/doc/src/sgml/ref/pg_resetwal.sgml > > @@ -171,6 +171,20 @@ PostgreSQL documentation > > </para> > > > > <variablelist> > > + <varlistentry> > > + <term><option>--char-signedness=<replaceable > > class="parameter">option</replaceable></option></term> > > + <listitem> > > + <para> > > + Manually set the default char signedness. Possible values are > > + <literal>signed</literal> and <literal>unsigned</literal>. > > + </para> > > + <para> > > + This option can also be used to change the default char signedness > > + of an existing database cluster. > > + </para> > > + </listitem> > > + </varlistentry> > > Each other list entry discusses how to choose a safe value, so this should say > something like that. For users in doubt, nothing other than pg_upgrade should > use this flag. Theoretically, if you just ran pg_upgrade with the wrong value > and had not yet modified any trgm indexes, you could run this to fix the > problem. That's too rare of a situation and too dangerous to document, so the > documentation for this flag should discourage using the flag. Updated the documentation. > > > --- a/src/bin/pg_resetwal/pg_resetwal.c > > +++ b/src/bin/pg_resetwal/pg_resetwal.c > > @@ -105,6 +106,7 @@ main(int argc, char *argv[]) > > {"multixact-offset", required_argument, NULL, 'O'}, > > {"oldest-transaction-id", required_argument, NULL, 'u'}, > > {"next-transaction-id", required_argument, NULL, 'x'}, > > + {"char-signedness", required_argument, NULL, 2}, > > {"wal-segsize", required_argument, NULL, 1}, > > Put the ", 2}" entry after the ", 1}" entry. (We alphabetize in usage(), but > we append to the end in these C arrays.) Fixed. > > > @@ -1189,6 +1213,7 @@ usage(void) > > printf(_(" -u, --oldest-transaction-id=XID set oldest transaction > > ID\n")); > > printf(_(" -x, --next-transaction-id=XID set next transaction > > ID\n")); > > printf(_(" --wal-segsize=SIZE size of WAL segments, in > > megabytes\n")); > > + printf(_(" --char-signedness=OPTION set char signedness to > > \"signed\" or \"unsigned\"\n")); > > This --help output alphabetizes by short option, with the one non-short option > at the end. So I'd also alphabetize among the non-short options, specifically > putting the new line before --wal-segsize. Fixed. I've attached the updated patches. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
v4-0003-pg_upgrade-Preserve-default-char-signedness-value.patch
Description: Binary data
v4-0004-pg_upgrade-Add-set-char-signedness-to-set-the-def.patch
Description: Binary data
v4-0002-pg_resetwal-Add-char-signedness-option-to-change-.patch
Description: Binary data
v4-0005-Fix-an-issue-with-index-scan-using-pg_trgm-due-to.patch
Description: Binary data
v4-0001-Add-default_char_signedness-field-to-ControlFileD.patch
Description: Binary data