Re: Add support for specifying tables in pg_createsubscriber.

Peter Smith Fri, 22 Aug 2025 00:07:29 -0700

Hi Shubham.

Some review comments for patch v3-0001.


======
Commit message

1.
Allows optional column lists and row filters.

~

Is this right? Aren't the column lists and row filters now separated
in patch 0002?

======
doc/src/sgml/ref/pg_createsubscriber.sgml

2. synopsis

pg_createsubscriber [option...] { -d | --database }dbname { -D |
--pgdata }datadir { -P | --publisher-server }connstr { --table
}table-name

2a.
The --table should follow the --database instead of being last.

~

2b.
The brackets {--table} don't seem useful

~

2c.
Since there can be multiple tables per database, I feel an ellipsis
(...) is needed too

~

3.
+    <varlistentry>
+     <term><option>--table=<replaceable
class="parameter">table</replaceable></option></term>
+     <listitem>

This belongs more naturally immediately following --database.

~~~

4.
+      <para>
+       The argument must be a fully qualified table name in one of the
+       following forms:
+        
<itemizedlist><listitem><para><literal>schema.table</literal></para></listitem>
+        
<listitem><para><literal>db.schema.table</literal></para></listitem></itemizedlist>
+       If the database name is provided, it must match the most recent
+       <option>--database</option> argument.
+      </para>

4a.
It would be nicer if the SGML were less compacted.
e.g.
<itemizedlist>
  <listitem><para><literal>schema.table</literal></para></listitem>
  <listitem><para><literal>db.schema.table</literal></para></listitem>
</itemizedlist>

~

4b.
Why do we insist that the user must explicitly give a schema name?
Can't a missing schema just imply "public" so user doesn't have to
type so much?

~

4c.
FUNDAMENTAL SPEC QUESTION....

I am confused why this patch allows specifying the 'db' name part of
--table when you also insist it must be identical to the most recent
--database name. With this rule, it seems unnecessary and merely means
more validation code is required.

I can understand having the 'db' name part might be useful if you
would also permit the --table to be specified anywhere on the command,
but insisting it must match the most recent --database name seems to
cancel out any reason for allowing it in the first place.

~~~

5.
+      <para>
+       A table specification may also include an optional column list and/or
+       row filter:
+       <itemizedlist>
+       <listitem>
+         <para>
+          <literal>schema.table(col1, col2, ...)</literal> &mdash; publishes
+          only the specified columns.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>schema.table WHERE (predicate)</literal> &mdash; publishes
+          only rows that satisfy the given condition.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          Both forms can be combined, e.g.
+          <literal>schema.table(col1, col2) WHERE (id &gt; 100)</literal>.
+         </para>
+        </listitem>
+       </itemizedlist>
+      </para>
+

AFAIK, support for this was separated into patch 0002. So these docs
should be in patch 0002, not here.

======
src/bin/pg_basebackup/pg_createsubscriber.c

6.
+typedef struct TableSpec
+{
+ char    *spec;
+ char    *dbname;
+ char    *pattern_regex;
+ char    *pattern_part1_regex;
+ char    *pattern_part2_regex;
+ char    *pattern_part3_regex;
+ struct TableSpec *next;
+} TableSpec;
+

6a.
Add a typedefs.list entry for this new typedef, and run pg_indent.

~

6b.
pattern_regex is unused?

~

6c.
pattern_part1_regex seems assigned, but it is otherwise unused (??). Needed?

~

6d.
I had previously given a review comment suggesting names like
part1/2/3 because previously, you were using members with different
meanings depending on the tablespec. But, now it seems all the dot
parse logic is re-written, so AFAICT you are always using part1 means
dbregex; part2 means schemaregex; part3 means tableregex ... So, the
"part" names are strange in the current impl - if you really do know
what they represent, then call them by their proper names.

~~~

7.
+static TableSpec * table_list_head = NULL;
+static TableSpec * table_list_tail = NULL;
+static char *current_dbname = NULL;

7a.
Why isn't 'current_dbname' just a local variable of main()?

~

7b.
I doubt the 'tail' var is needed. See a later review comment for details.

~~~

8.
-
 /*
  * Cleanup objects that were created by pg_createsubscriber if there is an
  * error.

Whitespace change unrelated to this patch?

~~~

usage:

9.
  printf(_("      --subscription=NAME         subscription name\n"));
+ printf(_("      --table                     table to subscribe to;
can be specified multiple times\n"));
  printf(_("  -V, --version                   output version
information, then exit\n"));

Or, should this say "table to publish".  Maybe it amounts to the same
thing, but this patch modifies CREATE PUBLICATION, not CREATE
SUBSCRIPTION.

~~~
store_pub_sub_info:

10.
+ for (i = 0; i < num_dbs; i++)
+ {
+ TableSpec  *prev = NULL;
+ TableSpec  *cur = table_list_head;
+ TableSpec  *filtered_head = NULL;
+ TableSpec  *filtered_tail = NULL;
+
+ while (cur != NULL)
+ {
+ TableSpec  *next = cur->next;
+
+ if (strcmp(cur->dbname, dbinfo[i].dbname) == 0)
+ {
+ if (prev)
+ prev->next = next;
+ else
+ table_list_head = next;
+
+ cur->next = NULL;
+ if (!filtered_head)
+ filtered_head = filtered_tail = cur;
+ else
+ {
+ filtered_tail->next = cur;
+ filtered_tail = cur;
+ }
+ }
+ else
+ prev = cur;
+ cur = next;
+ }
+ dbinfo[i].tables = filtered_head;
+ }
+

10a.
Is there a bug? I have not debugged this, but when you are assigning
filtered_tail = cur, who is ensuring that "filtered" list is
NULL-terminated? e.g. I suspect the cur->next might still point to
something inappropriate.

~

10b.
I doubt all that 'filered_head/tail' logic is needed at all. Can't you
just assign directly to the head of dbinfo[i].tables list as you
encounter appropriate tables for that db? The order may become
reversed, but does that matter?

e.g. something like this
cur->next = dbinfo[i].tables ? dbinfo[i].tables.next : NULL;
dbinfo[i].tables = cur;

~~~

create_publication:

11.
  PGresult   *res;
  char    *ipubname_esc;
  char    *spubname_esc;
+ bool first_table = true;

This 'first_table' variable is redundant. Just check i == 0 in the loop.

~~~

12.
+ appendPQExpBuffer(str, "%s%s.%s",
+   first_table ? "" : ", ",
+   escaped_schema, escaped_table);

SUGGESTION
if (i > 0)
  appendPQExpBuffer(str, ", ");

appendPQExpBuffer(str, "%s.%s", escaped_schema, escaped_table);

~~~

13.
- if (!simple_string_list_member(&opt.database_names, optarg))
  {
- simple_string_list_append(&opt.database_names, optarg);
- num_dbs++;
+ if (current_dbname)
+ pg_free(current_dbname);
+ current_dbname = pg_strdup(optarg);
+
+ if (!simple_string_list_member(&opt.database_names, optarg))
+ {
+ simple_string_list_append(&opt.database_names, optarg);
+ num_dbs++;
+ }
+ else
+ pg_fatal("database \"%s\" specified more than once for
-d/--database", optarg);
+ break;
  }
- else
- pg_fatal("database \"%s\" specified more than once for
-d/--database", optarg);
- break;

13a.
The scope {} is not needed. You removed the previously declared variable.

~

13b.
The pfree might be overkill. I think a few little database name
strdups leaks are hardly going to be a problem. It's up to you.

~~~

14.
+ char    *copy_arg;
+ char    *first_dot;
+ char    *second_dot;
+ char    *dbname_arg = NULL;
+ char    *schema_table_part;
+ TableSpec  *ts;

Does /dbname_arg/dbname_part/make more sense here?

~~~

15.
+ if (first_dot != NULL)
+ second_dot = strchr(first_dot + 1, '.');
+ else
+ second_dot = NULL;
+
+

The 'else' is not needed if you had assigned NULL at the declaration.

~~~

16.
The logic seems quite brittle. e.g. Maybe you should only parse
looking for dots to the end of the copy_arg or the first space.
Otherwise, other "dots" in the table spec will mess up your logic.

e.g.
"db.sch.tbl" -> 2 dots
"sch.tbl WHERE (x > 1.0)" -> also 2 dots

Similarly, what you called 'schema_table_part' cannot be allowed to
extend beyond any space; otherwise, it can easily return an unexpected
dotcnt.

"db.sch.tbl WHERE (x > 1.0)" -> 3 dots

~~~

17.
+ patternToSQLRegex(encoding, dbbuf, schemabuf, namebuf,
+   schema_table_part, false, false, &dotcnt);
+
+ if (dbname_arg != NULL)
+ dotcnt++;
+
+ if (dotcnt == 2)
+ {
+ ts->pattern_part1_regex = pg_strdup(dbbuf->data);
+ ts->pattern_part2_regex = pg_strdup(schemabuf->data);
+ ts->pattern_part3_regex = namebuf->len > 0 ? pg_strdup(namebuf->data) : NULL;
+ }
+ else if (dotcnt == 1)
+ {
+ ts->pattern_part1_regex = NULL;
+ ts->pattern_part2_regex = pg_strdup(dbbuf->data);
+ ts->pattern_part3_regex = NULL;
+ }
+ else
+ pg_fatal("invalid table specification \"%s\"", optarg);

Something seems very strange here...

I haven't debugged this, but I imagine if "--table sch.tab" then
dotcnt will be 1.

But then
+ ts->pattern_part1_regex = NULL;
+ ts->pattern_part2_regex = pg_strdup(dbbuf->data);
+ ts->pattern_part3_regex = NULL;

The schema regex is assigned the dbbuf->data (???) I don't trust this
logic. I saw that there are no test cases where dbpart is omitted, so
maybe this is a lurking bug?

~~~

18.
+
+ if (!table_list_head)
+ table_list_head = table_list_tail = ts;
+ else
+ table_list_tail->next = ts;
+
+ table_list_tail = ts;
+

All this 'tail' stuff looks potentially unnecessary. e.g. I think you
could simplify all this just by adding to the front of the list.

SUGGESTION
ts->next = table_list_head;
table_list_head = ts;

======
.../t/040_pg_createsubscriber.pl

19.
+# Declare database names
+my $db3 = 'db3';
+my $db4 = 'db4';
+

What do these variables achieve? e.g. Why not just use hardcoded
strings 'db1' and 'db2'.

~~~

20.
+# Test: Table-level publication creation
+$node_p->safe_psql($db3, "CREATE TABLE public.t1 (id int, val text)");
+$node_p->safe_psql($db3, "CREATE TABLE public.t2 (id int, val text)");
+$node_p->safe_psql($db3, "CREATE TABLE myschema.t4 (id int, val text)");

You could combine all these.

~~~

21.
+$node_p->safe_psql($db4,
+ "CREATE TABLE public.t3 (id int, val text, extra int)");
+$node_p->safe_psql($db4,
+ "CREATE TABLE otherschema.t5 (id serial primary key, info text)");

You could combine these.

~~~

22.
+# Create explicit publications
+my $pubname1 = 'pub1';
+my $pubname2 = 'pub2';
+

What does creating these variables achieve? e.g. Why not just use
hardcoded strings 'pub1' and 'pub2'.

~~~

23.
+# Run pg_createsubscriber with table-level options
+command_ok(
+ [
+ 'pg_createsubscriber',
+ '--verbose',
+ '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
+ '--pgdata' => $node_s2->data_dir,
+ '--publisher-server' => $node_p->connstr($db3),
+ '--socketdir' => $node_s2->host,
+ '--subscriber-port' => $node_s2->port,
+ '--publication' => $pubname1,
+ '--publication' => $pubname2,
+ '--database' => $db3,
+ '--table' => "$db3.public.t1",
+ '--table' => "$db3.public.t2",
+ '--table' => "$db3.myschema.t4",
+ '--database' => $db4,
+ '--table' => "$db4.public.t3",
+ '--table' => "$db4.otherschema.t5",
+ ],
+ 'pg_createsubscriber runs with table-level publication (existing nodes)');
+

There is no test for a --table spec where the dbpart is omitted.

~~~

24.
+# Check publication tables for db3 with public schema first

What is the significance "with public schema first" in this comment?

~~

25.
+# Check publication tables for db4, with public schema first
+my $actual2 = $node_p->safe_psql(
+ $db4, qq(
+        SELECT pubname || '|' || schemaname || '|' || tablename
+        FROM pg_publication_tables
+        WHERE pubname = '$pubname2'
+        ORDER BY schemaname, tablename
+    )
+);
+is( $actual2,
+ "$pubname2|otherschema|t5\n$pubname2|public|t3",
+ 'publication includes tables in public and otherschema schemas on db4');
+

Ditto. What is the significance "with public schema first" in this comment?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Add support for specifying tables in pg_createsubscriber.

Reply via email to