from:"Shubham Khanna"

Re: speed up a logical replica setup

2024-01-15 Thread Shubham Khanna

 to being ready for commit.

I have done the Performance testing and attached the results to
compare the 'Execution Time' between 'logical replication' and
'pg_subscriber' for 100MB, 1GB and 5GB data:
| 100MB | 1GB  | 5GB
Logical rep (2 w) | 1.815s  | 14.895s | 75.541s
Logical rep (4 w) | 1.194s  | 9.484s   | 46.938s
Logical rep (8 w) | 0.828s  | 6.422s   | 31.704s
Logical rep(10 w)| 0.646s  | 3.843s   | 18.425s
pg_subscriber | 3.977s  | 9.988s   | 12.665s

Here, 'w' stands for 'workers'. I have included the tests to see the
test result variations with different values for
'max_sync_workers_per_subscription' ranging from 2 to 10. I ran the
tests for different data records; for 100MB I put  3,00,000 Records,
for 1GB I put 30,00,000 Records and for 5GB I put 1,50,00,000 Records.
It is observed that 'pg_subscriber' is better when the table size is
more.
Next I plan to run these tests for 10GB and 20GB to see if this trend
continues or not.
Attaching the script files which have the details of the test scripts
used and the excel file has the test run details. The
'pg_subscriber.pl' file is to test with 'pg_subscriber' and the
'logical_rep.pl' file is to test with 'Logical Replication'.

Thanks and Regards,
Shubham Khanna.


pg_subscriber.pl
Description: Binary data


logical_replication.pl
Description: Binary data


time_stamps(comparison).xlsx
Description: MS-Excel 2007 spreadsheet

Re: [PATCH] Compression dictionaries for JSONB

2024-01-17 Thread Shubham Khanna

On Wed, Jan 17, 2024 at 4:16 PM Aleksander Alekseev
 wrote:
>
> Hi hackers,
>
> > 8272749e added a few more arguments to CastCreate(). Here is the rebased 
> > patch.
>
> After merging afbfc029 [1] the patch needed a rebase. PFA v10.
>
> The patch is still in a PoC state and this is exactly why comments and
> suggestions from the community are most welcome! Particularly I would
> like to know:
>
> 1. Would you call it a wanted feature considering the existence of
> Pluggable TOASTer patchset which (besides other things) tries to
> introduce type-aware TOASTers for EXTERNAL attributes? I know what
> Simon's [2] and Nikita's latest answers were, and I know my personal
> opinion on this [3][4], but I would like to hear from the rest of the
> community.
>
> 2. How should we make sure a dictionary will not consume all the
> available memory? Limiting the amount of dictionary entries to pow(2,
> 16) and having dictionary versions seems to work OK for ZSON. However
> it was pointed out that this may be an unwanted limitation for the
> in-core implementation.
>
> [1]: 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c727f511;hp=afbfc02983f86c4d71825efa6befd547fe81a926
> [2]: 
> https://www.postgresql.org/message-id/CANbhV-HpCF852WcZuU0wyh1jMU4p6XLbV6rCRkZpnpeKQ9OenQ%40mail.gmail.com
> [3]: 
> https://www.postgresql.org/message-id/CAJ7c6TN-N3%3DPSykmOjmW1EAf9YyyHFDHEznX-5VORsWUvVN-5w%40mail.gmail.com
> [4]: 
> https://www.postgresql.org/message-id/CAJ7c6TO2XTTk3cu5w6ePHfhYQkoNpw7u1jeqHf%3DGwn%2BoWci8eA%40mail.gmail.com

I tried to apply the patch but it is failing at the Head. It is giving
the following error:
patching file src/test/regress/expected/dict.out
patching file src/test/regress/expected/oidjoins.out
patching file src/test/regress/expected/opr_sanity.out
patching file src/test/regress/parallel_schedule
Hunk #1 FAILED at 111.
1 out of 1 hunk FAILED -- saving rejects to file
src/test/regress/parallel_schedule.rej
patching file src/test/regress/sql/dict.sql
Please send the Re-base version of the patch.

Thanks and Regards,
Shubham Khanna.

Re: Fix search_path for all maintenance commands

2024-01-17 Thread Shubham Khanna

On Thu, Jan 18, 2024 at 9:19 AM Jeff Davis  wrote:
>
> On Mon, 2023-07-17 at 12:16 -0700, Jeff Davis wrote:
> > Based on feedback, I plan to commit soon.
>
> Attached is a new version.
>
> Changes:
>
> * Also switch the search_path during CREATE MATERIALIZED VIEW, so that
> it's consistent with REFRESH. As a part of this change, I slightly
> reordered things in ExecCreateTableAs() so that the skipData path
> returns early without entering the SECURITY_RESTRICTED_OPERATION. I
> don't think that's a problem because (a) that is one place where
> SECURITY_RESTRICTED_OPERATION is not used for security, but rather for
> consistency; and (b) that path doesn't go through rewriter, planner, or
> executor anyway so I don't see why it would matter.
>
> * Use GUC_ACTION_SAVE rather than GUC_ACTION_SET. That was a problem
> with the previous patch for index functions executed in parallel
> workers, which can happen calling SQL functions from pg_amcheck.
>
> * I used a wrapper function RestrictSearchPath() rather than calling
> set_config_option() directly. That provides a nice place in case we
> need to add a compatibility GUC to disable it.
>
> Question:
>
> Why do we switch to the table owner and use
> SECURITY_RESTRICTED_OPERATION in DefineIndex(), when we will switch in
> index_build (etc.) anyway? Similarly, why do we switch in vacuum_rel(),
> when it doesn't matter for lazy vacuum and we will switch in
> cluster_rel() and do_analyze_rel() anyway?

I tried to apply the patch but it is failing at the Head. It is giving
the following error:
Hunk #7 succeeded at 3772 (offset -12 lines).
patching file src/backend/commands/matview.c
patching file src/backend/commands/vacuum.c
Hunk #2 succeeded at 2169 (offset -19 lines).
patching file src/backend/utils/init/usercontext.c
patching file src/bin/scripts/t/100_vacuumdb.pl
Hunk #1 FAILED at 109.
1 out of 1 hunk FAILED -- saving rejects to file
src/bin/scripts/t/100_vacuumdb.pl.rej
patching file src/include/utils/usercontext.h
patching file src/test/modules/test_oat_hooks/expected/test_oat_hooks.out
patching file src/test/regress/expected/matview.out
patching file src/test/regress/expected/privileges.out
patching file src/test/regress/expected/vacuum.out
patching file src/test/regress/sql/matview.sql
patching file src/test/regress/sql/privileges.sql
patching file src/test/regress/sql/vacuum.sql

Please send the Re-base version of the patch.

Thanks and Regards,
Shubham Khanna.

Re: Add 64-bit XIDs into PostgreSQL 15

2024-01-18 Thread Shubham Khanna

On Wed, Dec 13, 2023 at 5:56 PM Maxim Orlov  wrote:
>
> Hi!
>
> Just to keep this thread up to date, here's a new version after recent 
> changes in SLRU.
> I'm also change order of the patches in the set, to make adding initdb MOX 
> options after the
> "core 64 xid" patch, since MOX patch is unlikely to be committed and now for 
> test purpose only.

I tried to apply the patch but it is failing at the Head. It is giving
the following error:
Hunk #1 succeeded at 601 (offset 5 lines).
patching file src/backend/replication/slot.c
patching file src/backend/replication/walreceiver.c
patching file src/backend/replication/walsender.c
Hunk #1 succeeded at 2434 (offset 160 lines).
patching file src/backend/storage/ipc/procarray.c
Hunk #1 succeeded at 1115 with fuzz 2.
Hunk #3 succeeded at 1286 with fuzz 2.
Hunk #7 FAILED at 4341.
Hunk #8 FAILED at 4899.
Hunk #9 FAILED at 4959.
3 out of 10 hunks FAILED -- saving rejects to file
src/backend/storage/ipc/procarray.c.rej
patching file src/backend/storage/ipc/standby.c
Hunk #1 FAILED at 1043.
Hunk #2 FAILED at 1370.
2 out of 2 hunks FAILED -- saving rejects to file
src/backend/storage/ipc/standby.c.rej
Please send the Re-base version of the patch.

Thanks and Regards,
Shubham Khanna.

Re: speed up a logical replica setup

2024-01-19 Thread Shubham Khanna

On Tue, Jan 16, 2024 at 11:58 AM Shubham Khanna
 wrote:
>
> On Thu, Dec 21, 2023 at 11:47 AM Amit Kapila  wrote:
> >
> > On Wed, Dec 6, 2023 at 12:53 PM Euler Taveira  wrote:
> > >
> > > On Thu, Nov 9, 2023, at 8:12 PM, Michael Paquier wrote:
> > >
> > > On Thu, Nov 09, 2023 at 03:41:53PM +0100, Peter Eisentraut wrote:
> > > > On 08.11.23 00:12, Michael Paquier wrote:
> > > >> - Should the subdirectory pg_basebackup be renamed into something more
> > > >> generic at this point?  All these things are frontend tools that deal
> > > >> in some way with the replication protocol to do their work.  Say
> > > >> a replication_tools?
> > > >
> > > > Seems like unnecessary churn.  Nobody has complained about any of the 
> > > > other
> > > > tools in there.
> > >
> > > Not sure.  We rename things across releases in the tree from time to
> > > time, and here that's straight-forward.
> > >
> > >
> > > Based on this discussion it seems we have a consensus that this tool 
> > > should be
> > > in the pg_basebackup directory. (If/when we agree with the directory 
> > > renaming,
> > > it could be done in a separate patch.) Besides this move, the v3 provides 
> > > a dry
> > > run mode. It basically executes every routine but skip when should do
> > > modifications. It is an useful option to check if you will be able to run 
> > > it
> > > without having issues with connectivity, permission, and existing objects
> > > (replication slots, publications, subscriptions). Tests were slightly 
> > > improved.
> > > Messages were changed to *not* provide INFO messages by default and 
> > > --verbose
> > > provides INFO messages and --verbose --verbose also provides DEBUG 
> > > messages. I
> > > also refactored the connect_database() function into which the connection 
> > > will
> > > always use the logical replication mode. A bug was fixed in the transient
> > > replication slot name. Ashutosh review [1] was included. The code was 
> > > also indented.
> > >
> > > There are a few suggestions from Ashutosh [2] that I will reply in another
> > > email.
> > >
> > > I'm still planning to work on the following points:
> > >
> > > 1. improve the cleanup routine to point out leftover objects if there is 
> > > any
> > >connection issue.
> > >
> >
> > I think this is an important part. Shall we try to write to some file
> > the pending objects to be cleaned up? We do something like that during
> > the upgrade.
> >
> > > 2. remove the physical replication slot if the standby is using one
> > >(primary_slot_name).
> > > 3. provide instructions to promote the logical replica into primary, I 
> > > mean,
> > >stop the replication between the nodes and remove the replication setup
> > >(publications, subscriptions, replication slots). Or even include 
> > > another
> > >action to do it. We could add both too.
> > >
> > > Point 1 should be done. Points 2 and 3 aren't essential but will provide 
> > > a nice
> > > UI for users that would like to use it.
> > >
> >
> > Isn't point 2 also essential because how would otherwise such a slot
> > be advanced or removed?
> >
> > A few other points:
> > ==
> > 1. Previously, I asked whether we need an additional replication slot
> > patch created to get consistent LSN and I see the following comment in
> > the patch:
> >
> > + *
> > + * XXX we should probably use the last created replication slot to get a
> > + * consistent LSN but it should be changed after adding pg_basebackup
> > + * support.
> >
> > Yeah, sure, we may want to do that after backup support and we can
> > keep a comment for the same but I feel as the patch stands today,
> > there is no good reason to keep it. Also, is there a reason that we
> > can't create the slots after backup is complete and before we write
> > recovery parameters
> >
> > 2.
> > + appendPQExpBuffer(str,
> > +   "CREATE SUBSCRIPTION %s CONNECTION '%s' PUBLICATION %s "
> > +   "WITH (create_slot = false, copy_data = false, enabled = false)",
> > +   dbinfo->subname, dbinfo->pubconninfo, dbinfo->pubname);
> >
> > Shouldn't we enable two_phase by default for

Re: speed up a logical replica setup

2024-01-23 Thread Shubham Khanna

On Tue, Jan 23, 2024 at 7:41 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear hackers,
>
> > We fixed some of the comments posted in the thread. We have created it
> > as top-up patch 0002 and 0003.
>
> I found that the CFbot raised an ERROR. Also, it may not work well in case
> of production build.
> PSA the fixed patch set.

 Segmentation fault was found after testing the given command(There is
an extra '/' between 'new_standby2' and '-P') '$ gdb --args
./pg_subscriber -D ../new_standby2 / -P "host=localhost
port=5432 dbname=postgres" -d postgres'
While executing the above command, I got the following error:
pg_subscriber: error: too many command-line arguments (first is "/")
pg_subscriber: hint: Try "pg_subscriber --help" for more information.

Program received signal SIGSEGV, Segmentation fault.
0x7e5b in cleanup_objects_atexit () at pg_subscriber.c:173
173    if (perdb->made_subscription)
(gdb) p perdb
$1 = (LogicalRepPerdbInfo *) 0x0

Thanks and Regards,
Shubham Khanna.

Re: Improve eviction algorithm in ReorderBuffer

2024-02-01 Thread Shubham Khanna

the arbitrary entry in O(log n) and also update the
> arbitrary entry's key in O(log n). This is known as the indexed
> priority queue. I've attached the patch for that (0001 and 0002).
>
> That way, in terms of reorderbuffer, we can update and remove the
> transaction's memory usage in O(log n) (in worst case and O(1) in
> average) and then pick the largest transaction in O(1). Since we might
> need to call ReorderBufferSerializeTXN() even in non-streaming case,
> we need to maintain the binaryheap anyway. I've attached the patch for
> that (0003).
>
> Here are test script for many sub-transactions case:
>
> create table test (c int);
> create or replace function testfn (cnt int) returns void as $$
> begin
>   for i in 1..cnt loop
> begin
>   insert into test values (i);
> exception when division_by_zero then
>   raise notice 'caught error';
>   return;
> end;
>   end loop;
> end;
> $$
> language plpgsql;
> select pg_create_logical_replication_slot('s', 'test_decoding');
> select testfn(5);
> set logical_decoding_work_mem to '4MB';
> select count(*) from pg_logical_slot_peek_changes('s', null, null)";
>
> and here are results:
>
> * HEAD: 16877.281 ms
> * HEAD w/ patches (0001 and 0002): 655.154 ms
>
> There is huge improvement in a many-subtransactions case.

I have run the same test and found around 12.53x improvement(the
median of five executions):
HEAD| HEAD+ v2-0001+ v2-0002 + v2-0003 patch
29197ms   | 2329ms

I had also run the regression test that you had shared at [1], there
was a very very slight dip in this case around it takes around 0.31x
more time:
HEAD| HEAD + v2-0001+ v2-0002 + v2-0003 patch
4459ms | 4473ms

The machine has Total Memory of 755.536 GB, 120 CPUs and RHEL 7
Operating System. Also find the detailed info of the performance
machine attached.

[1] - 
https://www.postgresql.org/message-id/CAD21AoB-7mPpKnLmBNfzfavG8AiTwEgAdVMuv%3DjzmAp9ex7eyQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.
MemTotal: 755.536 GB
MemFree: 748.281 GB
MemAvailable: 748.581 GB
Buffers: 0.002 GB
Cached: 1.366 GB
SwapCached: 0.000 GB
Active: 0.834 GB
Inactive: 0.745 GB
Active(anon): 0.221 GB
Inactive(anon): 0.028 GB
Active(file): 0.614 GB
Inactive(file): 0.717 GB
Unevictable: 0.000 GB
Mlocked: 0.000 GB
SwapTotal: 4.000 GB
SwapFree: 4.000 GB
Dirty: 0.000 GB
Writeback: 0.000 GB
AnonPages: 0.212 GB
Mapped: 0.142 GB
Shmem: 0.038 GB
Slab: 0.468 GB
SReclaimable: 0.139 GB
SUnreclaim: 0.329 GB
KernelStack: 0.020 GB
PageTables: 0.023 GB
NFS_Unstable: 0.000 GB
Bounce: 0.000 GB
WritebackTmp: 0.000 GB
CommitLimit: 381.768 GB
Committed_AS: 0.681 GB
VmallocTotal: 32768.000 GB
VmallocUsed: 1.852 GB
VmallocChunk: 32189.748 GB
Percpu: 0.035 GB
HardwareCorrupted: 0.000 GB
AnonHugePages: 0.025 GB
CmaTotal: 0.000 GB
CmaFree: 0.000 GB
HugePages_Total: 0.000 GB
HugePages_Free: 0.000 GB
HugePages_Rsvd: 0.000 GB
HugePages_Surp: 0.000 GB
Hugepagesize: 0.002 GB
DirectMap4k: 0.314 GB
DirectMap2M: 6.523 GB
DirectMap1G: 763.000 GB
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 62
model name  : Intel(R) Xeon(R) CPU E7-4890 v2 @ 2.80GHz
stepping: 7
microcode   : 0x715
cpu MHz : 1762.646
cache size  : 38400 KB
physical id : 0
siblings: 30
core id : 0
cpu cores   : 15
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb 
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt 
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb intel_ppin ssbd ibrs 
ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt 
dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
bogomips: 5586.07
clflush size: 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:NAME="Red Hat Enterprise Linux Server"
VERSION="7.8 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.8"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.8 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.8:GA:server"
HOME_URL="https://www.redhat.com/";
BUG_REPORT_URL="https://bugzilla.redhat.com/";

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.8
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.8"

Re: speed up a logical replica setup

2024-02-05 Thread Shubham Khanna

 used only for the logging.
> Can we output string before calling the function?
> (The parameter is not needed anymore if -P is removed)
>
> 08.
> The terminology is still not consistent. Some functions call the target as 
> standby,
> but others call it as subscriber.
>
> 09.
> v14 does not work if the standby server has already been set recovery_target*
> options. PSA the reproducer. I considered two approaches:
>
>  a) raise an ERROR when these parameter were set. check_subscriber() can do it
>  b) overwrite these GUCs as empty strings.
>
> 10.
> The execution always fails if users execute --dry-run just before. Because
> pg_createsubscriber stops the standby anyway. Doing dry run first is quite 
> normal
> use-case, so current implementation seems not user-friendly. How should we 
> fix?
> Below bullets are my idea:
>
>  a) avoid stopping the standby in case of dry_run: seems possible.
>  b) accept even if the standby is stopped: seems possible.
>  c) start the standby at the end of run: how arguments like pg_ctl -l should 
> be specified?
>
> My top-up patches fixes some issues.
>
> v15-0001: same as v14-0001
> === experimental patches ===
> v15-0002: Use replication connections when we connects to the primary.
>   Connections to standby is not changed because the standby/subscriber
>   does not require such type of connection, in principle.
>   If we can accept connecting to subscriber with replication mode,
>   this can be simplified.
> v15-0003: Remove -P and use primary_conninfo instead. Same as v13-0004
> v15-0004: Check whether the target is really standby. This is done by 
> pg_is_in_recovery()
> v15-0005: Avoid stopping/starting standby server in dry_run mode.
>   I.e., approach a). in #10 is used.
> v15-0006: Overwrite recovery parameters. I.e., aproach b). in #9 is used.
>
> [1]: 
> https://www.postgresql.org/message-id/b315c7da-7ab1-4014-a2a9-8ab6ae26017c%40app.fastmail.com


While reviewing the v15 patches I discovered that subscription
connection string has added a lot of options which are not required
now:
v15-0001
postgres=# select subname, subconninfo from pg_subscription;
subname|   subconninfo
---+--
pg_createsubscriber_5_1867633 | host=localhost port=5432 dbname=postgres
(1 row)



v15-0001+0002+0003
postgres=# select subname, subconninfo from pg_subscription;
subname|

  subconninfo



---+

------------
--
pg_createsubscriber_5_1895366 | user=shubham
passfile='/home/shubham/.pgpass' channel_binding=prefer ho
st=127.0.0.1 port=5432 sslmode=prefer sslcompression=0
sslcertmode=allow sslsni=1 ssl_min_protocol_versi
on=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0
target_session_attrs=any load_balance_
hosts=disable dbname=postgres
(1 row)


Here, we can see that channel_binding, sslmode, sslcertmode, sslsni,
gssencmode, krbsrvname, etc are getting included. This does not look
intentional, we should keep the subscription connection same as in
v15-0001.

Thanks and Regards,
Shubham Khanna.


Thanks and Regards,
Shubham Khanna.

Re: speed up a logical replica setup

2024-02-09 Thread Shubham Khanna

fied version of v16-0006.
>
> Functions arguments should not be struct because they are passing by value.
> They should be a pointer. Or, for modify_subscriber_sysid and 
> wait_for_end_recovery,
> we can pass a value which would be really used.
>
>
> Done.
>
> 07.
> ```
> static char *get_base_conninfo(char *conninfo, char *dbname,
>const char *noderole);
> ```
>
> Not sure noderole should be passed here. It is used only for the logging.
> Can we output string before calling the function?
> (The parameter is not needed anymore if -P is removed)
>
>
> Done.
>
> 08.
> The terminology is still not consistent. Some functions call the target as 
> standby,
> but others call it as subscriber.
>
>
> The terminology should reflect the actual server role. I'm calling it 
> "standby"
> if it is a physical replica and "subscriber" if it is a logical replica. Maybe
> "standby" isn't clear enough.
>
> 09.
> v14 does not work if the standby server has already been set recovery_target*
> options. PSA the reproducer. I considered two approaches:
>
> a) raise an ERROR when these parameter were set. check_subscriber() can do it
> b) overwrite these GUCs as empty strings.
>
>
> I prefer (b) that's exactly what you provided in v16-0006.
>
> 10.
> The execution always fails if users execute --dry-run just before. Because
> pg_createsubscriber stops the standby anyway. Doing dry run first is quite 
> normal
> use-case, so current implementation seems not user-friendly. How should we 
> fix?
> Below bullets are my idea:
>
> a) avoid stopping the standby in case of dry_run: seems possible.
> b) accept even if the standby is stopped: seems possible.
> c) start the standby at the end of run: how arguments like pg_ctl -l should 
> be specified?
>
>
> I prefer (a). I applied a slightly modified version of v16-0005.
>
> This new patch contains the following changes:
>
> * check whether the target is really a standby server (0004)
> * refactor: pg_create_logical_replication_slot function
> * use a single variable for pg_ctl and pg_resetwal directory
> * avoid recovery errors applying default settings for some GUCs (0006)
> * don't stop/start the standby in dry run mode (0005)
> * miscellaneous fixes
>
> I don't understand why v16-0002 is required. In a previous version, this patch
> was using connections in logical replication mode. After some discussion we
> decided to change it to regular connections and use SQL functions (instead of
> replication commands). Is it a requirement for v16-0003?
>
> I started reviewing v16-0007 but didn't finish yet. The general idea is ok.
> However, I'm still worried about preventing some use cases if it provides only
> the local connection option. What if you want to keep monitoring this instance
> while the transformation is happening? Let's say it has a backlog that will
> take some time to apply. Unless, you have a local agent, you have no data 
> about
> this server until pg_createsubscriber terminates. Even the local agent might
> not connect to the server unless you use the current port.

I tried verifying few scenarios by using 5 databases and came across
the following errors:

./pg_createsubscriber -D ../new_standby -P "host=localhost port=5432
dbname=postgres" -S "host=localhost port=9000 dbname=postgres"  -d db1
-d db2 -d db3 -d db4 -d db5

pg_createsubscriber: error: publisher requires 6 wal sender
processes, but only 5 remain
pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 7.

It is successful only with 7 wal senders, so we can change error
messages accordingly.


pg_createsubscriber: error: publisher requires 6 replication slots,
but only 5 remain
pg_createsubscriber: hint: Consider increasing max_replication_slots
to at least 7.

It is successful only with 7 replication slots, so we can change error
messages accordingly.

Thanks and Regards,
Shubham Khanna,

Re: speed up a logical replica setup

2024-03-22 Thread Shubham Khanna

On Fri, Mar 22, 2024 at 9:02 AM Euler Taveira  wrote:
>
> On Thu, Mar 21, 2024, at 6:49 AM, Shlok Kyal wrote:
>
> There is a compilation error while building postgres with the patch
> due to a recent commit. I have attached a top-up patch v32-0003 to
> resolve this compilation error.
> I have not updated the version of the patch as I have not made any
> change in v32-0001 and v32-0002 patch.
>
>
> I'm attaching a new version (v33) to incorporate this fix (v32-0003) into the
> main patch (v32-0001). This version also includes 2 new tests:
>
> - refuse to run if the standby server is running
> - refuse to run if the standby was promoted e.g. it is not in recovery
>
> The first one exercises a recent change (standby should be stopped) and the
> second one covers an important requirement.
>
> Based on the discussion [1] about the check functions, Vignesh suggested that 
> it
> should check both server before exiting. v33-0003 implements it. I don't have 
> a
> strong preference; feel free to apply it.
>
>
> [1] 
> https://www.postgresql.org/message-id/CALDaNm1Dg5tDRmaabk%2BZND4WF17NrNq52WZxCE%2B90-PGz5trQQ%40mail.gmail.com

I had run valgrind with pg_createsubscriber to see if there were any
issues. Valgrind reported the following issues:
==651272== LEAK SUMMARY:
==651272==definitely lost: 1,319 bytes in 18 blocks
==651272==indirectly lost: 1,280 bytes in 2 blocks
==651272==  possibly lost: 44 bytes in 3 blocks
==651272==still reachable: 3,066 bytes in 22 blocks
==651272== suppressed: 0 bytes in 0 blocks
==651272==
==651272== For lists of detected and suppressed errors, rerun with: -s
==651272== ERROR SUMMARY: 17 errors from 17 contexts (suppressed: 0 from 0)
The attached report has the details of the same.

Thanks and Regards:
Shubham Khanna.

valgrind.log
Description: Binary data

Re: Improve eviction algorithm in ReorderBuffer

2024-03-27 Thread Shubham Khanna

On Tue, Mar 5, 2024 at 8:50 AM vignesh C  wrote:
>
> On Wed, 28 Feb 2024 at 11:40, Amit Kapila  wrote:
> >
> > On Mon, Feb 26, 2024 at 7:54 PM Masahiko Sawada  
> > wrote:
> > >
> >
> > A few comments on 0003:
> > ===
> > 1.
> > +/*
> > + * Threshold of the total number of top-level and sub transactions
> > that controls
> > + * whether we switch the memory track state. While the MAINTAIN_HEAP state 
> > is
> > + * effective when there are many transactions being decoded, in many 
> > systems
> > + * there is generally no need to use it as long as all transactions
> > being decoded
> > + * are top-level transactions. Therefore, we use MaxConnections as
> > the threshold
> > + * so we can prevent switch to the state unless we use subtransactions.
> > + */
> > +#define REORDER_BUFFER_MEM_TRACK_THRESHOLD MaxConnections
> >
> > The comment seems to imply that MAINTAIN_HEAP is useful for large
> > number of transactions but ReorderBufferLargestTXN() switches to this
> > state even when there is one transaction. So, basically we use the
> > binary_heap technique to get the largest even when we have one
> > transaction but we don't maintain that heap unless we have
> > REORDER_BUFFER_MEM_TRACK_THRESHOLD number of transactions are
> > in-progress. This means there is some additional work when (build and
> > reset heap each time when we pick largest xact) we have fewer
> > transactions in the system but that may not be impacting us because of
> > other costs involved like serializing all the changes. I think once we
> > can try to stress test this by setting
> > debug_logical_replication_streaming to 'immediate' to see if the new
> > mechanism has any overhead.
>
> I ran the test with a transaction having many inserts:
>
>  | 5000 | 1   |  2  | 10|  100   | 1000
> --- 
> |---|||--||
> Head | 26.31 | 48.84 | 93.65  | 480.05  |  4808.29   | 47020.16
> Patch |  26.35  | 50.8   | 97.99  | 484.8|  4856.95   | 48108.89
>
> The same test with debug_logical_replication_streaming= 'immediate'
>
>  | 5000 | 1   |  2  | 10|  100   | 1000
> --- 
> |---|||--||
> Head | 59.29   |  115.84  |  227.21 | 1156.08   |  11367.42   |  113986.14
> Patch | 62.45  |  120.48  |  240.56 | 1185.12   |  11855.37   |  119921.81
>
> The execution time is in milliseconds. The column header indicates the
> number of inserts in the transaction.
> In this case I noticed that the test execution with patch was taking
> slightly more time.

I have ran the tests that Vignesh had reported a issue, the test
results with the latest patch is given below:

Without debug_logical_replication_streaming= 'immediate'
Record|1000  |100  |10 | 2 | 1 | 5000
--|---|-|---|--|--|--
Head|47563.759| 4917.057|478.923|97.28   |50.368 |25.917
Patch|47445.733| 4722.874|472.817|95.15   |48.801 |26.168
%imp|0.248| 03.949|01.274   |02.189|03.111  |-0.968

With debug_logical_replication_streaming= 'immediate'
Record| 1000  | 100   | 10  | 2  | 1  | 5000
--||--|-|---|---|--
Head|106281.236|10669.992|1073.815|214.287|107.62  |54.947
Patch|103108.673|10603.139|1064.98  |210.229|106.321|54.218
%imp| 02.985| 0.626  |0.822  |01.893  |01.207  |01.326

The execution time is in milliseconds. The column header indicates the
number of inserts in the transaction. I can notice with the test
result that the issue has been resolved with the new patch.

Thanks and Regards,
Shubham Khanna.

Improve documentation of upgrade for streaming replication section.

2024-02-09 Thread Shubham Khanna

Hi,

Currently the documentation of upgrade for streaming replication
section says that logical replication slots will be copied
irrespective of the version, but the logical replication slots are
copied only if the old primary is version 17.0 or later. The changes
for the same are in the attached patch.

Thanks and Regards,
Shubham Khanna.


v1-0001-Improve-documentation-of-upgrade-for-streaming-re (2).patch
Description: Binary data

Re: speed up a logical replica setup

2024-02-14 Thread Shubham Khanna

On Thu, Feb 15, 2024 at 10:37 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Euler,
>
> Here are my minor comments for 17.
>
> 01.
> ```
> /* Options */
> static const char *progname;
>
> static char *primary_slot_name = NULL;
> static bool dry_run = false;
>
> static bool success = false;
>
> static LogicalRepInfo *dbinfo;
> static int  num_dbs = 0;
> ```
>
> The comment seems out-of-date. There is only one option.
>
> 02. check_subscriber and check_publisher
>
> Missing pg_catalog prefix in some lines.
>
> 03. get_base_conninfo
>
> I think dbname would not be set. IIUC, dbname should be a pointer of the 
> pointer.
>
>
> 04.
>
> I check the coverage and found two functions have been never called:
>  - drop_subscription
>  - drop_replication_slot
>
> Also, some cases were not tested. Below bullet showed notable ones for me.
> (Some of them would not be needed based on discussions)
>
> * -r is specified
> * -t is specified
> * -P option contains dbname
> * -d is not specified
> * GUC settings are wrong
> * primary_slot_name is specified on the standby
> * standby server is not working
>
> In feature level, we may able to check the server log is surely removed in 
> case
> of success.
>
> So, which tests should be added? drop_subscription() is called only when the
> cleanup phase, so it may be difficult to test. According to others, it seems 
> that
> -r and -t are not tested. GUC-settings have many test cases so not sure they
> should be. Based on this, others can be tested.
>
> PSA my top-up patch set.
>
> V18-0001: same as your patch, v17-0001.
> V18-0002: modify the alignment of codes.
> V18-0003: change an argument of get_base_conninfo. Per comment 3.
> === experimental patches ===
> V18-0004: Add testcases per comment 4.
> V18-0005: Remove -P option. I'm not sure it should be needed, but I made just 
> in case.

I created a cascade Physical Replication system like
node1->node2->node3 and ran pg_createsubscriber for node2. After
running the script, I started the node2 again and found
pg_createsubscriber command was successful after which the physical
replication between node2 and node3 has been broken. I feel
pg_createsubscriber should check this scenario and throw an error in
this case to avoid breaking the cascaded replication setup. I have
attached the script which was used to verify this.

Thanks and Regards,
Shubham Khanna.
#!/bin/bash

#
# This script tests a case of cascading replication.
#
# Creating system,:
#   node1-(physical replication)->node2-(physical replication)->node3
#

# Stop instances
pg_ctl stop -D node1
pg_ctl stop -D node2
pg_ctl stop -D node3

# Remove old files
rm -rf node1
rm -rf node2
rm -rf node3
rm log_node2 log_node1 log_node3

# Initialize primary
initdb -D node1

echo "wal_level = logical" >> node1/postgresql.conf
echo "max_replication_slots=3" >> node1/postgresql.conf

pg_ctl -D node1 -l log_node1 start

psql -d postgres -c "CREATE DATABASE db1";
psql -d db1 -c "CHECKPOINT";

sleep 3

# Initialize standby
pg_basebackup -v -R -D node2
echo "Port=9000" >> node2/postgresql.conf

pg_ctl -D node2 -l log_node2 start

# Initialize another standby
pg_basebackup -p 9000 -v -R -D node3
echo "Port=9001" >> node3/postgresql.conf
pg_ctl -D node3 -l log_node3 start

# Insert a tuple to primary
psql -d db1 -c "CREATE TABLE c1(a int)";
psql -d db1 -c "Insert into c1 Values(2)";
sleep 3

# And verify it can be seen on another standby
psql -d db1 -p 9001 -c "Select * from c1";

pg_createsubscriber -D node2/ -S "host=localhost port=9000 dbname=postgres" -d postgres -d db1 -r -v
pg_ctl -D node2 -l log_node2 start

Re: speed up a logical replica setup

2024-02-16 Thread Shubham Khanna

ary as node1 and
standby as node3, this scenario runs successfully. I was not sure if
this should be supported or not?
Scenario 2) Create cascade replication like node1->node2->node3 using
replication slots (attached cascade_3node_setup_with_slots has the
script for this):
Here, slot name was used as slot1 for node1 to node2 and slot2 for
node2 to node3. Then I ran pg_createsubscriber by specifying primary
as node1 and standby as node3. In this case pg_createsubscriber fails
with the following error:
pg_createsubscriber: error: could not obtain replication slot
information: got 0 rows, expected 1 row
[Inferior 1 (process 2623483) exited with code 01]

This is failing because slot name slot2 is used between node2->node3
but pg_createsubscriber is checked for slot1, the slot which is used
for replication between node1->node2.
Thoughts?

Thanks and Regards,
Shubham Khanna.
#!/bin/bash

#
# This script tests a case of cascading replication.
#
# Creating system,:
#   node1-(physical replication)->node2-(physical replication)->node3
#

# Stop instances
sudo pkill -9 postgres
pg_ctl stop -D node1
pg_ctl stop -D node2
pg_ctl stop -D node3

# Remove old files
rm -rf node1
rm -rf node2
rm -rf node3
rm log_node2 log_node1 log_node3

# Initialize primary
initdb -D node1

echo "wal_level = logical" >> node1/postgresql.conf
echo "max_replication_slots=10" >> node1/postgresql.conf

pg_ctl -D node1 -l log_node1 start

psql -d postgres -c "CREATE DATABASE db1";
psql -d db1 -c "CHECKPOINT";

sleep 3

# Initialize standby
pg_basebackup -h 127.0.0.1 -X stream -v -R -D node2
echo "Port=9000" >> node2/postgresql.conf

pg_ctl -D node2 -l log_node2 start

# Initialize another standby
pg_basebackup -h 127.0.0.1 -X stream -p 9000 -v -R -D node3
echo "Port=9001" >> node3/postgresql.conf
pg_ctl -D node3 -l log_node3 start

# Insert a tuple to primary
psql -d db1 -c "CREATE TABLE c1(a int)";
psql -d db1 -c "Insert into c1 Values(2)";
sleep 3

# And verify it can be seen on another standby
psql -d db1 -p 9001 -c "Select * from c1";


pg_createsubscriber -D node3/ -P "host=localhost port=5432 dbname=postgres" -d postgres -d db1 -p 9001 -r -v
pg_ctl -D node3 -l log_node3 start
psql -d db1 -c "INSERT INTO c1 VALUES(3)";
psql -d db1 -p 9001 -c "SELECT * FROM c1";



#!/bin/bash

#
# This script tests a case of cascading replication.
#
# Creating system,:
#   node1-(physical replication)->node2-(physical replication)->node3
#

# Stop instances
sudo pkill -9 postgres
pg_ctl stop -D node1
pg_ctl stop -D node2
pg_ctl stop -D node3

# Remove old files
rm -rf node1
rm -rf node2
rm -rf node3
rm log_node2 log_node1 log_node3

# Initialize primary
initdb -D node1

echo "wal_level = logical" >> node1/postgresql.conf
echo "max_replication_slots=10" >> node1/postgresql.conf

pg_ctl -D node1 -l log_node1 start

psql -d postgres -c "CREATE DATABASE db1";
psql -d db1 -c "CHECKPOINT";

sleep 3

# Initialize standby
pg_basebackup -h 127.0.0.1 -X stream -S slot1 -C -v -R -D node2
echo "Port=9000" >> node2/postgresql.conf

pg_ctl -D node2 -l log_node2 start

# Initialize another standby
pg_basebackup -h 127.0.0.1 -X stream -p 9000 -S slot2 -C -v -R -D node3
echo "Port=9001" >> node3/postgresql.conf
pg_ctl -D node3 -l log_node3 start

# Insert a tuple to primary
psql -d db1 -c "CREATE TABLE c1(a int)";
psql -d db1 -c "Insert into c1 Values(2)";
sleep 3

# And verify it can be seen on another standby
psql -d db1 -p 9001 -c "Select * from c1";


pg_createsubscriber -D node3/ -P "host=localhost port=5432 dbname=postgres" -d postgres -d db1 -p 9001 -r -v
pg_ctl -D node3 -l log_node3 start
psql -d db1 -c "INSERT INTO c1 VALUES(3)";
psql -d db1 -p 9001 -c "SELECT * FROM c1";

Re: speed up a logical replica setup

2024-03-04 Thread Shubham Khanna

 about to remove 
> it.
> If the logging is enabled, the information during the pg_createsubscriber will
> be available. The client log can be redirected to a file for future 
> inspection.
>
> Comments?

I applied the v25 patch and did RUN the 'pg_createsubscriber' command.
I was unable to execute it and experienced the following error:

$ ./pg_createsubscriber -D node2/ -P "host=localhost port=5432
dbname=postgres"  -d postgres -d db1 -p 9000 -r
./pg_createsubscriber: invalid option -- 'p'
pg_createsubscriber: hint: Try "pg_createsubscriber --help" for more
information.

Here, the --p is not accepting the desired port number. Thoughts?

Thanks and Regards,
Shubham Khanna.

Re: Statistics Import and Export

2023-11-06 Thread Shubham Khanna

On Mon, Nov 6, 2023 at 4:16 PM Corey Huinker  wrote:
>>
>>
>> Yeah, that use makes sense as well, and if so then postgres_fdw would likely 
>> need to be aware of the appropriate query for several versions back - they 
>> change, not by much, but they do change. So now we'd have each query text in 
>> three places: a system view, postgres_fdw, and the bin/scripts pre-upgrade 
>> program. So I probably should consider the best way to share those in the 
>> codebase.
>>
>
> Attached is v2 of this patch.

While applying Patch, I noticed few Indentation issues:
1) D:\Project\Postgres>git am v2-0003-Add-pg_import_rel_stats.patch
.git/rebase-apply/patch:1265: space before tab in indent.
errmsg("invalid statistics
format, stxndeprs must be array or null");
.git/rebase-apply/patch:1424: trailing whitespace.
   errmsg("invalid statistics format,
stxndistinct attnums elements must be strings, but one is %s",
.git/rebase-apply/patch:1315: new blank line at EOF.
+
warning: 3 lines add whitespace errors.
Applying: Add pg_import_rel_stats().

2) D:\Project\Postgres>git am v2-0004-Add-pg_export_stats-pg_import_stats.patch
.git/rebase-apply/patch:282: trailing whitespace.
const char *export_query_v14 =
.git/rebase-apply/patch:489: trailing whitespace.
const char *export_query_v12 =
.git/rebase-apply/patch:648: trailing whitespace.
const char *export_query_v10 =
.git/rebase-apply/patch:826: trailing whitespace.

.git/rebase-apply/patch:1142: trailing whitespace.
result = PQexec(conn,
warning: squelched 4 whitespace errors
warning: 9 lines add whitespace errors.
Applying: Add pg_export_stats, pg_import_stats.

Thanks and Regards,
Shubham Khanna.

Re: Fix output of zero privileges in psql

2023-11-07 Thread Shubham Khanna

On Wed, Nov 8, 2023 at 10:46 AM Laurenz Albe  wrote:
>
> On Sat, 2023-10-21 at 04:29 +0200, Erik Wienhold wrote:
> > The attached v3 of my initial patch
> > does that.  It also includes Laurenz' fix to no longer ignore \pset null
> > (minus the doc changes that suggest using \pset null to distinguish
> > between default and empty privileges because that's no longer needed).
>
> Thanks!
>
> I went over the patch, fixed some problems and added some more stuff from
> my patch.
>
> In particular:
>
>   --- a/doc/src/sgml/ddl.sgml
>   +++ b/doc/src/sgml/ddl.sgml
>   @@ -2353,7 +2353,9 @@ GRANT SELECT (col1), UPDATE (col1) ON mytable TO 
> miriam_rw;
>  
>   If the Access privileges column is empty for a given
>   object, it means the object has default privileges (that is, its
>   -   privileges entry in the relevant system catalog is null).  Default
>   +   privileges entry in the relevant system catalog is null).  The column 
> shows
>   +   (none) for empty privileges (that is, no privileges 
> at
>   +   all, even for the object owner — a rare occurrence).  Default
>   privileges always include all privileges for the owner, and can include
>   some privileges for PUBLIC depending on the object
>   type, as explained above.  The first GRANT
>
> This description of empty privileges is smack in the middle of describing
> default privileges.  I thought that was confusing and moved it to its
> own paragraph.
>
>   --- a/src/bin/psql/describe.c
>   +++ b/src/bin/psql/describe.c
>   @@ -6718,7 +6680,13 @@ static void
>printACLColumn(PQExpBuffer buf, const char *colname)
>{
>   appendPQExpBuffer(buf,
>   - "pg_catalog.array_to_string(%s, E'\\n') AS \"%s\"",
>   + "CASE\n"
>   + "  WHEN %s IS NULL THEN ''\n"
>   + "  WHEN pg_catalog.cardinality(%s) = 0 THEN '%s'\n"
>   + "  ELSE pg_catalog.array_to_string(%s, E'\\n')\n"
>   + "END AS \"%s\"",
>   + colname,
>   + colname, gettext_noop("(none)"),
> colname, gettext_noop("Access privileges"));
>}
>
> This erroneously displays NULL as empty string and subverts my changes.
> I have removed the first branch of the CASE expression.
>
>   --- a/src/test/regress/expected/psql.out
>   +++ b/src/test/regress/expected/psql.out
>   @@ -6663,3 +6663,97 @@ DROP ROLE regress_du_role0;
>DROP ROLE regress_du_role1;
>DROP ROLE regress_du_role2;
>DROP ROLE regress_du_admin;
>   +-- Test empty privileges.
>   +BEGIN;
>   +WARNING:  there is already a transaction in progress
>
> This warning is caused by a pre-existing error in the regression test, which
> forgot to close the transaction.  I have added a COMMIT at the appropriate 
> place.
>
>   +ALTER TABLESPACE regress_tblspace OWNER TO CURRENT_USER;
>   +REVOKE ALL ON TABLESPACE regress_tblspace FROM CURRENT_USER;
>   +\db+ regress_tblspace
>   +List of tablespaces
>   +   Name   | Owner  |Location | Access 
> privileges | Options |  Size   | Description
>   
> +--++-+---+-+-+-
>   + regress_tblspace | regress_zeropriv_owner | pg_tblspc/16385 | (none)  
>   | | 0 bytes |
>   +(1 row)
>
> This test is not stable, since it contains the OID of the tablespace, which
> is different every time.
>
>   +ALTER DATABASE :"DBNAME" OWNER TO CURRENT_USER;
>   +REVOKE ALL ON DATABASE :"DBNAME" FROM CURRENT_USER, PUBLIC;
>   +\l :"DBNAME"
>   +List of databases
>   +Name| Owner  | Encoding  | Locale Provider | 
> Collate | Ctype | ICU Locale | ICU Rules | Access privileges
>   
> +++---+-+-+---++---+---
>   + regression | regress_zeropriv_owner | SQL_ASCII | libc| C 
>   | C ||   | (none)
>   +(1 row)
>
> This test is also not stable, since it depends on the locale definition
> of the regression test database.  If you use "make installcheck", that could
> be a different locale.
>
> I think that these tests are not absolutely necessary, and the other tests
> are sufficient.  Consequently, I took the simple road of removing them.
>
> I also tried to improve the commit message.
>
> Patch attached.

I tested the Patch for the modified changes and it is working fine.

Thanks and regards,
Shubham Khanna.

Re: Improve tab completion for ALTER DEFAULT PRIVILEGE and ALTER TABLE

2023-11-24 Thread Shubham Khanna

n Fri, Nov 24, 2023 at 6:33 PM vignesh C  wrote:
>
> Hi,
>
> Improved tab completion for "ALTER DEFAULT PRIVILEGE" and "ALTER TABLE":
> 1) GRANT, REVOKE and FOR USER keyword was not displayed in tab
> completion of alter default privileges like the below statement:
> ALTER DEFAULT PRIVILEGES GRANT INSERT ON tables TO PUBLIC;
> ALTER DEFAULT PRIVILEGES REVOKE INSERT ON tables FROM PUBLIC;
> ALTER DEFAULT PRIVILEGES FOR USER vignesh revoke INSERT ON tables FROM dba1;
>
> 2) USER was not displayed for "ALTER DEFAULT PRIVILEGES IN SCHEMA
> public FOR " like in below statement:
> ALTER DEFAULT PRIVILEGES IN SCHEMA public FOR USER dba1 GRANT INSERT
> ON TABLES TO PUBLIC;
>
> 3) "FOR GRANT OPTION" was not display for "ALTER DEFAULT PRIVILEGES
> REVOKE " like in below statement:
> alter default privileges revoke grant option for select ON tables FROM dba1;
>
> 4) "DATA TYPE" was missing in "ALTER TABLE table-name ALTER COLUMN
> column-name SET" like in:
> ALTER TABLE t1 ALTER COLUMN c1 SET DATA TYPE text;
>
> Attached patch has the changes for the same.

+COMPLETE_WITH("ROLE", "USER");
+  /* ALTER DEFAULT PRIVILEGES REVOKE */
+  else if (Matches("ALTER", "DEFAULT", "PRIVILEGES", "REVOKE"))
+COMPLETE_WITH("SELECT", "INSERT", "UPDATE", "DELETE", "TRUNCATE",
+    "REFERENCES", "TRIGGER", "CREATE", "EXECUTE", "USAGE",
+    "MAINTAIN", "ALL", "GRANT OPTION FOR");

I could not find "alter default privileges revoke maintain", should
this be removed?

Regards,
Shubham Khanna

Re: [PATCH] psql: Add tab-complete for optional view parameters

2023-11-27 Thread Shubham Khanna

On Thu, Nov 23, 2023 at 4:37 PM Dean Rasheed  wrote:
>
> On Mon, 14 Aug 2023 at 18:34, David Zhang  wrote:
> >
> > it would be great to switch the order of the 3rd and the 4th line to make a
> > better match for "CREATE" and "CREATE OR REPLACE" .
> >
>
> I took a look at this, and I think it's probably neater to keep the
> "AS SELECT" completion for CREATE [OR REPLACE] VIEW xxx WITH (*)
> separate from the already existing support for "AS SELECT" without
> WITH.
>
> A couple of other points:
>
> 1. It looks slightly neater, and works better, to complete one word at
> a time -- e.g., "WITH" then "(", instead of "WITH (", since the latter
> doesn't work if the user has already typed "WITH".
>
> 2. It should also complete with "=" after the option, where appropriate.
>
> 3. CREATE VIEW should offer "local" and "cascaded" after
> "check_option" (though there's no point in doing likewise for the
> boolean options, since they default to true, if present, and false
> otherwise).
>
> Attached is an updated patch, incorporating those comments.
>
> Barring any further comments, I think this is ready for commit.

I reviewed the given Patch and it is working fine.

Thanks and Regards,
Shubham Khanna.

Re: Improve tab completion for ALTER DEFAULT PRIVILEGE and ALTER TABLE

2023-11-27 Thread Shubham Khanna

On Mon, Nov 27, 2023 at 9:58 PM vignesh C  wrote:
>
> On Fri, 24 Nov 2023 at 18:37, Shubham Khanna
>  wrote:
> >
> > n Fri, Nov 24, 2023 at 6:33 PM vignesh C  wrote:
> > >
> > > Hi,
> > >
> > > Improved tab completion for "ALTER DEFAULT PRIVILEGE" and "ALTER TABLE":
> > > 1) GRANT, REVOKE and FOR USER keyword was not displayed in tab
> > > completion of alter default privileges like the below statement:
> > > ALTER DEFAULT PRIVILEGES GRANT INSERT ON tables TO PUBLIC;
> > > ALTER DEFAULT PRIVILEGES REVOKE INSERT ON tables FROM PUBLIC;
> > > ALTER DEFAULT PRIVILEGES FOR USER vignesh revoke INSERT ON tables FROM 
> > > dba1;
> > >
> > > 2) USER was not displayed for "ALTER DEFAULT PRIVILEGES IN SCHEMA
> > > public FOR " like in below statement:
> > > ALTER DEFAULT PRIVILEGES IN SCHEMA public FOR USER dba1 GRANT INSERT
> > > ON TABLES TO PUBLIC;
> > >
> > > 3) "FOR GRANT OPTION" was not display for "ALTER DEFAULT PRIVILEGES
> > > REVOKE " like in below statement:
> > > alter default privileges revoke grant option for select ON tables FROM 
> > > dba1;
> > >
> > > 4) "DATA TYPE" was missing in "ALTER TABLE table-name ALTER COLUMN
> > > column-name SET" like in:
> > > ALTER TABLE t1 ALTER COLUMN c1 SET DATA TYPE text;
> > >
> > > Attached patch has the changes for the same.
> >
> > +COMPLETE_WITH("ROLE", "USER");
> > +  /* ALTER DEFAULT PRIVILEGES REVOKE */
> > +  else if (Matches("ALTER", "DEFAULT", "PRIVILEGES", "REVOKE"))
> > +COMPLETE_WITH("SELECT", "INSERT", "UPDATE", "DELETE", "TRUNCATE",
> > +    "REFERENCES", "TRIGGER", "CREATE", "EXECUTE", "USAGE",
> > +    "MAINTAIN", "ALL", "GRANT OPTION FOR");
> >
> > I could not find "alter default privileges revoke maintain", should
> > this be removed?
>
> This was reverted as part of:
> 151c22deee66a3390ca9a1c3675e29de54ae73fc.
> Revert MAINTAIN privilege and pg_maintain predefined role.
>
> This reverts the following commits: 4dbdb82513, c2122aae63,
> 5b1a879943, 9e1e9d6560, ff9618e82a, 60684dd834, 4441fc704d,
> and b5d6382496.  A role with the MAINTAIN privilege may be able to
> use search_path tricks to escalate privileges to the table owner.
> Unfortunately, it is too late in the v16 development cycle to apply
> the proposed fix, i.e., restricting search_path when running
> maintenance commands.
>
I have executed the given changes and they are working fine.

Thanks and Regards,
Shubham Khanna.

Re: EXCLUDE COLLATE in CREATE/ALTER TABLE document

2023-11-28 Thread Shubham Khanna

On Fri, Nov 17, 2023 at 4:55 AM shihao zhong  wrote:
>
> Hi Jian,
>
> Thanks for your comments, a new version is attached.
>
> Thanks,
> Shihao
>
> On Fri, Nov 10, 2023 at 9:59 AM jian he  wrote:
>>
>> On Wed, Nov 1, 2023 at 10:30 AM shihao zhong  wrote:
>> >
>> > Thank you for your feedback on my previous patch. I have fixed the issue 
>> > and attached a new patch for your review. Could you please take a look for 
>> > it if you have a sec? Thanks
>> >
>>
>> Your patch works fine. you can see it here:
>> https://cirrus-ci.com/task/6481922939944960
>> in an ideal world, since the doc is already built, we can probably
>> view it as a plain html file just click the ci test result.
>>
>> in src/sgml/ref/create_table.sgml:
>> "Each exclude_element can optionally specify an operator class and/or
>> ordering options; these are described fully under CREATE INDEX."
>>
>> You may need to update this sentence to reflect that exclude_element
>> can also optionally specify collation.

I have reviewed the changes and it looks fine.

Thanks and Regards,
Shubham Khanna.

Re: Postgres Partitions Limitations (5.11.2.3)

2023-11-29 Thread Shubham Khanna

On Thu, Nov 9, 2023 at 10:00 PM shihao zhong  wrote:
>
> That looks good to me!
>
> The new status of this patch is: Ready for Committer


I have reviewed the patch and it is working fine.

Thanks and Regards,
Shubham Khanna.

Re: Various small doc improvements; plpgsql, schemas, permissions, oidvector

2023-11-30 Thread Shubham Khanna

ine) and within that new sub-session sends the SQL 
> contained within the routine to the server for execution just like any other 
> client, and therefore any object references present in that SQL need to be 
> resolved to a schema as previously discussed.  By default, upon connecting, 
> the newly created session is updated so that its settings take on the current 
> values in the parent session.  When authoring a routine this is often 
> undesirable as the behavior of the routine now depends upon an environment 
> that is not definitively known to the routine author.  Schema-qualifying 
> object references within the routine body is one tool to remove such 
> uncertainty.  Another is by using the SET clause of the relevant CREATE SQL 
> Command to specify what the value of important settings are to be.
>
> The key takeaway from the preceding two paragraphs is that because routines 
> are stored as text and their settings resolved at execution time, and 
> indirect server actions can invoke those routines with a pg_catalog only 
> search_path, any routine that potentially can be invoked in that manner and 
> makes use of search_path should either be modified to eliminate such use or 
> define the required search_path via the SET option during its creation or 
> replacement.
>
> 0012 - (this has changed recently too, I'm not sure how this fits within the 
> rest.  I still feel like something is missing even in my revision but not 
> sure what or if it is covered sufficiently nearby)
>
> All roles are ultimately owned and managed by the bootstrap superuser, who 
> can establish trees of groups and users upon which the object permission 
> granting system works.  By enabling the CREATEROLE attribute on a user a 
> superuser can delegate role creation to other people (it is inadvisable to 
> enable CREATEROLE on a group) who can then construct their own trees of 
> groups and users.
>
> (not sure how true this is still but something to consider in terms of big 
> picture role setups)
> It is likewise inadvisable to create multiple superusers since in practice 
> their actions in many cases can be made to look attributable to the bootstrap 
> superuser.  It is necessary to enlist services outside of PostgreSQL to 
> adequately establish auditing in a multi-superuser setup.
>
> Note my intentional use of users and groups here.  We got rid of the 
> distinction with CREATE ROLE but in terms of system administration they still 
> have, IMO, significant utility.
>
> 0013 - +1
> 0014 - +1
>
> 0015 - I'd almost rather only note in CREATE FUNCTION that PARALLEL does not 
> matter for a trigger returning function as triggers only execute in cases of 
> data writing which precludes using parallelism.  Which is indeed what the 
> documentation explicitly calls out in "When Can Parallel Query Be Used?" so 
> it isn't inference from omission.
>
> I don't have a problem saying in the trigger documentation, maybe at the very 
> end:
>
> The functions that triggers execute are more appropriately considered 
> procedures but since the later feature did not exist when triggers were 
> implemented precedent compels the dba to write their routines as functions.  
> As a consequence, function attributes such as PARALLEL, and WINDOW, are 
> possible to define on a function that is to be used as a trigger but will 
> have no effect. (though I would think at least some of these get rejected 
> outright)
>
> 0016 - not within my knowledge base
>
>I reviewed the Patch and found a few changes. Please have a look at them:
-v7-0002-Change-section-heading-to-better-describe-referen.patch

"Re-Using the Type of Columns and Variables" seems adequate.  Getting
something in there about declartions seems too wordy.  I thought
perhaps "Referencing" instead of "Re-Using", but "referencing" isn't
perfect and "re-using" is generic enough, shorter, and simpler to

Here 'declartions' should be replaced with 'declarations'.


-v7-0012-Explain-role-management.patch

+   The managment of most database objects is by way of granting some role

Here 'managment' should be replaced with 'management'.

-v7-0013-Hyperlink-from-CREATE-FUNCTION-reference-page-to-.patch

Is is nice to have a link in the reference material to a full discussion.

Here 'is' should be removed.

-v7-0015-Trigger-authors-need-not-worry-about-parallelism.patch

Plus, this patch adds an index entry so the new verbage is easy to find
for those who do investigate.

Here 'verbage' should be replaced with 'verbiage'.

-v7-0016-Predicate-locks-are-held-per-cluster-not-per-data.patch

This is a significant corner case and so should be documented.  It is
also somewhat suprising since the databases within a cluster are
otherwise isolated, at least from the user's perspective.

Here 'suprising' should be replaced with 'surprising'.

Predicate locks are held per-cluster, not per database.
+ This means that serializeable transactions in one database can have
+ effects in another.
+ Long running serializeable transactions, as might occur accidentally
+ when
+ pagination
+ halts psql output, can have
+ significant inter-database effects.
+ These include exhausting available predicate locks and
+ cluster-wide WAL checkpoint delay.
+ When making use of serializeable transactions consider having
+ separate clusters for production and non-production use.

Here 'serializeable' should be replaced with 'serializable'.

Thanks and Regards,
Shubham Khanna.

Re: SET ROLE documentation improvement

2023-12-04 Thread Shubham Khanna

On Fri, Nov 10, 2023 at 11:11 PM Nathan Bossart
 wrote:
>
> On Tue, Sep 26, 2023 at 08:33:25AM -0700, Yurii Rashkovskii wrote:
> > This is a good start, indeed. I've amended my patch to include it.
>
> Thanks for the new patch.
>
> Looking again, I'm kind of hesitant to add too much qualification to this
> note about losing superuser privileges.  If we changed it to
>
> Note that when a superuser chooses to SET ROLE to a non-superuser 
> role,
> they lose their superuser privileges, except for the privilege to
> change to another role again using SET ROLE or RESET ROLE.
>
> it almost seems to imply that a non-superuser role could obtain the ability
> to switch to any role if they first SET ROLE to a superuser.  In practice,
> that's true because they could just give the session role SUPERUSER, but I
> don't think that's the intent of this section.
>
> I thought about changing it to something like
>
> Note that when a superuser chooses to SET ROLE to a non-superuser 
> role,
> they lose their superuser privileges.  However, if the current session
> user is a superuser, they retain the ability to set the current user
> identifier to any role via SET ROLE and RESET ROLE.
>
> but it seemed weird to me to single out superusers here when it's always
> true that the current session user retains the ability to SET ROLE to any
> role they have the SET option on.  That is already covered above in the
> "Description" section, so I don't really see the need to belabor the point
> by adding qualifications to the "Notes" section.  ISTM the point of these
> couple of paragraphs in the "Notes" section is to explain the effects on
> privileges for schemas, tables, etc.
>
> I still think we should update the existing note about privileges for
> SET/RESET ROLE to something like the following:
>
> diff --git a/doc/src/sgml/ref/set_role.sgml b/doc/src/sgml/ref/set_role.sgml
> index 13bad1bf66..c91a95f5af 100644
> --- a/doc/src/sgml/ref/set_role.sgml
> +++ b/doc/src/sgml/ref/set_role.sgml
> @@ -41,8 +41,10 @@ RESET ROLE
>
>
>
> -   The specified role_name
> -   must be a role that the current session user is a member of.
> +   The current session user must have the SET for the
> +   specified role_name, either
> +   directly or indirectly via a chain of memberships with the
> +   SET option.
> (If the session user is a superuser, any role can be selected.)
>
>
> --
> I have Reviewed the patch. Patch applies neatly without any issues. 
> Documentation build was successful and there was no Spell-check issue also. I 
> did not find any issues. The patch looks good to me.
>
>Thanks and Regards,
>Shubham Khanna.

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

2023-12-04 Thread Shubham Khanna

On Tue, Dec 5, 2023 at 10:51 AM James Coleman  wrote:
>
> Hello,
>
> While working on my talk for PGConf.NYC next week I came across this
> bullet in the docs on heap only tuples:
>
> > Old versions of updated rows can be completely removed during normal
> > operation, including SELECTs, instead of requiring periodic vacuum
> > operations. (This is possible because indexes do not reference their page
> > item identifiers.)
>
> But when a HOT update happens the entry in an (logically unchanged)
> index still points to the original heap tid, and that line item is
> updated with a pointer to the new line pointer in the same page.
>
> Assuming I'm understanding this correctly, attached is a patch
> correcting the description.
>
> I have Reviewed the patch. Patch applies neatly without any issues. 
> Documentation build was successful and there was no Spell-check issue also. I 
> did not find any issues. The patch looks >good to me.
>
>Thanks and Regards,
>Shubham Khanna.

Re: Remove MSVC scripts from the tree

2023-12-05 Thread Shubham Khanna

ff should be discussed in a separate patch.
>
> > Except for openssl, where the link is somewhat valuable, the rest don't 
> > really
> > seem to be specific to windows.
>
> Yeah, these are historic.  Still they can be useful for the Visual
> builds in some cases, I guess?  I am not sure if it's worth pushing
> these dependencies to the main meson page, somewhat polluting it for
> references that most people don't really care about.  Anyway, I'm
> tempted to be less ambitious in a first step and just move that in the
> compatibility section.
>
> >> +   
> >> +Special Considerations for 64-Bit Windows
> >> +
> >> + PostgreSQL will only build for the x64 architecture on 64-bit 
> >> Windows.
> >> +
> >> +
> >> + Mixing 32- and 64-bit versions in the same build tree is not 
> >> supported.
> >> + The build system will automatically detect if it's running in a 32- 
> >> or
> >> + 64-bit environment, and build PostgreSQL accordingly. For this 
> >> reason, it
> >> + is important to start the correct command prompt before building.
> >> +
> >
> >> Isn't this directly contradicting the earlier
> >> +The native Windows port requires a 32 or 64-bit version of Windows
> >> +2000 or later. Earlier operating systems do
> > ?
>
> How it that?  Mixing 32b and 64b libraries is not related to the
> minimal runtime version.  This is just telling to not mix both.
> --
>
Patch is not applying. Please share the Rebased Version. Please find the error:

D:\Project\Postgres>git am D:\Project\Patch\v5-0001-Remove-MSVC-scripts.patch
error: patch failed: doc/src/sgml/filelist.sgml:38
error: doc/src/sgml/filelist.sgml: patch does not apply
error: patch failed: src/tools/msvc/Mkvcbuild.pm:1
error: src/tools/msvc/Mkvcbuild.pm: patch does not apply
error: patch failed: src/tools/msvc/Solution.pm:1
error: src/tools/msvc/Solution.pm: patch does not apply
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Applying: Remove MSVC scripts
Patch failed at 0001 Remove MSVC scripts
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Thanks and Regards,
Shubham Khanna.

Re: Adding a pg_get_owned_sequence function?

2023-12-08 Thread Shubham Khanna

On Fri, Dec 8, 2023 at 3:43 PM Dagfinn Ilmari Mannsåker
 wrote:
>
> Hi hackers,
>
> I've always been annoyed by the fact that pg_get_serial_sequence takes
> the table and returns the sequence as strings rather than regclass. And
> since identity columns were added, the name is misleading as well (which
> is even acknowledged in the docs, together with a suggestion for a
> better name).
>
> So, instead of making excuses in the documentation, I thought why not
> add a new function which addresses all of these issues, and document the
> old one as a backward-compatibilty wrapper?
>
> Please see the attached patch for my stab at this.
>

I reviewed the Patch and the compilation looks fine. I tested various
scenarios and did not find any issues.  Also I did RUN 'make check'
and 'make check-world' and all the test cases passed successfully. I
figured out a small typo please have a look at it:-

This is the name the docs say `pg_get_serial_sequence` sholud have
had, and gives us the opportunity to change the return and table
argument types to `regclass` and the column argument to `name`,
instead of using `text` everywhere.  This matches what's in catalogs,
and requires less explaining than the rules for
`pg_get_serial_sequence`.

Here 'sholud' have been 'should'.

Thanks and Regards,
Shubham Khanna.

Re: [Proposal] Add foreign-server health checks infrastructure

2023-12-11 Thread Shubham Khanna

On Mon, Dec 11, 2023 at 2:08 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Fujii-san, Tom,
>
> Thank you for giving a suggestion! PSA new version.
>
> > Regarding 0001 patch, on second thought, to me, it seems odd to expose
> > a function that doesn't have anything to directly do with PostgreSQL
> > as a libpq function. The function simply calls poll() on the socket
> > with POLLRDHUP if it is supported. While it is certainly convenient to
> > have this function, I'm not sure that it fits within the scope of libpq.
> > Thought?
>
> Current style is motivated by Onder [1] and later discussions. I thought it 
> might
> be useful for other developers, but OK, I can remove changes on libpq modules.
> Horiguchi-san has suggested [2] that it might be overkill to use 
> WaitEventSet()
> mechanism, so I kept using poll().
> I reused the same naming as previous version because they actually do 
> something
> Like libpq, but better naming is very welcome.
>
> > Regarding 0002 patch, the behavior of 
> > postgres_fdw_verify_connection_states()
> > in regards to multiple connections using different user mappings seems
> > not well documented. The function seems to return false if either of
> > those connections has been closed.
>
> I did not considered the situation because I have not came up with the 
> situation
> that only one of connections to the same foreign server is broken.
>
> > This behavior means that it's difficult to identify which connection
> > has been closed when there are multiple ones to the given server.
> > To make it easier to identify that, it could be helpful to extend
> > the postgres_fdw_verify_connection_states() function so that it accepts
> > a unique connection as an input instead of just the server name.
> > One suggestion is to extend the function so that it accepts
> > both the server name and the user name used for the connection,
> > and checks the specified connection. If only the server name is specified,
> > the function should check all connections to the server and return false
> > if any of them are closed. This would be helpful since there is typically
> > only one connection to the server in most cases.
>
> Just to confirm, your point "user name" means a local user, right?
> I made a patch for addressing them.
>
> > Additionally, it would be helpful to extend the 
> > postgres_fdw_get_connections()
> > function to also return the user name used for each connection,
> > as currently there is no straightforward way to identify it.
>
> Added, See 0003. IIUC there is no good way to extract user mapping from its 
> OID, so I have
> added an function to do that and used it.
>
> > The function name "postgres_fdw_verify_connection_states" may seem
> > unnecessarily long to me. A simpler name like
> > "postgres_fdw_verify_connection" may be enough?
>
> Renamed.
>
> > The patch may not be ready for commit due to the review comments,
> > and with the feature freeze approaching in a few days,
> > it may not be possible to include this feature in v16.
>
> It is sad for me, but it is more important for PostgreSQL to add nicer codes.
> I changed status to "Needs review" again.

I am failing to apply the latest
Patch-"v39-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch"
 for review. Please find the error I am facing:
D:\Project\Postgres>git am
D:\Project\Patch\v39-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch
error: patch failed: contrib/postgres_fdw/connection.c:117
error: contrib/postgres_fdw/connection.c: patch does not apply
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Applying: postgres_fdw: add postgres_fdw_verify_connection variants
Patch failed at 0001 postgres_fdw: add postgres_fdw_verify_connection variants
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Please rebase and post an updated version of the Patch.

Thanks and Regards,
Shubham Khanna.

Re: Some useless includes in llvmjit_inline.cpp

2023-12-13 Thread Shubham Khanna

On Mon, Dec 11, 2023 at 6:28 AM Thomas Munro  wrote:
>
> Hi,
>
> This is not exhaustive, I just noticed in passing that we don't need these.

I was able to compile the changes with "--with-llvm" successfully, and
the changes look good to me.

Thanks and Regards,
Shubham Khanna.

Re: Tab complete for CREATE SUBSCRIPTION ... CONECTION does not work

2023-12-26 Thread Shubham Khanna

On Tue, Dec 26, 2023 at 3:02 PM Japin Li  wrote:
>
>
> Hi hacker,
>
> As $subject detailed, the tab-complete cannot work such as:
>
>CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=6543' \t
>
> It seems that the get_previous_words() could not parse the single quote.
>
> OTOH, it works for CREATE SUBSCRIPTION sub CONNECTION xx \t, should we fix it?
>
I reviewed the Patch and it looks fine to me.

Thanks and Regards,
Shubham Khanna.

Re: warn if GUC set to an invalid shared library

2023-12-27 Thread Shubham Khanna

On Thu, Dec 28, 2023 at 10:54 AM Justin Pryzby  wrote:
>
> On Sat, Oct 29, 2022 at 10:40 AM Justin Pryzby  wrote:
> > > > > On Fri, Sep 02, 2022 at 05:24:58PM -0500, Justin Pryzby wrote:
> > > > > > It caused no issue when I changed:
> > > > > >
> > > > > > /* Check that it's acceptable for the 
> > > > > > indicated parameter */
> > > > > > if (!parse_and_validate_value(record, name, 
> > > > > > value,
> > > > > > - PGC_S_FILE, 
> > > > > > ERROR,
> > > > > > + PGC_S_TEST, 
> > > > > > ERROR,
> > > > > >   &newval, 
> > > > > > &newextra))
> > > > > >
> > > > > > I'm not sure where to go from here.
> > > > >
> > > > > I'm hoping for some guidance ; this simple change may be naive, but 
> > > > > I'm not
> > > > > sure what a wider change would look like.
>
> I'm still hoping.
>
> > > PGC_S_TEST is a better fit, so my question is whether it's really that
> > > simple ?
> >
> > I've added the trivial change as 0001 and re-opened the patch (which ended
> > up in January's CF)
> >
> > If for some reason it's not really as simple as that, then 001 will
> > serve as a "straw-man patch" hoping to elicit discussion on that point.
>
> > From defdb57fe0ec373c1eea8df42f0e1831b3f9c3cc Mon Sep 17 00:00:00 2001
> > From: Justin Pryzby 
> > Date: Fri, 22 Jul 2022 15:52:11 -0500
> > Subject: [PATCH v6 1/4] WIP: test GUCs from ALTER SYSTEM as PGC_S_TEST not
> >  FILE
> >
> > WIP: ALTER SYSTEM should use PGC_S_TEST rather than PGC_S_FILE
> >
> > Since the value didn't come from a file.  Or maybe we should have
> > another PGC_S_ value for this, or a flag for 'is a test'.
> > ---
> >  src/backend/utils/misc/guc.c | 2 +-
> >  src/include/utils/guc.h  | 1 +
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
> > index 6f21752b844..ae8810591d6 100644
> > --- a/src/backend/utils/misc/guc.c
> > +++ b/src/backend/utils/misc/guc.c
> > @@ -4435,7 +4435,7 @@ AlterSystemSetConfigFile(AlterSystemStmt 
> > *altersysstmt)
> >
> >   /* Check that it's acceptable for the indicated 
> > parameter */
> >   if (!parse_and_validate_value(record, name, value,
> > -   
> > PGC_S_FILE, ERROR,
> > +   
> > PGC_S_TEST, ERROR,
> > 
> > &newval, &newextra))
> >   ereport(ERROR,
> >   
> > (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>
> This is rebased over my own patch to enable checks for
> REGRESSION_TEST_NAME_RESTRICTIONS.
>
I was reviewing the Patch and came across a minor issue that the Patch
does not apply on the current Head. Please provide the updated version
of the patch.

Thanks and Regards,
Shubham Khanna.

Re: Some revises in adding sorting path

2023-12-28 Thread Shubham Khanna

On Thu, Dec 28, 2023 at 4:00 PM Richard Guo  wrote:
>
>
> On Wed, Mar 29, 2023 at 4:00 AM David Rowley  wrote:
>>
>> If you write some tests for this code, it will be useful to prove that
>> it actually does something, and also that it does not break again in
>> the future. I don't really want to just blindly copy the pattern used
>> in 3c6fc5820 for creating incremental sort paths if it's not useful
>> here. It would be good to see tests that make an Incremental Sort path
>> using the code you're changing.
>
>
> Thanks for the suggestion.  I've managed to come up with a query that
> gets the codes being changed here to perform an incremental sort.
>
> set min_parallel_index_scan_size to 0;
> set enable_seqscan to off;
>
> Without making those parallel paths:
>
> explain (costs off)
> select * from tenk1 where four = 2 order by four, hundred, 
> parallel_safe_volatile(thousand);
>   QUERY PLAN
> --
>  Incremental Sort
>Sort Key: hundred, (parallel_safe_volatile(thousand))
>Presorted Key: hundred
>->  Gather Merge
>  Workers Planned: 3
>  ->  Parallel Index Scan using tenk1_hundred on tenk1
>Filter: (four = 2)
> (7 rows)
>
> and with those parallel paths:
>
> explain (costs off)
> select * from tenk1 where four = 2 order by four, hundred, 
> parallel_safe_volatile(thousand);
>   QUERY PLAN
> ---
>  Gather Merge
>Workers Planned: 3
>->  Incremental Sort
>  Sort Key: hundred, (parallel_safe_volatile(thousand))
>  Presorted Key: hundred
>  ->  Parallel Index Scan using tenk1_hundred on tenk1
>Filter: (four = 2)
> (7 rows)
>
> I've added two tests for the code changes in create_ordered_paths in the
> new patch.
>
>>
>> Same for the 0003 patch.
>
>
> For the code changes in gather_grouping_paths, I've managed to come up
> with a query that makes an explicit Sort atop cheapest partial path.
>
> explain (costs off)
> select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
>  QUERY PLAN
> 
>  Finalize GroupAggregate
>Group Key: twenty, (parallel_safe_volatile(two))
>->  Gather Merge
>  Workers Planned: 4
>  ->  Sort
>Sort Key: twenty, (parallel_safe_volatile(two))
>->  Partial HashAggregate
>  Group Key: twenty, parallel_safe_volatile(two)
>  ->  Parallel Seq Scan on tenk1
> (9 rows)
>
> Without this logic the plan would look like:
>
> explain (costs off)
> select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
>  QUERY PLAN
> 
>  Finalize GroupAggregate
>Group Key: twenty, (parallel_safe_volatile(two))
>->  Sort
>  Sort Key: twenty, (parallel_safe_volatile(two))
>  ->  Gather
>Workers Planned: 4
>->  Partial HashAggregate
>  Group Key: twenty, parallel_safe_volatile(two)
>  ->  Parallel Seq Scan on tenk1
> (9 rows)
>
> This test is also added in the new patch.
>
> But I did not find a query that makes an incremental sort in this case.
> After trying for a while it seems to me that we do not need to consider
> incremental sort in this case, because for a partial path of a grouped
> or partially grouped relation, it is either unordered (HashAggregate or
> Append), or it has been ordered by the group_pathkeys (GroupAggregate).
> It seems there is no case that we'd have a partial path that is
> partially sorted.
>
I reviewed the Patch and it looks fine to me.

Thanks and Regards,
Shubham Khanna.

Re: Some revises in adding sorting path

2023-12-28 Thread Shubham Khanna

On Thu, Dec 28, 2023 at 4:01 PM Shubham Khanna
 wrote:
>
> On Thu, Dec 28, 2023 at 4:00 PM Richard Guo  wrote:
> >
> >
> > On Wed, Mar 29, 2023 at 4:00 AM David Rowley  wrote:
> >>
> >> If you write some tests for this code, it will be useful to prove that
> >> it actually does something, and also that it does not break again in
> >> the future. I don't really want to just blindly copy the pattern used
> >> in 3c6fc5820 for creating incremental sort paths if it's not useful
> >> here. It would be good to see tests that make an Incremental Sort path
> >> using the code you're changing.
> >
> >
> > Thanks for the suggestion.  I've managed to come up with a query that
> > gets the codes being changed here to perform an incremental sort.
> >
> > set min_parallel_index_scan_size to 0;
> > set enable_seqscan to off;
> >
> > Without making those parallel paths:
> >
> > explain (costs off)
> > select * from tenk1 where four = 2 order by four, hundred, 
> > parallel_safe_volatile(thousand);
> >   QUERY PLAN
> > --
> >  Incremental Sort
> >Sort Key: hundred, (parallel_safe_volatile(thousand))
> >Presorted Key: hundred
> >->  Gather Merge
> >  Workers Planned: 3
> >  ->  Parallel Index Scan using tenk1_hundred on tenk1
> >Filter: (four = 2)
> > (7 rows)
> >
> > and with those parallel paths:
> >
> > explain (costs off)
> > select * from tenk1 where four = 2 order by four, hundred, 
> > parallel_safe_volatile(thousand);
> >   QUERY PLAN
> > ---
> >  Gather Merge
> >Workers Planned: 3
> >->  Incremental Sort
> >  Sort Key: hundred, (parallel_safe_volatile(thousand))
> >  Presorted Key: hundred
> >  ->  Parallel Index Scan using tenk1_hundred on tenk1
> >Filter: (four = 2)
> > (7 rows)
> >
> > I've added two tests for the code changes in create_ordered_paths in the
> > new patch.
> >
> >>
> >> Same for the 0003 patch.
> >
> >
> > For the code changes in gather_grouping_paths, I've managed to come up
> > with a query that makes an explicit Sort atop cheapest partial path.
> >
> > explain (costs off)
> > select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
> >  QUERY PLAN
> > 
> >  Finalize GroupAggregate
> >Group Key: twenty, (parallel_safe_volatile(two))
> >->  Gather Merge
> >  Workers Planned: 4
> >  ->  Sort
> >Sort Key: twenty, (parallel_safe_volatile(two))
> >->  Partial HashAggregate
> >  Group Key: twenty, parallel_safe_volatile(two)
> >  ->  Parallel Seq Scan on tenk1
> > (9 rows)
> >
> > Without this logic the plan would look like:
> >
> > explain (costs off)
> > select count(*) from tenk1 group by twenty, parallel_safe_volatile(two);
> >  QUERY PLAN
> > 
> >  Finalize GroupAggregate
> >Group Key: twenty, (parallel_safe_volatile(two))
> >->  Sort
> >  Sort Key: twenty, (parallel_safe_volatile(two))
> >  ->  Gather
> >Workers Planned: 4
> >->  Partial HashAggregate
> >  Group Key: twenty, parallel_safe_volatile(two)
> >  ->  Parallel Seq Scan on tenk1
> > (9 rows)
> >
> > This test is also added in the new patch.
> >
> > But I did not find a query that makes an incremental sort in this case.
> > After trying for a while it seems to me that we do not need to consider
> > incremental sort in this case, because for a partial path of a grouped
> > or partially grouped relation, it is either unordered (HashAggregate or
> > Append), or it has been ordered by the group_pathkeys (GroupAggregate).
> > It seems there is no case that we'd have a partial path that is
> > partially sorted.
> >
Just for clarity; I am not familiar with the code. And for the review,
I ran 'make check' and 'make check-world' and all the test cases
passed successfully.

Thanks and Regards,
Shubham Khanna.

Re: Commitfest manager January 2024

2023-12-31 Thread Shubham Khanna

On Sat, Dec 23, 2023 at 8:53 AM vignesh C  wrote:
>
> Hi,
>
> I didn't see anyone volunteering for the January Commitfest, so I'll
> volunteer to be CF manager for January 2024 Commitfest.

I can assist with the January 2024 Commitfest.

Thanks and Regards,
Shubham Khanna.

Re: Assorted typo fixes

2024-01-01 Thread Shubham Khanna

On Thu, Dec 28, 2023 at 3:21 AM Dagfinn Ilmari Mannsåker
 wrote:
>
> Hi folks,
>
> I was playing around with the `typos` tool
> (https://github.com/crate-ci/typos), and thought I'd run it on the
> posgres repo for fun.  After a bit of tweaking to get rid of most false
> positives (see separately attached .typos.toml file), it came up with a
> useful set of suggestions, some of which I applied verbatim, others
> which needed a bit more rewording.
>
> Attached is a series of patches.  The first one are what I consider
> obvious, unambiguous fixes to code comments.  The subsequent ones are
> fixes for actual code (variable, function, type names) and docs, one
> patch per class of typo.  As far as I can tell, none of the code changes
> (except the ECPG one, see below) affect anything exported, so this
> should not cause any compatibility issues for extensions.
>
> The ECPG change affects the generated C code, but from my reading of the
> callers in descriptor.c and ecpg.trailer, any code that would have
> caused it to encounter the affected enum value would fail to compile, so
> either the case is not possible, or nobody actually uses whatever syntax
> is affected (I don't know enough about ECPG to tell without spending far
> too much time digging in the code).

I was reviewing the Patch and came across a minor issue that the Patch
does not apply on the current Head. Please provide the updated version
of the patch. Also, I found one typo:
0008-ecpg-fix-typo-in-get_dtype-return-value-for-ECPGd_co.patch
All the other enum values return a string mathing the enum label, but
this has had a trailing r since the function was added in commit
339a5bbfb17ecd171ebe076c5bf016c4e66e2c0a

 Here 'mathing' should be 'matching'.

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-06-14 Thread Shubham Khanna

> Thanks for the updated patch, few comments:
> 1) The option name seems wrong here:
> In one place include_generated_column is specified and other place
> include_generated_columns is specified:
>
> +   else if (IsSet(supported_opts,
> SUBOPT_INCLUDE_GENERATED_COLUMN) &&
> +strcmp(defel->defname,
> "include_generated_column") == 0)
> +   {
> +   if (IsSet(opts->specified_opts,
> SUBOPT_INCLUDE_GENERATED_COLUMN))
> +   errorConflictingDefElem(defel, pstate);
> +
> +   opts->specified_opts |= 
> SUBOPT_INCLUDE_GENERATED_COLUMN;
> +   opts->include_generated_column = defGetBoolean(defel);
> +   }

Fixed.

> diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
> index d453e224d9..e8ff752fd9 100644
> --- a/src/bin/psql/tab-complete.c
> +++ b/src/bin/psql/tab-complete.c
> @@ -3365,7 +3365,7 @@ psql_completion(const char *text, int start, int end)
> COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
>   "disable_on_error",
> "enabled", "failover", "origin",
>   "password_required",
> "run_as_owner", "slot_name",
> - "streaming",
> "synchronous_commit", "two_phase");
> + "streaming",
> "synchronous_commit", "two_phase","include_generated_columns");
>
> 2) This small data table need not have a primary key column as it will
> create an index and insertion will happen in the index too.
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS
> AS (a * 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');

Fixed.

> 3) Please add a test case for this:
> +  set to false. If the subscriber-side
> column is also a
> +  generated column then this option has no effect; the
> replicated data will
> +  be ignored and the subscriber column will be filled as
> normal with the
> +  subscriber-side computed or default data.

Added the required test case.

> 4) You can use a new style of ereport to remove the brackets around errcode
> 4.a)
> +   else if (!parse_bool(strVal(elem->arg),
> &data->include_generated_columns))
> +   ereport(ERROR,
> +
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +errmsg("could not
> parse value \"%s\" for parameter \"%s\"",
> +
> strVal(elem->arg), elem->defname)));
>
> 4.b) similarly here too:
> +   ereport(ERROR,
> +   (errcode(ERRCODE_SYNTAX_ERROR),
> +   /*- translator: both %s are strings of the form
> "option = value" */
> +   errmsg("%s and %s are mutually
> exclusive options",
> +   "copy_data = true",
> "include_generated_column = true")));
>
> 4.c) similarly here too:
> +   if (include_generated_columns_option_given)
> +   ereport(ERROR,
> +   
> (errcode(ERRCODE_SYNTAX_ERROR),
> +errmsg("conflicting
> or redundant options")));

Fixed.

> 5) These variable names can be changed to keep it smaller, something
> like gencol or generatedcol or gencolumn, etc
> +++ b/src/include/catalog/pg_subscription.h
> @@ -98,6 +98,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
>   * slots) in the upstream database are enabled
>   * to be synchronized to the standbys. */
>
> + bool subincludegeneratedcolumn; /* True if generated columns must be
> published */
> +
>  #ifdef CATALOG_VARLEN /* variable-length fields start here */
>   /* Connection string to the publisher */
>   text subconninfo BKI_FORCE_NOT_NULL;
> @@ -157,6 +159,7 @@ typedef struct Subscription
>   List*publications; /* List of publication names to subscribe to */
>   char*origin; /* Only publish data originating from the
>   * specified origin */
> + bool includegeneratedcolumn; /* publish generated column data */
>  } Subscription;

Fixed.

The attached Patch contains the suggested changes.

Thanks and Regards,
Shubham Khanna.


v6-0001-Enable-support-for-include_generated_columns-opti.patch
Description: Binary data

Re: Pgoutput not capturing the generated columns

2024-06-14 Thread Shubham Khanna

these columns or not. The default is false.
> +   
> +  
> +
>
> 7a.
> It doesn't render properly. e.g. Should not be bold italic (probably
> the class is wrong?), because none of the nearby parameters look this
> way.
>
> ~
>
> 7b.
> The name here should NOT have hyphens. It needs underscores same as
> all other nearby protocol parameters.
>
> ~
>
> 7c.
> The description seems overly verbose.
>
> SUGGESTION
> Boolean option to enable generated columns. This option controls
> whether generated columns should be included in the string
> representation of tuples during logical decoding in PostgreSQL. The
> default is false.

Fixed.

> ==
> doc/src/sgml/ref/create_subscription.sgml
>
> 8.
> +
> +id="sql-createsubscription-params-with-include-generated-column">
> +include_generated_column
> (boolean)
> +
> + 
> +  Specifies whether the generated columns present in the tables
> +  associated with the subscription should be replicated. The default 
> is
> +  false.
> + 
>
> The parameter name should be plural (include_generated_columns).

Fixed.

> ==
> src/backend/commands/subscriptioncmds.c
>
> 9.
>  #define SUBOPT_ORIGIN 0x8000
> +#define SUBOPT_INCLUDE_GENERATED_COLUMN 0x0001
>
> Should be plural COLUMNS
>
Fixed.

> 10.
> + else if (IsSet(supported_opts, SUBOPT_INCLUDE_GENERATED_COLUMN) &&
> + strcmp(defel->defname, "include_generated_column") == 0)
>
> The new subscription parameter should be plural ("include_generated_columns").

Fixed.

> 11.
> +
> + /*
> + * Do additional checking for disallowed combination when copy_data and
> + * include_generated_column are true. COPY of generated columns is
> not supported
> + * yet.
> + */
> + if (opts->copy_data && opts->include_generated_column)
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + /*- translator: both %s are strings of the form "option = value" */
> + errmsg("%s and %s are mutually exclusive options",
> + "copy_data = true", "include_generated_column = true")));
> + }
>
> /combination/combinations/
>
> The parameter name should be plural in the comment and also in the
> error message.

Fixed.

> ==
> src/bin/psql/tab-complete.c
>
> 12.
>   COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
> "disable_on_error", "enabled", "failover", "origin",
> "password_required", "run_as_owner", "slot_name",
> -   "streaming", "synchronous_commit", "two_phase");
> +   "streaming", "synchronous_commit", 
> "two_phase","include_generated_columns");
>
> The new param should be added in alphabetical order same as all the others.

Fixed.

> ==
> src/include/catalog/pg_subscription.h
>
> 13.
> + bool subincludegeneratedcolumn; /* True if generated columns must be
> published */
> +
>
> The field name should be plural.

Fixed.

>
> 14.
> + bool includegeneratedcolumn; /* publish generated column data */
>  } Subscription;
>
> The field name should be plural.

Fixed.

> ==
> src/include/replication/walreceiver.h
>
> 15.
>   * prepare time */
>   char*origin; /* Only publish data originating from the
>   * specified origin */
> + bool include_generated_column; /* publish generated columns */
>   } logical;
>   } proto;
>  } WalRcvStreamOptions;
>
> ~
>
> This new field name should be plural.

Fixed.

> ==
> src/test/subscription/t/011_generated.pl
>
> 16.
> +my ($cmdret, $stdout, $stderr) = $node_subscriber->psql('postgres', qq(
> + CREATE SUBSCRIPTION sub2 CONNECTION '$publisher_connstr' PUBLICATION
> pub2 WITH (include_generated_column = true)
> +));
> +ok( $stderr =~
> +   qr/copy_data = true and include_generated_column = true are
> mutually exclusive options/,
> + 'cannot use both include_generated_column and copy_data as true');
>
> Isn't this mutual exclusiveness of options something that could have
> been tested in the regress test suite instead of TAP tests? e.g. AFAIK
> you won't require a connection to test this case.


> 17. Missing test?
>
> IIUC there is a missing test scenario. You can add another subscriber
> table TAB3 which *already* has generated cols (e.g. generating
> different values to the publisher) so then you can verify they are NOT
> overwritten, even when the 'include_generated_cols' is true.
>
> ==

Moved this test case to the Regression test.

Patch v6-0001 contains all the changes required. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJn6EiyAitJbbvkvVV2d45fV3Wjr2VmWFugm3RsbaU%2BRg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-06-19 Thread Shubham Khanna

> Hi Shubham, thanks for providing a patch.
> I have some comments for v6-0001.
>
> 1. create_subscription.sgml
> There is repetition of the same line.
>
> + 
> +  Specifies whether the generated columns present in the tables
> +  associated with the subscription should be replicated. If the
> +  subscriber-side column is also a generated column then this option
> +  has no effect; the replicated data will be ignored and the 
> subscriber
> +  column will be filled as normal with the subscriber-side computed 
> or
> +  default data.
> +  false.
> + 
> +
> + 
> +  This parameter can only be set true if
> copy_data is
> +  set to false. If the subscriber-side
> column is also a
> +  generated column then this option has no effect; the
> replicated data will
> +  be ignored and the subscriber column will be filled as
> normal with the
> +  subscriber-side computed or default data.
> + 
>
> ==
> 2. subscriptioncmds.c
>
> 2a. The macro name should be in uppercase. We can use a short name
> like 'SUBOPT_INCLUDE_GEN_COL'. Thought?
> +#define SUBOPT_include_generated_columns 0x0001
>
> 2b.Update macro name accordingly
> + if (IsSet(supported_opts, SUBOPT_include_generated_columns))
> + opts->include_generated_columns = false;
>
> 2c. Update macro name accordingly
> + else if (IsSet(supported_opts, SUBOPT_include_generated_columns) &&
> + strcmp(defel->defname, "include_generated_columns") == 0)
> + {
> + if (IsSet(opts->specified_opts, SUBOPT_include_generated_columns))
> + errorConflictingDefElem(defel, pstate);
> +
> + opts->specified_opts |= SUBOPT_include_generated_columns;
> + opts->include_generated_columns = defGetBoolean(defel);
> + }
>
> 2d. Update macro name accordingly
> +   SUBOPT_RUN_AS_OWNER | SUBOPT_FAILOVER | SUBOPT_ORIGIN |
> +   SUBOPT_include_generated_columns);
>
>
> ==
>
> 3. decoding_into_rel.out
>
> 3a. In comment, I think it should be "When 'include-generated-columns'
> = '1' the generated column 'b' values will be replicated"
> +-- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> +data
> +-
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
>
> 3b. In comment, I think it should be "When 'include-generated-columns'
> = '1' the generated column 'b' values will not be replicated"
> +-- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
> +  data
> +
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:4
> + table public.gencoltable: INSERT: a[integer]:5
> + table public.gencoltable: INSERT: a[integer]:6
> + COMMIT
> +(5 rows)
>
> =
>
> 4. Here names for both the tests are the same. I think we should use
> different names.
>
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'generated columns replicated to non-generated column on subscriber');
> +
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub3');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'generated columns replicated to non-generated column on subscriber');

All the comments are handled.

The attached Patch contains all the suggested changes.

Thanks and Regards,
Shubham Khanna.


v8-0001-Currently-generated-column-values-are-not-replica.patch
Description: Binary data

Re: Pgoutput not capturing the generated columns

2024-06-19 Thread Shubham Khanna

es('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> -- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> INSERT INTO gencoltable (a) VALUES (4), (5), (6);
> -- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
> to:
> -- When 'include-generated-columns' is not set
> SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> -- When 'include-generated-columns' = '1' the generated column 'b'
> values will not be replicated
> SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '1');
> -- When 'include-generated-columns' = '0' the generated column 'b'
> values will be replicated
> SELECT data FROM pg_logical_slot_peek_changes('regression_slot', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1',
> 'include-generated-columns', '0');
>
> 9) In commit message  the  option used is wrong
> include_generated_columns should actually be
> include-generated-columns:
> Usage from test_decoding plugin:
> SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1',
>   'include_generated_columns','1');

All the comments are handled.

Patch v8-0001 contains all the changes required. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8Rj%2BAi0CgtXiAga82bWpWB8fVcOWycNyJ_jqXm788v3R8rQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-06-19 Thread Shubham Khanna

; Not fixed as claimed. This field name ought to be plural.
>
> /subincludegencol/subincludegencols/
>
> ~~~
>
> 9.
>   char*origin; /* Only publish data originating from the
>   * specified origin */
> + bool includegencol; /* publish generated column data */
>  } Subscription;
>
> Not fixed as claimed. This field name ought to be plural.
>
> /includegencol/includegencols/
>
> ==
> src/test/subscription/t/031_column_list.pl
>
> 10.
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab2 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 2) STORED)"
> +);
> +
> +$node_publisher->safe_psql('postgres',
> + "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> + 10) STORED)"
> +);
> +
>  $node_subscriber->safe_psql('postgres',
>   "CREATE TABLE tab1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 22) STORED, c int)"
>  );
>
> +$node_subscriber->safe_psql('postgres',
> + "CREATE TABLE tab2 (a int PRIMARY KEY, b int)"
> +);
> +
> +$node_subscriber->safe_psql('postgres',
> + "CREATE TABLE tab3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> + 20) STORED)"
> +);
>
> IMO the test needs lots more comments to describe what it is doing:
>
> For example, the setup deliberately has made:
> * publisher-side tab2 has generated col 'b' but subscriber-side tab2
> has NON-gnerated col 'b'.
> * publisher-side tab3 has generated col 'b' but subscriber-side tab2
> has DIFFERENT COMPUTATION generated col 'b'.
>
> So it will be better to have comments to explain all this instead of
> having to figure it out.
>
> ~~~
>
> 11.
>  # data for initial sync
>
>  $node_publisher->safe_psql('postgres',
>   "INSERT INTO tab1 (a) VALUES (1), (2), (3)");
> +$node_publisher->safe_psql('postgres',
> + "INSERT INTO tab2 (a) VALUES (1), (2), (3)");
>
>  $node_publisher->safe_psql('postgres',
> - "CREATE PUBLICATION pub1 FOR ALL TABLES");
> + "CREATE PUBLICATION pub1 FOR TABLE tab1");
> +$node_publisher->safe_psql('postgres',
> + "CREATE PUBLICATION pub2 FOR TABLE tab2");
> +$node_publisher->safe_psql('postgres',
> + "CREATE PUBLICATION pub3 FOR TABLE tab3");
> +
>
> # Wait for initial sync of all subscriptions
> $node_subscriber->wait_for_subscription_sync;
>
> my $result = $node_subscriber->safe_psql('postgres', "SELECT a, b FROM tab1");
> is( $result, qq(1|22
> 2|44
> 3|66), 'generated columns initial sync');
>
> ~
>
> IMO (and for completeness) it would be better to INSERT data for all
> the tables and alsot to validate that tables tab2 and tab3 has zero
> rows replicated. Yes, I know there is 'copy_data=false', but it is
> just easier to see all the tables instead of guessing why some are
> omitted, and anyway this test case will be needed after the next patch
> implements the COPY support for gen-cols.
>
> ~~~
>
> 12.
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab2 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub2');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'generated columns replicated to non-generated column on subscriber');
> +
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub3');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'generated columns replicated to non-generated column on subscriber');
> +
>
> Here also I think there should be explicit comments about what these
> cases are testing, what results you are expecting, and why. The
> comments will look something like the message parameter of those
> safe_psql(...)
>
> e.g.
> # confirm generated columns ARE replicated when the subscriber-side
> column is not generated
>
> e.g.
> # confirm generated columns are NOT replicated when the
> subscriber-side column is also generated
>
> ==
>
> 99.
> Please also see my nitpicks attachment patch for various other
> cosmetic and docs problems, and apply theseif you agree:
> - documentation wording/rendering
> - wrong comments
> - spacing
> - etc.

All the comments are handled.

Patch v8-0001 contains all the changes required. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8Rj%2BAi0CgtXiAga82bWpWB8fVcOWycNyJ_jqXm788v3R8rQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-06-22 Thread Shubham Khanna

On Thu, Jun 20, 2024 at 9:03 AM Peter Smith  wrote:
>
> Hi, here are my review comments for v8-0001.
>
> ==
> Commit message.
>
> 1.
> It seems like the patch name was accidentally omitted, so it became a
> mess when it defaulted to the 1st paragraph of the commit message.
>
> ==
> contrib/test_decoding/test_decoding.c
>
> 2.
> + data->include_generated_columns = true;
>
> I previously posted a comment [1, #4] that this should default to
> false; IMO it is unintuitive for the test_decoding to have an
> *opposite* default behaviour compared to CREATE SUBSCRIPTION.
>
> ==
> doc/src/sgml/ref/create_subscription.sgml
>
> NITPICK - remove the inconsistent blank line in SGML
>
> ==
> src/backend/commands/subscriptioncmds.c
>
> 3.
> +#define SUBOPT_include_generated_columns 0x0001
>
> I previously posted a comment [1, #6] that this should be UPPERCASE,
> but it is not yet fixed.
>
> ==
> src/bin/psql/describe.c
>
> NITPICK - move and reword the bogus comment
>
> ~
>
> 4.
> + if (pset.sversion >= 17)
> + appendPQExpBuffer(&buf,
> + ", subincludegencols AS \"%s\"\n",
> + gettext_noop("include_generated_columns"));
>
> 4a.
> For consistency with every other parameter, that column title should
> be written in words "Include generated columns" (not
> "include_generated_columns").
>
> ~
>
> 4b.
> IMO this new column belongs with the other subscription parameter
> columns (e.g. put it ahead of the "Conninfo" column).
>
> ==
> src/test/subscription/t/011_generated.pl
>
> NITPICK - fixed a comment
>
> 5.
> IMO, it would be better for readability if all the matching CREATE
> TABLE for publisher and subscriber are kept together, instead of the
> current code which is creating all publisher tables and then creating
> all subscriber tables.
>
> ~~~
>
> 6.
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'confirm generated columns ARE replicated when the
> subscriber-side column is not generated');
> +
> ...
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'confirm generated columns are NOT replicated when the
> subscriber-side column is also generated');
> +
>
> 6a.
> These SELECT all need ORDER BY to protect against the SELECT *
> returning rows in some unexpected order.
>
> ~
>
> 6b.
> IMO there should be more comments here to explain how you can tell the
> column was NOT replicated. E.g. it is because the result value of 'b'
> is the subscriber-side computed value (which you made deliberately
> different to the publisher-side computed value).
>
> ==
>
> 99.
> Please also refer to the attached nitpicks top-up patch for minor
> cosmetic stuff.

All the comments are handled.

The attached Patch contains all the suggested changes.

Thanks and Regards,
Shubham Khanna.


v9-0001-Currently-generated-column-values-are-not-replica.patch
Description: Binary data

Re: Pgoutput not capturing the generated columns

2024-06-22 Thread Shubham Khanna

On Fri, Jun 21, 2024 at 8:23 AM Peter Smith  wrote:
>
> Hi Shubham, here are some more patch v8-0001 comments that I missed yesterday.
>
> ==
> src/test/subscription/t/011_generated.pl
>
> 1.
> Are the PRIMARY KEY qualifiers needed for the new tab2, tab3 tables? I
> don't think so.
>
> ~~~
>
> 2.
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab2");
> +is( $result, qq(4|8
> +5|10), 'confirm generated columns ARE replicated when the
> subscriber-side column is not generated');
> +
> +$node_publisher->safe_psql('postgres', "INSERT INTO tab3 VALUES (4), (5)");
> +
> +$node_publisher->wait_for_catchup('sub3');
> +
> +$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM tab3");
> +is( $result, qq(4|24
> +5|25), 'confirm generated columns are NOT replicated when the
> subscriber-side column is also generated');
> +
>
> It would be prudent to do explicit "SELECT a,b" instead of "SELECT *",
> for readability and to avoid any surprises.

Both the comments are handled.

Patch v9-0001 contains all the changes required. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8Rj%2B6kwOGmn5MsRaTmciJDxtvNsyszMoPXV62OGPGzkxrCg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-06-26 Thread Shubham Khanna

On Mon, Jun 24, 2024 at 8:21 AM Peter Smith  wrote:
>
> Hi, here are some patch v9-0001 comments.
>
> I saw Kuroda-san has already posted comments for this patch so there
> may be some duplication here.
>
> ==
> GENERAL
>
> 1.
> The later patches 0002 etc are checking to support only STORED
> gencols. But, doesn't that restriction belong in this patch 0001 so
> VIRTUAL columns are not decoded etc in the first place... (??)
>
> ~~~
>
> 2.
> The "Generated Columns" docs mentioned in my previous review comment
> [2] should be modified by this 0001 patch.
>
> ~~~
>
> 3.
> I think the "Message Format" page mentioned in my previous review
> comment [3] should be modified by this 0001 patch.
>
> ==
> Commit message
>
> 4.
> The patch name is still broken as previously mentioned [1, #1]
>
> ==
> doc/src/sgml/protocol.sgml
>
> 5.
> Should this docs be referring to STORED generated columns, instead of
> just generated columns?
>
> ==
> doc/src/sgml/ref/create_subscription.sgml
>
> 6.
> Should this be docs referring to STORED generated columns, instead of
> just generated columns?
>
> ==
> src/bin/pg_dump/pg_dump.c
>
> getSubscriptions:
> NITPICK - tabs
> NITPICK - patch removed a blank line it should not be touching
> NITPICK = patch altered indents it should not be touching
> NITPICK - a missing blank line that was previously present
>
> 7.
> + else
> + appendPQExpBufferStr(query,
> + " false AS subincludegencols,\n");
>
> There is an unwanted comma here.
>
> ~
>
> dumpSubscription
> NITPICK - patch altered indents it should not be touching
>
> ==
> src/bin/pg_dump/pg_dump.h
>
> NITPICK - unnecessary blank line
>
> ==
> src/bin/psql/describe.c
>
> describeSubscriptions
> NITPICK - bad indentation
>
> 8.
> In my previous review [1, #4b] I suggested this new column should be
> in a different order (e.g. adjacent to the other ones ahead of
> 'Conninfo'), but this is not yet addressed.
>
> ==
> src/test/subscription/t/011_generated.pl
>
> NITPICK - missing space in comment
> NITPICK - misleading "because" wording in the comment
>
> ==
>
> 99.
> See also my attached nitpicks diff, for cosmetic issues. Please apply
> whatever you agree with.
>
> ==
> [1] My v8-0001 review -
> https://www.postgresql.org/message-id/CAHut%2BPujrRQ63ju8P41tBkdjkQb4X9uEdLK_Wkauxum1MVUdfA%40mail.gmail.com
> [2] 
> https://www.postgresql.org/message-id/CAHut%2BPvsRWq9t2tEErt5ZWZCVpNFVZjfZ_owqfdjOhh4yXb_3Q%40mail.gmail.com
> [3] 
> https://www.postgresql.org/message-id/CAHut%2BPsHsT3V1wQ5uoH9ynbmWn4ZQqOe34X%2Bg37LSi7sgE_i2g%40mail.gmail.com

All the comments are handled.

I have attached the updated patch v11 here in [1]. See [1] for the
changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJpS_XDkR6OrsmMZtCBZNxeYoCdENhC0%3Dbe0rLmNvhiQw%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-07-05 Thread Shubham Khanna

On Tue, Jul 2, 2024 at 10:59 AM Peter Smith  wrote:
>
> Hi Shubham,
>
> As you can see, most of my recent review comments for patch 0001 are
> only cosmetic nitpicks. But, there is still one long-unanswered design
> question from a month ago [1, #G.2]
>
> A lot of the patch code of pgoutput.c and proto.c and logicalproto.h
> is related to the introduction and passing everywhere of new
> 'include_generated_columns' function parameters. These same functions
> are also always passing "BitMapSet *columns" representing the
> publication column list.
>
> My question was about whether we can't make use of the existing BMS
> parameter instead of introducing all the new API parameters.
>
> The idea might go something like this:
>
> * If 'include_generated_columns' option is specified true and if no
> column list was already specified then perhaps the relentry->columns
> can be used for a "dummy" column list that has everything including
> all the generated columns.
>
> * By doing this:
>  -- you may be able to avoid passing the extra
> 'include_gernated_columns' everywhere
>  -- you may be able to avoid checking for generated columns deeper in
> the code (since it is already checked up-front when building the
> column list BMS)
>
> ~~
>
> I'm not saying this design idea is guaranteed to work, but it might be
> worth considering, because if it does work then there is potential to
> make the current 0001 patch significantly shorter.
>
> ==
> [1] 
> https://www.postgresql.org/message-id/CAHut%2BPsuJfcaeg6zst%3D6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng%40mail.gmail.com

I have fixed this issue in the latest Patches.

Please refer to the updated v15 Patches here in [1]. See [1] for the
changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8Rj%2B%3Dhn--ALJQvzzu7meX3LuO3tJKppDS7eO1BGvNFYBAbg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-07-10 Thread Shubham Khanna

On Wed, Jul 10, 2024 at 4:22 AM Peter Smith  wrote:
>
> Hi Shubham/Shlok, I was thinking some more about the suggested new
> BitMapSet (BMS) idea of patch 0001 that changes the 'columns' meaning
> to include generated cols also where necessary.
>
> I feel it is a bit risky to change lots of code without being 100%
> confident it will still be in the final push. It's also going to make
> the reviewing job harder if stuff gets added and then later removed.
>
> IMO it might be better to revert all the patches (mostly 0001, but
> also parts of subsequent patches) to their pre-BMS-change ~v14* state.
> Then all the BMS "improvement" can be kept isolated in a new patch
> 0004.
>
> Some more reasons to split this off into a separate patch are:
>
> * The BMS change is essentially a redesign/cleanup of the code but is
> nothing to do with the actual *functionality* of the new "generated
> columns" feature.
>
> * Apart from the BMS change I think the rest of the patches are nearly
> stable now. So it might be good to get it all finished so the BMS
> change can be tackled separately.
>
> * By isolating the BMS change, then we will be able to see exactly
> what is the code cost/benefit (e.g. removal of redundant code versus
> adding new logic) which is part of the judgement to decide whether to
> do it this way or not.
>
> * By isolating the BMS change, then it makes it convenient for testing
> before/after in case there are any performance concerns
>
> * By isolating the BMS change, if some unexpected obstacle is
> encountered that makes it unfeasible then we can just throw away patch
> 0004 and everything else (patches 0001,0002,0003) will still be good
> to go.

As suggested, I have created  a separate patch for the Bitmapset(BMS)
idea of patch 0001 that changes the 'columns' meaning to include
generated cols also where necessary.
Please refer to the updated v17 Patches here in [1]. See [1] for the
changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJ0gAUd62PvBRXCPYy2oTNZWEY-Qe8cBNzQaJPVMZCeGA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-05-08 Thread Shubham Khanna

On Wed, May 8, 2024 at 11:39 AM Rajendra Kumar Dangwal
 wrote:
>
> Hi PG Hackers.
>
> We are interested in enhancing the functionality of the pgoutput plugin by 
> adding support for generated columns.
> Could you please guide us on the necessary steps to achieve this? 
> Additionally, do you have a platform for tracking such feature requests? Any 
> insights or assistance you can provide on this matter would be greatly 
> appreciated.

The attached patch has the changes to support capturing generated
column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
‘include_generated_columns’ option is specified, the generated column
information and generated column data also will be sent.

Usage from pgoutput plugin:
CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
(a * 2) STORED);
CREATE publication pub1 for all tables;
SELECT 'init' FROM pg_create_logical_replication_slot('slot1', 'pgoutput');
SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL,
'proto_version', '1', 'publication_names', 'pub1',
'include_generated_columns', 'true');

Usage from test_decoding plugin:
SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'test_decoding');
CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS
(a * 2) STORED);
INSERT INTO gencoltable (a) VALUES (1), (2), (3);
SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL,
'include-xids', '0', 'skip-empty-xacts', '1',
'include_generated_columns', '1');

Currently it is not supported as a subscription option because table
sync for the generated column is not possible as copy command does not
support getting data for the generated column. If this feature is
required we can remove this limitation from the copy command and then
add it as a subscription option later.
Thoughts?

Thanks and Regards,
Shubham Khanna.

v1-0001-Support-capturing-generated-column-data-using-pgo.patch
Description: Binary data

Re: Pgoutput not capturing the generated columns

2024-05-22 Thread Shubham Khanna

> Dear Shubham,
>
> Thanks for creating a patch! Here are high-level comments.

> 1.
> Please document the feature. If it is hard to describe, we should change the 
> API.

I have added the feature in the document.

> 4.
> Regarding the test_decoding plugin, it has already been able to decode the
> generated columns. So... as the first place, is the proposed option really 
> needed
> for the plugin? Why do you include it?
> If you anyway want to add the option, the default value should be on - which 
> keeps
> current behavior.

I have made the generated column options as true for test_decoding
plugin so by default we will send generated column data.

> 5.
> Assuming that the feature become usable used for logical replicaiton. Not 
> sure,
> should we change the protocol version at that time? Nodes prior than PG17 may
> not want receive values for generated columns. Can we control only by the 
> option?

I verified the backward compatibility test by using the generated
column option and it worked fine. I think there is no need to make any
further changes.

> 7.
>
> Some functions refer data->publish_generated_column many times. Can we store
> the value to a variable?
>
> Below comments are for test_decoding part, but they may be not needed.
>
> =
>
> a. pg_decode_startup()
>
> ```
> +else if (strcmp(elem->defname, "include_generated_columns") == 0)
> ```
>
> Other options for test_decoding do not have underscore. It should be
> "include-generated-columns".
>
> b. pg_decode_change()
>
> data->include_generated_columns is referred four times in the function.
> Can you store the value to a varibable?
>
>
> c. pg_decode_change()
>
> ```
> -true);
> +        true, data->include_generated_columns );
> ```
>
> Please remove the blank.

Fixed.
The attached v3 Patch has the changes for the same.

Thanks and Regards,
Shubham Khanna.


v3-0001-Support-generated-column-capturing-generated-colu.patch
Description: Binary data

Re: Pgoutput not capturing the generated columns

2024-06-03 Thread Shubham Khanna

> 'include_generated_columns', '1'

Added the other option values to demonstrate the difference in behaviour:

> 2b.
> I think you maybe need to include more some test combinations where
> there is and isn't a COLUMN LIST, because I am not 100% sure I
> understand the current logic/expectations for all combinations.
>
> e.g. When the generated column is in a column list but
> 'publish_generated_columns' is false then what should happen? etc.
> Also if there are any special rules then those should be mentioned in
> the commit message.

Test case is added and the same is mentioned in the documentation.

> ==
> src/backend/replication/logical/proto.c
>
> 3.
> For all the API changes the new parameter name should be plural.
>
> /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

> 4. logical_rep_write_tuple:
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
> - if (!column_in_column_list(att->attnum, columns))
> + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> + continue;
> +
> + if (att->attgenerated && !publish_generated_column)
>   continue;
> That code seems confusing. Shouldn't the logic be exactly as also in
> logicalrep_write_attrs()?
>
> e.g. Shouldn't they both look like this:
>
> if (att->attisdropped)
>   continue;
>
> if (att->attgenerated && !publish_generated_column)
>   continue;
>
> if (!column_in_column_list(att->attnum, columns))
>   continue;

Fixed.

> ==
> src/backend/replication/pgoutput/pgoutput.c
>
> 5.
>  static void send_relation_and_attrs(Relation relation, TransactionId xid,
>   LogicalDecodingContext *ctx,
> - Bitmapset *columns);
> + Bitmapset *columns,
> + bool publish_generated_column);
>
> Use plural. /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

> 6. parse_output_parameters
>
>   bool origin_option_given = false;
> + bool generate_column_option_given = false;
>
>   data->binary = false;
>   data->streaming = LOGICALREP_STREAM_OFF;
>   data->messages = false;
>   data->two_phase = false;
> + data->publish_generated_column = false;
>
> I think the 1st var should be 'include_generated_columns_option_given'
> for consistency with the name of the actual option that was given.

Updated the name to 'include_generated_columns_option_given'

> ==
> src/include/replication/logicalproto.h
>
> 7.
> (Same as a previous review comment)
>
> For all the API changes the new parameter name should be plural.
>
> /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

> ==
> src/include/replication/pgoutput.h
>
> 8.
>   bool publish_no_origin;
> + bool publish_generated_column;
>  } PGOutputData;
>
> /publish_generated_column/publish_generated_columns/

Updated the name to 'include_generated_columns'

The attached Patch contains the suggested changes.

Thanks and Regards,
Shubham Khanna.


v4-0001-Enable-support-for-include_generated_columns-opti.patch
Description: Binary data

Re: Pgoutput not capturing the generated columns

2024-06-03 Thread Shubham Khanna

say "ALTER
> is not allowed" (not "toggling is not allowed") and also the option
> name should be quoted as per the new guidelines for error messages.
>
> ==
> src/backend/replication/logical/proto.c

Fixed.

> 6. logicalrep_write_tuple
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> +
>
> Calling column_in_column_list() might be a more expensive operation
> than checking just generated columns flag so maybe reverse the order
> and check the generated columns first for a tiny performance gain.

Fixed.

> 7.
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> + continue;
>
> ditto #6

Fixed.

> 8. logicalrep_write_attrs
>
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> + continue;
> +
>
> ditto #6

Fixed.

> 9.
> - if (att->attisdropped || att->attgenerated)
> + if (att->attisdropped)
>   continue;
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> + continue;
>
> ditto #6
>
> ==
> src/include/catalog/pg_subscription.h

Fixed.

> 10. CATALOG
>
> + bool subgeneratedcolumn; /* True if generated colums must be published */
>
> /colums/columns/
>
> ==
> src/test/regress/sql/publication.sql

Fixed.

> 11.
> --- error: generated column "d" can't be in list
> +-- ok
>
>
> Maybe change "ok" to say like "ok: generated cols can be in the list too"

Fixed.

> 12.
> GENERAL - Missing CREATE SUBSCRIPTION test?
> GENERAL - Missing ALTER SUBSCRIPTION test?
>
> How come this patch adds a new CREATE SUBSCRIPTION option but does not
> seem to include any test case for that option in either the CREATE
> SUBSCRIPTION or ALTER SUBSCRIPTION regression tests?

Added the test cases for the same.

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-06-03 Thread Shubham Khanna

On Thu, May 23, 2024 at 10:56 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shubham,
>
> Thanks for updating the patch! I checked your patches briefly. Here are my 
> comments.
>
> 01. API
>
> Since the option for test_decoding is enabled by default, I think it should 
> be renamed.
> E.g., "skip-generated-columns" or something.

Let's keep the same name 'include_generated_columns' for both the cases.

> 02. ddl.sql
>
> ```
> +-- check include-generated-columns option with generated column
> +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 
> 2) STORED);
> +INSERT INTO gencoltable (a) VALUES (1), (2), (3);
> +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
> 'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', 
> '1');
> +data
> +-
> + BEGIN
> + table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
> + table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
> + table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
> + COMMIT
> +(5 rows)
> ```
>
> We should test non-default case, which the generated columns are not 
> generated.

Added the non-default case, which the generated columns are not generated.

> 03. ddl.sql
>
> Not sure new tests are in the correct place. Do we have to add new file and 
> move tests to it?
> Thought?

Added the new tests in the 'decoding_into_rel.out' file.

> 04. protocol.sgml
>
> Please keep the format of the sgml file.

Fixed.

> 05. protocol.sgml
>
> The option is implemented as the streaming option of pgoutput plugin, so they 
> should be
> located under "Logical Streaming Replication Parameters" section.

Fixed.

> 05. AlterSubscription
>
> ```
> +   if (IsSet(opts.specified_opts, 
> SUBOPT_GENERATED_COLUMN))
> +   {
> +   ereport(ERROR,
> +   
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +errmsg("toggling 
> generated_column option is not allowed.")));
> +   }
> ```
>
> If you don't want to support the option, you can remove 
> SUBOPT_GENERATED_COLUMN
> macro from the function. But can you clarify the reason why you do not want?

Fixed.

> 07. logicalrep_write_tuple
>
> ```
> -   if (!column_in_column_list(att->attnum, columns))
> +   if (!column_in_column_list(att->attnum, columns) && 
> !att->attgenerated)
> +   continue;
> +
> +   if (att->attgenerated && !publish_generated_column)
> continue;
> ```
>
> I think changes in v2 was reverted or wrongly merged.

Fixed.

> 08. test code
>
> Can you add tests that generated columns are replicated by the logical 
> replication?

Added the test cases.

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-06-03 Thread Shubham Khanna

On Thu, May 23, 2024 at 5:56 PM vignesh C  wrote:
>
> On Thu, 23 May 2024 at 09:19, Shubham Khanna
>  wrote:
> >
> > > Dear Shubham,
> > >
> > > Thanks for creating a patch! Here are high-level comments.
> >
> > > 1.
> > > Please document the feature. If it is hard to describe, we should change 
> > > the API.
> >
> > I have added the feature in the document.
> >
> > > 4.
> > > Regarding the test_decoding plugin, it has already been able to decode the
> > > generated columns. So... as the first place, is the proposed option 
> > > really needed
> > > for the plugin? Why do you include it?
> > > If you anyway want to add the option, the default value should be on - 
> > > which keeps
> > > current behavior.
> >
> > I have made the generated column options as true for test_decoding
> > plugin so by default we will send generated column data.
> >
> > > 5.
> > > Assuming that the feature become usable used for logical replicaiton. Not 
> > > sure,
> > > should we change the protocol version at that time? Nodes prior than PG17 
> > > may
> > > not want receive values for generated columns. Can we control only by the 
> > > option?
> >
> > I verified the backward compatibility test by using the generated
> > column option and it worked fine. I think there is no need to make any
> > further changes.
> >
> > > 7.
> > >
> > > Some functions refer data->publish_generated_column many times. Can we 
> > > store
> > > the value to a variable?
> > >
> > > Below comments are for test_decoding part, but they may be not needed.
> > >
> > > =
> > >
> > > a. pg_decode_startup()
> > >
> > > ```
> > > +else if (strcmp(elem->defname, "include_generated_columns") == 0)
> > > ```
> > >
> > > Other options for test_decoding do not have underscore. It should be
> > > "include-generated-columns".
> > >
> > > b. pg_decode_change()
> > >
> > > data->include_generated_columns is referred four times in the function.
> > > Can you store the value to a varibable?
> > >
> > >
> > > c. pg_decode_change()
> > >
> > > ```
> > > -true);
> > > +true, 
> > > data->include_generated_columns );
> > > ```
> > >
> > > Please remove the blank.
> >
> > Fixed.
> > The attached v3 Patch has the changes for the same.
>
> Few comments:
> 1) Since this is removed, tupdesc variable is not required anymore:
> +++ b/src/backend/catalog/pg_publication.c
> @@ -534,12 +534,6 @@ publication_translate_columns(Relation targetrel,
> List *columns,
> errmsg("cannot use system
> column \"%s\" in publication column list",
>colname));
>
> -   if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> -   ereport(ERROR,
> -
> errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> -   errmsg("cannot use generated
> column \"%s\" in publication column list",
> -  colname));

Fixed.

> 2) In test_decoding include_generated_columns option is used:
> +   else if (strcmp(elem->defname,
> "include_generated_columns") == 0)
> +   {
> +   if (elem->arg == NULL)
> +   continue;
> +   else if (!parse_bool(strVal(elem->arg),
> &data->include_generated_columns))
> +   ereport(ERROR,
> +
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +errmsg("could not
> parse value \"%s\" for parameter \"%s\"",
> +
> strVal(elem->arg), elem->defname)));
> +   }
>
> In subscription we have used generated_column, we can try to use the
> same option in both places:
> +   else if (IsSet(supported_opts, SUBOPT_GENERATED_COLUMN) &&
> +strcmp(defel->defname,
> "generated_column") == 0)
> +   {
> +   if (IsSet(opts->specified_opts,
> SUBOPT_GENERATED_COLUMN))
> +   errorConflictingDefElem(defel, pstate);
> +
> +   opts->specified_opts |= SUBOPT_GENERATED_COLUMN;
> +   opts->generated_column = defGetBoolean(defel);
> +   }

Will update the name to 'include_generated_columns' in the next
version of the Patch.

> 3) Tab completion can be added for create subscription to include
> generated_column option

Fixed.

> 4) There are few whitespace issues while applying the patch, check for
> git diff --check

Fixed.

> 5) Add few tests for the new option added

Added new test cases.

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-06-03 Thread Shubham Khanna

On Fri, May 24, 2024 at 8:26 AM Peter Smith  wrote:
>
> Hi,
>
> Here are some review comments for the patch v3-0001.
>
> I don't think v3 addressed any of my previous review comments for
> patches v1 and v2. [1][2]
>
> So the comments below are limited only to the new code (i.e. the v3
> versus v2 differences). Meanwhile, all my previous review comments may
> still apply.

Patch v4-0001 addresses the previous review comments for patches v1
and v2. [1][2]

> ==
> GENERAL
>
> The patch applied gives whitespace warnings:
>
> [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:150:
> trailing whitespace.
>
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:202:
> trailing whitespace.
>
> ../patches_misc/v3-0001-Support-generated-column-capturing-generated-colu.patch:730:
> trailing whitespace.
> warning: 3 lines add whitespace errors.

Fixed.

> ==
> contrib/test_decoding/test_decoding.c
>
> 1. pg_decode_change
>
>   MemoryContext old;
> + bool include_generated_columns;
> +
>
> I'm not really convinced this variable saves any code.

Fixed.

> ==
> doc/src/sgml/protocol.sgml
>
> 2.
> +
> +  class="parameter">include-generated-columns
> + 
> +
> +The include-generated-columns option controls whether
> generated columns should be included in the string representation of
> tuples during logical decoding in PostgreSQL. This allows users to
> customize the output format based on whether they want to include
> these columns or not.
> + 
> + 
> + 
>
> 2a.
> Something is not correct when this name has hyphens and all the nearby
> parameter names do not. Shouldn't it be all uppercase like the other
> boolean parameter?
>
> ~
>
> 2b.
> Text in the SGML file should be wrapped properly.
>
> ~
>
> 2c.
> IMO the comment can be more terse and it also needs to specify that it
> is a boolean type, and what is the default value if not passed.
>
> SUGGESTION
>
> INCLUDE_GENERATED_COLUMNS [ boolean ]
>
> If true, then generated columns should be included in the string
> representation of tuples during logical decoding in PostgreSQL. The
> default is false.

Fixed.

> ==
> src/backend/replication/logical/proto.c
>
> 3. logicalrep_write_tuple
>
> - if (!column_in_column_list(att->attnum, columns))
> + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated)
> + continue;
> +
> + if (att->attgenerated && !publish_generated_column)
>   continue;
>
> 3a.
> This code seems overcomplicated checking the same flag multiple times.
>
> SUGGESTION
> if (att->attgenerated)
> {
>   if (!publish_generated_column)
> continue;
> }
> else
> {
>   if (!column_in_column_list(att->attnum, columns))
> continue;
> }
>
> ~
>
> 3b.
> The same logic occurs several times in logicalrep_write_tuple

Fixed.

> 4. logicalrep_write_attrs
>
>   if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> + if (att->attgenerated && !publish_generated_column)
> + continue;
> +
>
> Shouldn't these code fragments (2x in this function) look the same as
> in logicalrep_write_tuple? See the above review comments.

Fixed.

> ==
> src/backend/replication/pgoutput/pgoutput.c
>
> 5. maybe_send_schema
>
>   TransactionId topxid = InvalidTransactionId;
> + bool publish_generated_column = data->publish_generated_column;
>
> I'm not convinced this saves any code, and anyway, it is not
> consistent with other fields in this function that are not extracted
> to another variable (e.g. data->streaming).

Fixed.

> 6. pgoutput_change
> -
> + bool publish_generated_column = data->publish_generated_column;
> +
>
> I'm not convinced this saves any code, and anyway, it is not
> consistent with other fields in this function that are not extracted
> to another variable (e.g. data->binary).

Fixed.

> ==
> [1] My v1 review -
> https://www.postgresql.org/message-id/CAHut+PsuJfcaeg6zst=6pe5uyjv_uxvrhu3ck7w2ahb1uqy...@mail.gmail.com
> [2] My v2 review -
> https://www.postgresql.org/message-id/CAHut%2BPv4RpOsUgkEaXDX%3DW2rhHAsJLiMWdUrUGZOcoRHuWj5%2BQ%40mail.gmail.com

Patch v4-0001 contains all the changes required. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJcOsk%3Dy%2BvJ3y%2BvXhzR9ZUzUEURvS_90hQW3MNfJ5di7A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-08-21 Thread Shubham Khanna

On Fri, Aug 16, 2024 at 2:47 PM vignesh C  wrote:
>
> On Fri, 16 Aug 2024 at 10:04, Shubham Khanna
>  wrote:
> >
> > On Thu, Aug 8, 2024 at 12:43 PM Peter Smith  wrote:
> > >
> > > Hi Shubham,
> > >
> > > I think the v25-0001 patch only half-fixes the problems reported in my
> > > v24-0001 review.
> > >
> > > ~
> > >
> > > Background (from the commit message):
> > > This commit enables support for the 'include_generated_columns' option
> > > in logical replication, allowing the transmission of generated column
> > > information and data alongside regular table changes.
> > >
> > > ~
> > >
> > > The broken TAP test scenario in question is replicating from a
> > > "not-generated" column to a "generated" column. As the generated
> > > column is not on the publishing side, IMO the
> > > 'include_generated_columns' option should have zero effect here.
> > >
> > > In other words, I expect this TAP test for 'include_generated_columns
> > > = true' case should also be failing, as I wrote already yesterday:
> > >
> > > +# FIXME
> > > +# Since there is no generated column on the publishing side this should 
> > > give
> > > +# the same result as the previous test. -- e.g. something like:
> > > +# ERROR:  logical replication target relation
> > > "public.tab_nogen_to_gen" is missing
> > > +# replicated column: "b"
> >
> > I have fixed the given comments. The attached v26-0001 Patch contains
> > the required changes.
>
> Few comments:
> 1) There's no need to pass include_generated_columns in this case; we
> can retrieve it from ctx->data instead:
> @@ -749,7 +764,7 @@ maybe_send_schema(LogicalDecodingContext *ctx,
>  static void
>  send_relation_and_attrs(Relation relation, TransactionId xid,
> LogicalDecodingContext *ctx,
> -   Bitmapset *columns)
> +   Bitmapset *columns,
> bool include_generated_columns)
>  {
> TupleDesc   desc = RelationGetDescr(relation);
> int i;
> @@ -766,7 +781,10 @@ send_relation_and_attrs(Relation relation,
> TransactionId xid,
>
> 2) Commit message:
> If the subscriber-side column is also a generated column then this option
> has no effect; the replicated data will be ignored and the subscriber
> column will be filled as normal with the subscriber-side computed or
> default data.
>
> An error will occur in this case, so the message should be updated 
> accordingly.
>
> 3) The current test is structured as follows: a) Create all required
> tables b) Insert data into tables c) Create publications d) Create
> subscriptions e) Perform inserts and verify
> This approach can make reviewing and maintenance somewhat challenging.
>
> Instead, could you modify it to: a) Create the required table for a
> single test b) Insert data for this test c) Create the publication for
> this test d) Create the subscriptions for this test e) Perform inserts
> and verify f) Clean up
>
> 4) We can maintain the test as a separate 0002 patch, as it may need a
> few rounds of review and final adjustments. Once it's fully completed,
> we can merge it back in.
>
> 5) Once we create and drop publication/subscriptions for individual
> tests, we won't need such extensive configuration; we should be able
> to run them with default values:
> +$node_publisher->append_conf(
> +   'postgresql.conf',
> +   "max_wal_senders = 20
> +max_replication_slots = 20");

Fixed all the given comments. The attached patches contain the
suggested changes.

Thanks and Regards,
Shubham Khanna.


v28-0002-Tap-tests-for-include-generated-columns.patch
Description: Binary data


v28-0001-Enable-support-for-include_generated_columns-opt.patch
Description: Binary data

Re: Pgoutput not capturing the generated columns

2024-08-21 Thread Shubham Khanna

On Mon, Aug 19, 2024 at 11:01 AM Peter Smith  wrote:
>
> Hi, Here are my review comments for v27-0001.
>
> ==
> contrib/test_decoding/expected/generated_columns.out
> contrib/test_decoding/sql/generated_columns.sql
>
> +-- By default, 'include-generated-columns' is enabled, so the values
> for the generated column 'b' will be replicated even if it is not
> explicitly specified.
>
> nit - The "default" is only like this for "test_decoding" (e.g., the
> CREATE SUBSCRIPTION option is the opposite), so let's make the comment
> clearer about that.
> nit - Use sentence case in the comments.

I have addressed all the comments in the v-28-0001 Patch. Please refer
to the updated v28-0001 Patch here in [1]. See [1] for the changes
added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjL7rkxk6qSroRPg5ZARWMdK2Nd4-QyYNeoc2vhBm3cdDg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-08-21 Thread Shubham Khanna

On Mon, Aug 19, 2024 at 12:40 PM Peter Smith  wrote:
>
> Hi Shubham, here are my review comments for the TAP tests patch v27-0002
>
> ==
> Commit message
>
> Tap tests for 'include-generated-columns'
>
> ~
>
> But, it's more than that-- these are the TAP tests for all
> combinations of replication related to generated columns. i.e. both
> with and without 'include_generated_columns' option enabled.
>
> ==
> src/test/subscription/t/011_generated.pl
>
> I was mistaken, thinking that the v27-0002 had already been refactored
> according to Vignesh's last review but it is not done yet, so I am not
> going to post detailed review comments until the restructuring is
> completed.
>
> ~
>
> OTOH, there are some problems I felt have crept into v26-0001 (TAP
> test is same as v27-0002), so maybe try to also take care of them (see
> below) in v28-0002.
>
> In no particular order:
>
> * I felt it is almost useless now to have the "combo" (
> "regress_pub_combo")  publication. It used to have many tables when
> you first created it but with every version posted it is publishing
> less and less so now there are only 2 tables in it. Better to have a
> specific publication for each table now and forget about "combos"
>
> * The "TEST tab_gen_to_gen initial sync" seems to be not even checking
> the table data. Why not? e.g. Even if you expect no data, you should
> test for it.
>
> * The "TEST tab_gen_to_gen replication" seems to be not even checking
> the table data. Why not?
>
> * Multiple XXX comments like "... it needs more study to determine if
> the above result was actually correct, or a PG17 bug..." should be
> removed. AFAIK we should well understand the expected results for all
> combinations by now.
>
> * The "TEST tab_order replication" is now getting an error saying
> , Now, that may now be the correct
> error for this situation, but in that case, then I think the test is
> not longer testing what it was intended to test (i.e. that column
> order does not matter) Probably the table definition needs
> adjusting to make sure we are testing whenwe want to test, and not
> just making some random scenario "PASS".
>
> * The test "# TEST tab_alter" expected empty result also seems
> unhelpful. It might be related to the previous bullet.

I have addressed all the comments in the v-28-0002 Patch. Please refer
to the updated v28-0002 Patch here in [1]. See [1] for the changes
added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjL7rkxk6qSroRPg5ZARWMdK2Nd4-QyYNeoc2vhBm3cdDg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-09-13 Thread Shubham Khanna

On Tue, Sep 10, 2024 at 2:51 AM Masahiko Sawada  wrote:
>
> On Mon, Sep 9, 2024 at 2:38 AM Shubham Khanna
>  wrote:
> >
> > On Thu, Aug 29, 2024 at 11:46 AM Amit Kapila  
> > wrote:
> > >
> > > On Thu, Aug 29, 2024 at 8:44 AM Masahiko Sawada  
> > > wrote:
> > > >
> > > > On Wed, Aug 28, 2024 at 1:06 AM Amit Kapila  
> > > > wrote:
> > > > >
> > > > > On Mon, May 20, 2024 at 1:49 PM Masahiko Sawada 
> > > > >  wrote:
> > > > > >
> > > > > > As Euler mentioned earlier, I think it's a decision not to replicate
> > > > > > generated columns because we don't know the target table on the
> > > > > > subscriber has the same expression and there could be locale issues
> > > > > > even if it looks the same. I can see that a benefit of this proposal
> > > > > > would be to save cost to compute generated column values if the user
> > > > > > wants the target table on the subscriber to have exactly the same 
> > > > > > data
> > > > > > as the publisher's one. Are there other benefits or use cases?
> > > > > >
> > > > >
> > > > > The cost is one but the other is the user may not want the data to be
> > > > > different based on volatile functions like timeofday()
> > > >
> > > > Shouldn't the generation expression be immutable?
> > > >
> > >
> > > Yes, I missed that point.
> > >
> > > > > or the table on
> > > > > subscriber won't have the column marked as generated.
> > > >
> > > > Yeah, it would be another use case.
> > > >
> > >
> > > Right, apart from that I am not aware of other use cases. If they
> > > have, I would request Euler or Rajendra to share any other use case.
> > >
> > > > >  Now, considering
> > > > > such use cases, is providing a subscription-level option a good idea
> > > > > as the patch is doing? I understand that this can serve the purpose
> > > > > but it could also lead to having the same behavior for all the tables
> > > > > in all the publications for a subscription which may or may not be
> > > > > what the user expects. This could lead to some performance overhead
> > > > > (due to always sending generated columns for all the tables) for cases
> > > > > where the user needs it only for a subset of tables.
> > > >
> > > > Yeah, it's a downside and I think it's less flexible. For example, if
> > > > users want to send both tables with generated columns and tables
> > > > without generated columns, they would have to create at least two
> > > > subscriptions.
> > > >
> > >
> > > Agreed and that would consume more resources.
> > >
> > > > Also, they would have to include a different set of
> > > > tables to two publications.
> > > >
> > > > >
> > > > > I think we should consider it as a table-level option while defining
> > > > > publication in some way. A few ideas could be: (a) We ask users to
> > > > > explicitly mention the generated column in the columns list while
> > > > > defining publication. This has a drawback such that users need to
> > > > > specify the column list even when all columns need to be replicated.
> > > > > (b) We can have some new syntax to indicate the same like: CREATE
> > > > > PUBLICATION pub1 FOR TABLE t1 INCLUDE GENERATED COLS, t2, t3, t4
> > > > > INCLUDE ..., t5;. I haven't analyzed the feasibility of this, so there
> > > > > could be some challenges but we can at least investigate it.
> > > >
> > > > I think we can create a publication for a single table, so what we can
> > > > do with this feature can be done also by the idea you described below.
> > > >
> > > > > Yet another idea is to keep this as a publication option
> > > > > (include_generated_columns or publish_generated_columns) similar to
> > > > > "publish_via_partition_root". Normally, "publish_via_partition_root"
> > > > > is used when tables on either side have different partitions
> > > > > hierarchies which is somewhat the case here.
> > > >
> > > > It sounds more useful to

Re: Pgoutput not capturing the generated columns

2024-07-16 Thread Shubham Khanna

On Mon, Jul 15, 2024 at 11:09 AM Peter Smith  wrote:
>
> Hi, I had a quick look at the patch v17-0004 which is the split-off
> new BMS logic.
>
> IIUC this 0004 is currently undergoing some refactoring and
> cleaning-up, so I won't comment much about it except to give the
> following observation below.
>
> ==
> src/backend/replication/logical/proto.c.
>
> I did not expect to see any code fragments that are still checking
> generated columns like below:
>
> logicalrep_write_tuple:
>
>   if (att->attgenerated)
>   {
> - if (!include_generated_columns)
> - continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   continue;
> ~
>
>   if (att->attgenerated)
>   {
> - if (!include_generated_columns)
> - continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   continue;
>
> ~~~
>
> logicalrep_write_attrs:
>
>   if (att->attgenerated)
>   {
> - if (!include_generated_columns)
> - continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   continue;
>
> ~
> if (att->attgenerated)
>   {
> - if (!include_generated_columns)
> - continue;
>
>   if (att->attgenerated != ATTRIBUTE_GENERATED_STORED)
>   continue;
> ~~~
>
>
> AFAIK, now checking support of generated columns will be done when the
> BMS 'columns' is assigned, so the continuation code will be handled
> like this:
>
> if (!column_in_column_list(att->attnum, columns))
>   continue;
>
> ==
>
> BTW there is a subtle but significant difference in this 0004 patch.
> IOW, we are introducing a difference between the list of published
> columns VERSUS a publication column list. So please make sure that all
> code comments are adjusted appropriately so they are not misleading by
> calling these "column lists" still.
>
> BEFORE: BMS 'columns'  means "columns of the column list" or NULL if
> there was no publication column list
> AFTER: BMS 'columns' means "columns to be replicated" or NULL if all
> columns are to be replicated

I have addressed all the comments in v19-0004 patch.
Please refer to the updated v19-0004 Patch here in [1]. See [1] for
the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8Rj%2BR0cj%3Dz1bTMAgQKQWx1EKvkMEnV9QsHGvOqTdnLUQi1A%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-08-06 Thread Shubham Khanna

xit code 1
> 2024-08-05 13:25:19.933 AEST [11095] LOG:  logical replication apply
> worker for subscription "sub1_nocopy" has started
> 2024-08-05 13:25:19.966 AEST [11095] ERROR:  logical replication
> target relation "public.tab_nogen_to_gen" is missing replicated
> column: "b"
> 2024-08-05 13:25:19.966 AEST [11095] CONTEXT:  processing remote data
> for replication origin "pg_16390" during message type "INSERT" in
> transaction 742, finished at 0/1967BB0
> 2024-08-05 13:25:19.968 AEST [20039] LOG:  background worker "logical
> replication apply worker" (PID 11095) exited with exit code 1
> 2024-08-05 13:25:24.917 AEST [11225] LOG:  logical replication apply
> worker for subscription "sub1_nocopy" has started
> 2024-08-05 13:25:24.926 AEST [11225] ERROR:  logical replication
> target relation "public.tab_nogen_to_gen" is missing replicated
> column: "b"
> 2024-08-05 13:25:24.926 AEST [11225] CONTEXT:  processing remote data
> for replication origin "pg_16390" during message type "INSERT" in
> transaction 742, finished at 0/1967BB0
> 2024-08-05 13:25:24.927 AEST [20039] LOG:  background worker "logical
> replication apply worker" (PID 11225) exited with exit code 1
>
This is an expected behaviour. The error message here is improvised.
This error is consistent and it is being handled in the 0002 patch.
Below are the logs for the same:
2024-08-07 10:47:45.977 IST [29756] LOG:  logical replication table
synchronization worker for subscription "sub1", table
"tab_nogen_to_gen" has started
2024-08-07 10:47:46.116 IST [29756] ERROR:  logical replication target
relation "public.tab_nogen_to_gen" has a generated column "b" but
corresponding column on source relation is not a generated column
0002 Patch needs to be applied to get rid of this error.

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-10-18 Thread Shubham Khanna

On Thu, Oct 17, 2024 at 3:59 PM vignesh C  wrote:
>
> On Wed, 16 Oct 2024 at 23:25, Shubham Khanna
>  wrote:
> >
> > On Wed, Oct 9, 2024 at 9:08 AM vignesh C  wrote:
> > >
> > > On Tue, 8 Oct 2024 at 11:37, Shubham Khanna  
> > > wrote:
> > > >
> > > > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith  
> > > > wrote:
> > > > >
> > > > > Hi Shubham, here are my review comments for v36-0001.
> > > > >
> > > > > ==
> > > > > 1. General  - merge patches
> > > > >
> > > > > It is long past due when patches 0001 and 0002 should've been merged.
> > > > > AFAIK the split was only because historically these parts had
> > > > > different authors. But, keeping them separated is not helpful anymore.
> > > > >
> > > > > ==
> > > > > src/backend/catalog/pg_publication.c
> > > > >
> > > > > 2.
> > > > >  Bitmapset *
> > > > > -pub_collist_validate(Relation targetrel, List *columns)
> > > > > +pub_collist_validate(Relation targetrel, List *columns, bool 
> > > > > pubgencols)
> > > > >
> > > > > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > > > > so it should also be removed.
> > > > >
> > > > > ==
> > > > > src/backend/replication/pgoutput/pgoutput.c
> > > > >
> > > > > 3.
> > > > >   /*
> > > > > - * If the publication is FOR ALL TABLES then it is treated the same 
> > > > > as
> > > > > - * if there are no column lists (even if other publications have a
> > > > > - * list).
> > > > > + * To handle cases where the publish_generated_columns option isn't
> > > > > + * specified for all tables in a publication, we must create a column
> > > > > + * list that excludes generated columns. So, the publisher will not
> > > > > + * replicate the generated columns.
> > > > >   */
> > > > > - if (!pub->alltables)
> > > > > + if (!(pub->alltables && pub->pubgencols))
> > > > >
> > > > > I still found that comment hard to understand. Does this mean to say
> > > > > something like:
> > > > >
> > > > > --
> > > > > Process potential column lists for the following cases:
> > > > >
> > > > > a. Any publication that is not FOR ALL TABLES.
> > > > >
> > > > > b. When the publication is FOR ALL TABLES and
> > > > > 'publish_generated_columns' is false.
> > > > > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > > > > so all columns will be replicated by default. However, if
> > > > > 'publish_generated_columns' is set to false, column lists must still
> > > > > be created to exclude any generated columns from being published
> > > > > --
> > > > >
> > > > > ==
> > > > > src/test/regress/sql/publication.sql
> > > > >
> > > > > 4.
> > > > > +SET client_min_messages = 'WARNING';
> > > > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) 
> > > > > STORED);
> > > > >
> > > > > AFAIK you don't need to keep changing 'client_min_messages',
> > > > > particularly now that you've removed the WARNING message that was
> > > > > previously emitted.
> > > > >
> > > > > ~
> > > > >
> > > > > 5.
> > > > > nit - minor comment changes.
> > > > >
> > > > > ==
> > > > > Please refer to the attachment which implements any nits from above.
> > > > >
> > > >
> > > > I have fixed all the given comments. Also, I have created a new 0003
> > > > patch for the TAP-Tests related to the '011_generated.pl' file. I am
> > > > planning to merge 0001 and 0003 patches once they will get fixed.
> > > > The attached patches contain the required changes.
> > >
> > > Few comments:
> > > 1) Since we are no longer throwing an error for generated columns, the
> > > function header comments also need to be up

Re: Pgoutput not capturing the generated columns

2024-10-16 Thread Shubham Khanna

On Wed, Oct 9, 2024 at 11:00 AM vignesh C  wrote:
>
> On Tue, 8 Oct 2024 at 11:37, Shubham Khanna  
> wrote:
> >
> > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith  wrote:
> > >
> > > Hi Shubham, here are my review comments for v36-0001.
> > >
> > > ==
> > > 1. General  - merge patches
> > >
> > > It is long past due when patches 0001 and 0002 should've been merged.
> > > AFAIK the split was only because historically these parts had
> > > different authors. But, keeping them separated is not helpful anymore.
> > >
> > > ==
> > > src/backend/catalog/pg_publication.c
> > >
> > > 2.
> > >  Bitmapset *
> > > -pub_collist_validate(Relation targetrel, List *columns)
> > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> > >
> > > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > > so it should also be removed.
> > >
> > > ==
> > > src/backend/replication/pgoutput/pgoutput.c
> > >
> > > 3.
> > >   /*
> > > - * If the publication is FOR ALL TABLES then it is treated the same as
> > > - * if there are no column lists (even if other publications have a
> > > - * list).
> > > + * To handle cases where the publish_generated_columns option isn't
> > > + * specified for all tables in a publication, we must create a column
> > > + * list that excludes generated columns. So, the publisher will not
> > > + * replicate the generated columns.
> > >   */
> > > - if (!pub->alltables)
> > > + if (!(pub->alltables && pub->pubgencols))
> > >
> > > I still found that comment hard to understand. Does this mean to say
> > > something like:
> > >
> > > --
> > > Process potential column lists for the following cases:
> > >
> > > a. Any publication that is not FOR ALL TABLES.
> > >
> > > b. When the publication is FOR ALL TABLES and
> > > 'publish_generated_columns' is false.
> > > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > > so all columns will be replicated by default. However, if
> > > 'publish_generated_columns' is set to false, column lists must still
> > > be created to exclude any generated columns from being published
> > > --
> > >
> > > ==
> > > src/test/regress/sql/publication.sql
> > >
> > > 4.
> > > +SET client_min_messages = 'WARNING';
> > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) 
> > > STORED);
> > >
> > > AFAIK you don't need to keep changing 'client_min_messages',
> > > particularly now that you've removed the WARNING message that was
> > > previously emitted.
> > >
> > > ~
> > >
> > > 5.
> > > nit - minor comment changes.
> > >
> > > ==
> > > Please refer to the attachment which implements any nits from above.
> > >
> >
> > I have fixed all the given comments. Also, I have created a new 0003
> > patch for the TAP-Tests related to the '011_generated.pl' file. I am
> > planning to merge 0001 and 0003 patches once they will get fixed.
> > The attached patches contain the required changes.
>
> There is inconsistency in replication when a generated column is
> specified in the column list. The generated column data is not
> replicated during initial sync whereas it is getting replicated during
> incremental sync:
> -- publisher
> CREATE TABLE t1(c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED)
> INSERT INTO t1 VALUES (1);
> CREATE PUBLICATION pub1 for table t1(c1, c2);
>
> --subscriber
> CREATE TABLE t1(c1 int, c2 int)
> CREATE SUBSCRIPTION sub1 connection 'dbname=postgres host=localhost
> port=5432' PUBLICATION pub1;
>
> -- Generate column data is not synced during initial sync
> postgres=# select * from t1;
>  c1 | c2
> +
>   1 |
> (1 row)
>
> -- publisher
> INSERT INTO t1 VALUES (2);
>
> -- Whereas generated column data is synced during incremental sync
> postgres=# select * from t1;
>  c1 | c2
> +
>   1 |
>   2 |  4
> (2 rows)
>

There was an issue for this scenario:
CREATE TABLE t1(c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED)
create publication pub1 for table t1(c1, c2)

In this case included_cols was getting set to NULL.
Changed it to get included_cols as it is instead of replacing with
NULL and changed the condition to:
if (server_version >= 18)
{
  remotegenlist[natt] = DatumGetBool(slot_getattr(slot, 5, &isnull));
  /*
   * If the column is generated and neither the generated column
   * option is specified nor it appears in the column list, we will
   * skip it.
   */
  if (remotegenlist[natt] && !has_pub_with_pubgencols && !included_cols)
  {
ExecClearTuple(slot);
continue;
  }
}

I will further think if there is a better solution for this.
Please refer to the updated v39 Patches here in [1]. See [1] for the
changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjLjb%2B98i5ZQUphivxdOZ3hSGLfq2SiWQetUvk8zGyAQwQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-10-16 Thread Shubham Khanna

On Wed, Oct 9, 2024 at 11:13 AM Peter Smith  wrote:
>
> Hi, here are my review comments for patch v37-0001.
>
> ==
> Commit message
>
> 1.
> Example usage of subscription option:
> CREATE PUBLICATION FOR TABLE tab_gencol WITH (publish_generated_columns
> = true);
>
> ~
>
> This is wrong -- it's not a "subscription option". Better to just say
> "Example usage:"
>
> ~~~
>
> 2.
> When 'copy_data' is true, during the initial sync, the data is replicated from
> the publisher to the subscriber using the COPY command. The normal COPY
> command does not copy generated columns, so when 'publish_generated_columns'
> is true...
>
> ~
>
> By only mentioning the "when ... is true" case this description does
> not cover the scenario when 'publish_generated_columns' is false when
> the publication column list has a generated column.
>
> ~~~
>
> 3.
> typo - /replication of generated column/replication of generated columns/
> typo - /filed/filled/
> typo - 'pg_publicataion' catalog
>
> ==
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
> 4.
> nit - missing word in a comment
>
> ~~~
>
> fetch_remote_table_info:
> 5.
> + appendStringInfo(&cmd,
>   "  FROM pg_catalog.pg_attribute a"
>   "  LEFT JOIN pg_catalog.pg_index i"
>   "   ON (i.indexrelid = pg_get_replica_identity_index(%u))"
>   " WHERE a.attnum > 0::pg_catalog.int2"
> - "   AND NOT a.attisdropped %s"
> + "   AND NOT a.attisdropped", lrel->remoteid);
> +
> + appendStringInfo(&cmd,
>   "   AND a.attrelid = %u"
>   " ORDER BY a.attnum",
> - lrel->remoteid,
> - (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 ?
> -   "AND a.attgenerated = ''" : ""),
>   lrel->remoteid);
>
> Version v37-0001 has removed a condition previously between these two
> appendStringInfo's. But, that now means there is no reason to keep
> these statements separated. These should be combined now to use one
> appendStringInfo.
>
> ~
>
> 6.
> + if (server_version >= 12)
> + remotegenlist[natt] = DatumGetBool(slot_getattr(slot, 5, &isnull));
> +
>
> Are you sure the version check for 12 is correct? IIUC, this 5
> matches the 'attgenerated' column, but the SQL for that was
> constructed using a different condition:
> if (server_version >= 18)
>   appendStringInfo(&cmd, ", a.attgenerated != ''");
>
> It is this 12 versus 18 difference that makes me suspicious of
> a potential mistake.
>
> ~~~
>
> 7.
> + /*
> + * If the column is generated and neither the generated column option
> + * is specified nor it appears in the column list, we will skip it.
> + */
> + if (remotegenlist[natt] && !has_pub_with_pubgencols &&
> + !bms_is_member(attnum, included_cols))
> + {
> + ExecClearTuple(slot);
> + continue;
> + }
>
> 7b.
> I am also suspicious about how this condition interacts with the other
> condition (shown below) that came earlier:
> /* If the column is not in the column list, skip it. */
> if (included_cols != NULL && !bms_is_member(attnum, included_cols))
>
> Something doesn't seem right. e.g. If we can only get here by passing
> the earlier condition, then it means we already know the generated
> condition was *not* a member of a column list in which case that
> should affect this new condition and the new comment too.
>
> ==
> src/backend/replication/pgoutput/pgoutput.c
>
> pgoutput_column_list_init:
>
> 8.
>   /*
> - * If the publication is FOR ALL TABLES then it is treated the same as
> - * if there are no column lists (even if other publications have a
> - * list).
> + * Process potential column lists for the following cases: a. Any
> + * publication that is not FOR ALL TABLES. b. When the publication is
> + * FOR ALL TABLES and 'publish_generated_columns' is false. FOR ALL
> + * TABLES publication doesn't have user-defined column lists, so all
> + * columns will be replicated by default. However, if
> + * 'publish_generated_columns' is set to false, column lists must
> + * still be created to exclude any generated columns from being
> + * published.
>   */
>
> nit - please reformat this comment so the bullets are readable
>

I have fixed all the comments and posted the v39 patches for them.
Please refer to the updated v39 Patches here in [1]. See [1] for the
changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjLjb%2B98i5ZQUphivxdOZ3hSGLfq2SiWQetUvk8zGyAQwQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-10-16 Thread Shubham Khanna

On Wed, Oct 9, 2024 at 11:52 AM vignesh C  wrote:
>
> On Tue, 8 Oct 2024 at 11:37, Shubham Khanna  
> wrote:
> >
> > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith  wrote:
> > >
> > > Hi Shubham, here are my review comments for v36-0001.
> > >
> > > ==
> > > 1. General  - merge patches
> > >
> > > It is long past due when patches 0001 and 0002 should've been merged.
> > > AFAIK the split was only because historically these parts had
> > > different authors. But, keeping them separated is not helpful anymore.
> > >
> > > ==
> > > src/backend/catalog/pg_publication.c
> > >
> > > 2.
> > >  Bitmapset *
> > > -pub_collist_validate(Relation targetrel, List *columns)
> > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols)
> > >
> > > Since you removed the WARNING, this parameter 'pubgencols' is unused
> > > so it should also be removed.
> > >
> > > ==
> > > src/backend/replication/pgoutput/pgoutput.c
> > >
> > > 3.
> > >   /*
> > > - * If the publication is FOR ALL TABLES then it is treated the same as
> > > - * if there are no column lists (even if other publications have a
> > > - * list).
> > > + * To handle cases where the publish_generated_columns option isn't
> > > + * specified for all tables in a publication, we must create a column
> > > + * list that excludes generated columns. So, the publisher will not
> > > + * replicate the generated columns.
> > >   */
> > > - if (!pub->alltables)
> > > + if (!(pub->alltables && pub->pubgencols))
> > >
> > > I still found that comment hard to understand. Does this mean to say
> > > something like:
> > >
> > > --
> > > Process potential column lists for the following cases:
> > >
> > > a. Any publication that is not FOR ALL TABLES.
> > >
> > > b. When the publication is FOR ALL TABLES and
> > > 'publish_generated_columns' is false.
> > > A FOR ALL TABLES publication doesn't have user-defined column lists,
> > > so all columns will be replicated by default. However, if
> > > 'publish_generated_columns' is set to false, column lists must still
> > > be created to exclude any generated columns from being published
> > > --
> > >
> > > ==
> > > src/test/regress/sql/publication.sql
> > >
> > > 4.
> > > +SET client_min_messages = 'WARNING';
> > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) 
> > > STORED);
> > >
> > > AFAIK you don't need to keep changing 'client_min_messages',
> > > particularly now that you've removed the WARNING message that was
> > > previously emitted.
> > >
> > > ~
> > >
> > > 5.
> > > nit - minor comment changes.
> > >
> > > ==
> > > Please refer to the attachment which implements any nits from above.
> > >
> >
> > I have fixed all the given comments. Also, I have created a new 0003
> > patch for the TAP-Tests related to the '011_generated.pl' file. I am
> > planning to merge 0001 and 0003 patches once they will get fixed.
> > The attached patches contain the required changes.
>
> Few comments:
> 1) I felt this change need not be part of this patch, if required it
> can be proposed as a separate patch:
> +   if (server_version >= 15)
> {
> WalRcvExecResult *pubres;
> TupleTableSlot *tslot;
> Oid attrsRow[] = {INT2VECTOROID};
> -   StringInfoData pub_names;
> -
> -   initStringInfo(&pub_names);
> -   foreach(lc, MySubscription->publications)
> -   {
> -   if (foreach_current_index(lc) > 0)
> -   appendStringInfoString(&pub_names, ", ");
> -   appendStringInfoString(&pub_names,
> quote_literal_cstr(strVal(lfirst(lc;
> -   }
> +   StringInfo  pub_names = makeStringInfo();
>
> 2) These two statements can be combined in to single appendStringInfo:
> +   appendStringInfo(&cmd,
>  "  FROM pg_catalog.pg_attribute a"
>  "  LEFT JOIN pg_catalog.pg_index

Re: Pgoutput not capturing the generated columns

2024-10-16 Thread Shubham Khanna

, qq(1|1||2),
> + 'gen_to_nogen initial sync, when publish_generated_columns=false');
> +
> +$result = $node_subscriber->safe_psql('postgres',
> + "SELECT * FROM gen_to_nogen2 ORDER BY c");
> +is($result, qq(1|1||),
> + 'gen_to_nogen2 initial sync, when publish_generated_columns=false');
>
> IMO all the "result" queries like these ones ought to have to have a
> comment which explains the reason for the expected results. This
> review comment applies to multiple places. Please add comments to all
> of them.
>
> ~~~
>
> 7.
> +# --
> +# Testcase: Publisher replicates the column list data excluding generated
> +# columns even though publish_generated_columns option is false.
> +# --
> +
>
> 7a.
> This is the 2nd test case, but AFAICT it would be far easier to test
> this scenario just by making another table (with an appropriate column
> list) for the 1st test case.
>
> ~
>
> 7b.
> BTW, I don't understand this test at all. I thought according to the
> comment that it intended to use a publication column list with only
> normal columns in it. But that is not what the publication looks like
> here:
> + CREATE PUBLICATION pub1 FOR table nogen_to_gen, nogen_to_gen2(gen1)
> WITH (publish_generated_columns=false);
>
> Indeed, the way it is currently written I didn't see what this test is
> doing that is any different from the prior test (???)
>
> ~~~
>
> 8.
> +# --
> +# Testcase: Although publish_generated_columns is true, publisher publishes
> +# only the data of the columns specified in column list, skipping other
> +# generated/non-generated columns.
> +# --
>
> versus
>
> +# --
> +# Testcase: Publisher publishes only the data of the columns specified in
> +# column list skipping other generated/non-generated columns.
> +# --
>
> Again, I did not understand how these test cases differ from each
> other. Surely, those can be combined easily enough just by adding
> another table with a different kind of column list.
>
> ~~~
>
> 9.
> +# ------
> +# Testcase: Publisher replicates all columns if publish_generated_columns is
> +# enabled and there is no column list
> +# --
> +
>
> Here is yet another test case that AFAICT can just be combined with
> other test cases that were using publish_generated_columns=true. It
> seems all you need is one extra table with no column list. You don't
> need all the extra create/drop pub/sub overheads to test this.
>
> ==

I have fixed all the comments and posted the v39 patches for them.
Please refer to the updated v39 Patches here in [1]. See [1] for the
changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjLjb%2B98i5ZQUphivxdOZ3hSGLfq2SiWQetUvk8zGyAQwQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-10-07 Thread Shubham Khanna

On Fri, Oct 4, 2024 at 9:43 AM Peter Smith  wrote:
>
> Hi Shubham, I don't have any new comments for the patch v36-0002.
>
> But, according to my records, there are multiple old comments not yet
> addressed for this patch. I am giving reminders for those below so
> they don't get accidentally overlooked. Please re-confirm and at the
> next posted version please respond individually to each of these to
> say if they are addressed or not.
>
> ==
>
> 1. General
> From review v31 [1] comment #1. Patches 0001 and 0002 should be merged.
>
> ==
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
>
> 2.
> From review v31 [1] comment #4. Make the detailed useful error message
> common if possible.
>
> ~~~

This comment is still open. Will fix this in next versions of patches.

>
> fetch_remote_table_info:
>
> 3.
> From review v31 [1] comment #5. I was not sure if this logic is
> sophisticated enough to handle the case when the same table has
> gencols but there are multiple subscribed publications and the
> 'publish_generated_columns' parameter differs. Is this scenario
> tested?
>
> ~
>
> 4.
> + * Get column lists for each relation, and check if any of the
> + * publications have the 'publish_generated_columns' parameter enabled.
>
> From review v32 [2] comment #1. This needs some careful testing. I was
> not sure if sufficient to just check the 'publish_generated_columns'
> flag. Now that "column lists take precedence" it is quite possible for
> all publications to say 'publish_generated_columns=false', but the
> publication can still publish gencols *anyway* if they are specified
> in a column list.
>
> ==
> [1] review v31 18/9 -
> https://www.postgresql.org/message-id/flat/CAHv8Rj%2BKOoh58Uf5k2MN-%3DA3VdV60kCVKCh5ftqYxgkdxFSkqg%40mail.gmail.com#f2f3b48080f96ea45e1410f5b1cd9735
> [2] review v32 24/9 -
> https://www.postgresql.org/message-id/CAHut%2BPu7EcK_JTgWS7GzeStHk6Asb1dmEzCJU2TJf%2BW1Zy30LQ%40mail.gmail.com
>

I have fixed the comments and posted the v37 patches for them. Please
refer to the updated v37 Patches here in [1]. See [1] for
the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8Rj%2BRnw%2B_SfSyyrvWL49AfJzx4O8YVvdU9gB%2BSQdt3%3DqF%2BA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-10-29 Thread Shubham Khanna

On Tue, Oct 29, 2024 at 3:18 PM vignesh C  wrote:
>
> On Tue, 29 Oct 2024 at 11:30, Peter Smith  wrote:
> >
> > Here are my review comments for patch v44-0002.
> >
> > ==
> > Commit message.
> >
> > 1.
> > The commit message is missing.
>
> This patch is now merged, so no change required.
>
> > ==
> > src/backend/replication/logical/tablesync.c
> >
> > fetch_remote_table_info:
> >
> > 2.
> > +fetch_remote_table_info(char *nspname, char *relname, LogicalRepRelation 
> > *lrel,
> > + List **qual, bool *remotegencolpresent)
> >
> > The name 'remotegencolpresent' sounds like it means a generated col is
> > present in the remote table, but don't we only care when it is being
> > published? So, would a better parameter name be more like
> > 'remote_gencol_published'?
>
> I have changed it to gencol_published based on Amit's suggestion at [1].
>
> > ~~~
> >
> > 3.
> > Would it be better to introduce a new human-readable variable like:
> > bool check_for_published_gencols = (server_version >= 18);
> >
> > because then you could use that instead of having the 18 check in
> > multiple places.
>
> I felt this is not required, so not making any change for this.
>
> > ~~~
> >
> > 4.
> > -   lengthof(attrRow), attrRow);
> > +   server_version >= 18 ? lengthof(attrRow) : lengthof(attrRow) -
> > 1, attrRow);
> >
> > If you wish, that length calculation could be written more concisely like:
> > lengthof(attrow) - (server_version >= 18 ? 0 : 1)
>
> I felt the current one is better, also Amit feels the same way as in
> [1]. Not making any change for this.
>
> > ~~~
> >
> > 5.
> > + if (server_version >= 18)
> > + *remotegencolpresent |= DatumGetBool(slot_getattr(slot, 5, &isnull));
> > +
> >
> > Should this also say Assert(!isnull)?
>
> Added an assert
>
> > ==
> > src/test/subscription/t/031_column_list.pl
> >
> > 6.
> > + qq(0|1),
> > + 'replication with generated columns in column list');
> >
> > Perhaps this message should be worded slightly differently, to
> > distinguish it from the "normal" replication message.
> >
> > /replication with generated columns in column list/initial replication
> > with generated columns in column list/
>
> Modified
>
> The v45 version patch attached at [2] has the changes for the same.
>
> [1] - 
> https://www.postgresql.org/message-id/CAA4eK1Lpzy3eqd2AOM%2BTXp80SFL1cCfX3cf9thjL-hJxn%2BAYGA%40mail.gmail.com
> [2] - 
> https://www.postgresql.org/message-id/CALDaNm1oc-%2Buav380Z1k6gCZY5GJn5ZYKRexwM%2BqqGiRinUS-Q%40mail.gmail.com
>

While performing the Backward Compatibility Test, I found that
'tablesync' is not working for the older versions i.e., from
version-12 till version-15.
I created 2 nodes ; PUBLISHER on old versions and SUBSCRIBER on HEAD +
v45 Patch for testing.
Following was done on the PUBLISHER node:
CREATE TABLE t1 (c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED);
INSERT INTO t1 (c1) VALUES (1), (2);
CREATE PUBLICATION pub1 for table t1;

Following  was done on the SUBSCRIBER node:
CREATE TABLE t1 (c1 int, c2 int);
CREATE SUBSCRIPTION sub1 CONNECTION 'dbname=postgres' PUBLICATION pub1;

Following Error occurs repeatedly in the Subscriber log files:
ERROR:  could not start initial contents copy for table "public.t1":
ERROR:  column "c2" is a generated column
DETAIL:  Generated columns cannot be used in COPY.

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-10-28 Thread Shubham Khanna

On Mon, Oct 28, 2024 at 8:47 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shubham,
>
> Thanks for updating the patch! I resumed reviewing the patch set.
> Here are only cosmetic comments as my rehabilitation.
>
> 01. getPublications()
>
> I feel we could follow the notation like getSubscriptions(), because number of
> parameters became larger. How do you feel like attached?
>

I will handle this comment in a later set of patches.

> 02. fetch_remote_table_info()
>
> ```
>   "SELECT DISTINCT"
> - "  (CASE WHEN (array_length(gpt.attrs, 1) = 
> c.relnatts)"
> - "   THEN NULL ELSE gpt.attrs END)"
> + "  (gpt.attrs)"
> ```
>
> I think no need to separate lines and add bracket. How about like below?
>
> ```
>  "SELECT DISTINCT gpt.attrs"
> ```
>
Fixed this.

The v44 version patches attached at [1] have the changes for the same.
[1] - 
https://www.postgresql.org/message-id/CAHv8RjLvr8ZxX-1TcaxrZns1nwgrVUTO_2jhDdOPys0WgrDyKQ%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-10-28 Thread Shubham Khanna

on
> "testpub_fortable"
>
> Hmm - looks like a wrong expected result to me.
>
> ~
>
>  -- error: duplicates not allowed in column list
>  ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, a);
> -ERROR:  duplicate column "a" in publication column list
> +ERROR:  relation "testpub_tbl5" is already member of publication
> "testpub_fortable"
>
> Hmm - looks like a wrong expected result to me.
>
> probably more like this...
>
> ==
> src/test/subscription/t/031_column_list.pl
>
> 6.
> +$node_subscriber->safe_psql(
> + 'postgres', qq(
> + CREATE TABLE test_gen (a int PRIMARY KEY, b int);
> +));
> +
> +$node_subscriber->safe_psql(
> + 'postgres', qq(
> + CREATE SUBSCRIPTION sub_gen CONNECTION '$publisher_connstr'
> PUBLICATION pub_gen;
> +));
>
> Should combine these.
>
> ~~~
>
> 7.
> +$node_publisher->wait_for_catchup('sub_gen');
> +
> +is( $node_subscriber->safe_psql(
> + 'postgres', "SELECT * FROM test_gen ORDER BY a"),
> + qq(1|2),
> + 'replication with generated columns in column list');
> +
>
> But, this is only testing normal replication. You should also include
> some initial table data so you can test that the initial table
> synchronization works too. Otherwise, I think current this patch has
> no proof that the initial 'copy_data' even works at all.
>

All the agreed comments have been addressed. Please find the attached
v44 Patches for the required changes.

Thanks and Regards,
Shubham Khanna.


v44-0001-Support-logical-replication-of-generated-columns.patch
Description: Binary data


v44-0002-Support-copy-generated-column-during-table-sync.patch
Description: Binary data

Re: Pgoutput not capturing the generated columns

2024-09-27 Thread Shubham Khanna

#x27;t clear to me how these tests are related to the patch.
> Shouldn't there instead be some ALTER tests for trying to modify the
> 'publish_generate_columns' parameter?
>
> ==
> src/test/regress/expected/publication.out
> src/test/regress/sql/publication.sql
>
> 14.
> --- error: generated column "d" can't be in list
> +-- ok: generated columns can be in the list too
>  ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d);
> -ERROR:  cannot use generated column "d" in publication column list
> +WARNING:  specified generated column "d" in publication column list
> for publication with publish_generated_columns as false
>
> I think these tests for the WARNING scenario need to be a bit more
> deliberate. This seems to have happened as a side-effect. For example,
> I was expecting more testing like:
>
> Comments about various combinations to say what you are doing and what
> you are expecting:
> - gencols in column list with publish_generated_columns=false, expecting 
> WARNING
> - gencols in column list with publish_generated_columns=true, NOT
> expecting WARNING
> - gencols in column list with publish_generated_columns=true, then
> ALTER PUBLICATION setting publication_generate_columns=false,
> expecting WARNING
> - NO gencols in column list with publish_generated_columns=false, then
> ALTER PUBLICATION to add gencols to column list, expecting WARNING
>
> ==
> src/test/subscription/t/031_column_list.pl
>
> 15.
> -# TEST: Generated and dropped columns are not considered for the column list.
> +# TEST: Dropped columns are not considered for the column list.
>  # So, the publication having a column list except for those columns and a
> -# publication without any column (aka all columns as part of the columns
> +# publication without any column list (aka all columns as part of the column
>  # list) are considered to have the same column list.
>  $node_publisher->safe_psql(
>   'postgres', qq(
>   CREATE TABLE test_mix_4 (a int PRIMARY KEY, b int, c int, d int
> GENERATED ALWAYS AS (a + 1) STORED);
>   ALTER TABLE test_mix_4 DROP COLUMN c;
>
> - CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 (a, b);
> - CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4;
> + CREATE PUBLICATION pub_mix_7 FOR TABLE test_mix_4 WITH
> (publish_generated_columns = true);
> + CREATE PUBLICATION pub_mix_8 FOR TABLE test_mix_4 WITH
> (publish_generated_columns = false);
>
> I felt the comment for this test ought to be saying something more
> about what you are doing with the 'publish_generated_columns'
> parameters and what behaviour it was expecting.
>
> ==
> Please refer to the attachment which addresses some of the nit
> comments mentioned above.
>
> ==
> [1] my review of v31-0001:
> https://www.postgresql.org/message-id/CAHut%2BPsv-neEP_ftvBUBahh%2BKCWw%2BqQMF9N3sGU3YHWPEzFH-Q%40mail.gmail.com
> [2] my review of v30-0001:
> https://www.postgresql.org/message-id/CAHut%2BPuaitgE4tu3nfaR%3DPCQEKjB%3DmpDtZ1aWkbwb%3DJZE8YvqQ%40mail.gmail.com
> [3] 
> https://www.postgresql.org/message-id/CALDaNm1c7xPBodHw6LKp9e8hvGVJHcKH%3DDHK0iXmZuXKPnxZ3Q%40mail.gmail.com
> [4] 
> https://www.postgresql.org/message-id/CAHut%2BPv45gB4cV%2BSSs6730Kb8urQyqjdZ9PBVgmpwqCycr1Ybg%40mail.gmail.com
>

I have addressed all the comments in the v34-0001 Patch. Please refer
to the updated v34-0001 Patch here in [1]. See [1] for the changes
added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJkUdYCdK_bL3yvEV%3DzKrA2dsnZYa1VMT2H5v0%2BqbaGbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-09-27 Thread Shubham Khanna

On Tue, Sep 24, 2024 at 7:08 AM Peter Smith  wrote:
>
> Hi. Here are my v32-0002 review comments:
>
> ==
> src/backend/replication/logical/tablesync.c
>
> 1. fetch_remote_table_info
>
>   /*
> - * Get column lists for each relation.
> + * Get column lists for each relation, and check if any of the
> + * publications have the 'publish_generated_columns' parameter enabled.
>
> I am not 100% sure about this logic anymore. Maybe it is OK, but it
> requires careful testing because with Amit's "column lists take
> precedence" it is now possible for the publication to say
> 'publish_generated_columns=false', but the publication can still
> publish gencols *anyway* if they were specified in a column list.
>
> ~~~
>

This comment is still open. Will fix this in the next version of patches.

> 2.
>   /*
>   * Fetch info about column lists for the relation (from all the
>   * publications).
>   */
> + StringInfo pub_names = makeStringInfo();
> +
> + get_publications_str(MySubscription->publications, pub_names, true);
>   resetStringInfo(&cmd);
>   appendStringInfo(&cmd,
> ~
>
> nit - The comment here seems misplaced.
>
> ~~~
>
> 3.
> + if (server_version >= 12)
> + {
> + has_pub_with_pubgencols = server_version >= 18 && 
> has_pub_with_pubgencols;
> +
> + if (!has_pub_with_pubgencols)
> + appendStringInfo(&cmd, " AND a.attgenerated = ''");
> + }
>
> My previous review comment about this [1 #10] was:
> Can the 'gencols_allowed' var be removed, and the condition just be
> replaced with if (!has_pub_with_pubgencols)? It seems equivalent
> unless I am mistaken.
>
> nit - So the current v32 code is not what I was expecting. What I
> meant was 'has_pub_with_pubgencols' can only be true if server_version
> >= 18, so I thought there was no reason to check it again. For
> reference, I've changed it to like I meant in the nitpicks attachment.
> Please see if that works the same.
>
> ==
> [1] my review of v31-0002.
> https://www.postgresql.org/message-id/CAHut%2BPusbhvPrL1uN1TKY%3DFd4zu3h63eDebZvsF%3Duy%2BLBKTwgA%40mail.gmail.com
>

I have addressed all the comments in the v34-0002 Patch. Please refer
to the updated v34-0002 Patch here in [1]. See [1] for the changes
added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJkUdYCdK_bL3yvEV%3DzKrA2dsnZYa1VMT2H5v0%2BqbaGbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-09-27 Thread Shubham Khanna

On Mon, Sep 23, 2024 at 6:19 PM vignesh C  wrote:
>
> On Fri, 20 Sept 2024 at 17:15, Shubham Khanna
>  wrote:
> >
> > On Wed, Sep 11, 2024 at 8:55 AM Peter Smith  wrote:
> >
> > I have fixed all the comments. The attached patches contain the desired 
> > changes.
> > Also the merging of 0001 and 0002 can be done once there are no
> > comments on the patch to help in reviewing.
>
> Few comments:
> 1) This commit  message seems wrong, currently irrespective of
> publish_generated_columns, the column specified in column list take
> preceedene:
> When 'publish_generated_columns' is false, generated columns are not
> replicated, even when present in a PUBLICATION col-list.
>
> 2) Since we have added pubgencols to pg_pubication.h we can specify
> "Bump catversion" in the commit message.
>
> 3) In create publication column list/publish_generated_columns
> documentation we should mention that if generated column is mentioned
> in column list, generated columns mentioned in column list will be
> replication irrespective of publish_generated_columns option.
>
> 4) This warning should be mentioned only if publish_generated_columns is 
> false:
> if (TupleDescAttr(tupdesc, attnum - 1)->attgenerated)
> -   ereport(ERROR,
> +   ereport(WARNING,
>
> errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> -   errmsg("cannot use generated
> column \"%s\" in publication column list",
> +   errmsg("specified generated
> column \"%s\" in publication column list for publication with
> publish_generated_columns as false",
>colname));
>
> 5) These tests are not required for this feature:
> +   'ALTER PUBLICATION pub5 ADD TABLE test_table WHERE (col1 > 0);' => {
> +   create_order => 51,
> +   create_sql =>
> + 'ALTER PUBLICATION pub5 ADD TABLE
> dump_test.test_table WHERE (col1 > 0);',
> +   regexp => qr/^
> +   \QALTER PUBLICATION pub5 ADD TABLE ONLY
> dump_test.test_table WHERE ((col1 > 0));\E
> +   /xm,
> +   like => { %full_runs, section_post_data => 1, },
> +   unlike => {
> +   exclude_dump_test_schema => 1,
> +   exclude_test_table => 1,
> +   },
> +   },
> +
> +   'ALTER PUBLICATION pub5 ADD TABLE test_second_table WHERE
> (col2 = \'test\');'
> + => {
> +   create_order => 52,
> +   create_sql =>
> + 'ALTER PUBLICATION pub5 ADD TABLE
> dump_test.test_second_table WHERE (col2 = \'test\');',
> +   regexp => qr/^
> +   \QALTER PUBLICATION pub5 ADD TABLE ONLY
> dump_test.test_second_table WHERE ((col2 = 'test'::text));\E
> +   /xm,
> +   like => { %full_runs, section_post_data => 1, },
> +   unlike => { exclude_dump_test_schema => 1, },
> + },
>

I have addressed all the comments in the v34-0001 Patch. Please refer
to the updated v34-0001 Patch here in [1]. See [1] for the changes
added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjJkUdYCdK_bL3yvEV%3DzKrA2dsnZYa1VMT2H5v0%2BqbaGbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-09-20 Thread Shubham Khanna

On Tue, Sep 17, 2024 at 1:14 PM Peter Smith  wrote:
>
> Here are some review comments for v31-0001 (for the docs only)
>
> There may be some overlap here with some comments already made for
> v30-0001 which are not yet addressed in v31-0001.
>
> ==
> Commit message
>
> 1.
> When introducing the 'publish_generated_columns' parameter, you must
> also say this is a PUBLICATION parameter.
>
> ~~~
>
> 2.
> With this enhancement, users can now include the 'include_generated_columns'
> option when querying logical replication slots using either the pgoutput
> plugin or the test_decoding plugin. This option, when set to 'true' or '1',
> instructs the replication system to include generated column information
> and data in the replication stream.
>
> ~
>
> The above is stale information because it still refers to the old name
> 'include_generated_columns', and to test_decoding which was already
> removed in this patch.
>
> ==
> doc/src/sgml/ddl.sgml
>
> 3.
> +  Generated columns may be skipped during logical replication
> according to the
> +  CREATE PUBLICATION option
> +   linkend="sql-createpublication-params-with-include-generated-columns">
> +  publish_generated_columns.
>
> 3a.
> nit - The linkend is based on the old name instead of the new name.
>
> 3b.
> nit - Better to call this a parameter instead of an option because
> that is what the CREATE PUBLICATION docs call it.
>
> ==
> doc/src/sgml/protocol.sgml
>
> 4.
> +
> + publish_generated_columns
> +  
> +   
> +Boolean option to enable generated columns. This option controls
> +whether generated columns should be included in the string
> +representation of tuples during logical decoding in PostgreSQL.
> +   
> +  
> +
> +
>
> Is this even needed anymore? Now that the implementation is using a
> PUBLICATION parameter, isn't everything determined just by that
> parameter? I don't see the reason why a protocol change is needed
> anymore. And, if there is no protocol change needed, then this
> documentation change is also not needed.
>
> 
>
> 5.
>   
> -  Next, the following message part appears for each column included in
> -  the publication (except generated columns):
> +  Next, the following message parts appear for each column included in
> +  the publication (generated columns are excluded unless the parameter
> +  
> +  publish_generated_columns specifies 
> otherwise):
>   
>
> Like the previous comment above, I think everything is now determined
> by the PUBLICATION parameter. So maybe this should just be referring
> to that instead.
>
> ==
> doc/src/sgml/ref/create_publication.sgml
>
> 6.
> +id="sql-createpublication-params-with-include-generated-columns">
> +publish_generated_columns
> (boolean)
> +
>
> nit - the ID is based on the old parameter name.
>
> ~
>
> 7.
> + 
> +  This option is only available for replicating generated
> column data from the publisher
> +  to a regular, non-generated column in the subscriber.
> + 
>
> IMO remove this paragraph. I really don't think you should be
> mentioning the subscriber here at all. AFAIK this parameter is only
> for determining if the generated column will be published or not. What
> happens at the other end (e.g. logic whether it gets ignored or not by
> the subscriber) is more like a matrix of behaviours that could be
> documented in the "Logical Replication" section. But not here.
>
> (I removed this in my nitpicks attachment)
>
> ~~~
>
> 8.
> + 
> + This parameter can only be set true if
> copy_data is
> + set to false.
> + 
>
> IMO remove this paragraph too. The user can create a PUBLICATION
> before a SUBSCRIPTION even exists so to say it "can only be set..." is
> not correct. Sure, your patch 0001 does not support the COPY of
> generated columns but if you want to document that then it should be
> documented in the CREATE SUBSCRIBER docs. But not here.
>
> (I removed this in my nitpicks attachment)
>
> TBH, it would be better if patches 0001 and 0002 were merged then you
> can avoid all this. IIUC they were only separate in the first place
> because 2 different people wrote them. It is not making reviews easier
> with them split.
>
> ==
>
> Please see the attachment which implements some of the nits above.
>

I have addressed all the comments in the v32-0001 Patch. Please refer
to the updated v32-0001 Patch here in [1]. See [1] for the changes
added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-09-20 Thread Shubham Khanna

On Tue, Sep 17, 2024 at 3:12 PM Peter Smith  wrote:
>
> Review comments for v31-0001.
>
> (I tried to give only new comments, but there might be some overlap
> with comments I previously made for v30-0001)
>
> ==
> src/backend/catalog/pg_publication.c
>
> 1.
> +
> + if (publish_generated_columns_given)
> + {
> + values[Anum_pg_publication_pubgencolumns - 1] =
> BoolGetDatum(publish_generated_columns);
> + replaces[Anum_pg_publication_pubgencolumns - 1] = true;
> + }
>
> nit - unnecessary whitespace above here.
>
> ==
> src/backend/replication/pgoutput/pgoutput.c
>
> 2. prepare_all_columns_bms
>
> + /* Iterate the cols until generated columns are found. */
> + cols = bms_add_member(cols, i + 1);
>
> How does the comment relate to the statement that follows it?
>
> ~~~
>
> 3.
> + * Skip generated column if pubgencolumns option was not
> + * specified.
>
> nit - /pubgencolumns option/publish_generated_columns parameter/
>
> ==
> src/bin/pg_dump/pg_dump.c
>
> 4.
> getPublications:
>
> nit - /i_pub_gencolumns/i_pubgencols/ (it's the same information but simpler)
>
> ==
> src/bin/pg_dump/pg_dump.h
>
> 5.
> + bool pubgencolumns;
>  } PublicationInfo;
>
> nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)
>
> ==
> vsrc/bin/psql/describe.c
>
> 6.
>   bool has_pubviaroot;
> + bool has_pubgencol;
>
> nit - /has_pubgencol/has_pubgencols/ (plural consistency)
>
> ==
> src/include/catalog/pg_publication.h
>
> 7.
> + /* true if generated columns data should be published */
> + bool pubgencolumns;
>  } FormData_pg_publication;
>
> nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)
>
> ~~~
>
> 8.
> + bool pubgencolumns;
>   PublicationActions pubactions;
>  } Publication;
>
> nit - /pubgencolumns/pubgencols/ (it's the same information but simpler)
>
> ==
> src/test/regress/sql/publication.sql
>
> 9.
> +-- Test the publication with or without 'PUBLISH_GENERATED_COLUMNS' parameter
> +SET client_min_messages = 'ERROR';
> +CREATE PUBLICATION pub1 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=1);
> +\dRp+ pub1
> +
> +CREATE PUBLICATION pub2 FOR ALL TABLES WITH (PUBLISH_GENERATED_COLUMNS=0);
> +\dRp+ pub2
>
> 9a.
> nit - Use lowercase for the parameters.
>
> ~
>
> 9b.
> nit - Fix the comment to say what the test is actually doing:
> "Test the publication 'publish_generated_columns' parameter enabled or 
> disabled"
>
> ==
> src/test/subscription/t/031_column_list.pl
>
> 10.
> Later I think you should add another test here to cover the scenario
> that I was discussing with Sawada-San -- e.g. when there are 2
> publications for the same table subscribed by just 1 subscription but
> having different values of the 'publish_generated_columns' for the
> publications.
>

I have addressed all the comments in the v32-0001 Patch. Please refer
to the updated v32-0001 Patch here in [1]. See [1] for the changes
added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-09-20 Thread Shubham Khanna

On Wed, Sep 18, 2024 at 8:58 AM Peter Smith  wrote:
>
> Hi, here are my review comments for patch v31-0002.
>
> ==
>
> 1. General.
>
> IMO patches 0001 and 0002 should be merged when next posted. IIUC the
> reason for the split was only because there were 2 different authors
> but that seems to be not relevant anymore.
>
> ==
> Commit message
>
> 2.
> When 'copy_data' is true, during the initial sync, the data is replicated from
> the publisher to the subscriber using the COPY command. The normal COPY
> command does not copy generated columns, so when 'publish_generated_columns'
> is true, we need to copy using the syntax:
> 'COPY (SELECT column_name FROM table_name) TO STDOUT'.
>
> ~
>
> 2a.
> Should clarify that 'copy_data' is a SUBSCRIPTION parameter.
>
> 2b.
> Should clarify that 'publish_generated_columns' is a PUBLICATION parameter.
>
> ==
> src/backend/replication/logical/tablesync.c
>
> make_copy_attnamelist:
>
> 3.
> - for (i = 0; i < rel->remoterel.natts; i++)
> + desc = RelationGetDescr(rel->localrel);
> + localgenlist = palloc0(rel->remoterel.natts * sizeof(bool));
>
> Each time I review this code I am tricked into thinking it is wrong to
> use rel->remoterel.natts here for the localgenlist. AFAICT the code is
> actually fine because you do not store *all* the subscriber gencols in
> 'localgenlist' -- you only store those with matching names on the
> publisher table. It might be good if you could add an explanatory
> comment about that to prevent any future doubts.
>
> ~~~
>
> 4.
> + if (!remotegenlist[remote_attnum])
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("logical replication target relation \"%s.%s\" has a
> generated column \"%s\" "
> + "but corresponding column on source relation is not a generated column",
> + rel->remoterel.nspname, rel->remoterel.relname, NameStr(attr->attname;
>
> This error message has lots of good information. OTOH, I think when
> copy_data=false the error would report the subscriber column just as
> "missing", which is maybe less helpful. Perhaps that other
> copy_data=false "missing" case can be improved to share the same error
> message that you have here.
>

This comment is still open. Will fix this in the next set of patches.

> ~~~
>
> fetch_remote_table_info:
>
> 5.
> IIUC, this logic needs to be more sophisticated to handle the case
> that was being discussed earlier with Sawada-san [1]. e.g. when the
> same table has gencols but there are multiple subscribed publications
> where the 'publish_generated_columns' parameter differs.
>
> Also, you'll need test cases for this scenario, because it is too
> difficult to judge correctness just by visual inspection of the code.
>
> 
>
> 6.
> nit - Change 'hasgencolpub' to 'has_pub_with_pubgencols' for
> readability, and initialize it to 'false' to make it easy to use
> later.
>
> ~~~
>
> 7.
> - * Get column lists for each relation.
> + * Get column lists for each relation and check if any of the publication
> + * has generated column option.
>
> and
>
> + /* Check if any of the publication has generated column option */
> + if (server_version >= 18)
>
> nit - tweak the comments to name the publication parameter properly.
>
> ~~~
>
> 8.
> foreach(lc, MySubscription->publications)
> {
> if (foreach_current_index(lc) > 0)
> appendStringInfoString(&pub_names, ", ");
> appendStringInfoString(&pub_names, quote_literal_cstr(strVal(lfirst(lc;
> }
>
> I know this is existing code, but shouldn't all this be done by using
> the purpose-built function 'get_publications_str'
>
> ~~~
>
> 9.
> + ereport(ERROR,
> + errcode(ERRCODE_CONNECTION_FAILURE),
> + errmsg("could not fetch gencolumns information from publication list: %s",
> +pub_names.data));
>
> and
>
> + errcode(ERRCODE_UNDEFINED_OBJECT),
> + errmsg("failed to fetch tuple for gencols from publication list: %s",
> +pub_names.data));
>
> nit - /gencolumns information/generated column publication
> information/ to make the errmsg more human-readable
>
> ~~~
>
> 10.
> + bool gencols_allowed = server_version >= 18 && hasgencolpub;
> +
> + if (!gencols_allowed)
> + appendStringInfo(&cmd, " AND a.attgenerated = ''");
>
> Can the 'gencols_allowed' var be removed, and the condition just be
> replaced with if (!has_pub_with_pubgencols)? It seems equivalent
> unless I am mistaken.
>
> ==
>
> Please refer to the attachment which implements some of the nits
> mentioned above.
>
> ==
> [1] 
> https://www.postgresql.org/message-id/CAD21AoBun9crSWaxteMqyu8A_zme2ppa2uJvLJSJC2E3DJxQVA%40mail.gmail.com
>

I have addressed the comments in the v32-0002 Patch. Please refer to
the updated v32-0002 Patch here in [1]. See [1] for the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8RjKkoaS1oMsFvPRFB9nPSVC5p_D4Kgq5XB9Y2B2xU7smbA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-10-03 Thread Shubham Khanna

On Mon, Sep 30, 2024 at 12:56 PM Peter Smith  wrote:
>
> Hi Shubham. Here are my review comment for patch v34-0002.
>
> ==
> doc/src/sgml/ref/create_publication.sgml
>
> 1.
> + 
> + This parameter can only be set true if
> copy_data is
> + set to false.
> + 
>
> Huh? AFAIK the patch implements COPY for generated columns, so why are
> you saying this limitation?
>
> ==

I have fixed this in the v36-0002 patch.

> src/backend/replication/logical/tablesync.c
>
> 2. reminder
>
> Previously (18/9) [1 #4] I wrote maybe that other copy_data=false
> "missing" case error can be improved to share the same error message
> that you have in make_copy_attnamelist. And you replied [2] it would
> be addressed in the next patchset, but that was at least 2 versions
> back and I don't see any change yet.
>
This comment is still open. Will fix this and post in the next version
of patches.

Please refer to the updated v36-0002 Patch here in [1]. See [1] for
the changes added.

[1] 
https://www.postgresql.org/message-id/CAHv8Rj%2B1RDd7AnJNzOJXk--zcbTtU3nys%3DZgU3ktB4e3DWbJgg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Pgoutput not capturing the generated columns

2024-10-25 Thread Shubham Khanna

On Fri, Oct 25, 2024 at 3:54 PM Amit Kapila  wrote:
>
> On Fri, Oct 25, 2024 at 12:07 PM Amit Kapila  wrote:
> >
> > On Thu, Oct 24, 2024 at 8:50 PM vignesh C  wrote:
> > >
> > > The v42 version patch attached at [1] has the changes for the same.
> > >
> >
> > Some more comments:
> >
>
> 1.
> +pgoutput_pubgencol_init(PGOutputData *data, List *publications,
> + RelationSyncEntry *entry)
>
> Can we name it as check_and_init_gencol? I don't know if it is a good
> idea to append a prefix pgoutput for local functions. It is primarily
> used for exposed functions from pgoutput.c. I see that in a few cases
> we do that for local functions as well but that is not a norm.
>
> A related point:
> + /* Initialize publish generated columns value */
> + pgoutput_pubgencol_init(data, rel_publications, entry);
>
> Accordingly change this comment to something like: "Check whether to
> publish to generated columns.".
>

Fixed.

> 2.
> +/*
> + * Returns true if the relation has column list associated with the
> + * publication, false if the relation has no column list associated with the
> + * publication.
> + */
> +bool
> +is_column_list_publication(Publication *pub, Oid relid)
> ...
> ...
>
> How about naming the above function as has_column_list_defined()?
> Also, you can write the above comment as: "Returns true if the
> relation has column list associated with the publication, false
> otherwise."
>

Fixed.

> 3.
> + /*
> + * The column list takes precedence over pubgencols, so skip checking
> + * column list publications.
> + */
> + if (is_column_list_publication(pub, entry->publish_as_relid))
>
> Let's change this comment to: "The column list takes precedence over
> publish_generated_columns option. Those will be checked later, see
> pgoutput_column_list_init."
>

Fixed.

The v43 version patch attached at [1] has the changes for the same.
[1] - 
https://www.postgresql.org/message-id/CAHv8RjJJJRzy83tG0nB90ivYcp7sFKTU%3D_BcQ-nUZ7VbHFwceA%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Improve the error message for logical replication of regular column to generated column.

2024-11-25 Thread Shubham Khanna

On Mon, Nov 25, 2024 at 8:50 AM Peter Smith  wrote:
>
> Hi Shubham,
>
> here are my review comments for patch v4-0001.
>
> ==
> src/backend/replication/logical/relation.c
>
> logicalrep_report_missing_and_gen_attrs:
>
> 1.
>  static void
> -logicalrep_report_missing_attrs(LogicalRepRelation *remoterel,
> - Bitmapset *missingatts)
> +logicalrep_report_missing_and_gen_attrs(LogicalRepRelation *remoterel,
> + Bitmapset *atts,
> + bool ismissing)
>
>
> Maybe the function should be called
> 'logicalrep_report_missing_or_gen_attrs' (not 'and')
>
> ~
>
> 2.
> - if (!bms_is_empty(missingatts))
> + if (!bms_is_empty(atts))
>
> I felt this should be an Assert because the code becomes easier to
> read if you check this before making the call in the first place. See
> my NITPICKS patch.
>
> ~
>
> 3.
> + if (attcnt == 1)
> + appendStringInfo(&attsbuf, _("\"%s\""),
>   remoterel->attnames[i]);
>   else
> - appendStringInfo(&missingattsbuf, _(", \"%s\""),
> + appendStringInfo(&attsbuf, _(", \"%s\""),
>   remoterel->attnames[i]);
>   }
>
> This code can be simplified (e.g. remove the 'else' etc if you just
> check > 1 instead). See my NITPICKS patch.
>
> SUGGESTION
> if (attcnt > 1)
>   appendStringInfo(&attsbuf, _(", "));
>
> appendStringInfo(&attsbuf, _("\"%s\""), remoterel->attnames[i]);
>
> ~~~
>
> logicalrep_rel_open:
>
> 4.
> + /*
> + * Include it in generatedattrs if publishing to a generated
> + * column.
> + */
> + if (attr->attgenerated)
> + generatedattrs = bms_add_member(generatedattrs, attnum);
>
> That comment can be simpler if indeed it is needed at all.
>
> SUGGESTION:
> /* Remember which subscriber columns are generated. */
>
> ~
>
> 5.
> As I reported above (#2), I think it is better to check for empty BMS
> in the caller because then the code is easier to read. Also, you need
> to comment on which of these 2 errors will take precedence because if
> there are simultaneous problems you are still only reporting one kind
> of error at a time.
>
> SUGGESTION:
> /*
>  * Report any missing or generated columns. Note, if there are both
>  * kinds then the 'missing' error takes precedence.
>  */
> if (!bms_is_empty(missingatts))
>   logicalrep_report_missing_and_gen_attrs(remoterel, missingatts,
>   true);
> if (!bms_is_empty(generatedattrs))
>   logicalrep_report_missing_and_gen_attrs(remoterel, generatedattrs,
>   false);
>
> ==
> src/test/subscription/t/011_generated.pl
>
> 6.
> +# 
> =
> +# The following test cases exercise logical replication for the combinations
> +# where there is a generated column on one or both sides of pub/sub:
> +# - regular -> generated and generated -> generated
> +# - regular -> missing
> +# 
> =
>
>
> 6a.
> This comment is not quite right. You can't say "where there is a
> generated column on one or both sides of pub/sub" because that is not
> true for the "regular -> missing" case. See NITPICKS for a suggested
> comment.
>
> ~
>
> 6b.
> IMO you should also be testing the "generated -> missing" combination.
> You don't need more tests -- just more columns.
>
> ~
>
> 6c
> You also need to include a test where there are BOTH generated and
> missing to show the 'missing' error takes precedence. Again, you don't
> need more separate test cases to achieve this -- just need more
> columns in the tables.
>
> ~~~
>
> 7.
> +# --
> +# A "regular -> generated" and "generated -> generated" replication fails,
> +# reporting an error that the generated column on the subscriber side
> +# cannot be replicated.
>
> /and/or/
>
> ~~~
>
> 8.
> +# --
> +# A "regular -> missing" replication fails, reporting an error
> +# that the subscriber side is missing replicated columns.
> +#
> +# Testcase: regular -> missing
> +# Publisher table has regular columns 'c2' and 'c3'.
> +# Subscriber table is missing columns 'c2' and 'c3'.
> +# --
>
> I've also added the "generated -> missing" combination and addressed
> the review comment about intercluding a test where there are BOTH
> missing and generated columns, so you can see which error takes
> precedence. Please see the NITPICKS diff.
>

I have fixed the given comments. The attached Patch contains the
required changes.

Thanks and regards,
Shubham Khanna.


v5-0001-Error-message-improvement.patch
Description: Binary data

Re: Improve the error message for logical replication of regular column to generated column.

2024-11-18 Thread Shubham Khanna

On Sat, Nov 16, 2024 at 5:43 PM Shlok Kyal  wrote:
>
> On Fri, 15 Nov 2024 at 15:57, Shubham Khanna
>  wrote:
> >
> > On Thu, Nov 14, 2024 at 2:09 PM Peter Smith  wrote:
> > >
> > > Hi Shubham,
> > >
> > > +1 for the patch idea.
> > >
> > > Improving this error message for subscriber-side generated columns
> > > will help to remove some confusion.
> > >
> > > Here are my review comments for patch v1-0001.
> > >
> > > ==
> > > Commit message.
> > >
> > > 1.
> > > The error message was misleading, as it failed to clarify that the 
> > > replication
> > > of regular column on the publisher to the corresponding generated column 
> > > on
> > > the subscriber is not supported.
> > >
> > > This patch improves the error handling and reporting mechanism to make it 
> > > clear
> > > that the replication of regular column on the subscriber is not supported,
> > > resolving the misleading "missing column" error.
> > >
> > > ~
> > >
> > > It makes no difference whether the publishing table column is regular
> > > or generated, so you should not be implying that this has anything to
> > > do with the replication of just regular columns. AFAIK, the *only*
> > > thing that matters is that you cannot replicate into a subscriber-side
> > > generated column or a subscriber-side missing column.
> > >
> > > The current master reports replication into either a generated or a
> > > missing column as the same "missing replication column" error. IIUC,
> > > the errors were "correct", although clearly, for the generated column
> > > case the error was quite misleading.
> > >
> > > So, this patch is really *only* to improve the error wording when
> > > attempting to replicate into a subscriber-side generated column.
> > > That's what the commit message should be conveying.
> > >
> > > ==
> > > src/backend/replication/logical/relation.c
> > >
> > > logicalrep_rel_open:
> > >
> > > 2.
> > >   Bitmapset  *missingatts;
> > > + StringInfoData gencolsattsbuf;
> > > + int generatedatts = 0;
> > > +
> > > + initStringInfo(&gencolsattsbuf);
> > >
> > > The existing "missing columns" error is implemented by building a BMS
> > > and then passing it to the function 'logicalrep_report_missing_attrs'
> > > to report the error.
> > >
> > > IMO the generated column error is essentially the same, so should be
> > > implemented with almost identical logic -- i.e. you should build a
> > > 'generatedattrs' BMS of generated cols with matching names and (if
> > > that BMS is not empty) then pass that to a new function
> > > 'logicalrep_report_generated_attrs' (a sibling function to the
> > > existing one).
> > >
> > > ~~~
> > >
> > > 3.
> > > + /*
> > > + * Check if the subscription table generated column has
> > > + * same name as a non-generated column in the
> > > + * corresponding publication table.
> > > + */
> > >
> > > This (misplaced) comment talks about checking if the names are the
> > > same. But I don't see any name-checking logic here (???). Where is it?
> > >
> > > ~~~
> > >
> > > 4.
> > > + ereport(ERROR,
> > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > > + errmsg_plural("replicating to a target relation's generated column
> > > \"%s\" for \"%s.%s\" is not supported",
> > > +"replicating to a target relation's generated column \"%s\" for
> > > \"%s.%s\" is not supported",
> > > +generatedatts, gencolsattsbuf.data, remoterel->nspname,
> > > remoterel->relname)));
> > >
> > > There are no plural differences here. This smells like a cut/paste
> > > mistake from logicalrep_report_generated_attrs'.
> > >
> > > IMO this error should close match the existing "missing replication
> > > columns" error, and use the errmsg_plural correctly. In other words,
> > > it should look something more like this:
> > >
> > > ereport(ERROR,
> > >   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > >   errmsg_plural("cannot replicate to target relation

Re: Improve the error message for logical replication of regular column to generated column.

2024-11-15 Thread Shubham Khanna

On Fri, Nov 15, 2024 at 8:19 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shubham,
>
> Thanks for creating a patch! I checked yours and I have comments.
>
> 01.
> ```
> +   StringInfoData gencolsattsbuf;
> +   int generatedatts = 0;
> +
> +   initStringInfo(&gencolsattsbuf);
> ```
>
> gencolsattsbuf is initialized at the beginning but won't be free'd.
>
> But I prefer the Peter's suggestion - you can combine the reporting stuff to
> logicalrep_report_missing_attrs and rename the function. This is clearer than
> directly adding declarations and ereport() in logicalrep_rel_open().
>
> 02.
>
> ```
> +   /*
> +* Check if the subscription table 
> generated column has
> +* same name as a non-generated 
> column in the
> +* corresponding publication table.
> +*/
> ```
>
> I don't think this comment is correct. The error can be reported even when
> both publisher and subscriber has the generated column, right?
> Also, I feel comments can be located atop "if".
>
> 03.
> I feel if you combine the reporting stuff with 
> logicalrep_report_missing_attrs, some
> of changes are not needed anymore. You can just add comment in 
> logicalrep_rel_open
> and modify the message in logicalrep_report_missing_attrs.
>
>
> [1]: 
> https://www.postgresql.org/message-id/CAHut%2BPumbPEqk6v2XVjT7vKWKzQNBjMHXByWJ5%3DFmjEfk1v_pQ%40mail.gmail.com
>

I have fixed the given comments. The v2 version patch attached at [1]
has the changes for the same.
[1] - 
https://www.postgresql.org/message-id/CAHv8RjJfuLO7HK1P%3DhaY2stdGxYRAqrOwe6Ov4rzsprU63NQkg%40mail.gmail.com

Thanks and Regards,
Shubham Khanna.

Re: Improve the error message for logical replication of regular column to generated column.

2024-11-15 Thread Shubham Khanna

mments) can be much more sophisticated. AFAICT by cleverly arranging
> different publication table column types and different subscriber-side
> table column ordering I think you should be able to test multiple
> things at once.
>
> Such as
> - regular -> generated is detected
> - generated -> generated is detected
> - that the error only reports the generated column problems where the
> column names are matching, not others
>
> ~~~~
>
> 6.
> Also, as previously mentioned in internal reviews, this patch should
> include a 2nd test case to do pretty much the same testing but
> expecting to get a "missing replication column".
>
> The reasons to include this 2nd test are:
> a) The missing column was never tested properly before.
> b) This current patch has overlapping logic so you need to be assured
> that adding this new error doesn't break the existing one.
> c) Only one of these errors wins. Adding both tests will define the
> expected order if both error scenarios exist at the same time.
>

I have fixed the given comments. The attached Patch contains the
required changes.

Thanks and regards,
Shubham Khanna.
From 2886da4ad94bd2a4c0b99deaa21d57127b210a3e Mon Sep 17 00:00:00 2001
From: Shubham Khanna 
Date: Thu, 7 Nov 2024 15:54:19 +0530
Subject: [PATCH v2] Error-message-improvement

Currently, if logical replication attempts to target a subscriber-side table
column that is either missing or generated, it produces the following
identical error message:
ERROR: logical replication target relation \"%s.%s\" is missing
replicated columns: %s

While the error itself is valid, the message wording can be misleading for
generated columns. This patch introduces a distinct error message specifically
for the generated column scenario.
---
 src/backend/replication/logical/relation.c | 71 +---
 src/test/subscription/t/011_generated.pl   | 76 ++
 2 files changed, 124 insertions(+), 23 deletions(-)

diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c
index f5a0ef2bd9..276bf210bb 100644
--- a/src/backend/replication/logical/relation.c
+++ b/src/backend/replication/logical/relation.c
@@ -220,40 +220,52 @@ logicalrep_rel_att_by_name(LogicalRepRelation *remoterel, const char *attname)
 }
 
 /*
- * Report error with names of the missing local relation column(s), if any.
+ * Report error with names of the missing and generated local relation column(s), if any.
  */
 static void
-logicalrep_report_missing_attrs(LogicalRepRelation *remoterel,
-Bitmapset *missingatts)
+logicalrep_report_missing_and_gen_attrs(LogicalRepRelation *remoterel,
+		Bitmapset *atts,
+		bool ismissing)
 {
-	if (!bms_is_empty(missingatts))
+	if (!bms_is_empty(atts))
 	{
-		StringInfoData missingattsbuf;
-		int			missingattcnt = 0;
+		StringInfoData attsbuf;
+		int			attcnt = 0;
 		int			i;
 
-		initStringInfo(&missingattsbuf);
+		initStringInfo(&attsbuf);
 
 		i = -1;
-		while ((i = bms_next_member(missingatts, i)) >= 0)
+		while ((i = bms_next_member(atts, i)) >= 0)
 		{
-			missingattcnt++;
-			if (missingattcnt == 1)
-appendStringInfo(&missingattsbuf, _("\"%s\""),
+			attcnt++;
+			if (attcnt == 1)
+appendStringInfo(&attsbuf, _("\"%s\""),
  remoterel->attnames[i]);
 			else
-appendStringInfo(&missingattsbuf, _(", \"%s\""),
+appendStringInfo(&attsbuf, _(", \"%s\""),
  remoterel->attnames[i]);
 		}
 
-		ereport(ERROR,
-(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg_plural("logical replication target relation \"%s.%s\" is missing replicated column: %s",
-			   "logical replication target relation \"%s.%s\" is missing replicated columns: %s",
-			   missingattcnt,
-			   remoterel->nspname,
-			   remoterel->relname,
-			   missingattsbuf.data)));
+		if (ismissing)
+			ereport(ERROR,
+	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	 errmsg_plural("logical replication target relation \"%s.%s\" is missing replicated column: %s",
+   "logical replication target relation \"%s.%s\" is missing replicated columns: %s",
+   attcnt,
+   remoterel->nspname,
+   remoterel->relname,
+   attsbuf.data)));
+
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	 errmsg_plural("cannot replicate to target relation \"%s.%s\" generated column: %s",
+   "cannot replicate to target relation \"%s.%s\" generated columns: %s",
+   attcnt,
+   remoterel->nspname,
+   remoterel->relname,
+   attsbuf.data)));
 	}
 }
 
@@ -

Re: Improve the error message for logical replication of regular column to generated column.

2024-11-18 Thread Shubham Khanna

On Fri, Nov 15, 2024 at 7:07 PM vignesh C  wrote:
>
> On Fri, 15 Nov 2024 at 15:57, Shubham Khanna
>  wrote:
> >
> > I have fixed the given comments. The attached Patch contains the
> > required changes.
>
> Few comments:
> 1)
> a)You can mention that "If ismissing is true, report the error message
> as 'Missing replicated columns.' Otherwise, report the error message
> as 'Cannot replicate to generated column."
>  /*
> - * Report error with names of the missing local relation column(s), if any.
> + * Report error with names of the missing and generated local
> relation column(s), if any.
>   */
>
> b) You can keep the line within 80 chars in this case.
>
> 2) Spurious blank line:
> +   ereport(ERROR,
> +
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +errmsg_plural("logical
> replication target relation \"%s.%s\" is missing replicated column:
> %s",
> +
> "logical replication target relation \"%s.%s\" is missing replicated
> columns: %s",
> +  attcnt,
> +
> remoterel->nspname,
> +
> remoterel->relname,
> +
> attsbuf.data)));
> +
> +   else
> +   ereport(ERROR,
> +
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +errmsg_plural("cannot
> replicate to target relation \"%s.%s\" generated column: %s",
> +
> "cannot replicate to target relation \"%s.%s\" generated columns: %s",
> +  attcnt,
> +
> remoterel->nspname,
> +
> remoterel->relname,
> +
> attsbuf.data)));
>
> 3) This comment is not correct as the definition of
> generated(publisher) to generated(subscriber) can be same:
> +   /*
> +* Add to generatedattrs if names
> match but definitions
> +* differ.
> +*/
> +   if (attr->attgenerated)
> +   generatedattrs =
> bms_add_member(generatedattrs, i);
>
> 4)
> a) You can use "regular" instead of "normal":
> +# A "normal -> generated" and "generated -> generated" replication fails,
> +# reporting an error that the generated column on the subscriber side
> +# cannot be replicated.
> +#
> +# Test Case: normal -> generated and generated -> generated
> +# Publisher table has regular column 'c2' and generated column 'c3'.
> +# Subscriber table has generated columns 'c2' and 'c3'.
>
> b) similarly here too:
> +# ------
> +# A "normal -> missing" replication fails, reporting an error
> +# that the subscriber side is missing replicated columns.
> +#
> +# Testcase: normal -> missing
> +# Publisher table has normal columns 'c2' and 'c3'.
> +# Subscriber table is missing columns 'c2' and 'c3'.
> +# --
>

I have fixed the given comments. The attached Patch contains the
required changes.

Thanks and regards,
Shubham Khanna.


v3-0001-Error-message-improvement.patch
Description: Binary data

Re: Improve the error message for logical replication of regular column to generated column.

2024-11-18 Thread Shubham Khanna

On Mon, Nov 18, 2024 at 4:11 PM vignesh C  wrote:
>
> On Mon, 18 Nov 2024 at 15:47, Shubham Khanna
>  wrote:
> >
> > On Fri, Nov 15, 2024 at 7:07 PM vignesh C  wrote:
> >
> > I have fixed the given comments. The attached Patch contains the
> > required changes.
>
> Couple of minor comments:
> 1) Since the previous error is going to exit, this pfree is not required:
> +   else
> +   ereport(ERROR,
> +
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +errmsg_plural("cannot
> replicate to target relation \"%s.%s\" generated column: %s",
> +
> "cannot replicate to target relation \"%s.%s\" generated columns: %s",
> +  attcnt,
> +
> remoterel->nspname,
> +
> remoterel->relname,
> +
> attsbuf.data)));
> +
> +   pfree(attsbuf.data);
>
>
> 2) "You can add single-line comments such as 'Report missing columns'
> and 'Report replicating to generated columns.'"
> +   logicalrep_report_missing_and_gen_attrs(remoterel,
> generatedattrs,
> +
>  false);
> +   logicalrep_report_missing_and_gen_attrs(remoterel, 
> missingatts,
> +
>  true);
>

I have fixed the given comments. The attached Patch contains the
required changes.

Thanks and regards,
Shubham Khanna.


v4-0001-Error-message-improvement.patch
Description: Binary data

Improve the error message for logical replication of regular column to generated column.

2024-11-13 Thread Shubham Khanna

Hi all,

Recently there was an issue reported by Kuroda-san on a different
thread [1]. I have created this thread to discuss the issue
separately.

Currently, the ERROR message for the replication of a regular column
on the publisher node to a generated column on the subscriber node
is:-
ERROR: logical replication target relation \"%s.%s\" is missing
replicated columns: %s
For example:-
test_pub=# CREATE TABLE t1(c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED);
test_pub=# CREATE PUBLICATION pub1 for table t1(c1, c2, c3);
test_pub=# INSERT INTO t1 VALUES (1);

test_sub=# CREATE TABLE t1(c1 int, c2 int GENERATED ALWAYS AS (c1 + 2)
STORED, c3 int GENERATED ALWAYS AS (c1 + 2) STORED);
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr'
PUBLICATION pub1;
-> ERROR: logical replication target relation "t1" is missing
replicated column: "c2","c3"

The error message was misleading, as it failed to clarify that the
replication of a regular column on the publisher to the corresponding
generated column on the subscriber is not supported.
To avoid and solve the issue, we can update the ERROR message stating
that the replication of the generated column on the subscriber is not
supported. I have attached a patch for the same.

[1]: 
https://www.postgresql.org/message-id/TYCPR01MB5693AF061D62E55189490D2DF5562%40TYCPR01MB5693.jpnprd01.prod.outlook.com

Thanks and Regards,
Shubham Khanna.


v1-0001-Error-message-improvement.patch
Description: Binary data

Re: Improve the error message for logical replication of regular column to generated column.

2024-11-26 Thread Shubham Khanna

On Tue, Nov 26, 2024 at 5:45 AM Peter Smith  wrote:
>
> Hi Shubham,
>
> Here are my review comments for patch v5-0001.
>
> Please don't reply with a blanket "I have fixed the given comments"
> because it was not true. E.g., some of my previous comments are
> rejected in favour of Amit's better code suggestion, but then other
> comments seem not addressed for reasons unknown.
>
> ==
> Commit message.
>
> 1.
> Now that the errors for the 'missing' and 'generated' columns are
> separated, it means that if some subscriber table suffers both
> problems at the same time then only one of those errors can be
> reported. I think you should mention here that if that happens the
> missing column error takes precedence.
>
> ==
> src/backend/replication/logical/relation.c
>
> get_attrs_str:
>
> 2.
> + * Generates a comma-separated string of attribute names based on the 
> provided
> + * relation information and a bitmap indicating which attributes are 
> included.
> + *
> + * The result is a palloc'd string.
>
> "Generate"?
>
> I think you can simplify the function comment a bit (also mentioning
> the palloc'd string seemed overkill to me).
>
> SUGGESTION:
> Returns a comma-separated string of attribute names based on the
> provided relation and bitmap indicating which attributes to include.
>
> ~
>
> 3.
> +static char *
> +get_attrs_str(LogicalRepRelation *remoterel, Bitmapset *atts)
>
> All other static functions in this file have a common prefix
> 'logicalrep_', so it will be better for this to follow the same
> pattern.
>
> 
>
> logicalrep_report_missing_and_gen_attrs:
>
> 4.
> +/*
> + * If !bms_is_empty(missingatts), report the error message as 'Missing
> + * replicated columns.' Otherwise, report the error message as
> 'Cannot replicate
> + * to generated columns.'
> + */
>
> The function comment does not need to include code fragments or spell
> out the actual errorS because the code is self-explanatory. Anyway,
> the "Otherwise" here was not quite correct because the generated BMS
> is also checked for emptiness. Finally, I think here it is better to
> be explicit about the case when there are BOTH errors -- e.g. say that
> the 'missing' error wins.
>
> So the whole function comment can be simplified.
>
> SUGGESTION:
> /*
>  * If attempting to replicate to subscriber side missing columns or generated
>  * columns then report an error.
>  *
>  * (If there are both kinds of errors the 'missing' error takes precedence).
>  */
>
> ~
>
> 5.
> +static void
> +logicalrep_report_missing_and_gen_attrs(LogicalRepRelation *remoterel,
> + Bitmapset *missingatts,
> + Bitmapset *genatts)
>
> 5a.
> As I wrote in the previous review [1 - #1], because only one error can
> happen at a time, IMO this function name should be
> 'logicalrep_report_missing_or_gen_attrs' (e.g. 'or' not 'and').
>
> ~
>
> 5b.
> /genatts/generatedatts/  (that is what you called the BMS in the
> caller, so better to be consistent)
>
> ~
>
> logicalrep_rel_open:
>
> 6.
> + Bitmapset  *missingatts; /* Bitmapset for missing attributes. */
> + Bitmapset  *generatedattrs = NULL; /* Bitmapset for generated
> + * attributes. */
>
> Those comments don't achieve anything because they are just saying the
> same as the code. You might as well remove them.
>
> ~
>
> 7.
> + /*
> + * Report any missing and generated columns. Note, if there are both
> + * kinds then the 'missing' error takes precedence.
> + */
> + logicalrep_report_missing_and_gen_attrs(remoterel, missingatts,
> + generatedattrs);
>
> This comment can also be removed. The function name is already
> self-explanatory, and the information of the "Note" part belongs in
> the function comment.
>
> ==
> src/test/subscription/t/011_generated.pl
>
> The tests LGTM.
>
> ==
>
> Please refer to the attached diffs patch which includes most (but not
> all) of the suggestions mentioned above.
>
> ==
> [1] 
> https://www.postgresql.org/message-id/CAHut%2BPuoDsPUO1YDBOEWAsKT8dXA0PDoK6S_Yc6kO_s8yPKHfA%40mail.gmail.com
>

I have fixed the given comments. The attached Patch contains the
required changes.

Thanks and regards,
Shubham Khanna.


v6-0001-Error-message-improvement.patch
Description: Binary data

Adding a '--two-phase' option to 'pg_createsubscriber' utility.

2024-12-09 Thread Shubham Khanna

Hi all,

I am writing to propose the addition of the two_phase option in
pg_createsubscriber. As discussed in [1], supporting this feature
during the development of pg_createsubscriber was planned for this
version.
Currently, pg_createsubscriber creates subscriptions with the
two_phase option set to false. Enabling the two_phase option after a
subscription has been created is not straightforward, as it requires
the subscription to be disabled first. This patch aims to address this
limitation by incorporating the two_phase option into
pg_createsubscriber which will help create subscription with two_phase
option and make use of the advantages of creating subscription with
two_phase option.
The attached patch has the changes for the same.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2B7gRxiK3xNEH03CY3mMKxqi7BVz2k3VxxVZp8fgjHZxg%40mail.gmail.com

Thanks and regards,
Shubham Khanna.


v1-0001-Add-support-for-enabling-disabling-two-phase-comm.patch
Description: Binary data

Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

2024-12-29 Thread Shubham Khanna

On Mon, Dec 30, 2024 at 10:10 AM vignesh C  wrote:
>
> On Mon, 30 Dec 2024 at 09:34, Shubham Khanna
>  wrote:
> >
> > Hi,
> >
> > Currently, there is a risk that pg_createsubscriber may fail to
> > complete successfully when the max_slot_wal_keep_size value is set too
> > low. This can occur if the WAL is removed before the standby using the
> > replication slot is able to complete replication, as the required WAL
> > files are no longer available.
> >
> > I was able to reproduce this issue using the following steps:
> > Set up a streaming replication environment.
> > Run pg_createsubscriber in a debugger.
> > Pause pg_createsubscriber at the setup_recovery stage.
> > Perform several operations on the primary node to generate a large
> > volume of WAL, causing older WAL segments to be removed due to the low
> > max_slot_wal_keep_size setting.
> > Once the necessary WAL segments are deleted, continue the execution of
> > pg_createsubscriber.
> > At this point, pg_createsubscriber fails with the following error:
> > 2024-12-29 01:21:37.590 IST [427353] FATAL:  could not receive data
> > from WAL stream: ERROR:  requested WAL segment
> > 00010003 has already been removed
> > 2024-12-29 01:21:37.592 IST [427345] LOG:  waiting for WAL to become
> > available at 0/3000110
> > 2024-12-29 01:21:42.593 IST [427358] LOG:  started streaming WAL from
> > primary at 0/300 on timeline 1
> > 2024-12-29 01:21:42.593 IST [427358] FATAL:  could not receive data
> > from WAL stream: ERROR:  requested WAL segment
> > 00010003 has already been removed
> >
> > This issue was previously reported in [1], with a suggestion to raise
> > a warning in [2]. I’ve implemented a patch that logs a warning in
> > dry-run mode. This will give users the opportunity to adjust the
> > max_slot_wal_keep_size value before running the command.
> >
> > Thoughts?
>
> +1 for throwing a warning in dry-run mode
>
> Few comments:
> 1) We can have this check only in dry-run mode, it is not required in
> non dry-run mode as there is nothing much user can do once the tool is
> running, we can change this:
> +   if (max_slot_wal_keep_size != -1)
> +   {
> +   pg_log_warning("publisher requires
> 'max_slot_wal_keep_size = -1', but only %d remain",
> +  max_slot_wal_keep_size);
> +   pg_log_warning_detail("Change the
> 'max_slot_wal_keep_size' configuration on the publisher to -1.");
> +   }
>
> to:
> +   if (dry_run && max_slot_wal_keep_size != -1)
> +   {
> +   pg_log_warning("publisher requires
> 'max_slot_wal_keep_size = -1', but only %d remain",
> +  max_slot_wal_keep_size);
> +   pg_log_warning_detail("Change the
> 'max_slot_wal_keep_size' configuration on the publisher to -1.");
> +   }
>
> 2) This error message is not quite right, can we change it to
> "publisher requires max_slot_wal_keep_size to be -1, but is set to %d"
> +   if (max_slot_wal_keep_size != -1)
> +   {
> +   pg_log_warning("publisher requires
> 'max_slot_wal_keep_size = -1', but only %d remain",
> +  max_slot_wal_keep_size);
> +   pg_log_warning_detail("Change the
> 'max_slot_wal_keep_size' configuration on the publisher to -1.");
> +   }
>
> 3) Also the configuration could be specified in format specifier like
> it is specified in the earlier case
>

I have fixed the given comments. The attached patch contains the
suggested changes.

Thanks and regards,
Shubham Khanna.


v2-0001-Validate-max_slot_wal_keep_size-in-pg_createsubsc.patch
Description: Binary data

Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

2024-12-26 Thread Shubham Khanna

On Fri, Dec 27, 2024 at 11:30 AM vignesh C  wrote:
> > > > >
> The documentation requires a minor update: instead of specifying
> subscriptions, the user will specify multiple databases, and the
> subscription will be created on the specified databases. Documentation
> should be updated accordingly:
> +  
> +   Enables  linkend="sql-createsubscription-params-with-two-phase">two_phase
> +   commit for the subscription. If there are multiple subscriptions
> +   specified, this option applies to all of them.
> +   The default is false.
>

I have fixed the given comment. The attached patch contains the
suggested changes.

Thanks and regards,
Shubham Khanna.


v9-0001-Add-support-for-two-phase-commit-in-pg_createsubs.patch
Description: Binary data

Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size

2024-12-30 Thread Shubham Khanna

On Mon, Dec 30, 2024 at 4:03 PM vignesh C  wrote:
>
> On Mon, 30 Dec 2024 at 12:04, Shubham Khanna
>  wrote:
> >
> > On Mon, Dec 30, 2024 at 10:10 AM vignesh C  wrote:
> > >
> > > On Mon, 30 Dec 2024 at 09:34, Shubham Khanna
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > Currently, there is a risk that pg_createsubscriber may fail to
> > > > complete successfully when the max_slot_wal_keep_size value is set too
> > > > low. This can occur if the WAL is removed before the standby using the
> > > > replication slot is able to complete replication, as the required WAL
> > > > files are no longer available.
> > > >
> > > > I was able to reproduce this issue using the following steps:
> > > > Set up a streaming replication environment.
> > > > Run pg_createsubscriber in a debugger.
> > > > Pause pg_createsubscriber at the setup_recovery stage.
> > > > Perform several operations on the primary node to generate a large
> > > > volume of WAL, causing older WAL segments to be removed due to the low
> > > > max_slot_wal_keep_size setting.
> > > > Once the necessary WAL segments are deleted, continue the execution of
> > > > pg_createsubscriber.
> > > > At this point, pg_createsubscriber fails with the following error:
> > > > 2024-12-29 01:21:37.590 IST [427353] FATAL:  could not receive data
> > > > from WAL stream: ERROR:  requested WAL segment
> > > > 00010003 has already been removed
> > > > 2024-12-29 01:21:37.592 IST [427345] LOG:  waiting for WAL to become
> > > > available at 0/3000110
> > > > 2024-12-29 01:21:42.593 IST [427358] LOG:  started streaming WAL from
> > > > primary at 0/300 on timeline 1
> > > > 2024-12-29 01:21:42.593 IST [427358] FATAL:  could not receive data
> > > > from WAL stream: ERROR:  requested WAL segment
> > > > 00010003 has already been removed
> > > >
> > > > This issue was previously reported in [1], with a suggestion to raise
> > > > a warning in [2]. I’ve implemented a patch that logs a warning in
> > > > dry-run mode. This will give users the opportunity to adjust the
> > > > max_slot_wal_keep_size value before running the command.
> > > >
> > > > Thoughts?
> > >
> > > +1 for throwing a warning in dry-run mode
> > >
> > > Few comments:
> > > 1) We can have this check only in dry-run mode, it is not required in
> > > non dry-run mode as there is nothing much user can do once the tool is
> > > running, we can change this:
> > > +   if (max_slot_wal_keep_size != -1)
> > > +   {
> > > +   pg_log_warning("publisher requires
> > > 'max_slot_wal_keep_size = -1', but only %d remain",
> > > +  max_slot_wal_keep_size);
> > > +   pg_log_warning_detail("Change the
> > > 'max_slot_wal_keep_size' configuration on the publisher to -1.");
> > > +   }
> > >
> > > to:
> > > +   if (dry_run && max_slot_wal_keep_size != -1)
> > > +   {
> > > +   pg_log_warning("publisher requires
> > > 'max_slot_wal_keep_size = -1', but only %d remain",
> > > +  max_slot_wal_keep_size);
> > > +   pg_log_warning_detail("Change the
> > > 'max_slot_wal_keep_size' configuration on the publisher to -1.");
> > > +   }
> > >
> > > 2) This error message is not quite right, can we change it to
> > > "publisher requires max_slot_wal_keep_size to be -1, but is set to %d"
> > > +   if (max_slot_wal_keep_size != -1)
> > > +   {
> > > +   pg_log_warning("publisher requires
> > > 'max_slot_wal_keep_size = -1', but only %d remain",
> > > +  max_slot_wal_keep_size);
> > > +   pg_log_warning_detail("Change the
> > > 'max_slot_wal_keep_size' configuration on the publisher to -1.");
> > > +   }
> > >
> > > 3) Also the configuration could be specified in format specifier like
> > > it is specified in the earlier case
> > >
> >
> > I have fixed the given comments. The attached patch contains the
&g

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-04 Thread Shubham Khanna

On Tue, Feb 4, 2025 at 3:49 PM Ashutosh Bapat
 wrote:
>
> Hi Shubham,
>
> On Tue, Feb 4, 2025 at 2:10 PM Shubham Khanna
>  wrote:
> > > >
> > >
> > > It could be a bit tricky to find that for users but they can devise a
> > > query to get the names and numbers of databases matching the given
> > > pattern.  OTOH, I am not sure there is a clear need at this stage for
> > > pattern matching for this tool. So, we can go with a simple switch as
> > > you are proposing at this stage.
> > >
> >
> > After reconsidering the idea of supporting '--all-databases' switch is
> > the better approach at this stage, I have added the new switch in the
> > latest patch.
> > The attached patch contains the suggested changes.
>
> + If neither -d nor -a is
> +   specified, pg_createsubscriber will use
> +   --all-databases by default.
>
> As pointed upthread by me and Peter, using --all-databases by default
> is not a good behaviour.
>
> But the code doesn't behave like --all-databases by default. Looks
> like we need to fix the documentation.

Updated the documentation accordingly and added the current behaviour
of -a/--all-databases option.

> + /* Generate publication and slot names if not specified */
> + SimpleStringListCell *cell;
> +
> + fetch_all_databases(opt);
> +
> + cell = opt->database_names.head;
>
> We don't seem to check existence of publication and slot name
> specification as the comment indicates. Do we need to check that those
> names are not specified at all? and also mention in the documentation
> that those specifications are required when using -a/--all-databases
> option?
>

Added a check to verify that publication and slot names are not
manually specified when using -a/--all-databases option and updated
the documentation accordingly.

The attached patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.


v3-0001-Enhance-pg_createsubscriber-to-fetch-and-append-a.patch
Description: Binary data

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-04 Thread Shubham Khanna

On Thu, Jan 30, 2025 at 12:08 PM Amit Kapila  wrote:
>
> On Thu, Jan 30, 2025 at 6:21 AM Peter Smith  wrote:
> >
> > On Wed, Jan 29, 2025 at 4:44 PM Amit Kapila  wrote:
> > >
> > > On Tue, Jan 28, 2025 at 12:01 PM Ashutosh Bapat
> > >  wrote:
> > > >
> > > > If we want to stick to --database= supporting a pattern looks better
> > > > than just a single special pattern *.
> > > >
> > >
> > > This sounds reasonable to me as well. Note that the interaction of
> > > other parameters like --replication-slot is not yet discussed. I think
> > > if the number of slots given matches with the number of databases
> > > fetched based on pattern matches then we can use them otherwise,
> > > return the ERROR. The other option could be that we don't allow
> > > options like --replication-slot along with pattern matching option.
> > >
> >
> > I have had second thoughts about my pattern idea. Now, I favour just
> > adding another --all-databases switch like Ashutosh had suggested [1]
> > in the first place.
> >
> > I had overlooked the rules saying that the user is allowed to specify
> > *multiple* --publication or --replication-slot or --subscription name
> > switches, but when doing so they have to match the same number of
> > --database switches. Using a --dbname=pattern would be fraught with
> > complications. e.g. How can we know up-front how many databases the
> > dbname pattern will resolve to, and even in what order they get
> > resolved?
> >
>
> It could be a bit tricky to find that for users but they can devise a
> query to get the names and numbers of databases matching the given
> pattern.  OTOH, I am not sure there is a clear need at this stage for
> pattern matching for this tool. So, we can go with a simple switch as
> you are proposing at this stage.
>

After reconsidering the idea of supporting '--all-databases' switch is
the better approach at this stage, I have added the new switch in the
latest patch.
The attached patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.


v2-0001-Enhance-pg_createsubscriber-to-fetch-and-append-a.patch
Description: Binary data

Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-02-06 Thread Shubham Khanna

On Tue, Feb 4, 2025 at 5:39 PM Shlok Kyal  wrote:
>
> On Wed, 29 Jan 2025 at 15:14, Shubham Khanna
>  wrote:
> >
> > On Wed, Jan 29, 2025 at 10:42 AM Hayato Kuroda (Fujitsu)
> >  wrote:
> > >
> > > Dear Shubham,
> > >
> > > > I propose adding the --clean-publisher-objects option to the
> > > > pg_createsubscriber utility. As discussed in [1], this feature ensures
> > > > a clean and streamlined setup of logical replication by removing stale
> > > > or unnecessary publications from the subscriber node. These
> > > > publications, replicated during streaming replication, become
> > > > redundant after converting to logical replication and serve no further
> > > > purpose. This patch introduces the drop_all_publications() function,
> > > > which efficiently fetches and drops all publications on the subscriber
> > > > node within a single transaction.
> > >
> > > I think replication slot are also type of 'publisher-objects', but they 
> > > are not
> > > removed for now: API-name may not be accurate. And...
> > >
> > > > Additionally, other related objects, such as subscriptions and
> > > > replication slots, may also require cleanup. I plan to analyze this
> > > > further and include them in subsequent patches.
> > >
> > > I'm not sure replication slots should be cleaned up. Apart from other 
> > > items like
> > > publication/subscription, replication slots are not replicated when it is 
> > > created
> > > on the primary instance. This means they are intentionally created by 
> > > DBAs and there
> > > may not be no strong reasons to drop them after the conversion.
> > >
> > > Another question is the style of APIs. Do you plan to provide APIs like
> > > 'cleanup-subscriber-objects' and 'cleanup-publisher-objects', or just one
> > > 'cleanup-logical-replication-objects'?
> > >
> >
> > Thanks for the suggestions, I will keep them in mind while preparing
> > the 0002 patch for the same.
> > Currently, I have changed the API to '--cleanup-publisher-objects'.
> >
> > > Regarding the patch:
> > >
> > > 1.
> > > ```
> > > +   The pg_createsubscriber now supports 
> > > the
> > > +   --clean-publisher-objects to remove all 
> > > publications on
> > > +   the subscriber node before creating a new subscription.
> > > ```
> > >
> > > This description is not suitable for the documentation. Something like:
> > >
> > > Remove all publications on the subscriber node.
> > >
> > > 2.
> > > ```
> > > +   /* Drop publications from the subscriber if requested */
> > > +   drop_all_publications(dbinfo);
> > > ```
> > >
> > > This should be called when `opts.clean_publisher_objects` is true.
> > >
> > > 3.
> > > You said publications are dropped within a single transaction, but the
> > > patch does not do. Which is correct?
> > >
> > > 4.
> > > ```
> > > +# Set up node A as primary
> > > +my $node_a = PostgreSQL::Test::Cluster->new('node_a');
> > > +my $aconnstr = $node_a->connstr;
> > > +$node_a->init(allows_streaming => 'logical');
> > > +$node_a->append_conf('postgresql.conf', 'autovacuum = off');
> > > +$node_a->start;
> > > +
> > > +# Set up node B as standby linking to node A
> > > +$node_a->backup('backup_3');
> > > +my $node_b = PostgreSQL::Test::Cluster->new('node_b');
> > > +$node_b->init_from_backup($node_a, 'backup_3', has_streaming => 1);
> > > +$node_b->append_conf(
> > > +   'postgresql.conf', qq[
> > > +   primary_conninfo = '$aconnstr'
> > > +   hot_standby_feedback = on
> > > +   max_logical_replication_workers = 5
> > > +   ]);
> > > +$node_b->set_standby_mode();
> > > +$node_b->start;
> > > ```
> > >
> >
> > Fixed the given comments. The attached patch contains the suggested changes.
> >
>
> Hi Shubham,
>
> I have reviewed the v2 patch. Here are my comments:
>
> 1. Initial of the comment should in smallcase to make it similar to
> other comments:
> +   boolcleanup_publisher_objects;  /* Drop all publicatio

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-10 Thread Shubham Khanna

?
> Actually, I can't see any code even doing what the new documentation
> is saying so is the documentation even telling the truth? The commit
> message will also be impacted.
>
> ~~~
>
> 6.
> +  
> +   --all-databases cannot be used with
> +   --publication, --subscription,
> +   or --replication-slot.
> +  
>
> What about "--database".
>
> ~~~
>
> 7.
> For all the other incompatible options you wrote:
>
> "Cannot be used together with -a."
>
> IMO it might be nioer to give the full name of the option.
> e.g. "Cannot be used together with --all-databases."
>
> ==
> src/bin/pg_basebackup/pg_createsubscriber.c
>
> usage:
>
> 8.
> + printf(_("  -a, --all-databases create subscriptions for
> all matching databases on the target\n"));
>
> That seems ambiguous to me. How about something more like below to
> clarify the subscription is created on the target.
>
> "for all source databases create subscriptions on the same target database"
>
> ~~~
>
> fetch_source_databases:
>
> 9.
> + const char *query = "SELECT datname FROM pg_database WHERE
> datistemplate = false";
>
> Do we need this variable? It seems simpler/better just to put the SQL in-line.
>
> ~~~
>
> main:
>
> 10.
>   switch (c)
>   {
> + case 'a':
> + opt.all_databases = true;
> + break;
> +
>
> 10a.
> Missing the check for duplicate option --all-databases. OTOH maybe you
> don't care about this error, because probably it is harmless if you
> just ignore it.
>
> ~
>
> 10b.
> Missing the check for --database, --publication, --subscription,
> --replication-slot.
>
> (e.g. if user specified the --all-databases *after* those other options)
>
> ~~~
>
> 11.
> + if (opt.all_databases)
> + {
> + pg_log_error("--replslot cannot be used with --all-databases");
> + exit(1);
> + }
>
> Where'd this name come from? AFAICT there is no such option as "--replslot"
>
> ~~~
>
> 12.
> - * If --database option is not provided, try to obtain the dbname from
> - * the publisher conninfo. If dbname parameter is not available, error
> - * out.
> + * If neither --database nor --all-databases option is provided, try
> + * to obtain the dbname from the publisher conninfo. If dbname
> + * parameter is not available, error out.
>
> For this comment to make sense I think the previous comment (where
> fetch_source_databases is called) needs to explain that when
> --all-databases is defined then it is going to treat that internally
> as the equivalent of a whole lot of user-specified --database options.
>
> ==
> .../t/040_pg_createsubscriber.pl
>
> 13.
> + qr/cannot specify --publication or --replication-slot when using
> --all-databases/,
> + 'fail if --all-databases is used with publication and slot');
> +
>
> How are tests expecting this even passing?
>
> Searching the patch I cannot find any such error!
>
> ==

Fixed the given comments. The attached patch at [1] contains the
suggested changes.

[1] - 
https://www.postgresql.org/message-id/CAHv8Rj%2BmhYgh8XLFM8sN8J05z0ia%2BknfB1kP6kjbnB55H-b-mw%40mail.gmail.com

Thanks and regards,
Shubham Khanna.

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-10 Thread Shubham Khanna

On Mon, Feb 10, 2025 at 6:58 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shubham,
>
> > Fixed the given comments. The attached patch at [1] contains the
> > suggested changes.
>
> Thanks for updates. I registered the thread to the commitfest entry [1].
> Few comments.
>
> 01. fetch_source_databases
>
> ```
> +   const char *query = "SELECT datname FROM pg_database WHERE 
> datistemplate = false";
> ```
>
> You told given comments were fixed, but you ignored this. Please address it
> or clarify the reason why you didn't. My comment was:
>
> > We should verify the attribute datallowconn, which indicates we can connect 
> > to the
> > database. If it is false, no one would be able to connect to the db and 
> > define anything.
>

Fixed.

> 02. fetch_source_databases
> ```
> +   /* Check for connection errors */
> +   if (PQstatus(conn) != CONNECTION_OK)
> +   {
> +   pg_log_error("connection to the source server failed: %s", 
> PQerrorMessage(conn));
> +   disconnect_database(conn, false);
> +   exit(1);
> +   }
> ```
>
> connect_database() does the error handling, no need to do manually.
>

Fixed.

> 03. fetch_source_databases
> ```
> +   pg_log_error("failed to execute query on the source server: 
> %s", PQerrorMessage(conn));
> ```
>
> I feel the error message is too general. Also, PQerrorMessage() is not 
> suitable for getting error messages
> generated by the query. How about:
> pg_log_error("could not obtain a list of databases: %s", 
> PQresultErrorMessage());
>

Fixed.

> 04. fetch_source_databases
> ```
> +   /* Error if no databases were found on the source server */
> +   if (num_rows == 0)
> +   {
> +   pg_log_error("no databases found on the source server");
> +   pg_log_error_hint("Ensure that there are user-created 
> databases on the source server.");
> +   PQclear(res);
> +   disconnect_database(conn, false);
> +   exit(1);
> +   }
> ```
>
> Can you clarify when this happens?
>

This can happen when the postgres database has been dropped by
connecting to a template database.

> 05. main
>
> ```
> case 'd':
> +   if (opt.all_databases)
> +   {
> +   pg_log_error("--database cannot be 
> used with --all-databases");
> +   exit(1);
> +
> ```
>
> I think this erroring is not enough. getopt_long() receives options with the 
> specified ordering, thus
> -d can be accepted if the -a is specified latter.
> (Same thing can be said for 'P', '--replslot' and '--subscription'.)
>
> pg_createsubscriber ... -a -d postgres -> can be rejected,
> pg_createsubscriber ... -d postgres -a -> cannot be rejected.
>

Fixed.

The attached patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.


v5-0001-Enhance-pg_createsubscriber-to-fetch-and-append-a.patch
Description: Binary data

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-07 Thread Shubham Khanna

On Wed, Feb 5, 2025 at 5:31 PM Shlok Kyal  wrote:
>
> On Wed, 5 Feb 2025 at 01:26, Shubham Khanna  
> wrote:
> >
> > On Tue, Feb 4, 2025 at 3:49 PM Ashutosh Bapat
> >  wrote:
> > >
> > > Hi Shubham,
> > >
> > > On Tue, Feb 4, 2025 at 2:10 PM Shubham Khanna
> > >  wrote:
> > > > > >
> > > > >
> > > > > It could be a bit tricky to find that for users but they can devise a
> > > > > query to get the names and numbers of databases matching the given
> > > > > pattern.  OTOH, I am not sure there is a clear need at this stage for
> > > > > pattern matching for this tool. So, we can go with a simple switch as
> > > > > you are proposing at this stage.
> > > > >
> > > >
> > > > After reconsidering the idea of supporting '--all-databases' switch is
> > > > the better approach at this stage, I have added the new switch in the
> > > > latest patch.
> > > > The attached patch contains the suggested changes.
> > >
> > > + If neither -d nor -a is
> > > +   specified, pg_createsubscriber will use
> > > +   --all-databases by default.
> > >
> > > As pointed upthread by me and Peter, using --all-databases by default
> > > is not a good behaviour.
> > >
> > > But the code doesn't behave like --all-databases by default. Looks
> > > like we need to fix the documentation.
> >
> > Updated the documentation accordingly and added the current behaviour
> > of -a/--all-databases option.
> >
> > > + /* Generate publication and slot names if not specified */
> > > + SimpleStringListCell *cell;
> > > +
> > > + fetch_all_databases(opt);
> > > +
> > > + cell = opt->database_names.head;
> > >
> > > We don't seem to check existence of publication and slot name
> > > specification as the comment indicates. Do we need to check that those
> > > names are not specified at all? and also mention in the documentation
> > > that those specifications are required when using -a/--all-databases
> > > option?
> > >
> >
> > Added a check to verify that publication and slot names are not
> > manually specified when using -a/--all-databases option and updated
> > the documentation accordingly.
> >
> > The attached patch contains the suggested changes.
> >
> Hi Shubham,
>
> Here are some of my comments:
>
> 1. We should start the comment text from the next line
> +
> +/* Function to validate all the databases and generate publication/slot names
> + * when using '--all-databases'.
> + */
>
> 2. Here in error message it should be '--database'
> + /* Check for conflicting options */
> + if (opt->all_databases && opt->database_names.head != NULL)
> + {
> + pg_log_error("cannot specify --dbname when using --all-databases");
> + exit(1);
> + }
> +
>
> 3. I think checking 'opt->all_databases' in if conditions in function
> 'validate_databases' is not required
> +static void
> +validate_databases(struct CreateSubscriberOptions *opt)
> +{
> + /* Check for conflicting options */
> + if (opt->all_databases && opt->database_names.head != NULL)
> + {
> + pg_log_error("cannot specify --dbname when using --all-databases");
> + exit(1);
> + }
>
> as we are already checking it while calling the function
> + /* Validate and process database options */
> + if (opt.all_databases)
> + validate_databases(&opt);
> +
>
> 4. Here we should update the count of 'num_replslots' and 'num_pubs'
>
> + while (cell != NULL)
> + {
> + char slot_name[NAMEDATALEN];
> +
> + snprintf(slot_name, sizeof(slot_name), "%s_slot", cell->val);
> + simple_string_list_append(&opt->replslot_names, slot_name);
> +
> + snprintf(slot_name, sizeof(slot_name), "%s_pub", cell->val);
> + simple_string_list_append(&opt->pub_names, slot_name);
> +
> + cell = cell->next;
> + }
> Since we are not updating these counts, the names are reflected as expected.
>
> Should also store subscription names similarly? Or maybe store
> subscription names and assign slot names same as subscription names?
>
> 5. Since --all-databases option is added we should update the comment:
> /*
>  * If --database option is not provided, try to obtain the dbname from
>  * the publisher conninfo. If dbname parameter is not available, error
>  * out.
>  */
>
> 6. Extra blank lines should be removed
>   }
>   }
>
> +
> +
>   /* Number of object names must match number of databases */
>

Fixed the given comments. The attached patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.


v4-0001-Enhance-pg_createsubscriber-to-fetch-and-append-a.patch
Description: Binary data

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-07 Thread Shubham Khanna

codes, no need to do.
>
> 11. 040_pg_createsubscriber.pl
> ```
> +# Ensure there are some user databases on the publisher
> +$node_a->safe_psql('postgres', 'CREATE DATABASE db3');
> +$node_a->safe_psql('postgres', 'CREATE DATABASE db4');
> +
> +$node_b->stop;
> ```
>
> You must wait until all changes have been replicated to the standby here.
>
> 12. 040_pg_createsubscriber.pl
>
> Can we do the similar tests without adding new instances? E.g., runs with
> dry-run mode and confirms all databases become target.
>

Fixed the given comments. The attached patch at [1] contains the
suggested changes.

[1] - 
https://www.postgresql.org/message-id/CAHv8Rj%2BDOQ9vvkKCsmDeFKyM073yRfs47Tn_qywQOdx15pmOMA%40mail.gmail.com

Thanks and regards,
Shubham Khanna.

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-07 Thread Shubham Khanna

ell = opt->database_names.head;
> +
> + while (cell != NULL)
> + {
> + char slot_name[NAMEDATALEN];
> +
> + snprintf(slot_name, sizeof(slot_name), "%s_slot", cell->val);
> + simple_string_list_append(&opt->replslot_names, slot_name);
> +
> + snprintf(slot_name, sizeof(slot_name), "%s_pub", cell->val);
> + simple_string_list_append(&opt->pub_names, slot_name);
> +
> + cell = cell->next;
> + }
> + }
>
>
> 16a.
> Why is this code checking 'opt->all_databases'? Isn't it only possible
> to get to this function via --all-databases. You can just Assert this.
>
> ~
>
> 16b.
> Why are you reusing 'slot_name' variable like this?
>
> ~
>
> 16c.
> Was there some ERROR checking missing here to ensure the length of the
> dbname is not going to cause pub/slot to exceed NAMEDATALEN?
>
> ~
>
> 16d.
> In hindsight, most of this function seems unnecessary to me. Probably
> you could've done all this pub/slot name assignment inside the
> fetch_all_databases() at the same time as you were doing
> simple_string_list_append(&opt->database_names, dbname); for each
> database.
>
> ~~~
>
> 17.
> - while ((c = getopt_long(argc, argv, "d:D:np:P:s:t:U:v",
> + while ((c = getopt_long(argc, argv, "ad:D:np:P:s:t:U:v",
>   long_options, &option_index)) != -1)
>
>
> Isn't the code within this loop missing all the early switch
> validation for checking you are not combining incompatible switches?
>
> e.g.
> --all-databases - "Cannot specify --all-databases more than once"
> --all-databases - "Cannot combine with --database, --publication,
> --subscription, --replication-slot"
> --database - "Cannot combine with --all-databases"
> --publication - "Cannot combine with --all-databases"
> --subscription - "Cannot combine with --all-databases"
> --replication-slot - "Cannot combine with --all-databases"
>
> ~~~
>
> 18.
> +
> +
>   /* Number of object names must match number of databases */
>
> Remove spurious whitespace.
>
> ==

Fixed the given comments. The attached patch at [1] contains the
suggested changes.

[1] - 
https://www.postgresql.org/message-id/CAHv8Rj%2BDOQ9vvkKCsmDeFKyM073yRfs47Tn_qywQOdx15pmOMA%40mail.gmail.com

Thanks and regards,
Shubham Khanna.

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-11 Thread Shubham Khanna

On Tue, Feb 11, 2025 at 6:30 AM Peter Smith  wrote:
>
> Hi Shubham.
>
> Responding with a blanket statement "Fixed the given comments" makes
> it difficult to be sure what was done when I come to confirm the
> changes. Often embedded questions go unanswered, and if changes are
> *not* made, then I can't tell if there was a good reason for rejection
> or was the comment just overlooked. Please can you treat the itemised
> feedback as a checklist and reply individually to each item. Even just
> saying "Fixed" is useful because then I can trust the item was seen
> and addressed.
>

I appreciate the feedback. I will ensure that each item is addressed
individually.

> e.g. From previous review [1]
> #7. Not fixed. The same docs issue still exists for --publication and
> for --subscription.

Fixed this issue in the attached patch.

> #13. Unanswered question "How are tests expecting this even passing?".
> Was a reason identified? IOW, how can we be sure the latest tests
> don't have a similar problem?
>

In the v4-0001 patch [1], the tests were mistakenly using
'command_fails' instead of 'command_fails_like' to verify failed test
cases. Since 'command_fails' only checks for a non-zero exit code and
not specific error messages, the tests were passing even when the
expected errors were not being triggered correctly.
To address this, I have updated the patch to use 'command_fails_like',
ensuring that the test cases now explicitly verify the correct failure
messages.

> //
>
> Some more review comments for the patch v5-0001.
>
> ==
> doc/src/sgml/ref/pg_createsubscriber.sgml
>
> Synopsis
>
> 1.
> pg_createsubscriber [option...] { -a | --all-databases } { -d |
> --database }dbname { -D | --pgdata }datadir { -P | --publisher-server
> }connstr
>
> This looks misleading because '-all-database' and '--database' cannot
> be combined.
>
> I think it will be better to have 2 completely different synopsis like
> I had originally suggested [2, comment #5]. E.g. just add a whole
> seperate  block adjacent to the other one in the SGML:
>
> --
>   
>pg_createsubscriber
>option
>
> 
>  -a
>  --all-databases
> 
> 
>  -D 
>  --pgdata
> 
> datadir
> 
>  -P
>  --publisher-server
> 
> connstr
>
>   
> --
>
> ~~~
>

Fixed.

> 2. --all-databases
>
> +  
> +   --all-databases cannot be used with
> +   --database, --publication,
> +   --replication-slot or 
> --subscription.
> +  
>
> If you want consistent wording with the other places in this patch,
> then this should just be the last sentence of the previous paragraph
> and be worded like: "Cannot be used together with..."
>
> ~~~
>

Fixed.

> 3. --publication
>
> -   a generated name is assigned to the publication name.
> +   a generated name is assigned to the publication name. Cannot be used
> +   together with -a.
>
> Previously reported. Claimed fixed -a to --all-databases. Not fixed.
>
> ~~~
>

Fixed.

> 4. --subscription
>
> -   a generated name is assigned to the subscription name.
> +   a generated name is assigned to the subscription name. Cannot be used
> +   together with -a.
>
>
> (same as the previous review comment).
>
> Previously reported. Claimed fixed -a to --all-databases. Not fixed.
>

Fixed.

> ==
> src/bin/pg_basebackup/pg_createsubscriber.c
>
> 5.
> + bool all_databases; /* fetch and specify all databases */
>
> Comment wording. "and specified" (??). Maybe just "--all-databases
> option was specified"
>
> ==

Fixed.

The attached Patch contains the suggested changes.

[1] - 
https://www.postgresql.org/message-id/CAHv8Rj%2BDOQ9vvkKCsmDeFKyM073yRfs47Tn_qywQOdx15pmOMA%40mail.gmail.com

Thanks and regards,
Shubham Khanna.


v6-0001-Enhance-pg_createsubscriber-to-fetch-and-append-a.patch
Description: Binary data

Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-02-10 Thread Shubham Khanna

On Fri, Feb 7, 2025 at 7:46 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shubham,
>
> Thanks for updating the patch.
>
> Previously you told that you had a plan to extend the patch to drop other 
> replication
> objects [1], but I think it is not needed. pg_createsubscriber has already 
> been
> able to drop the existing subscrisubscriptions in 
> check_and_drop_existing_subscriptions().
> As for the replication slot, I have told in [2], it would be created 
> intentionally
> thus I feel it should not be dropped.
> Thus I regard the patch does not have concrete extending plan.
>
> Below part contains my review comment.
>
> 01. Option name
>
> Based on the above discussion, "--cleanup-publisher-objects" is not suitable 
> because
> it won't drop replication slots. How about "--cleanup-publications"?
>

I have changed the name of the option  to "--cleanup-existing-publications"

> 02. usage()
> ```
> +   printf(_("  -C  --cleanup-publisher-objects drop all publications on 
> the logical replica\n"));
> ```

Fixed.

> s/logical replica/subscriber
>
> 03. drop_all_publications
> ```
> +/* Drops all existing logical replication publications from all subscriber
> + * databases
> + */
> +static void
> ```
>
> Initial line of the comment must be blank [3].
>

Removed this function.

> 04. main
> ```
> +   {"cleanup-publisher-objects", no_argument, NULL, 'C'},
> ```
>
> Is there a reason why upper case is used? I feel lower one is enough.
>

Fixed.

> 05. main
> ```
> +   /* Drop publications from the subscriber if requested */
> +   if (opt.cleanup_publisher_objects)
> +   drop_all_publications(dbinfo);
> ```
>
> After considering more, I noticed that we have already called 
> drop_publication()
> in the setup_subscriber(). Can we call drop_all_publications() there instead 
> when
> -C is specified?
>

I agree with you on this. I have removed the drop_all_publication()
and added the "--cleanup-existing-publications" option to the
drop_publication()

> 06. 040_pg_createsubscriber.pl
>
> ```
> +$node_s->start;
> +# Create publications to test it's removal
> +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;");
> +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;");
> +
> +# Verify the existing publications
> +my $pub_count_before =
> +  $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;");
> +is($pub_count_before, '2',
> +   'two publications created before --cleanup-publisher-objects is run');
> +
> +$node_s->stop;
> ```
>
> I feel it requires unnecessary startup and shutdown. IIUC, creating 
> publications and check
> counts can be before stopping the node_s, around line 331.
>

Fixed.

> 07. 040_pg_createsubscriber.pl
>
> ```
> +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub FOR ALL TABLES;");
> +$node_p->safe_psql($db1, "CREATE PUBLICATION test_pub2 FOR ALL TABLES;");
> +
> +# Verify the existing publications
> +my $pub_count_before =
> +  $node_s->safe_psql($db1, "SELECT COUNT(*) FROM pg_publication;");
> +is($pub_count_before, '2',
> +   'two publications created before --cleanup-publisher-objects is run');
> +
> ```
>
> Also, there is a possibility that CREATE PUBLICATION on node_p is not 
> replicated yet
> when SELECT COUNT(*) is executed. Please wait for the replay.
>
> [1]: 
> https://www.postgresql.org/message-id/CAHv8RjL4OvoYafofTb_U_JD5HuyoNowBoGpMfnEbhDSENA74Kg%40mail.gmail.com
> [2]: 
> https://www.postgresql.org/message-id/OSCPR01MB1496664FDC38DA40A441F449FF5EE2%40OSCPR01MB14966.jpnprd01.prod.outlook.com
> [3]: https://www.postgresql.org/docs/devel/source-format.html
>

Fixed.

The attached Patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.


v4-0001-Support-for-dropping-all-publications-in-pg_creat.patch
Description: Binary data

Re: Enhance 'pg_createsubscriber' to retrieve databases automatically when no database is provided.

2025-02-12 Thread Shubham Khanna

On Wed, Feb 12, 2025 at 1:15 PM Shubham Khanna
 wrote:
>
> On Wed, Feb 12, 2025 at 5:29 AM Peter Smith  wrote:
> >
> > On Tue, Feb 11, 2025 at 9:16 PM Shubham Khanna
> >  wrote:
> > >
> > > > #13. Unanswered question "How are tests expecting this even passing?".
> > > > Was a reason identified? IOW, how can we be sure the latest tests
> > > > don't have a similar problem?
> > > >
> > >
> > > In the v4-0001 patch [1], the tests were mistakenly using
> > > 'command_fails' instead of 'command_fails_like' to verify failed test
> > > cases. Since 'command_fails' only checks for a non-zero exit code and
> > > not specific error messages, the tests were passing even when the
> > > expected errors were not being triggered correctly.
> > > To address this, I have updated the patch to use 'command_fails_like',
> > > ensuring that the test cases now explicitly verify the correct failure
> > > messages.
> > >
> >
> > Ah, that makes sense. Thanks for sharing the reason. So in fact, it
> > was a valid concern because the v5 was still carrying over the same
> > flaw from v4... Anyway, it is good to know it is fixed now in v6.
> >
> > =
> >
> > Some general comments for the patch v6-0001:
> >
> > Do you need to test every possible bad option combination? It may be
> > fine because the error will be immediately raised so I expect the
> > execution overhead to be ~zero.
> >
> > BTW, your bad option combination tests are only using  --all-databases
> > *after* the other options. Maybe you should mix it up a bit, sometimes
> > putting it *before* the others as well, because rearranging will cause
> > different errors.
> >
> > Everything else now looks good to me.
>

Fixed the comment given by Shlok at [1]. The attached patch at [2]
contains the suggested changes.

[1] - 
https://www.postgresql.org/message-id/CANhcyEXGwFeQd2fuB2txm1maCC2zKyROUR5exEMGzPYdrZ4CPQ%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAHv8RjKc6LMJ86b4yCmwNTn0c65mz0BGMCF-vPJSKDMOGagVGA%40mail.gmail.com

Thanks and regards,
Shubham Khanna.

Re: Adding a '--clean-publisher-objects' option to 'pg_createsubscriber' utility.

2025-01-29 Thread Shubham Khanna

On Wed, Jan 29, 2025 at 10:42 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Shubham,
>
> > I propose adding the --clean-publisher-objects option to the
> > pg_createsubscriber utility. As discussed in [1], this feature ensures
> > a clean and streamlined setup of logical replication by removing stale
> > or unnecessary publications from the subscriber node. These
> > publications, replicated during streaming replication, become
> > redundant after converting to logical replication and serve no further
> > purpose. This patch introduces the drop_all_publications() function,
> > which efficiently fetches and drops all publications on the subscriber
> > node within a single transaction.
>
> I think replication slot are also type of 'publisher-objects', but they are 
> not
> removed for now: API-name may not be accurate. And...
>
> > Additionally, other related objects, such as subscriptions and
> > replication slots, may also require cleanup. I plan to analyze this
> > further and include them in subsequent patches.
>
> I'm not sure replication slots should be cleaned up. Apart from other items 
> like
> publication/subscription, replication slots are not replicated when it is 
> created
> on the primary instance. This means they are intentionally created by DBAs 
> and there
> may not be no strong reasons to drop them after the conversion.
>
> Another question is the style of APIs. Do you plan to provide APIs like
> 'cleanup-subscriber-objects' and 'cleanup-publisher-objects', or just one
> 'cleanup-logical-replication-objects'?
>

Thanks for the suggestions, I will keep them in mind while preparing
the 0002 patch for the same.
Currently, I have changed the API to '--cleanup-publisher-objects'.

> Regarding the patch:
>
> 1.
> ```
> +   The pg_createsubscriber now supports the
> +   --clean-publisher-objects to remove all publications 
> on
> +   the subscriber node before creating a new subscription.
> ```
>
> This description is not suitable for the documentation. Something like:
>
> Remove all publications on the subscriber node.
>
> 2.
> ```
> +   /* Drop publications from the subscriber if requested */
> +   drop_all_publications(dbinfo);
> ```
>
> This should be called when `opts.clean_publisher_objects` is true.
>
> 3.
> You said publications are dropped within a single transaction, but the
> patch does not do. Which is correct?
>
> 4.
> ```
> +# Set up node A as primary
> +my $node_a = PostgreSQL::Test::Cluster->new('node_a');
> +my $aconnstr = $node_a->connstr;
> +$node_a->init(allows_streaming => 'logical');
> +$node_a->append_conf('postgresql.conf', 'autovacuum = off');
> +$node_a->start;
> +
> +# Set up node B as standby linking to node A
> +$node_a->backup('backup_3');
> +my $node_b = PostgreSQL::Test::Cluster->new('node_b');
> +$node_b->init_from_backup($node_a, 'backup_3', has_streaming => 1);
> +$node_b->append_conf(
> +   'postgresql.conf', qq[
> +   primary_conninfo = '$aconnstr'
> +   hot_standby_feedback = on
> +   max_logical_replication_workers = 5
> +   ]);
> +$node_b->set_standby_mode();
> +$node_b->start;
> ```
>

Fixed the given comments. The attached patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.


v2-0001-Support-for-dropping-all-publications-in-pg_creat.patch
Description: Binary data

Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.

2024-12-11 Thread Shubham Khanna

On Wed, Dec 11, 2024 at 2:08 PM vignesh C  wrote:
>
> On Wed, 11 Dec 2024 at 11:21, Shubham Khanna
>  wrote:
> >
> > On Wed, Dec 11, 2024 at 4:21 AM Peter Smith  wrote:
> >
> > I have fixed the given comments. The attached patch contains the
> > suggested changes.
>
> Since all the subscriptions are created based on the two_phase option
> provided, there is no need to store this for each database:
> @@ -53,6 +54,7 @@ struct LogicalRepInfo
> char   *pubname;/* publication name */
> char   *subname;/* subscription name */
> char   *replslotname;   /* replication slot name */
> +   booltwo_phase;  /* two-phase enabled
> for the subscription */
>
> dbinfo[i].dbname = cell->val;
> +   dbinfo[i].two_phase = opt->two_phase;
> if (num_pubs > 0)
>
> How about we handle something like in the attached changes.
>

Thank you for pointing this out and for suggesting the changes. I
agree with your approach.
Also, I found a mistake in getopt_long and fixed it in this version of
the patch.
The attached patch contains the suggested changes.

Thanks and regards,
Shubham Khanna.


v4-0001-Add-support-for-two-phase-commit-in-pg_createsubs.patch
Description: Binary data

1 2 >

1 - 100 of 197 matches

Mail list logo