On Mon, Aug 26, 2024 at 10:13 PM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > On Monday, August 26, 2024 3:30 PM Peter Smith <smithpb2...@gmail.com> wrote: > > > > ====== > > src/include/replication/conflict.h > > > > nit - defined 'NUM_CONFLICT_TYPES' inside the enum (I think this way is > > often used in other PG source enums) > > I think we have recently tended to avoid doing that, as it has been commented > that this style is somewhat deceptive and can cause confusion. See a previous > similar comment[1]. The current style follows the other existing examples > like: > > #define IOOBJECT_NUM_TYPES (IOOBJECT_TEMP_RELATION + 1) > #define IOCONTEXT_NUM_TYPES (IOCONTEXT_VACUUM + 1) > #define IOOP_NUM_TYPES (IOOP_WRITEBACK + 1) > #define BACKEND_NUM_TYPES (B_LOGGER + 1)
OK. > > > > ====== > > src/test/subscription/t/026_stats.pl > > > > 1. > > + # Delete data from the test table on the publisher. This delete > > + operation # should be skipped on the subscriber since the table is already > > empty. > > + $node_publisher->safe_psql($db, qq(DELETE FROM $table_name;)); > > + > > + # Wait for the subscriber to report tuple missing conflict. > > + $node_subscriber->poll_query_until( > > + $db, > > + qq[ > > + SELECT update_missing_count > 0 AND delete_missing_count > 0 FROM > > + pg_stat_subscription_stats WHERE subname = '$sub_name' > > + ]) > > + or die > > + qq(Timed out while waiting for tuple missing conflict for > > subscription '$sub_name'); > > > > Can you write a comment to explain why the replicated DELETE is > > expected to increment both the 'update_missing_count' and the > > 'delete_missing_count'? > > I think the comments several lines above the wait explained the reason[2]. I > slightly modified the comments to make it clear. > 1. Right, but it still was not obvious to me what caused the 'update_missing_count'. On further study, I see it was a hangover from the earlier UPDATE test case which was still stuck in an ERROR loop attempting to do the update operation. e.g. before it was giving the expected 'update_exists' conflicts but after the subscriber table TRUNCATE the update conflict changes to give a 'update_missing' conflict instead. I've updated the comment to reflect my understanding. Please have a look to see if you agree. ~~~~ 2. I separated the tests for 'update_missing' and 'delete_missing', putting the update_missing test *before* the DELETE. I felt the expected results were much clearer when each test did just one thing. Please have a look to see if you agree. ~~~ 3. +# Enable track_commit_timestamp to detect origin-differ conflicts in logical +# replication. Reduce wal_retrieve_retry_interval to 1ms to accelerate the +# restart of the logical replication worker after encountering a conflict. +$node_subscriber->append_conf( + 'postgresql.conf', q{ +track_commit_timestamp = on +wal_retrieve_retry_interval = 1ms +}); Later, after CDR resolvers are implemented, it might be good to revisit these conflict test cases and re-write them to use some conflict resolvers like 'skip'. Then the subscriber won't give ERRORs and restart apply workers all the time behind the scenes, so you won't need the above configuration for accelerating the worker restarts. In other words, running these tests might be more efficient if you can avoid restarting workers all the time. I suggest putting an XXX comment here as a reminder that these tests should be revisited to make use of conflict resolvers in the future. ~~~ nit - not caused by this patch, but other comment inconsistencies about "stats_reset timestamp" can be fixed in passing too. ====== Kind Regards, Peter Smith. Fujitsu Australia
diff --git a/src/test/subscription/t/026_stats.pl b/src/test/subscription/t/026_stats.pl index 0df31a6..d9589f0 100644 --- a/src/test/subscription/t/026_stats.pl +++ b/src/test/subscription/t/026_stats.pl @@ -134,24 +134,37 @@ sub create_sub_pub_w_errors or die qq(Timed out while waiting for update_exists conflict for subscription '$sub_name'); - # Truncate the test table to ensure that the conflicting update operations - # are skipped, allowing the test to continue. - $node_subscriber->safe_psql($db, qq(TRUNCATE $table_name)); + # Truncate the subscriber side test table. Now that the table is empty, + # the update conflict ('update_existing') ERRORs will stop happening. A + # single update conflict 'update_missing' will be reported, but the update + # will be skipped on the subscriber, allowing the test to continue. + $node_subscriber->safe_psql($db, qq(TRUNCATE $table_name)); + + # Wait for the subscriber to report update_missing conflict. + $node_subscriber->poll_query_until( + $db, + qq[ + SELECT update_missing_count > 0 + FROM pg_stat_subscription_stats + WHERE subname = '$sub_name' + ]) + or die + qq(Timed out while waiting for update_missing conflict for subscription '$sub_name'); # Delete data from the test table on the publisher. This delete operation # should be skipped on the subscriber since the table is already empty. $node_publisher->safe_psql($db, qq(DELETE FROM $table_name;)); - # Wait for the subscriber to report tuple missing conflict. + # Wait for the subscriber to report delete_missing conflict. $node_subscriber->poll_query_until( $db, qq[ - SELECT update_missing_count > 0 AND delete_missing_count > 0 + SELECT delete_missing_count > 0 FROM pg_stat_subscription_stats WHERE subname = '$sub_name' ]) or die - qq(Timed out while waiting for tuple missing conflict for subscription '$sub_name'); + qq(Timed out while waiting for delete_missing conflict for subscription '$sub_name'); # Prepare data for further tests. $node_publisher->safe_psql($db, qq(INSERT INTO $table_name VALUES (1))); @@ -216,7 +229,7 @@ my ($pub1_name, $sub1_name) = create_sub_pub_w_errors($node_publisher, $node_subscriber, $db, $table1_name); -# Apply errors, sync errors, and conflicts are > 0 and reset timestamp is NULL +# Apply errors, sync errors, and conflicts are > 0 and stats_reset timestamp is NULL is( $node_subscriber->safe_psql( $db, qq(SELECT apply_error_count > 0, @@ -240,7 +253,7 @@ $node_subscriber->safe_psql($db, qq(SELECT pg_stat_reset_subscription_stats((SELECT subid FROM pg_stat_subscription_stats WHERE subname = '$sub1_name'))) ); -# Apply errors, sync errors, and conflicts are 0 and stats reset is not NULL +# Apply errors, sync errors, and conflicts are 0 and stats_reset timestamp is not NULL is( $node_subscriber->safe_psql( $db, qq(SELECT apply_error_count = 0, @@ -286,7 +299,7 @@ my ($pub2_name, $sub2_name) = create_sub_pub_w_errors($node_publisher, $node_subscriber, $db, $table2_name); -# Apply errors, sync errors, and conflicts are > 0 and reset timestamp is NULL +# Apply errors, sync errors, and conflicts are > 0 and stats_reset timestamp is NULL is( $node_subscriber->safe_psql( $db, qq(SELECT apply_error_count > 0, @@ -309,7 +322,7 @@ is( $node_subscriber->safe_psql( $node_subscriber->safe_psql($db, qq(SELECT pg_stat_reset_subscription_stats(NULL))); -# Apply errors, sync errors, and conflicts are 0 and stats reset is not NULL +# Apply errors, sync errors, and conflicts are 0 and stats_reset timestamp is not NULL is( $node_subscriber->safe_psql( $db, qq(SELECT apply_error_count = 0,