Hi, On 2021-12-22 22:34:45 +0900, Masahiko Sawada wrote: > I've attached an updated patch. Please review it.
Sorry for dropping the ball on this again :(. I've pushed the fix with some very minor polishing. > > The attached detects that bug, but I'm not sure it's worth expending > > test time, or this might be in the server test suit. > > Thanks. It's convenient to test this issue but I'm also not sure it's > worth adding to the test suit. I think it's definitely worth adding a test, but I don't particularly like the specific test implementation. Primarily because I think it's better to test this in a cluster that stays running, so that we can verify that the slot drop worked. It also doesn't seem necessary to create a separate cluster. I wrote the attached isolation test. I ended up not committing it yet - I'm worried that there could be some OS dependent output difference, due to some libpq error handling issues. See [1], which Tom pointed out is caused by the issue discussed in [2]. Greetings, Andres Freund [1] https://postgr.es/m/20220215004143.dlzsn72oqsmqa7uw%40alap3.anarazel.de [2] https://postgr.es/m/20220215004143.dlzsn72oqsmqa7uw%40alap3.anarazel.de
>From 343106442eafde4fcc55cc75e78501010d50a883 Mon Sep 17 00:00:00 2001 From: Andres Freund <and...@anarazel.de> Date: Mon, 14 Feb 2022 17:17:37 -0800 Subject: [PATCH v1] Add isolation test for errors during logical slot creation. Discussion: https://postgr.es/m/cad21aodaeepabzeyyjspzjumsparicvsbobal7spaofnkz+...@mail.gmail.com --- contrib/test_decoding/Makefile | 2 +- .../expected/slot_creation_error.out | 113 ++++++++++++++++++ .../specs/slot_creation_error.spec | 41 +++++++ 3 files changed, 155 insertions(+), 1 deletion(-) create mode 100644 contrib/test_decoding/expected/slot_creation_error.out create mode 100644 contrib/test_decoding/specs/slot_creation_error.spec diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile index 56ddc3abaeb..36929dd97d3 100644 --- a/contrib/test_decoding/Makefile +++ b/contrib/test_decoding/Makefile @@ -9,7 +9,7 @@ REGRESS = ddl xact rewrite toast permissions decoding_in_xact \ sequence ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \ oldest_xmin snapshot_transfer subxact_without_top concurrent_stream \ - twophase_snapshot + twophase_snapshot slot_creation_error REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf diff --git a/contrib/test_decoding/expected/slot_creation_error.out b/contrib/test_decoding/expected/slot_creation_error.out new file mode 100644 index 00000000000..fc1c98b5248 --- /dev/null +++ b/contrib/test_decoding/expected/slot_creation_error.out @@ -0,0 +1,113 @@ +Parsed test spec with 2 sessions + +starting permutation: s1_b s1_xid s2_init s1_view_slot s1_cancel_s2 s1_view_slot s1_c +step s1_b: BEGIN ISOLATION LEVEL SERIALIZABLE; +step s1_xid: SELECT 'xid' FROM txid_current(); +?column? +-------- +xid +(1 row) + +step s2_init: + SELECT 'init' FROM pg_create_logical_replication_slot('slot_creation_error', 'test_decoding'); + <waiting ...> +step s1_view_slot: + SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error' + +slot_name |slot_type|active +-------------------+---------+------ +slot_creation_error|logical |t +(1 row) + +step s1_cancel_s2: + SELECT pg_cancel_backend(pid) + FROM pg_stat_activity + WHERE application_name = 'isolation/slot_creation_error/s2'; + +pg_cancel_backend +----------------- +t +(1 row) + +step s2_init: <... completed> +ERROR: canceling statement due to user request +step s1_view_slot: + SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error' + +slot_name|slot_type|active +---------+---------+------ +(0 rows) + +step s1_c: COMMIT; + +starting permutation: s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot +step s1_b: BEGIN ISOLATION LEVEL SERIALIZABLE; +step s1_xid: SELECT 'xid' FROM txid_current(); +?column? +-------- +xid +(1 row) + +step s2_init: + SELECT 'init' FROM pg_create_logical_replication_slot('slot_creation_error', 'test_decoding'); + <waiting ...> +step s1_c: COMMIT; +step s2_init: <... completed> +?column? +-------- +init +(1 row) + +step s1_view_slot: + SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error' + +slot_name |slot_type|active +-------------------+---------+------ +slot_creation_error|logical |f +(1 row) + +step s1_drop_slot: + SELECT pg_drop_replication_slot('slot_creation_error'); + +pg_drop_replication_slot +------------------------ + +(1 row) + + +starting permutation: s1_b s1_xid s2_init s1_terminate_s2 s1_c s1_view_slot +step s1_b: BEGIN ISOLATION LEVEL SERIALIZABLE; +step s1_xid: SELECT 'xid' FROM txid_current(); +?column? +-------- +xid +(1 row) + +step s2_init: + SELECT 'init' FROM pg_create_logical_replication_slot('slot_creation_error', 'test_decoding'); + <waiting ...> +step s1_terminate_s2: + SELECT pg_terminate_backend(pid) + FROM pg_stat_activity + WHERE application_name = 'isolation/slot_creation_error/s2'; + +pg_terminate_backend +-------------------- +t +(1 row) + +step s2_init: <... completed> +FATAL: terminating connection due to administrator command +FATAL: terminating connection due to administrator command +server closed the connection unexpectedly + This probably means the server terminated abnormally + before or while processing the request. + +step s1_c: COMMIT; +step s1_view_slot: + SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error' + +slot_name|slot_type|active +---------+---------+------ +(0 rows) + diff --git a/contrib/test_decoding/specs/slot_creation_error.spec b/contrib/test_decoding/specs/slot_creation_error.spec new file mode 100644 index 00000000000..582b282331f --- /dev/null +++ b/contrib/test_decoding/specs/slot_creation_error.spec @@ -0,0 +1,41 @@ +# Test that erroring out during logical slot creation is handled properly + +session "s1" +setup { SET synchronous_commit=on; } + +step s1_b { BEGIN ISOLATION LEVEL SERIALIZABLE; } +step s1_xid { SELECT 'xid' FROM txid_current(); } +step s1_c { COMMIT; } +step s1_cancel_s2 { + SELECT pg_cancel_backend(pid) + FROM pg_stat_activity + WHERE application_name = 'isolation/slot_creation_error/s2'; +} + +step s1_terminate_s2 { + SELECT pg_terminate_backend(pid) + FROM pg_stat_activity + WHERE application_name = 'isolation/slot_creation_error/s2'; +} + +step s1_view_slot { + SELECT slot_name, slot_type, active FROM pg_replication_slots WHERE slot_name = 'slot_creation_error' +} + +step s1_drop_slot { + SELECT pg_drop_replication_slot('slot_creation_error'); +} + +session s2 +setup { SET synchronous_commit=on; } +step s2_init { + SELECT 'init' FROM pg_create_logical_replication_slot('slot_creation_error', 'test_decoding'); +} + +# The tests first start a transaction with an xid assigned in s1, then create +# a slot in s2. The slot creation waits for s1's transaction to end. Instead +# we cancel / terminate s2. +permutation s1_b s1_xid s2_init s1_view_slot s1_cancel_s2 s1_view_slot s1_c +permutation s1_b s1_xid s2_init s1_c s1_view_slot s1_drop_slot # check slot creation still works +permutation s1_b s1_xid s2_init s1_terminate_s2 s1_c s1_view_slot +# can't run tests after this, due to s2's connection failure -- 2.34.0