> On 4 Apr 2024, at 23:46, Daniel Gustafsson <dan...@yesql.se> wrote: > >> On 4 Apr 2024, at 23:24, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> A minimum fix that seems to make this work better is as attached, >> but I feel like somebody ought to examine all the IPC::Run::timer >> and IPC::Run::timeout calls and see which ones are mistaken. >> It's a little scary to convert a timeout to a timer because of >> the hazard that someplace that would need to be checking for >> is_expired isn't. > > Skimming this and a few callsites it seems reasonable to use a timer instead > of > a timeout, but careful study is needed to make sure we're not introducing > anything subtly wrong in the other direction.
Sharing a few preliminary results from looking at this, the attached passes check-world but need more study/testing. It seems wrong to me that we die() in query and query_until rather than giving the caller the power to decide how to proceed. We have even documented that we do just this: "Dies on failure to invoke psql, or if psql fails to connect. Errors occurring later are the caller's problem" Turning the timeout into a timer and returning undef along with logging a test failure in case of expiration seems a bit saner (maybe Andrew can suggest an API which has a better Perl feel to it). Most callsites don't need any changes to accommodate for this, the attached 0002 implements this timer change and modify the few sites that need it, converting one to plain query() where the added complexity of query_until isn't required. A few other comments on related things that stood out while reviewing: The tab completion test can use the API call for automatically restart the timer to reduce the complexity of check_completion a hair. Done in 0001 (but really not necessary). Commit Af279ddd1c2 added this sequence to 040_standby_failover_slots_sync.pl in the recovery tests: $back_q->query_until( qr/logical_slot_get_changes/, q( \echo logical_slot_get_changes SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL); )); ... <other tests> ... # Since there are no slots in standby_slot_names, the function # pg_logical_slot_get_changes should now return, and the session can be # stopped. $back_q->quit; There is no guarantee that pg_logical_slot_get_changes has returned when reaching this point. This might still work as intended, but the comment is slightly misleading IMO. recovery/t/043_wal_replay_wait.pl calls pg_wal_replay_wait() since 06c418e163e in a background session which it then skips terminating. Calling ->quit is mandated by the API, in turn required by IPC::Run. Calling ->quit on the process makes the test fail from the process having already exited, but we can call ->finish directly like we do in test_misc/t/005_timeouts.pl. 0003 fixes this. -- Daniel Gustafsson
v1-0001-Configure-interactive-instance-to-restart-timer.patch
Description: Binary data
v1-0002-Report-test-failure-rather-than-aborting-in-case-.patch
Description: Binary data
v1-0003-Clean-up-BackgroundPsql-object-when-test-ends.patch
Description: Binary data