On 3/19/21 3:17 PM, Euler Taveira wrote: > On Fri, Mar 19, 2021, at 12:23 AM, Tom Lane wrote: >> [ reads code... ] >> ... no, I think the problem is the test is still full of race conditions. >> >> In the first place, waiting till you see the output of a SELECT that's >> before the useful query is not enough to guarantee that the useful query >> is done, or even started. That's broken on both sessions. > That's an ugly and fragile mechanism to workaround the fact that pump_until > reacts after you have the query return. > >> In the second place, even if the second INSERT has started, you don't know >> that it's reached the point of blocking on the tuple conflict yet. >> Which in turn means that it might not've filled its tuplestore yet. >> >> In short, this script is designed to ensure that the test query can't >> finish too soon, but that proves nothing about whether the test query >> has even started. And since you also haven't really guaranteed that the >> intended-to-be-blocking query is done, I don't think that the first >> condition really holds either. > In order to avoid the race condition between filling the tuplestore and > killing > the backend, we could use a pool_query_until() before SIGKILL to wait the > temporary file being created. Do you think this modification will make this > test more stable? >
Wow, I thought I understand the perl code, but clearly I was wrong. I think the solution is not particularly difficult. For the initial insert, it's fine to switch the order of the SQL commands, so that the insert happens first, and only then emit the string. For the second insert, we can't do that, because it's expected to block on the lock. So we have to verify that by looking at the lock directly. I however don't understand what the pump_until function does. AFAICS it should run the query in a loop, and bail out once it matches the expected output. But that does not seem to happen, so when the query gets executed before the lock is there, it results in infinite loop. In the attached patch I've simulated this by random() < 0.5. If I replace this with a wait loop in a plpgsql block, that works perfectly fine (no infinite loops). Tested both on x86_64 and rpi. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
diff --git a/src/test/recovery/t/022_crash_temp_files.pl b/src/test/recovery/t/022_crash_temp_files.pl index 8044849b73..88478d44c1 100644 --- a/src/test/recovery/t/022_crash_temp_files.pl +++ b/src/test/recovery/t/022_crash_temp_files.pl @@ -79,8 +79,8 @@ my $killme2 = IPC::Run::start( # Insert one tuple and leave the transaction open $killme_stdin2 .= q[ BEGIN; -SELECT $$insert-tuple-to-lock-next-insert$$; INSERT INTO tab_crash (a) VALUES(1); +SELECT $$insert-tuple-to-lock-next-insert$$; ]; pump_until($killme2, \$killme_stdout2, qr/insert-tuple-to-lock-next-insert/m); $killme_stdout2 = ''; @@ -100,6 +100,15 @@ ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m), $killme_stdout = ''; $killme_stderr = ''; +# Wait until the batch insert gets stuck on the lock. +$killme_stdin2 .= q[ +SELECT $$insert-tuple-lock-waiting$$ FROM pg_locks WHERE pid = ] . $pid . q[ AND NOT granted AND random() < 0.5; +]; + +pump_until($killme2, \$killme_stdout2, qr/insert-tuple-lock-waiting/m); +$killme_stdout2 = ''; +$killme_stderr2 = ''; + # Kill with SIGKILL my $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid); is($ret, 0, 'killed process with KILL'); @@ -147,8 +156,8 @@ $killme2->run(); # Insert one tuple and leave the transaction open $killme_stdin2 .= q[ BEGIN; -SELECT $$insert-tuple-to-lock-next-insert$$; INSERT INTO tab_crash (a) VALUES(1); +SELECT $$insert-tuple-to-lock-next-insert$$; ]; pump_until($killme2, \$killme_stdout2, qr/insert-tuple-to-lock-next-insert/m); $killme_stdout2 = ''; @@ -168,6 +177,15 @@ ok(pump_until($killme, \$killme_stdout, qr/in-progress-before-sigkill/m), $killme_stdout = ''; $killme_stderr = ''; +# Wait until the batch insert gets stuck on the lock. +$killme_stdin2 .= q[ +SELECT $$insert-tuple-lock-waiting$$ FROM pg_locks WHERE pid = ] . $pid . q[ AND NOT granted AND random() < 0.5; +]; + +pump_until($killme2, \$killme_stdout2, qr/insert-tuple-lock-waiting/m); +$killme_stdout2 = ''; +$killme_stderr2 = ''; + # Kill with SIGKILL $ret = TestLib::system_log('pg_ctl', 'kill', 'KILL', $pid); is($ret, 0, 'killed process with KILL');