Hi, On 2019-01-26 20:53:48 -0500, Tom Lane wrote: > Recently, buildfarm member curculio has started to show a semi-repeatable > failure in src/test/recovery/t/013_crash_restart.pl: > > # aborting wait: program died > # stream contents: >>psql:<stdin>:8: no connection to the server > # psql:<stdin>:8: connection to server was lost > # << > # pattern searched for: (?^m:server closed the connection unexpectedly) > > # Failed test 'psql query died successfully after SIGKILL' > # at t/013_crash_restart.pl line 198. > > The message this test is looking for is what libpq reports upon getting > EOF or ECONNRESET from a socket read attempt. The message it's actually > seeing is what libpq reports if it notices that the PQconn is *already* > in CONNECTION_BAD state when it's trying to send a new query. > > I have no idea why we're seeing this in only one buildfarm member > and only for the past week or so, as it doesn't appear that any > related code has changed for months. (Perhaps something changed > about curculio's host?)
I have no idea why it's just curculio, but I think I know why it only started recently: Curculio doesn't appear to have tap tests enabled before https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2019-01-17%2021%3A30%3A02 > just change the test script to accept either message as a successful > result. I think that 4247db625 made such races more likely, but I > don't believe it was impossible before. Sounds right to me - do you want to do the honors or shall I? > Another idea is to change libpq so that both these cases emit identical > messages, but I don't really feel that that'd be an improvement. Also, > since 4247db625 was back-patched, we'd have to back-patch the message > change as well, which I like even less. People might be relying on > seeing either message spelling in some situations. Yea, I don't think that's the way to go. Greetings, Andres Freund