On Thu, Jul 18, 2024 at 7:00 AM Alexander Lakhin <exclus...@gmail.com> wrote: > As far as I can see (having analyzed a number of runs), the hanging occurs > when some itimer-related activity happens before "peek_socket" in this > event sequence: > [main] postgres {pid} select_stuff::wait: res after verify 0 > [main] postgres {pid} select_stuff::wait: returning 0 > [main] postgres {pid} select: sel.wait returns 0 > [main] postgres {pid} peek_socket: read_ready: 0, write_ready: 1, > except_ready: 0 > > (See the last occurrence of the sequence in the log.)
Yeah, right, there's a lot going on between those two lines from the [main] thread. There are messages from helper threads [itimer], [sig] and [socksel]. At a guess, [socksel] might be doing extra secret communication over our socket in order to exchange SO_PEERCRED information, huh, is that always there? Seems worth filing a bug report. For the record, I know of one other occasional test failure on Cygwin: it randomly panics in SnapBuildSerialize(). While I don't expect there to be any users of PostgreSQL on Cygwin (it was unusably broken before we refactored the postmaster in v16), that one is interesting because (1) it also happen on native Windows builds, and (2) at least one candidate fix[1] sounds like it would speed up logical replication on all operating systems. [1] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BJ4jSFk%3D-hdoZdcx%2Bp7ru6xuipzCZY-kiKoDc2FjsV7g%40mail.gmail.com#afb5dc4208cc0776a060145f9571dec2