On Sat, Jan 7, 2023 at 3:40 AM Andrew Dunstan <and...@dunslane.net> wrote:
> OK, should I now try re-enabling TAP tests on lorikeet?

Not before https://commitfest.postgresql.org/41/4032/ is committed.
After that, it might be worth a try?  I have no idea if the PANIC
problem I mentioned last night would apply to lorikeet's kernel too.
To summarise the kinds of failure we have analysed in this thread:

1.  SysV semaphores are buggy; fixed, I hope, by recent commit (= just
don't use them).
2.  The regular crashes we already knew about from other threads due
to signal masking being buggy seem to be fixed, coincidentally, by CF
#4032, not yet committed (= don't rely on sa_mask for correctness).
3.  PANIC apparently caused by non-atomic rename(), based on analysis
of similar failures seen on other old buggy OSes back in 2018.

If lorikeet has problem #3 (which it may not; we know from CF #3951
that kernel versions differ in related respects and Server 2019 as
used on CI seems to have the most conservative/old Windows behaviour)
then it might fail in the TAP tests just like the proposed
CI-for-Cygwin patch, unless you also do data_sync_retry=on, which
seems like a pretty ugly workaround to me.  I don't know what else
might be broken by non-atomic rename(), and I'd rather not find out
:-D  I finished up here by trying to tidy up some weird looking
nonsense in our code while working on general portability cleanup,
since I needed a way to check if __CYGWIN__ stuff still works, but
what we found out is that it's more broken than anyone realised, and
now I have to pull the emergency rabbit hole ejection cord because I
have less than zero time for or interest in debugging Cygwin.


Reply via email to