Re: "ERROR: latch already owned" on gharial

2024-02-09 Thread Soumyadeep Chakraborty
Hey, Deeply appreciate both your input! On Thu, Feb 8, 2024 at 4:57 AM Heikki Linnakangas wrote: > Hmm, there is a pair of SpinLockAcquire() and SpinLockRelease() in > ProcKill(), before step 3 can happen. Comment in spin.h about > SpinLockAcquire/Release: > > > *Load and store operations i

Re: "ERROR: latch already owned" on gharial

2024-02-08 Thread Andres Freund
Hi, On 2024-02-08 14:57:47 +0200, Heikki Linnakangas wrote: > On 08/02/2024 04:08, Soumyadeep Chakraborty wrote: > > A possible ordering of events: > > > > (1) DisownLatch() is called by pid Y during ProcKill() and the write for > > latch->owner_pid = 0 is NOT yet flushed to shmem. > > > > (2) T

Re: "ERROR: latch already owned" on gharial

2024-02-08 Thread Heikki Linnakangas
On 08/02/2024 04:08, Soumyadeep Chakraborty wrote: A possible ordering of events: (1) DisownLatch() is called by pid Y during ProcKill() and the write for latch->owner_pid = 0 is NOT yet flushed to shmem. (2) The PGPROC object for pid Y is returned to the free list. (3) Pid X sees the same PGP

Re: "ERROR: latch already owned" on gharial

2024-02-07 Thread Soumyadeep Chakraborty
Hey hackers, I wanted to report that we have seen this issue (with the procLatch) a few times very sporadically on Greenplum 6X (based on 9.4), with relatively newer versions of GCC. I realize that 9.4 is out of support, so this email is purely to add on to the existing thread, in case the info c

Re: "ERROR: latch already owned" on gharial

2022-07-13 Thread Alvaro Herrera
On 2022-Jul-13, Sandeep Thakkar wrote: > Thanks Robert. > > We are receiving the alerts from buildfarm-admins for anole and gharial not > reporting. Who can help to stop these? Thanks Probably Andrew knows how to set buildsystems.no_alerts for these animals. -- Álvaro Herrera PostgreSQ

Re: "ERROR: latch already owned" on gharial

2022-07-12 Thread Sandeep Thakkar
Thanks Robert. We are receiving the alerts from buildfarm-admins for anole and gharial not reporting. Who can help to stop these? Thanks On Wed, Jul 6, 2022 at 1:27 AM Robert Haas wrote: > On Sun, Jul 3, 2022 at 11:51 PM Thomas Munro > wrote: > > On Wed, Jun 1, 2022 at 12:55 AM Robert Haas >

Re: "ERROR: latch already owned" on gharial

2022-07-05 Thread Robert Haas
On Sun, Jul 3, 2022 at 11:51 PM Thomas Munro wrote: > On Wed, Jun 1, 2022 at 12:55 AM Robert Haas wrote: > > OK, I have access to the box now. I guess I might as well leave the > > crontab jobs enabled until the next time this happens, since Thomas > > just took steps to improve the logging, but

Re: "ERROR: latch already owned" on gharial

2022-07-03 Thread Thomas Munro
On Wed, Jun 1, 2022 at 12:55 AM Robert Haas wrote: > OK, I have access to the box now. I guess I might as well leave the > crontab jobs enabled until the next time this happens, since Thomas > just took steps to improve the logging, but I do think these BF > members are overdue to be killed off, a

Re: "ERROR: latch already owned" on gharial

2022-05-31 Thread Robert Haas
On Tue, May 31, 2022 at 8:20 AM Robert Haas wrote: > On Mon, May 30, 2022 at 8:31 PM Thomas Munro wrote: > > On Sat, May 28, 2022 at 1:56 AM Robert Haas wrote: > > > What I'm inclined to do is get gharial and anole removed from the > > > buildfarm. anole was set up by Heikki in 2011. I don't kno

Re: "ERROR: latch already owned" on gharial

2022-05-31 Thread Robert Haas
On Mon, May 30, 2022 at 8:31 PM Thomas Munro wrote: > On Sat, May 28, 2022 at 1:56 AM Robert Haas wrote: > > What I'm inclined to do is get gharial and anole removed from the > > buildfarm. anole was set up by Heikki in 2011. I don't know when > > gharial was set up, or by whom. I don't think any

Re: "ERROR: latch already owned" on gharial

2022-05-30 Thread Thomas Munro
On Sat, May 28, 2022 at 1:56 AM Robert Haas wrote: > What I'm inclined to do is get gharial and anole removed from the > buildfarm. anole was set up by Heikki in 2011. I don't know when > gharial was set up, or by whom. I don't think anyone at EDB cares > about these machines any more, or has any

Re: "ERROR: latch already owned" on gharial

2022-05-30 Thread Thomas Munro
On Sat, May 28, 2022 at 8:11 AM Tom Lane wrote: > Robert Haas writes: > > On Fri, May 27, 2022 at 10:21 AM Tom Lane wrote: > >> What I'd suggest is to promote that failure to elog(PANIC), which > >> would at least give us the PID and if we're lucky a stack trace. > > > That proposed change is fi

Re: "ERROR: latch already owned" on gharial

2022-05-27 Thread Tom Lane
Robert Haas writes: > On Fri, May 27, 2022 at 10:21 AM Tom Lane wrote: >> What I'd suggest is to promote that failure to elog(PANIC), which >> would at least give us the PID and if we're lucky a stack trace. > That proposed change is fine with me. > As to the question of whether it's a real bug

Re: "ERROR: latch already owned" on gharial

2022-05-27 Thread Robert Haas
On Fri, May 27, 2022 at 10:21 AM Tom Lane wrote: > That's possible, certainly. It's also possible that it's a real bug > that so far has only manifested there for (say) timing reasons. > The buildfarm is not so large that we can write off single-machine > failures as being unlikely to hit in the

Re: "ERROR: latch already owned" on gharial

2022-05-27 Thread Tom Lane
Robert Haas writes: > On Fri, May 27, 2022 at 7:55 AM Thomas Munro wrote: >> Thanks. Hmm. So far it's always a parallel worker. The best idea I >> have is to include the ID of the mystery PID in the error message and >> see if that provides a clue next time. > ... Even if we find a bug in Pos

Re: "ERROR: latch already owned" on gharial

2022-05-27 Thread Robert Haas
On Fri, May 27, 2022 at 7:55 AM Thomas Munro wrote: > Thanks. Hmm. So far it's always a parallel worker. The best idea I > have is to include the ID of the mystery PID in the error message and > see if that provides a clue next time. What I'm inclined to do is get gharial and anole removed fro

Re: "ERROR: latch already owned" on gharial

2022-05-27 Thread Thomas Munro
On Thu, May 26, 2022 at 2:35 PM Tom Lane wrote: > Thomas Munro writes: > > On a more practical note, I don't have access to the BF database right > > now. Would you mind checking if "latch already owned" has occurred on > > any other animals? > > Looking back 6 months, these are the only occurre

Re: "ERROR: latch already owned" on gharial

2022-05-25 Thread Tom Lane
Thomas Munro writes: > Sorry for the ambiguity -- I have no evidence of miscompilation. My > "just BTW" paragraph was a reaction to the memory of the last couple > of times Noah and I wasted hours chasing red herrings on this system, > which is pretty demotivating when looking into an unexplained

Re: "ERROR: latch already owned" on gharial

2022-05-25 Thread Thomas Munro
On Thu, May 26, 2022 at 2:25 AM Tom Lane wrote: > Noah Misch writes: > > +1, this is at least the third non-obvious miscompilation from gharial. > > Is there any evidence that this is a compiler-sourced problem? > Maybe it is, but it's sure not obvious to me (he says, eyeing his > buildfarm anima

Re: "ERROR: latch already owned" on gharial

2022-05-25 Thread Tom Lane
Noah Misch writes: > +1, this is at least the third non-obvious miscompilation from gharial. Is there any evidence that this is a compiler-sourced problem? Maybe it is, but it's sure not obvious to me (he says, eyeing his buildfarm animals with even older gcc versions). r

Re: "ERROR: latch already owned" on gharial

2022-05-24 Thread Noah Misch
On Tue, May 24, 2022 at 06:24:39PM -0700, Andres Freund wrote: > On 2022-05-25 12:45:21 +1200, Thomas Munro wrote: > > Just BTW, that animal has shown signs of a flaky toolchain before[1]. > > I know we have quite a lot of museum exhibits in the 'farm, in terms > > of hardare, OS, and tool chain.

Re: "ERROR: latch already owned" on gharial

2022-05-24 Thread Tom Lane
Andres Freund writes: > On 2022-05-25 12:45:21 +1200, Thomas Munro wrote: >> I know we have quite a lot of museum exhibits in the 'farm, in terms >> of hardare, OS, and tool chain. In some cases, they're probably just >> forgotten/not on anyone's upgrade radar. If they've shown signs of >> misbe

Re: "ERROR: latch already owned" on gharial

2022-05-24 Thread Andres Freund
Hi, On 2022-05-25 12:45:21 +1200, Thomas Munro wrote: > A couple of recent isolation test failures reported $SUBJECT. Was that just on gharial? > It could be a bug in recent-ish latch refactoring work, though I don't > know why it would show up twice just recently. Yea, that's weird. > Just

"ERROR: latch already owned" on gharial

2022-05-24 Thread Thomas Munro
Hi, A couple of recent isolation test failures reported $SUBJECT. It could be a bug in recent-ish latch refactoring work, though I don't know why it would show up twice just recently. Just BTW, that animal has shown signs of a flaky toolchain before[1]. I know we have quite a lot of museum exhib