Re: the s_lock_stuck on perform_spin_delay

2024-02-08 Thread Andy Fan
Hi, > There are some similarities between this and > https://www.postgresql.org/message-id/20240207184050.rkvpuudq7huijmkb%40awork3.anarazel.de > as described in that email. Thanks for this information. > > > Hm, I think this might make this code a bit more expensive. It's cheaper, both > in t

Re: the s_lock_stuck on perform_spin_delay

2024-02-07 Thread Andres Freund
Hi, There are some similarities between this and https://www.postgresql.org/message-id/20240207184050.rkvpuudq7huijmkb%40awork3.anarazel.de as described in that email. On 2024-01-25 15:24:17 +0800, Andy Fan wrote: > From 4c8fd0ab71299e57fbeb18dec70051bd1d035c7c Mon Sep 17 00:00:00 2001 > From: "

Re: the s_lock_stuck on perform_spin_delay

2024-01-24 Thread Andy Fan
Andres Freund writes: > On 2024-01-18 14:00:58 -0500, Robert Haas wrote: >> > The LockBufHdr also used init_local_spin_delay / perform_spin_delay >> > infrastruce and then it has the same issue like ${subject}, it is pretty >> > like the code in s_lock; Based on my current knowledge, I think we

Re: the s_lock_stuck on perform_spin_delay

2024-01-24 Thread Andy Fan
Hi, > Hi, > > On 2024-01-22 15:18:35 +0800, Andy Fan wrote: >> I used sigismember(&BlockSig, SIGQUIT) to detect if a process is doing a >> quickdie, however this is bad not only because it doesn't work on >> Windows, but also it has too poor performance even it impacts on >> USE_ASSERT_CHECKING

Re: the s_lock_stuck on perform_spin_delay

2024-01-23 Thread Andres Freund
Hi, On 2024-01-22 15:18:35 +0800, Andy Fan wrote: > I used sigismember(&BlockSig, SIGQUIT) to detect if a process is doing a > quickdie, however this is bad not only because it doesn't work on > Windows, but also it has too poor performance even it impacts on > USE_ASSERT_CHECKING build only. In v

Re: the s_lock_stuck on perform_spin_delay

2024-01-23 Thread Andres Freund
Hi, On 2024-01-18 14:00:58 -0500, Robert Haas wrote: > > The LockBufHdr also used init_local_spin_delay / perform_spin_delay > > infrastruce and then it has the same issue like ${subject}, it is pretty > > like the code in s_lock; Based on my current knowledge, I think we > > should add the check

Re: the s_lock_stuck on perform_spin_delay

2024-01-22 Thread Andy Fan
Robert Haas writes: > On Mon, Jan 22, 2024 at 12:13 PM Andy Fan wrote: >> > On Mon, Jan 22, 2024 at 11:58 AM Andy Fan wrote: >> >> I get your point! Acquiring an already held spinlock in quickdie is >> >> unlikely to happen, but since our existing infrastructure can handle it, >> >> then ther

Re: the s_lock_stuck on perform_spin_delay

2024-01-22 Thread Robert Haas
On Mon, Jan 22, 2024 at 12:13 PM Andy Fan wrote: > > On Mon, Jan 22, 2024 at 11:58 AM Andy Fan wrote: > >> I get your point! Acquiring an already held spinlock in quickdie is > >> unlikely to happen, but since our existing infrastructure can handle it, > >> then there is no reason to bypass it. >

Re: the s_lock_stuck on perform_spin_delay

2024-01-22 Thread Andy Fan
Robert Haas writes: > On Mon, Jan 22, 2024 at 11:58 AM Andy Fan wrote: >> I get your point! Acquiring an already held spinlock in quickdie is >> unlikely to happen, but since our existing infrastructure can handle it, >> then there is no reason to bypass it. > > No, the existing infrastructure

Re: the s_lock_stuck on perform_spin_delay

2024-01-22 Thread Robert Haas
On Mon, Jan 22, 2024 at 11:58 AM Andy Fan wrote: > I get your point! Acquiring an already held spinlock in quickdie is > unlikely to happen, but since our existing infrastructure can handle it, > then there is no reason to bypass it. No, the existing infrastructure cannot handle that at all. --

Re: the s_lock_stuck on perform_spin_delay

2024-01-22 Thread Andy Fan
Robert Haas writes: > On Mon, Jan 22, 2024 at 2:22 AM Andy Fan wrote: >> I used sigismember(&BlockSig, SIGQUIT) to detect if a process is doing a >> quickdie, however this is bad not only because it doesn't work on >> Windows, but also it has too poor performance even it impacts on >> USE_ASSE

Re: the s_lock_stuck on perform_spin_delay

2024-01-22 Thread Robert Haas
On Mon, Jan 22, 2024 at 2:22 AM Andy Fan wrote: > I used sigismember(&BlockSig, SIGQUIT) to detect if a process is doing a > quickdie, however this is bad not only because it doesn't work on > Windows, but also it has too poor performance even it impacts on > USE_ASSERT_CHECKING build only. In v8,

Re: the s_lock_stuck on perform_spin_delay

2024-01-21 Thread Andy Fan
Andy Fan writes: > I found a speical case about checking it in errstart. So commit 3 in v7 > is added. > > commit 757c67c1d4895ce6a523bcf5217af8eb2351e2a1 (HEAD -> s_stuck_v3) > Author: yizhi.fzh > Date: Mon Jan 22 07:14:29 2024 +0800 > > Bypass SpinLock checking in SIGQUIT signal hander

Re: the s_lock_stuck on perform_spin_delay

2024-01-21 Thread Andy Fan
Hi, Andy Fan writes: > Hi, > > v6 attached which addressed all the items Robert suggested except the > following 2 open items. They are handled differently. > >> >> Here is the summary of the open-items, it would be great that Andres and >> Matthias have a look at this when they have time. >> >

Re: the s_lock_stuck on perform_spin_delay

2024-01-18 Thread Andy Fan
Hi, v6 attached which addressed all the items Robert suggested except the following 2 open items. They are handled differently. > > Here is the summary of the open-items, it would be great that Andres and > Matthias have a look at this when they have time. > >>> The LockBufHdr also used init_loc

Re: the s_lock_stuck on perform_spin_delay

2024-01-18 Thread Andy Fan
Hi, Here is the summary of the open-items, it would be great that Andres and Matthias have a look at this when they have time. 1. Shall we treat the LockBufHdr as a SpinLock usage. >> The LockBufHdr also used init_local_spin_delay / perform_spin_delay >> infrastruce and then it has the same is

Re: the s_lock_stuck on perform_spin_delay

2024-01-18 Thread Robert Haas
On Thu, Jan 18, 2024 at 1:30 PM Andy Fan wrote: > > You added an #include to dynahash.c despite making no other changes to > > the file. > > That's mainly because I put the code into miscadmin.h and spin.h depends > on miscadmin.h with MACROs. That's not a good reason. Headers need to include hea

Re: the s_lock_stuck on perform_spin_delay

2024-01-18 Thread Andy Fan
Hi Robert, Thanks for your attention! > On Thu, Jan 18, 2024 at 7:56 AM Andy Fan wrote: >> Do you still have interest with making some progress on this topic? > > Some, but it's definitely not my top priority. I wish I could give as > much attention as everyone would like to everyone's patches

Re: the s_lock_stuck on perform_spin_delay

2024-01-18 Thread Robert Haas
On Thu, Jan 18, 2024 at 7:56 AM Andy Fan wrote: > Do you still have interest with making some progress on this topic? Some, but it's definitely not my top priority. I wish I could give as much attention as everyone would like to everyone's patches, but I can't even come close. I think that the s

Re: the s_lock_stuck on perform_spin_delay

2024-01-18 Thread Andy Fan
Hi Matthias / Robert: Do you still have interest with making some progress on this topic? > Robert Haas writes: > >> On Wed, Jan 10, 2024 at 10:17 PM Andy Fan wrote: >>> fixed in v2. >> >> Timing the spinlock wait seems like a separate patch from the new sanity >> checks. > > Yes, a separate

Re: the s_lock_stuck on perform_spin_delay

2024-01-14 Thread Andy Fan
Robert Haas writes: > On Wed, Jan 10, 2024 at 10:17 PM Andy Fan wrote: >> fixed in v2. > > Timing the spinlock wait seems like a separate patch from the new sanity > checks. Yes, a separate patch would be better, so removed it from v4. > I suspect that the new sanity checks should only be ar

Re: the s_lock_stuck on perform_spin_delay

2024-01-12 Thread Robert Haas
On Wed, Jan 10, 2024 at 10:17 PM Andy Fan wrote: > fixed in v2. Timing the spinlock wait seems like a separate patch from the new sanity checks. I suspect that the new sanity checks should only be armed in assert-enabled builds. I'm doubtful that this method of tracking the wait time will be ac

Re: the s_lock_stuck on perform_spin_delay

2024-01-10 Thread Andy Fan
Robert Haas writes: > Thanks for jumping in with a review, Matthias! FWIW, Matthias is also the first one for this proposal at this thread, thanks for that as well! -- Best Regards Andy Fan

Re: the s_lock_stuck on perform_spin_delay

2024-01-10 Thread Andy Fan
Hi Matthias, Thanks for the review! Matthias van de Meent writes: > On Wed, 10 Jan 2024 at 02:44, Andy Fan wrote: >> Hi, >> >> I want to know if Andres or you have plan >> to do some code review. I don't expect this would happen very soon, just >> want to make sure this will not happen that b

Re: the s_lock_stuck on perform_spin_delay

2024-01-10 Thread Robert Haas
Thanks for jumping in with a review, Matthias! On Wed, Jan 10, 2024 at 8:03 AM Matthias van de Meent wrote: > I'm not 100% sure on the policy of this, but theoretically you could > use LockAquireExtended(dontWait=true) while holding a spin lock, as > that would not have an unknown duration. Then

Re: the s_lock_stuck on perform_spin_delay

2024-01-10 Thread Matthias van de Meent
On Wed, 10 Jan 2024 at 02:44, Andy Fan wrote: > Hi, > > I want to know if Andres or you have plan > to do some code review. I don't expect this would happen very soon, just > want to make sure this will not happen that both of you think the other > one will do, but actually none of them does it in

Re: the s_lock_stuck on perform_spin_delay

2024-01-09 Thread Andy Fan
Hi, Robert Haas writes: > On Mon, Jan 8, 2024 at 9:40 PM Andy Fan wrote: >> The singler handler I was refering to is 'CHECK_FOR_INTERRUPTS', Based >> on this, spin_lock and lwlock are acted pretty differently. > > CHECK_FOR_INTERRUPTS() is not a signal handler, hmm, I knew this but I th

Re: the s_lock_stuck on perform_spin_delay

2024-01-09 Thread Robert Haas
On Mon, Jan 8, 2024 at 9:40 PM Andy Fan wrote: > The singler handler I was refering to is 'CHECK_FOR_INTERRUPTS', Based > on this, spin_lock and lwlock are acted pretty differently. CHECK_FOR_INTERRUPTS() is not a signal handler, and it's OK to acquire and release spin locks or lwlocks there. We

Re: the s_lock_stuck on perform_spin_delay

2024-01-08 Thread Andy Fan
Hi, Robert Haas writes: > On Sun, Jan 7, 2024 at 9:52 PM Andy Fan wrote: >> > I think we should add cassert-only infrastructure tracking whether we >> > currently hold spinlocks, are in a signal handler and perhaps a few other >> > states. That'd allow us to add assertions like: >> .. >> > - n

Re: the s_lock_stuck on perform_spin_delay

2024-01-08 Thread Robert Haas
On Sun, Jan 7, 2024 at 9:52 PM Andy Fan wrote: > > I think we should add cassert-only infrastructure tracking whether we > > currently hold spinlocks, are in a signal handler and perhaps a few other > > states. That'd allow us to add assertions like: > .. > > - no lwlocks or ... while in signal ha

Re: the s_lock_stuck on perform_spin_delay

2024-01-07 Thread Andy Fan
Hi! > > I think we should add cassert-only infrastructure tracking whether we > currently hold spinlocks, are in a signal handler and perhaps a few other > states. That'd allow us to add assertions like: .. > - no lwlocks or ... while in signal handlers I *wish* lwlocks should *not* be held whi

Re: the s_lock_stuck on perform_spin_delay

2024-01-06 Thread Andy Fan
Hi, > On 2024-01-05 14:19:23 -0500, Robert Haas wrote: >> On Fri, Jan 5, 2024 at 2:11 PM Andres Freund wrote: >> > I see it fairly regularly. Including finding several related bugs that >> > lead to >> > stuck systems last year (signal handlers are a menace). >> >> In that case, I think this

Re: the s_lock_stuck on perform_spin_delay

2024-01-05 Thread Andres Freund
Hi, On 2024-01-05 14:19:23 -0500, Robert Haas wrote: > On Fri, Jan 5, 2024 at 2:11 PM Andres Freund wrote: > > I see it fairly regularly. Including finding several related bugs that lead > > to > > stuck systems last year (signal handlers are a menace). > > In that case, I think this proposal i

Re: the s_lock_stuck on perform_spin_delay

2024-01-05 Thread Andres Freund
Hi, On 2024-01-05 10:20:39 +0800, Andy Fan wrote: > Andres Freund writes: > > On 2024-01-04 14:59:06 +0800, Andy Fan wrote: > >> My question is if someone doesn't obey the rule by mistake (everyone > >> can make mistake), shall we PANIC on a production environment? IMO I > >> think it can be a WA

Re: the s_lock_stuck on perform_spin_delay

2024-01-05 Thread Robert Haas
On Fri, Jan 5, 2024 at 2:11 PM Andres Freund wrote: > I see it fairly regularly. Including finding several related bugs that lead to > stuck systems last year (signal handlers are a menace). In that case, I think this proposal is dead. I can't personally testify to this code being a force for goo

Re: the s_lock_stuck on perform_spin_delay

2024-01-05 Thread Andres Freund
Hi, On 2024-01-05 08:51:53 -0500, Robert Haas wrote: > On Thu, Jan 4, 2024 at 6:06 PM Andres Freund wrote: > > I think we should add infrastructure to detect bugs like this during > > development, but not PANICing when this happens in production seems > > completely > > non-viable. > > I mean +

Re: the s_lock_stuck on perform_spin_delay

2024-01-05 Thread Robert Haas
On Thu, Jan 4, 2024 at 6:06 PM Andres Freund wrote: > I think we should add infrastructure to detect bugs like this during > development, but not PANICing when this happens in production seems completely > non-viable. I mean +1 for the infrastructure, but "completely non-viable"? Why? I've only

Re: the s_lock_stuck on perform_spin_delay

2024-01-04 Thread Andy Fan
Hi, Andres Freund writes: > > On 2024-01-04 14:59:06 +0800, Andy Fan wrote: >> My question is if someone doesn't obey the rule by mistake (everyone >> can make mistake), shall we PANIC on a production environment? IMO I >> think it can be a WARNING on a production environment and be a stuck >>

Re: the s_lock_stuck on perform_spin_delay

2024-01-04 Thread Andres Freund
Hi, On 2024-01-04 17:03:18 -0600, Jim Nasby wrote: > On 1/4/24 10:33 AM, Tom Lane wrote: > > Robert Haas writes: > > On Thu, Jan 4, 2024 at 10:22 AM Tom Lane wrote: > > We should be making an effort to ban coding patterns like > "return with spinlock still

Re: the s_lock_stuck on perform_spin_delay

2024-01-04 Thread Andres Freund
Hi, On 2024-01-04 14:59:06 +0800, Andy Fan wrote: > My question is if someone doesn't obey the rule by mistake (everyone > can make mistake), shall we PANIC on a production environment? IMO I > think it can be a WARNING on a production environment and be a stuck > when 'ifdef USE_ASSERT_CHECKING'.

Re: the s_lock_stuck on perform_spin_delay

2024-01-04 Thread Jim Nasby
On 1/4/24 10:33 AM, Tom Lane wrote: Robert Haas writes: On Thu, Jan 4, 2024 at 10:22 AM Tom Lane wrote: We should be making an effort to ban coding patterns like "return with spinlock still held", because they're just too prone to errors similar to this one. I agree th

Re: the s_lock_stuck on perform_spin_delay

2024-01-04 Thread Andres Freund
Hi, On 2024-01-04 12:04:07 -0500, Robert Haas wrote: > On Thu, Jan 4, 2024 at 11:33 AM Tom Lane wrote: > > I believe it's (a). No matter what the reason for a stuck spinlock > > is, the only reliable method of getting out of the problem is to > > blow things up and start over. The patch propose

Re: the s_lock_stuck on perform_spin_delay

2024-01-04 Thread Robert Haas
On Thu, Jan 4, 2024 at 11:33 AM Tom Lane wrote: > I believe it's (a). No matter what the reason for a stuck spinlock > is, the only reliable method of getting out of the problem is to > blow things up and start over. The patch proposed at the top of this > thread would leave the system unable to

Re: the s_lock_stuck on perform_spin_delay

2024-01-04 Thread Tom Lane
Robert Haas writes: > On Thu, Jan 4, 2024 at 10:22 AM Tom Lane wrote: >> We should be making an effort to ban coding patterns like >> "return with spinlock still held", because they're just too prone >> to errors similar to this one. > I agree that we don't want to add overhead, and also about h

Re: the s_lock_stuck on perform_spin_delay

2024-01-04 Thread Robert Haas
On Thu, Jan 4, 2024 at 10:22 AM Tom Lane wrote: > I'm not a fan of adding overhead to such a performance-critical > thing as spinlocks in order to catch coding errors that are easily > detectable statically. IMV the correct usage of spinlocks is that > they should only be held across *short, stra

Re: the s_lock_stuck on perform_spin_delay

2024-01-04 Thread Tom Lane
Robert Haas writes: > I'm not sure that the approach this patch takes is correct in detail, > but I kind of agree with you about the overall point. I mean, the idea > of the PANIC is to avoid having the system just sit there in a state > from which it will never recover ... but it can also have th

Re: the s_lock_stuck on perform_spin_delay

2024-01-04 Thread Andy Fan
Hi Matthias and Robert, Matthias van de Meent writes: > On Thu, 4 Jan 2024 at 08:09, Andy Fan wrote: >> >> My question is if someone doesn't obey the rule by mistake (everyone >> can make mistake), shall we PANIC on a production environment? IMO I >> think it can be a WARNING on a production e

Re: the s_lock_stuck on perform_spin_delay

2024-01-04 Thread Robert Haas
On Thu, Jan 4, 2024 at 2:09 AM Andy Fan wrote: > My question is if someone doesn't obey the rule by mistake (everyone > can make mistake), shall we PANIC on a production environment? IMO I > think it can be a WARNING on a production environment and be a stuck > when 'ifdef USE_ASSERT_CHECKING'. >

Re: the s_lock_stuck on perform_spin_delay

2024-01-04 Thread Matthias van de Meent
On Thu, 4 Jan 2024 at 08:09, Andy Fan wrote: > > My question is if someone doesn't obey the rule by mistake (everyone > can make mistake), shall we PANIC on a production environment? IMO I > think it can be a WARNING on a production environment and be a stuck > when 'ifdef USE_ASSERT_CHECKING'. >