回复:Re: PANIC: wrong buffer passed to visibilitymap_clear

2022-07-26 Thread 王伟(学弈)
On Fri, Jul 22, 2022 at 14:49 Peter Geoghegan wrote: > The line numbers from your stack trace don't match up with> REL_14_STABLE. Is > this actually a fork of Postgres 14? (Oh, looks like > it's an old beta release.) Yeah, I was testing on 14beta2 branch once. So I considered your advices and tes

Re: PANIC: wrong buffer passed to visibilitymap_clear

2022-07-22 Thread Tom Lane
Peter Geoghegan writes: > It would also be helpful if you told us about the specific table > involved. Though the important thing (the essential thing) is to test > today's REL_14_STABLE. There have been *lots* of bug fixes since > Postgres 14 beta2 was current. Yeah. To be blunt, you're wasting

Re: PANIC: wrong buffer passed to visibilitymap_clear

2022-07-22 Thread Peter Geoghegan
On Fri, Jul 22, 2022 at 1:22 AM 王伟(学弈) wrote: > I recently find this problem while testing PG14 with sysbench. The line numbers from your stack trace don't match up with REL_14_STABLE. Is this actually a fork of Postgres 14? (Oh, looks like it's an old beta release.) > Then I look through the em

Re: 回复:Re: PANIC: wrong buffer passed to visibilitymap_clear

2022-07-22 Thread Tomas Vondra
On 7/22/22 14:17, 王伟(学弈) wrote: > On 7/22/22 18:06, Tomas Vondra wrote: >> Which PG14 version / commit is this, exactly? What sysbench parameters >> did you use, how likely is hitting the issue? > PG_VERSION is '14beta2'. > The head commit id is 'e1c1c30f635390b6a3ae4993e8cac213a33e6e3f'. Why not

回复:Re: PANIC: wrong buffer passed to visibilitymap_clear

2022-07-22 Thread 王伟(学弈)
发件人:Tomas Vondra 日 期:2022年07月22日 18:06:21 收件人:王伟(学弈); pgsql-hackers 主 题:Re: PANIC: wrong buffer passed to visibilitymap_clear On 7/22/22 10:22, 王伟(学弈) wrote: > Hi, > I recently find this problem while testing PG14 with sysbench. > Then I look through

Re: PANIC: wrong buffer passed to visibilitymap_clear

2022-07-22 Thread Tomas Vondra
On 7/22/22 10:22, 王伟(学弈) wrote: > Hi, > I recently find this problem while testing PG14 with sysbench. > Then I look through the emails from pgsql-hackers and find a previous > similary bug which > is  > https://www.postgresql.org/message-id/flat/2247102.1618008027%40sss.pgh.pa.us >

PANIC: wrong buffer passed to visibilitymap_clear

2022-07-22 Thread 王伟(学弈)
Hi, I recently find this problem while testing PG14 with sysbench. Then I look through the emails from pgsql-hackers and find a previous similary bug which is https://www.postgresql.org/message-id/flat/2247102.1618008027%40sss.pgh.pa.us. But the bugfix commit(34f581c39e97e2ea237255cf75cccebccc02

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-12 Thread Peter Geoghegan
On Mon, Apr 12, 2021 at 6:33 PM Tom Lane wrote: > Thanks for looking it over. Do you have an opinion on whether or not > to back-patch? As far as we know, these bugs aren't exposed in the > back branches for lack of code that would set the all-visible flag > without superexclusive lock. But I'd

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-12 Thread Tom Lane
Peter Geoghegan writes: > On Mon, Apr 12, 2021 at 9:19 AM Tom Lane wrote: >> Hence, I propose the attached. It passes check-world, but that proves >> absolutely nothing of course :-(. I wonder if there is any way to >> exercise these code paths deterministically. > This approach seems reasonab

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-12 Thread Andres Freund
Hi, On 2021-04-11 13:55:30 -0400, Tom Lane wrote: > Either way, it's hard to argue that heap_update hasn't crossed the > complexity threshold where it's impossible to maintain safely. We > need to simplify it. Yea, I think we're well beyond that point. I can see a few possible steps to wrangle t

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-12 Thread Peter Geoghegan
On Mon, Apr 12, 2021 at 9:19 AM Tom Lane wrote: > So I think we have to stick with the current basic design, and just > tighten things up to make sure that visibility pins are accounted for > in the places that are missing it. > > Hence, I propose the attached. It passes check-world, but that pro

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-12 Thread Tom Lane
Peter Geoghegan writes: > On Sun, Apr 11, 2021 at 11:16 AM Tom Lane wrote: >> It wasn't very clear, because I hadn't thought it through very much; >> but what I'm imagining is that we discard most of the thrashing around >> all-visible rechecks and have just one such test somewhere very late >> i

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Peter Geoghegan
On Sun, Apr 11, 2021 at 11:16 AM Tom Lane wrote: > It wasn't very clear, because I hadn't thought it through very much; > but what I'm imagining is that we discard most of the thrashing around > all-visible rechecks and have just one such test somewhere very late > in heap_update, after we've succ

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Tom Lane
Peter Geoghegan writes: > On Sun, Apr 11, 2021 at 10:55 AM Tom Lane wrote: >> Either way, it's hard to argue that >> heap_update hasn't crossed the complexity threshold where it's >> impossible to maintain safely. We need to simplify it. > It is way too complicated. I don't think that I quite u

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Peter Geoghegan
On Sun, Apr 11, 2021 at 10:55 AM Tom Lane wrote: > Alternatively, we could do what you suggested and redefine things > so that one is only allowed to set the all-visible bit while holding > superexclusive lock; which again would allow an enormous simplification > in heap_update and cohorts. Great

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Tom Lane
I wrote: > I'm now inclined to think that we should toss every single line of that > code, take RelationGetBufferForTuple out of the equation, and have just > *one* place that rechecks for PageAllVisible having just become set. > It's a rare enough case that optimizing it is completely not worth th

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Tom Lane
I wrote: > (It does look like RelationGetBufferForTuple > knows about updating vmbuffer, but there's one code path through the > if-nest at 3850ff that doesn't call that.) Although ... isn't RelationGetBufferForTuple dropping the ball on this point too, in the code path at the end where it has to

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Tom Lane
Peter Geoghegan writes: > This isn't just any super-exclusive lock, either -- we were calling > ConditionalLockBufferForCleanup() at this point. > I now think that there is a good chance that we are seeing these > symptoms because the "conditional-ness" went away -- we accidentally > relied on th

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Peter Geoghegan
On Sun, Apr 11, 2021 at 9:10 AM Peter Geoghegan wrote: > I don't have any reason to believe that using a super-exclusive lock > during heap page vacuuming is necessary. My guess is that returning to > doing it that way might make the buildfarm green again. That would at > least confirm my suspicio

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Peter Geoghegan
On Sun, Apr 11, 2021 at 8:57 AM Tom Lane wrote: > > Does this patch seem to fix the problem? > > Hmm ... that looks pretty suspicious, I agree, but why wouldn't an > exclusive buffer lock be enough to prevent concurrency with heap_update? I don't have any reason to believe that using a super-excl

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Tom Lane
Peter Geoghegan writes: > On Sat, Apr 10, 2021 at 10:04 PM Tom Lane wrote: >> Just eyeing the evidence on hand, I'm wondering if something has decided >> it can start setting the page-all-visible bit without adequate lock, >> perhaps only in system catalogs. heap_update is clearly assuming that

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-11 Thread Peter Geoghegan
On Sat, Apr 10, 2021 at 10:04 PM Tom Lane wrote: > Just eyeing the evidence on hand, I'm wondering if something has decided > it can start setting the page-all-visible bit without adequate lock, > perhaps only in system catalogs. heap_update is clearly assuming that > that flag won't change under

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-10 Thread Tom Lane
I've managed to reproduce this locally, by dint of running the src/bin/scripts tests over and over and tweaking the timing by trying different "taskset" parameters to vary the number of CPUs available. I find that I duplicated the report from spurfowl, particularly (gdb) bt #0 0x7f67bb6807d5

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-09 Thread Andres Freund
On 2021-04-09 16:27:39 -0700, Andres Freund wrote: > Just looking at the code in heap_update: I'm a bit confused about > RelationGetBufferForTuple()'s vmbuffer and vmbuffer_other > arguments. It looks like it's not at all clear which of the two > arguments will have the vmbuffer for which of the pa

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-09 Thread Andres Freund
Hi, On 2021-04-09 16:27:12 -0700, Peter Geoghegan wrote: > They're both VACUUM ANALYZE. They must be, because the calls to > visibilitymap_clear PANIC (they don't ERROR) -- the failing > visibilitymap_clear() call must occur inside a critical section, and > all such calls are made within heapam.c

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-09 Thread Andres Freund
Hi, On 2021-04-09 18:40:27 -0400, Tom Lane wrote: > Buildfarm members spurfowl[1] and thorntail[2] have each shown $SUBJECT > once in the past two days. The circumstances are not quite the same; > spurfowl's failure is in autovacuum while thorntail's is in a manual > VACUUM command. Still, it se

Re: PANIC: wrong buffer passed to visibilitymap_clear

2021-04-09 Thread Peter Geoghegan
On Fri, Apr 9, 2021 at 3:40 PM Tom Lane wrote: > Buildfarm members spurfowl[1] and thorntail[2] have each shown $SUBJECT > once in the past two days. The circumstances are not quite the same; > spurfowl's failure is in autovacuum while thorntail's is in a manual > VACUUM command. Still, it seems

PANIC: wrong buffer passed to visibilitymap_clear

2021-04-09 Thread Tom Lane
Buildfarm members spurfowl[1] and thorntail[2] have each shown $SUBJECT once in the past two days. The circumstances are not quite the same; spurfowl's failure is in autovacuum while thorntail's is in a manual VACUUM command. Still, it seems clear that there's a recently-introduced bug here somew