Re: pg_amcheck contrib application

2021-05-03 Thread Robert Haas
On Fri, Apr 30, 2021 at 5:07 PM Robert Haas wrote: > On Fri, Apr 30, 2021 at 4:26 PM Mark Dilger > wrote: > > After further reflection, no other verbiage seems any better. I'd say go > > ahead and commit it this way. > > OK. I'll plan to do that on Monday, barring objections. Done now. -- Ro

Re: pg_amcheck contrib application

2021-04-30 Thread Robert Haas
On Fri, Apr 30, 2021 at 4:26 PM Mark Dilger wrote: > After further reflection, no other verbiage seems any better. I'd say go > ahead and commit it this way. OK. I'll plan to do that on Monday, barring objections. -- Robert Haas EDB: http://www.enterprisedb.com

Re: pg_amcheck contrib application

2021-04-30 Thread Mark Dilger
> On Apr 30, 2021, at 1:04 PM, Mark Dilger wrote: > >> toast value %u was expected to end at chunk %d, but ended while >> expecting chunk %d >> >> i.e. same as the currently-committed code, except for changing "ended >> at" to "ended while expecting." > > I find the grammar of this new formu

Re: pg_amcheck contrib application

2021-04-30 Thread Mark Dilger
> On Apr 30, 2021, at 12:47 PM, Robert Haas wrote: > > Hmm, I think that might need adjustment, actually. What I was trying > to do is compensate for the fact that what we now have is the next > chunk_seq value we expect, not the last one we saw, nor the total > number of chunks we've seen reg

Re: pg_amcheck contrib application

2021-04-30 Thread Robert Haas
On Fri, Apr 30, 2021 at 3:41 PM Mark Dilger wrote: > I think that's committable. > > The only nitpick might be > > - psprintf("toast value %u was expected to end > at chunk %d, but ended at chunk %d", > + psprintf("toast value %u index s

Re: pg_amcheck contrib application

2021-04-30 Thread Mark Dilger
> On Apr 30, 2021, at 12:29 PM, Robert Haas wrote: > > OK, how about this version? I think that's committable. The only nitpick might be - psprintf("toast value %u was expected to end at chunk %d, but ended at chunk %d", + psprint

Re: pg_amcheck contrib application

2021-04-30 Thread Robert Haas
On Fri, Apr 30, 2021 at 3:26 PM Mark Dilger wrote: > It looks mostly good to me. There is a off-by-one error introduced with: > > - else if (chunkno != (endchunk + 1)) > + else if (expected_chunk_seq < last_chunk_seq) > > I think that needs to be > > + else if (expected_chunk_seq <= last_chu

Re: pg_amcheck contrib application

2021-04-30 Thread Mark Dilger
> On Apr 30, 2021, at 11:56 AM, Robert Haas wrote: > > Can you review this version? It looks mostly good to me. There is a off-by-one error introduced with: - else if (chunkno != (endchunk + 1)) + else if (expected_chunk_seq < last_chunk_seq) I think that needs to be + else if (expec

Re: pg_amcheck contrib application

2021-04-30 Thread Mark Dilger
> On Apr 30, 2021, at 11:56 AM, Robert Haas wrote: > > On Fri, Apr 30, 2021 at 2:31 PM Mark Dilger > wrote: >> Just to be clear, I did not use your patch v1 as the starting point. > > I thought that might be the case, but I was trying to understand what > you didn't like about my version, and

Re: pg_amcheck contrib application

2021-04-30 Thread Robert Haas
On Fri, Apr 30, 2021 at 2:31 PM Mark Dilger wrote: > Just to be clear, I did not use your patch v1 as the starting point. I thought that might be the case, but I was trying to understand what you didn't like about my version, and comparing them seemed like a way to figure that out. > I took the

Re: pg_amcheck contrib application

2021-04-30 Thread Mark Dilger
> On Apr 30, 2021, at 9:39 AM, Robert Haas wrote: > > On Mon, Apr 26, 2021 at 1:52 PM Mark Dilger > wrote: >> The attached patch changes amcheck corruption reports as discussed upthread. >> This patch is submitted for the v14 development cycle as a bug fix, per >> your complaint that the c

Re: pg_amcheck contrib application

2021-04-30 Thread Robert Haas
On Mon, Apr 26, 2021 at 1:52 PM Mark Dilger wrote: > The attached patch changes amcheck corruption reports as discussed upthread. > This patch is submitted for the v14 development cycle as a bug fix, per your > complaint that the committed code generates reports sufficiently confusing to > a u

Re: pg_amcheck contrib application

2021-04-26 Thread Mark Dilger
> On Apr 23, 2021, at 3:01 PM, Mark Dilger wrote: > > I'll try to post something that accomplishes the changes to the reports that > you are looking for. The attached patch changes amcheck corruption reports as discussed upthread. This patch is submitted for the v14 development cycle as a b

Re: pg_amcheck contrib application

2021-04-23 Thread Mark Dilger
> On Apr 23, 2021, at 1:31 PM, Robert Haas wrote: > > Perhaps something like this, closer to the way you had it? > > expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE > : extsize - (last_chunk_seq * TOAST_MAX_CHUNK_SIZE); It still suffers the same failures

Re: pg_amcheck contrib application

2021-04-23 Thread Robert Haas
On Fri, Apr 23, 2021 at 2:36 PM Mark Dilger wrote: > > What's different? > > for one thing, if a sequence of chunks happens to fit perfectly, the final > chunk will have size TOAST_MAX_CHUNK_SIZE, but you're expecting no larger > than one less than that, given how modulo arithmetic works. Good

Re: pg_amcheck contrib application

2021-04-23 Thread Mark Dilger
> On Apr 23, 2021, at 11:29 AM, Robert Haas wrote: > > + expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE > + : extsize % TOAST_MAX_CHUNK_SIZE; > > What's different? for one thing, if a sequence of chunks happens to fit perfectly, the final chunk will ha

Re: pg_amcheck contrib application

2021-04-23 Thread Mark Dilger
> On Apr 23, 2021, at 11:29 AM, Robert Haas wrote: > > On Fri, Apr 23, 2021 at 2:15 PM Mark Dilger > wrote: >> Another difference I should probably mention is that a bunch of unrelated >> tests are failing with errors like: >> >>toast value 13465 chunk 0 has size 1995, but expected size

Re: pg_amcheck contrib application

2021-04-23 Thread Robert Haas
On Fri, Apr 23, 2021 at 2:15 PM Mark Dilger wrote: > Another difference I should probably mention is that a bunch of unrelated > tests are failing with errors like: > > toast value 13465 chunk 0 has size 1995, but expected size 1996 > > which leads me to suspect your changes to how the size i

Re: pg_amcheck contrib application

2021-04-23 Thread Mark Dilger
> On Apr 23, 2021, at 11:05 AM, Mark Dilger > wrote: > > Here are the differences between master and you patch: Another difference I should probably mention is that a bunch of unrelated tests are failing with errors like: toast value 13465 chunk 0 has size 1995, but expected size 1996

Re: pg_amcheck contrib application

2021-04-23 Thread Robert Haas
On Fri, Apr 23, 2021 at 2:05 PM Mark Dilger wrote: > Here are the differences between master and you patch: Thanks. Those messages look reasonable to me. -- Robert Haas EDB: http://www.enterprisedb.com

Re: pg_amcheck contrib application

2021-04-23 Thread Mark Dilger
> On Apr 23, 2021, at 10:31 AM, Mark Dilger > wrote: > > I will test your patch and see what differs. Here are the differences between master and you patch: UPDATE $toastname SET chunk_seq = chunk_seq + 1000 WHERE chunk_id = $value_id_to_corrupt - qr/${header}toast v

Re: pg_amcheck contrib application

2021-04-23 Thread Mark Dilger
> On Apr 23, 2021, at 10:28 AM, Robert Haas wrote: > > On Thu, Apr 22, 2021 at 7:28 PM Mark Dilger > wrote: >> I have refactored the patch to address your other concerns. Breaking the >> patch into multiple pieces didn't add any clarity, but refactoring portions >> of it made things simple

Re: pg_amcheck contrib application

2021-04-23 Thread Robert Haas
On Thu, Apr 22, 2021 at 7:28 PM Mark Dilger wrote: > I have refactored the patch to address your other concerns. Breaking the > patch into multiple pieces didn't add any clarity, but refactoring portions > of it made things simpler to read, I think, so here it is as one patch file. I was hopin

Re: pg_amcheck contrib application

2021-04-22 Thread Mark Dilger
> On Apr 19, 2021, at 5:07 PM, Mark Dilger wrote: > > > >> On Apr 19, 2021, at 12:50 PM, Robert Haas wrote: >> >> On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger >> wrote: >>> I have added the verb "has" rather than "contains" because "has" is more >>> consistent with the phrasing of other si

Re: pg_amcheck contrib application

2021-04-19 Thread Mark Dilger
> On Apr 19, 2021, at 12:50 PM, Robert Haas wrote: > > On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger > wrote: >> I have added the verb "has" rather than "contains" because "has" is more >> consistent with the phrasing of other similar corruption reports. > > That makes sense. > > I think it'

Re: pg_amcheck contrib application

2021-04-19 Thread Robert Haas
On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger wrote: > I have added the verb "has" rather than "contains" because "has" is more > consistent with the phrasing of other similar corruption reports. That makes sense. I think it's odd that a range of extraneous chunks is collapsed into a single repor

Re: pg_amcheck contrib application

2021-04-15 Thread Mark Dilger
> On Apr 14, 2021, at 10:17 AM, Robert Haas wrote: > > On Mon, Apr 12, 2021 at 11:06 PM Mark Dilger > wrote: >> It now reports: >> >> # heap table "postgres"."public"."test", block 0, offset 18, attribute 2: >> # toast value 16461 missing chunk 3 with expected size 1996 >> # heap table "p

Re: pg_amcheck contrib application

2021-04-14 Thread Robert Haas
On Mon, Apr 12, 2021 at 11:06 PM Mark Dilger wrote: > It now reports: > > # heap table "postgres"."public"."test", block 0, offset 18, attribute 2: > # toast value 16461 missing chunk 3 with expected size 1996 > # heap table "postgres"."public"."test", block 0, offset 18, attribute 2: > #

Re: pg_amcheck contrib application

2021-04-12 Thread Mark Dilger
> On Apr 9, 2021, at 1:51 PM, Robert Haas wrote: > > On Fri, Apr 9, 2021 at 2:50 PM Mark Dilger > wrote: >> I think #4, above, requires some clarification. If there are missing >> chunks, the very definition of how large we expect subsequent chunks to be >> is ill-defined. I took a fairly

Re: pg_amcheck contrib application

2021-04-09 Thread Robert Haas
On Fri, Apr 9, 2021 at 2:50 PM Mark Dilger wrote: > I think #4, above, requires some clarification. If there are missing chunks, > the very definition of how large we expect subsequent chunks to be is > ill-defined. I took a fairly conservative approach to avoid lots of bogus > complaints abo

Re: pg_amcheck contrib application

2021-04-09 Thread Mark Dilger
> On Apr 8, 2021, at 3:11 PM, Robert Haas wrote: > > On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger > wrote: >> All this leads me to believe that we should report the following: >> >> 1) If the total number of chunks retrieved differs from the expected number, >> report how many we expected vs

Re: pg_amcheck contrib application

2021-04-08 Thread Robert Haas
On Thu, Apr 8, 2021 at 6:51 PM Mark Dilger wrote: > #2 is if chunk_seq goes up but skips numbers. #3 is if chunk_seq ever goes > down, meaning the index scan did something unexpected. Yeah, sure. But I think we could probably treat those the same way. -- Robert Haas EDB: http://www.enterprise

Re: pg_amcheck contrib application

2021-04-08 Thread Mark Dilger
> On Apr 8, 2021, at 3:11 PM, Robert Haas wrote: > > On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger > wrote: >> All this leads me to believe that we should report the following: >> >> 1) If the total number of chunks retrieved differs from the expected number, >> report how many we expected vs

Re: pg_amcheck contrib application

2021-04-08 Thread Robert Haas
On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger wrote: > All this leads me to believe that we should report the following: > > 1) If the total number of chunks retrieved differs from the expected number, > report how many we expected vs. how many we got > 2) If the chunk_seq numbers are discontiguous,

Re: pg_amcheck contrib application

2021-04-08 Thread Mark Dilger
> On Apr 8, 2021, at 1:05 PM, Robert Haas wrote: > > On Thu, Apr 8, 2021 at 3:02 PM Mark Dilger > wrote: >> Imagine a toasted attribute with 18 chunks numbered [0..17]. Then we update >> the toast to have only 6 chunks numbered [0..5] except we corruptly keep >> chunks numbered [12..17] i

Re: pg_amcheck contrib application

2021-04-08 Thread Robert Haas
On Thu, Apr 8, 2021 at 3:02 PM Mark Dilger wrote: > Imagine a toasted attribute with 18 chunks numbered [0..17]. Then we update > the toast to have only 6 chunks numbered [0..5] except we corruptly keep > chunks numbered [12..17] in the toast table. We'd rather see a report like > this: [ toa

Re: pg_amcheck contrib application

2021-04-08 Thread Mark Dilger
> On Apr 7, 2021, at 1:16 PM, Robert Haas wrote: > > On Sun, Apr 4, 2021 at 8:02 PM Mark Dilger > wrote: >> v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked >> owing to how I had separated the changes in v17-0002 vs. v17-0003 > > Committed. Thank you. >> v18-0002

Re: pg_amcheck contrib application

2021-04-07 Thread Robert Haas
On Sun, Apr 4, 2021 at 8:02 PM Mark Dilger wrote: > v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked > owing to how I had separated the changes in v17-0002 vs. v17-0003 Committed. > v18-0002 - Postpones the toast checks for a page until after the main table > page lock

Re: pg_amcheck contrib application

2021-04-04 Thread Mark Dilger
> On Apr 1, 2021, at 1:08 PM, Robert Haas wrote: > > > > - There are a total of two (2) calls in the current source code to > palloc0fast, and hundreds of calls to palloc0. So I think you should > forget about using the fast variant and just do what almost every > other caller does. Done. >

Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas
On Thu, Apr 1, 2021 at 1:41 PM Robert Haas wrote: > OK, committed. We still need to deal with what you had as 0003 > upthread, so I guess the next step is for me to spend some time > reviewing that one. I did this, and it was a bit depressing. It appears that we now have duplicate checks for xmin

Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas
On Thu, Apr 1, 2021 at 1:24 PM Mark Dilger wrote: > > On Apr 1, 2021, at 10:20 AM, Robert Haas wrote: > > OK, let's try that again. > Looks good! OK, committed. We still need to deal with what you had as 0003 upthread, so I guess the next step is for me to spend some time reviewing that one. --

Re: pg_amcheck contrib application

2021-04-01 Thread Mark Dilger
> On Apr 1, 2021, at 10:20 AM, Robert Haas wrote: > > On Thu, Apr 1, 2021 at 1:06 PM Mark Dilger > wrote: >> Seems fine other than the typo. > > OK, let's try that again. Looks good! — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company

Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas
On Thu, Apr 1, 2021 at 1:06 PM Mark Dilger wrote: > Seems fine other than the typo. OK, let's try that again. -- Robert Haas EDB: http://www.enterprisedb.com v17-0001-amcheck-Fix-verify_heapam-s-tuple-visibility-che.patch Description: Binary data

Re: pg_amcheck contrib application

2021-04-01 Thread Mark Dilger
> On Apr 1, 2021, at 9:56 AM, Robert Haas wrote: > > On Thu, Apr 1, 2021 at 12:32 PM Mark Dilger > wrote: >>> - If xmax is a multi but seems to be garbled, I changed it to return >>> true rather than false. The inserter is known to have committed by >>> that point, so I think it's OK to try t

Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas
On Thu, Apr 1, 2021 at 12:32 PM Mark Dilger wrote: > > - If xmax is a multi but seems to be garbled, I changed it to return > > true rather than false. The inserter is known to have committed by > > that point, so I think it's OK to try to deform the tuple. We just > > shouldn't try to check TOAST

Re: pg_amcheck contrib application

2021-04-01 Thread Mark Dilger
> On Apr 1, 2021, at 8:08 AM, Robert Haas wrote: > > On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger > wrote: >> These changes got lost between v11 and v12. I've put them back, as well as >> updating to use your language. > > Here's an updated patch that includes your 0001 and 0002 plus a bun

Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas
On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger wrote: > These changes got lost between v11 and v12. I've put them back, as well as > updating to use your language. Here's an updated patch that includes your 0001 and 0002 plus a bunch of changes by me. I intend to commit this version unless somebo

Re: pg_amcheck contrib application

2021-03-31 Thread Robert Haas
On Wed, Mar 31, 2021 at 1:44 PM Mark Dilger wrote: > I read "exclusively locks" as meaning it takes an ExclusiveLock, but the code > shows that it takes an AccessExclusiveLock. I think the docs are pretty > misleading here, though I understand that grammatically it is hard to say > "accessivel

Re: pg_amcheck contrib application

2021-03-31 Thread Mark Dilger
> On Mar 31, 2021, at 10:31 AM, Mark Dilger > wrote: > > > >> On Mar 31, 2021, at 10:11 AM, Robert Haas wrote: >> >> On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger >> wrote: >>> I'm not looking at the old VACUUM FULL code, but my assumption is that if >>> the xvac code were resurrected, t

Re: pg_amcheck contrib application

2021-03-31 Thread Robert Haas
On Wed, Mar 31, 2021 at 1:31 PM Mark Dilger wrote: > Actually, that makes a lot of sense without even looking at the old code. I > was implicitly assuming that the toast table was undergoing a VF also, and > that the toast pointers in the main table tuples would be updated to point to > the ne

Re: pg_amcheck contrib application

2021-03-31 Thread Mark Dilger
> On Mar 31, 2021, at 10:11 AM, Robert Haas wrote: > > On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger > wrote: >> I'm not looking at the old VACUUM FULL code, but my assumption is that if >> the xvac code were resurrected, then when a tuple is moved off by a VACUUM >> FULL, the old tuple and

Re: pg_amcheck contrib application

2021-03-31 Thread Robert Haas
On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger wrote: > I'm not looking at the old VACUUM FULL code, but my assumption is that if the > xvac code were resurrected, then when a tuple is moved off by a VACUUM FULL, > the old tuple and associated toast cannot be pruned until concurrent > transaction

Re: pg_amcheck contrib application

2021-03-30 Thread Mark Dilger
> On Mar 30, 2021, at 12:45 PM, Robert Haas wrote: > > On Mon, Mar 29, 2021 at 7:16 PM Mark Dilger > wrote: >> Sure, here are four patches which do the same as the single v12 patch did. > > Thanks. Here are some comments on 0003 and 0004: > > When you posted v11, you said that "Rather than p

Re: pg_amcheck contrib application

2021-03-30 Thread Robert Haas
On Mon, Mar 29, 2021 at 7:16 PM Mark Dilger wrote: > Sure, here are four patches which do the same as the single v12 patch did. Thanks. Here are some comments on 0003 and 0004: When you posted v11, you said that "Rather than print out all four toast pointer fields for each toast failure, va_raws

Re: pg_amcheck contrib application

2021-03-29 Thread Mark Dilger
> On Mar 29, 2021, at 1:06 PM, Robert Haas wrote: > > Hmm, so this got ~10x bigger from my version. Could you perhaps > separate it out into a series of patches for easier review? Say, one > that just fixes the visibility logic, and then a second to avoid doing > the TOAST check with a buffer l

Re: pg_amcheck contrib application

2021-03-29 Thread Robert Haas
On Mon, Mar 29, 2021 at 1:45 PM Mark Dilger wrote: > Thanks! The attached patch addresses your comments here and in your prior > email. In particular, this patch changes the tuple visibility logic to not > check tuples for which the inserting transaction aborted or is still in > progress, and

Re: pg_amcheck contrib application

2021-03-29 Thread Mark Dilger
> On Mar 24, 2021, at 1:46 PM, Robert Haas wrote: > > Mark, > > Here's a quick and very dirty sketch of what I think perhaps this > logic could look like. This is pretty much untested and it might be > buggy, but at least you can see whether we're thinking at all in the > same direction. Than

Re: pg_amcheck contrib application

2021-03-24 Thread Robert Haas
Mark, Here's a quick and very dirty sketch of what I think perhaps this logic could look like. This is pretty much untested and it might be buggy, but at least you can see whether we're thinking at all in the same direction. -- Robert Haas EDB: http://www.enterprisedb.com very-rough-visibility

Re: pg_amcheck contrib application

2021-03-24 Thread Robert Haas
On Wed, Mar 24, 2021 at 9:12 AM Robert Haas wrote: > On Wed, Mar 24, 2021 at 2:13 AM Mark Dilger > wrote: > > The visibility rules fix is different in v11, relying on a visibility check > > which more closely follows the implementation of > > HeapTupleSatisfiesVacuumHorizon. > > Hmm. The header

Re: pg_amcheck contrib application

2021-03-24 Thread Robert Haas
On Wed, Mar 24, 2021 at 2:13 AM Mark Dilger wrote: > The visibility rules fix is different in v11, relying on a visibility check > which more closely follows the implementation of > HeapTupleSatisfiesVacuumHorizon. Hmm. The header comment you wrote says "If a tuple might not be visible to any r

Re: pg_amcheck contrib application

2021-03-23 Thread Mark Dilger
> On Mar 17, 2021, at 9:00 PM, Mark Dilger wrote: > > Of the toast pointer fields: > >int32 va_rawsize; /* Original data size (includes header) */ >int32 va_extsize; /* External saved size (doesn't) */ >Oid va_valueid; /* Unique ID of value within TO

Re: pg_amcheck contrib application

2021-03-23 Thread Peter Geoghegan
On Tue, Mar 23, 2021 at 12:53 PM Peter Geoghegan wrote: > One of the advantages of this design is that we verify practically all > of the work involved in deleting an entire subtree up-front, inside > _bt_lock_subtree_parent(). It's clearly safe to back out of it if it > looks dicey. That's taken

Re: pg_amcheck contrib application

2021-03-23 Thread Peter Geoghegan
On Tue, Mar 23, 2021 at 12:44 PM Tom Lane wrote: > > I will make this change to HEAD soon, barring objections. > > +1. Not deleting the upper page seems better than the alternatives. FWIW it might also work that way as a holdover from the old page deletion algorithm. These days we decide exactly

Re: pg_amcheck contrib application

2021-03-23 Thread Tom Lane
Peter Geoghegan writes: > That being said, I should make _bt_lock_subtree_parent() return false > and back out of page deletion without raising an error in the case > where we really cannot locate a valid downlink. We really ought to > soldier on when that happens, since we'll do that for a bunch

Re: pg_amcheck contrib application

2021-03-23 Thread Peter Geoghegan
On Tue, Mar 23, 2021 at 12:05 PM Robert Haas wrote: > Right, good point. But... does that really apply to > 005_opclass_damage.pl? I feel like with the kind of physical damage > we're doing in 003_check.pl, it makes a lot of sense to stop vacuum > from crashing headlong into that table. But, 005 i

Re: pg_amcheck contrib application

2021-03-23 Thread Mark Dilger
> On Mar 23, 2021, at 12:05 PM, Robert Haas wrote: > > 005 is doing "logical" > damage rather than "physical" damage, and I don't see why autovacuum > should misbehave in that kind of case. In fact, the fact that > autovacuum can handle such cases is one of the selling points for the > whole d

Re: pg_amcheck contrib application

2021-03-23 Thread Robert Haas
On Thu, Mar 18, 2021 at 12:12 AM Tom Lane wrote: > Mark Dilger writes: > >> On Mar 16, 2021, at 12:52 PM, Robert Haas wrote: > >> Since we now know that shutting autovacuum off makes the problem go > >> away, I don't see a reason to commit 0001. We should fix pg_amcheck > >> instead, if, as pres

Re: pg_amcheck contrib application

2021-03-17 Thread Tom Lane
Mark Dilger writes: >> On Mar 16, 2021, at 12:52 PM, Robert Haas wrote: >> Since we now know that shutting autovacuum off makes the problem go >> away, I don't see a reason to commit 0001. We should fix pg_amcheck >> instead, if, as presently seems to be the case, that's where the >> problem is.

Re: pg_amcheck contrib application

2021-03-17 Thread Mark Dilger
> On Mar 16, 2021, at 12:52 PM, Robert Haas wrote: > > On Mon, Mar 15, 2021 at 10:10 PM Mark Dilger > wrote: >> It is unfortunate that the failing test only runs pg_amcheck after creating >> numerous corruptions, as we can't know if pg_amcheck would have complained >> about pg_statistic befo

Re: pg_amcheck contrib application

2021-03-16 Thread Robert Haas
On Mon, Mar 15, 2021 at 10:10 PM Mark Dilger wrote: > It is unfortunate that the failing test only runs pg_amcheck after creating > numerous corruptions, as we can't know if pg_amcheck would have complained > about pg_statistic before the corruptions were created in other tables, or if > it onl

Re: pg_amcheck contrib application

2021-03-16 Thread Mark Dilger
> On Mar 16, 2021, at 11:40 AM, Robert Haas wrote: > > On Tue, Mar 16, 2021 at 2:22 PM Tom Lane wrote: >> I'm circling back around to the idea that amcheck is trying to >> validate TOAST references that are already dead, and it's getting >> burnt because something-or-other has already removed

Re: pg_amcheck contrib application

2021-03-16 Thread Andrew Dunstan
On 3/13/21 1:30 AM, Andres Freund wrote: > Hi, > > On 2021-03-13 01:22:54 -0500, Tom Lane wrote: >> Mark Dilger writes: >>> On Mar 12, 2021, at 10:16 PM, Noah Misch wrote: hoverfly does configure with PERL=perl64. /usr/bin/prove is from the 32-bit Perl, so I suspect the TAP sui

Re: pg_amcheck contrib application

2021-03-16 Thread Robert Haas
On Tue, Mar 16, 2021 at 2:45 PM Tom Lane wrote: > > In what context is it OK to just add extra alignment padding? > > It's *not* extra, according to pg_statistic's tuple descriptor. > Both forming and deforming of pg_statistic tuples should honor > that and locate stavaluesX values on d-aligned bo

Re: pg_amcheck contrib application

2021-03-16 Thread Tom Lane
Mark Dilger writes: > CopyStatistics seems to just copy Form_pg_statistic without regard for > who owns the toast. Is this safe? No less so than a ton of other places that insert values that might've come from on-disk storage. heap_toast_insert_or_update() is responsible for dealing with the pr

Re: pg_amcheck contrib application

2021-03-16 Thread Tom Lane
Robert Haas writes: > On Tue, Mar 16, 2021 at 1:48 PM Tom Lane wrote: >> No. What should be happening there is that some arrays in the column >> get larger alignment than they actually need, but that shouldn't cause >> a problem (and has not done so, AFAIK, in the decades that it's been >> like

Re: pg_amcheck contrib application

2021-03-16 Thread Robert Haas
On Tue, Mar 16, 2021 at 2:22 PM Tom Lane wrote: > I'm circling back around to the idea that amcheck is trying to > validate TOAST references that are already dead, and it's getting > burnt because something-or-other has already removed the toast > rows, though not the referencing datums. That's l

Re: pg_amcheck contrib application

2021-03-16 Thread Mark Dilger
> On Mar 16, 2021, at 10:48 AM, Tom Lane wrote: > > I'm not entirely sure what's going on, but I think coming at this > with the mindset that "amcheck has detected some corruption" is > just going to lead you astray. Almost certainly, it's "amcheck > is incorrectly claiming corruption". That

Re: pg_amcheck contrib application

2021-03-16 Thread Robert Haas
On Tue, Mar 16, 2021 at 1:48 PM Tom Lane wrote: > No. What should be happening there is that some arrays in the column > get larger alignment than they actually need, but that shouldn't cause > a problem (and has not done so, AFAIK, in the decades that it's been > like this). As you say, deformi

Re: pg_amcheck contrib application

2021-03-16 Thread Tom Lane
Mark Dilger writes: > On Mar 16, 2021, at 10:48 AM, Tom Lane wrote: >> (Too bad the message doesn't report the >> TOAST OID it probed for, so we can see if that's sane or not.) > I've added that and now get the toast pointer's va_valueid in the message: > heap table "postgres"."pg_catalog"."pg_

Re: pg_amcheck contrib application

2021-03-16 Thread Mark Dilger
> On Mar 16, 2021, at 10:48 AM, Tom Lane wrote: > > Robert Haas writes: >> On Tue, Mar 16, 2021 at 12:51 PM Mark Dilger >> wrote: >>> It shows them all has having attalign = 'd', but for some array types the >>> alignment will be 'i', not 'd'. So it's lying a bit about the contents. >>>

Re: pg_amcheck contrib application

2021-03-16 Thread Tom Lane
... btw, I now see that tern and hornet are passing this test at least as much as they're failing, proving that there's some timing or random chance involved. That doesn't completely eliminate the idea that there may be an architecture component to the issue, but it sure reduces its credibility.

Re: pg_amcheck contrib application

2021-03-16 Thread Tom Lane
Robert Haas writes: > On Tue, Mar 16, 2021 at 12:51 PM Mark Dilger > wrote: >> It shows them all has having attalign = 'd', but for some array types the >> alignment will be 'i', not 'd'. So it's lying a bit about the contents. >> But I'm now confused why this caused problems on the two hosts

Re: pg_amcheck contrib application

2021-03-16 Thread Robert Haas
On Tue, Mar 16, 2021 at 12:51 PM Mark Dilger wrote: > Yeah, that looks related: > > regression=# select attname, attlen, attnum, attalign from pg_attribute where > attrelid = 2619 and attname like 'stavalue%'; > attname | attlen | attnum | attalign > +++--

Re: pg_amcheck contrib application

2021-03-16 Thread Mark Dilger
> On Mar 16, 2021, at 9:30 AM, Mark Dilger wrote: > > > >> On Mar 16, 2021, at 9:07 AM, Tom Lane wrote: >> >> Mark Dilger writes: >>> I think autovacuum simply triggers the bug, and is not the cause of the >>> bug. If I turn autovacuum off and instead do an ANALYZE in each test >>> data

Re: pg_amcheck contrib application

2021-03-16 Thread Mark Dilger
> On Mar 16, 2021, at 9:07 AM, Tom Lane wrote: > > Mark Dilger writes: >> I think autovacuum simply triggers the bug, and is not the cause of the bug. >> If I turn autovacuum off and instead do an ANALYZE in each test database >> rather than performing the corruptions, I get reports about

Re: pg_amcheck contrib application

2021-03-16 Thread Tom Lane
Mark Dilger writes: > I think autovacuum simply triggers the bug, and is not the cause of the bug. > If I turn autovacuum off and instead do an ANALYZE in each test database > rather than performing the corruptions, I get reports about problems in > pg_statistic. This is on my mac laptop. Th

Re: pg_amcheck contrib application

2021-03-16 Thread Mark Dilger
> On Mar 15, 2021, at 11:09 PM, Noah Misch wrote: > >> Not sure that I believe the theory that this is from bad luck of >> concurrent autovacuum timing, though. > > With autovacuum_naptime=1s, on hornet, the failure reproduced twice in twelve > runs. With v6-0001-Turning-off-autovacuum-durin

Re: pg_amcheck contrib application

2021-03-15 Thread Noah Misch
On Mon, Mar 15, 2021 at 02:57:20PM -0400, Tom Lane wrote: > Mark Dilger writes: > > On Mar 15, 2021, at 10:04 AM, Tom Lane wrote: > >> These animals have somewhat weird alignment properties: MAXALIGN is 8 > >> but ALIGNOF_DOUBLE is only 4. I speculate that that is affecting their > >> choices ab

Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger
> On Mar 15, 2021, at 11:57 AM, Tom Lane wrote: > > Not sure that I believe the theory that this is from bad luck of > concurrent autovacuum timing, though. The fact that we're seeing > this on just those two animals suggests strongly to me that it's > architecture-correlated, instead. I find

Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger
> On Mar 15, 2021, at 11:57 AM, Tom Lane wrote: > > Yeah, that could be phrased better. Attaching the 0001 patch submitted earlier, plus 0002 which fixes the confusing corruption message. v6-0001-Turning-off-autovacuum-during-corruption-tests.patch Description: Binary data v6-0002-Fixin

Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger
> On Mar 15, 2021, at 11:57 AM, Tom Lane wrote: > > Do we have a strong enough lock on > the table under examination to be sure that autovacuum couldn't remove > a dead toast entry before we reach it? The main table and the toast table are only locked with AccessShareLock. Each page in the

Re: pg_amcheck contrib application

2021-03-15 Thread Tom Lane
Mark Dilger writes: > On Mar 15, 2021, at 10:04 AM, Tom Lane wrote: >> These animals have somewhat weird alignment properties: MAXALIGN is 8 >> but ALIGNOF_DOUBLE is only 4. I speculate that that is affecting their >> choices about whether an out-of-line TOAST value is needed, breaking >> this t

Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger
> On Mar 15, 2021, at 11:11 AM, Mark Dilger > wrote: > > I will submit a patch that turns off autovacuum for the test node shortly. v5-0001-Turning-off-autovacuum-during-corruption-tests.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise P

Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger
> On Mar 15, 2021, at 10:04 AM, Tom Lane wrote: > > Looks like we're not quite out of the woods, as hornet and tern are > still unhappy: > > # Failed test 'pg_amcheck excluding all corrupt schemas status (got 2 vs > expected 0)' > # at t/003_check.pl line 498. > > # Failed test 'pg_am

Re: pg_amcheck contrib application

2021-03-15 Thread Tom Lane
Looks like we're not quite out of the woods, as hornet and tern are still unhappy: # Failed test 'pg_amcheck excluding all corrupt schemas status (got 2 vs expected 0)' # at t/003_check.pl line 498. # Failed test 'pg_amcheck excluding all corrupt schemas stdout /(?^:^$)/' # at t/003_chec

Re: pg_amcheck contrib application

2021-03-13 Thread Noah Misch
On Sat, Mar 13, 2021 at 10:51:27AM -0800, Mark Dilger wrote: > > On Mar 13, 2021, at 10:46 AM, Noah Misch wrote: > > On Fri, Mar 12, 2021 at 05:04:09PM -0800, Mark Dilger wrote: > >>> On Mar 12, 2021, at 3:24 PM, Mark Dilger > >>> wrote: > >>> and the second deals with an apparent problem with I

Re: pg_amcheck contrib application

2021-03-13 Thread Mark Dilger
> On Mar 13, 2021, at 10:46 AM, Noah Misch wrote: > > On Fri, Mar 12, 2021 at 05:04:09PM -0800, Mark Dilger wrote: >>> On Mar 12, 2021, at 3:24 PM, Mark Dilger >>> wrote: >>> >>> and the second deals with an apparent problem with IPC::Run shell expanding >>> an asterisk on some platforms b

Re: pg_amcheck contrib application

2021-03-13 Thread Noah Misch
On Fri, Mar 12, 2021 at 05:04:09PM -0800, Mark Dilger wrote: > > On Mar 12, 2021, at 3:24 PM, Mark Dilger > > wrote: > > > > and the second deals with an apparent problem with IPC::Run shell expanding > > an asterisk on some platforms but not others. That second one, if true, > > seems like a

Re: pg_amcheck contrib application

2021-03-13 Thread Robert Haas
On Sat, Mar 13, 2021 at 10:20 AM Mark Dilger wrote: > We want to use values that don't look like any of the others. The complete > set of nibbles in the values above is [012345678B], leaving the set [9ACDEF] > unused. The attached patch uses the value DEADF9F9 as it seems a little > easier to

Re: pg_amcheck contrib application

2021-03-13 Thread Mark Dilger
> On Mar 13, 2021, at 6:50 AM, Tom Lane wrote: > > Robert Haas writes: >> I think it would be good to use a non-zero value here. We're doing a >> lot of poking into raw bytes here, and if something goes wrong, a zero >> value is more likely to look like something normal than whatever else. >>

  1   2   >