Re: [BUGS] Violation of primary key constraint

Tom Lane Fri, 01 Feb 2013 13:58:33 -0800

Toby Murray <toby.mur...@gmail.com> writes:
> On Thu, Jan 31, 2013 at 5:43 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> Could we see the full results of heap_page_items(get_raw_page()) for
>> each of the pages where any of these tuples live, viz


> Seems awkward to put inline so I made a text file with these results
> and am attaching it. Hoping the mailing list allows attachments.

Thanks.  These are a bit odd, because none of the tuples in question
look like they've been updated, except for (11249622,6) which we
already thought had been re-modified by some later transaction.

What it looks like to me is that the writes of the source pages simply
got lost somehow, so that what's on disk now is the pre-update copies
of these tuples as well as the post-update copies.  I wonder whether
your SSD dropped a batch of updates.  Speculating wildly: maybe that
was triggered by a power glitch that also caused the reported error
on your RAID array?

One thing that I learned from examining the PK index is that for
three out of these four pairs, there were intermediate versions,
ie there must have been multiple UPDATEs issued, probably all in
the same transaction (does that square with what you know of the
update process?).  For instance, the index entries for 26245218
look like

 Item 181 -- Length:   16  Offset: 5024 (0x13a0)  Flags: NORMAL
  Block Id: 11249625  linp Index: 3  Size: 16
  Has Nulls: 0  Has Varwidths: 0

  13a0: ab00d9a7 03001000 62789001 00000000  ........bx......

 Item 182 -- Length:   16  Offset: 5040 (0x13b0)  Flags: DEAD
  Block Id: 1501614  linp Index: 16  Size: 16
  Has Nulls: 0  Has Varwidths: 0

  13b0: 1600aee9 10001000 62789001 00000000  ........bx......

 Item 183 -- Length:   16  Offset: 5328 (0x14d0)  Flags: NORMAL
  Block Id: 1501614  linp Index: 5  Size: 16
  Has Nulls: 0  Has Varwidths: 0

  14d0: 1600aee9 05001000 62789001 00000000  ........bx......

We can see the heap entries at (11249625,3) and (1501614,5)
but there's nothing at (1501614,16).  This is explainable if
you believe that a write of page 1501614 got lost.

It's conceivable that this was a software glitch not a hardware glitch,
ie Postgres forgetting the dirty-bits for a batch of pages, but we've
not seen any similar reports elsewhere.  So I'm leaning to the hardware
explanation.

                        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Violation of primary key constraint

Reply via email to