On Thu, December 5, 2019 at 5:34 PM Peter Geoghegan wrote:
> > We have a Postgres 10 database that we recently upgraded to Postgres 12 
> > using pg_upgrade. We recently discovered that there are rows in one of the 
> > tables that have duplicate primary keys:
> 
> What's the timeline here? In other words, does it look like these rows
> were updated and/or deleted before, around the same time as, or after
> the upgrade?

The Postgres 12 upgrade was performed on 2019-11-22, so the affected rows were 
modified after this upgrade (although some of the rows were originally inserted 
before then, before they were modified/duplicated).

> > This database runs inside Docker, with the data directory bind-mounted to a 
> > reflink-enabled XFS filesystem. The VM is running Debian's 4.19.16-1~bpo9+1 
> > kernel inside an AWS EC2 instance. We have Debezium stream data from this 
> > database via pgoutput.
> 
> That seems suspicious, since reflink support for XFS is rather immature.

Good point. Looking at kernel commits since 4.19.16 it appears that there have 
been a few bug fixes in later kernel versions that address a few XFS corruption 
issues. Regardless of whether FS bugs are responsible of this corruption I'll 
plan on upgrading to a newer kernel.

> How did you invoke pg_upgrade? Did you use the --link (hard link) option?

Yes, we first created a backup using "cp -a --reflink=always", ran initdb on 
the new directory, and then upgraded using "pg_upgrade -b ... -B ... -d ... -D 
-k".

Alex

Reply via email to