On Sat, Jun 23, 2018 at 10:29:49PM -0400, Bruce Momjian wrote: > On Fri, Jun 15, 2018 at 03:01:37PM -0700, Vimalraj A wrote: > > I would like to understand why it happens so. > > 1. What transient state corrupts the db? > > 2. Is it a known issue with pg_upgrade? > > 3. Is there a way to get the data from pg_upgrade after "immediate" mode > > stop > > of previous version? > > Well, that's interesting. We document to shut down the old and new > sever with pg_ctl stop, but don't specify to avoid immediate. > > The reason you are having problems is that pg_upgrade does not copy the > WAL from the old cluster to the new one, so there is no way to replay > the needed WAL during startup of the new server, which leads to > corruption. Did you find this out in testing or in actual use? > > What is also interesting is how pg_upgrade tries to avoid problems with > _crash_ shutdowns --- if it sees a postmaster lock file, it tries to > start the server, and if that works, it then stops it, causing the WAL > to be replayed and cleanly shutdown. What it _doesn't_ handle is pg_ctl > -m immediate, which removes the lock file, but does leave WAL in need of > replay. Oops! > > Ideally we could detect this before we check pg_controldata and then do > the start/stop trick to fix the WAL, but the ordering of the code makes > that hard. Instead, I have developed the attached patch which does a > check for the cluster state at the same time we are checking > pg_controldata, and reports an error if there is not a clean shutdown. > Based on how rare this is, this is probably the cleanest solution, and I > think can be backpatched.
Updated patch applied through 9.3: https://git.postgresql.org/pg/commitdiff/260fe9f2b02b67de1e5ff29faf123e4220586c43 -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +