* Tom Lane (t...@sss.pgh.pa.us) wrote: > Robert Haas <robertmh...@gmail.com> writes: > > On Tue, Jan 27, 2015 at 9:50 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> That's certainly impossible for the system catalogs, which means you > >> have to be able to deal with relfilenode discrepancies for them, which > >> means that maintaining the same relfilenodes for user tables is of > >> dubious value. > > > Why is that impossible for the system catalogs? > > New versions aren't guaranteed to have the same system catalogs, let alone > the same relfilenodes for them.
Indeed, new versions almost certainly have wholly new system catalogs. While there might be a reason to keep the relfilenodes the same, it doesn't actually help with the pg_upgrade use-case we're currently discussing (at least, not without additional help). The problem is that we certainly must transfer all the new catalogs, but how would rsync know that those catalog files have to be transferred but not the user relations? Using --size-only would mean that system catalogs whose sizes happen to match after the upgrade wouldn't be transferred and that would certainly lead to a corrupt situation. Andres proposed a helper script which would go through the entire tree on the remote side and set all the timestamps on the remote side to match those on the local side (prior to the pg_upgrade). If all the relfilenodes remained the same and the timestamps on the catalog tables all changed then it might work to do (without using --size-only): stop-cluster set-timestamp-script pg_upgrade rsync new_data_dir -> remote:existing_cluster This would mean that any other files which happened to be changed by pg_upgrade beyond the catalog tables would also get copied across. The issue that I see with that is that if the pg_upgrade process does touch anything outside of the system catalogs, then its documented revert mechanism (rename the control file and start the old cluster back up, prior to having started the new cluster) wouldn't be valid. Requiring an extra script which runs around changing timestamps on files is a bit awkward too, though I suppose possible, and then we'd also have to document that this process only works with $version of pg_upgrade that does the preservation of the relfilenodes. I suppose there's also technically a race condition to consider, if the whole thing is scripted and pg_upgrade manages to change an existing file in the same second that the old cluster did then that file wouldn't be recognized by the rsync as having been updated. That's not too hard to address though- just wait a second somewhere in there. Still, I'm not really sure that this approach really gains us much over the approach that Bruce is proposing. Thanks, Stephen
signature.asc
Description: Digital signature