On Thu, Sep 24, 2020 at 10:27 AM Heikki Linnakangas <hlinn...@iki.fi> wrote: > >> /* > >> * If this is a relation file, copy the modified blocks. > >> * > >> * This is in addition to any other changes. > >> */ > >> iter = datapagemap_iterate(&entry->target_modified_pages); > >> while (datapagemap_next(iter, &blkno)) > >> { > >> offset = blkno * BLCKSZ; > >> > >> source->queue_fetch_range(source, entry->path, offset, BLCKSZ); > >> } > >> pg_free(iter); > > > > Can we put this hunk into a static function overwrite_pages()? > > Meh, it's only about 10 lines of code, and one caller.
Fair. > > > 7. Please address the FIXME for the symlink case: > > /* FIXME: Check if it points to the same target? */ > > It's not a new issue. Would be nice to fix, of course. I'm not sure what > the right thing to do would be. If you have e.g. replaced > postgresql.conf with a symlink that points outside the data directory, > would it be appropriate to overwrite it? Or perhaps we should throw an > error? We also throw an error if a file is a symlink in the source but a > regular file in the target, or vice versa. > Hmm, I can imagine a use case for 2 different symlink targets on the source and target clusters. For example the primary's pg_wal directory can have a different symlink target as compared to a standby's (different mount points on the same network maybe?). An end user might not desire pg_rewind meddling with that setup or may desire pg_rewind to treat the source as a source-of-truth with respect to this as well and would want pg_rewind to overwrite the target's symlink. Maybe doing a check and emitting a warning with hint/detail is prudent here while taking no action. > > 8. > > > > * it anyway. But there's no harm in copying it now.) > > > > and > > > > * copy them here. But we don't know which scenario we're > > * dealing with, and there's no harm in copying the missing > > * blocks now, so do it now. > > > > Could you add a line or two explaining why there is "no harm" in these > > two hunks above? > > The previous sentences explain that there's a WAL record covering them. > So they will be overwritten by WAL replay anyway. Does it need more > explanation? Yeah you are right, that is reason enough. > > 14. queue_overwrite_range(), finish_overwrite() instead of > > queue_fetch_range(), finish_fetch()? Similarly update\ > > *_fetch_file_range() and *_finish_fetch() > > > > > > 15. Let's have local_source.c and libpq_source.c instead of *_fetch.c > > Good idea! And fetch.h -> rewind_source.h. +1. You might have missed the changes to rename "fetch" -> "overwrite" as was mentioned in 14. > > I also moved the code in execute_file_actions() function to pg_rewind.c, > into a new function: perform_rewind(). It felt a bit silly to have just > execute_file_actions() in a file of its own. perform_rewind() now does > all the modifications to the data directory, writing the backup file. > Except for writing the recovery config: that also needs to be when > there's no rewind to do, so it makes sense to keep it separate. What do > you think? I don't mind inlining that functionality into perform_rewind(). +1. Heads up: The function declaration for execute_file_actions() is still there in rewind_source.h. > > 16. > > > >> conn = PQconnectdb(connstr_source); > >> > >> if (PQstatus(conn) == CONNECTION_BAD) > >> pg_fatal("could not connect to server: %s", > >> PQerrorMessage(conn)); > >> > >> if (showprogress) > >> pg_log_info("connected to server"); > > > > > > The above hunk should be part of init_libpq_source(). Consequently, > > init_libpq_source() should take a connection string instead of a conn. > > The libpq connection is also needed by WriteRecoveryConfig(), that's why > it's not fully encapsulated in libpq_source. Ah. I find it pretty weird why we need to specify --source-server to have ----write-recovery-conf work. From the code, we only need the conn for calling PQserverVersion(), something we can easily get by slurping pg_controldata on the source side? Maybe we can remove this limitation? Regards, Soumyadeep (VMware)