On Thu, Jan 25, 2024, at 6:05 AM, Hayato Kuroda (Fujitsu) wrote: > 01. > ``` > /* Options */ > static char *pub_conninfo_str = NULL; > static SimpleStringList database_names = {NULL, NULL}; > static int wait_seconds = DEFAULT_WAIT; > static bool retain = false; > static bool dry_run = false; > ``` > > Just to confirm - is there a policy why we store the specified options? If you > want to store as global ones, username and port should follow (my fault...). > Or, should we have a structure to store them?
It is a matter of style I would say. Check other client applications. Some of them also use global variable. There are others that group options into a struct. I would say that since it has a short lifetime, I don't think the current style is harmful. > 04. > ``` > {"dry-run", no_argument, NULL, 'n'}, > ``` > > I'm not sure why the dry_run mode exists. In terms pg_resetwal, it shows the > which value would be changed based on the input. As for the pg_upgrade, it > checks > whether the node can be upgraded for now. I think, we should have the checking > feature, so it should be renamed to --check. Also, the process should exit > earlier > at that time. It is extremely useful because (a) you have a physical replication setup and don't know if it is prepared for logical replication, (b) check GUCs (is max_wal_senders sufficient for this pg_subscriber command? Or is max_replication_slots sufficient to setup the logical replication even though I already have some used replication slots?), (c) connectivity and (d) credentials. > 05. > I felt we should accept some settings from enviroment variables, like > pg_upgrade. > Currently, below items should be acceted. > > - data directory > - username > - port > - timeout Maybe PGDATA. > 06. > ``` > pg_logging_set_level(PG_LOG_WARNING); > ``` > > If the default log level is warning, there are no ways to output debug logs. > (-v option only raises one, so INFO would be output) > I think it should be PG_LOG_INFO. You need to specify multiple -v options. > 07. > Can we combine verifications into two functions, e.g., check_primary() and > check_standby/check_subscriber()? I think v9 does it. > 08. > Not sure, but if we want to record outputs by pg_subscriber, the sub-directory > should be created. The name should contain the timestamp. The log file already contains the timestamp. Why? > 09. > Not sure, but should we check max_slot_wal_keep_size of primary server? It can > avoid to fail starting of logical replicaiton. A broken physical replication *before* running this tool is its responsibility? Hmm. We might add another check that can be noticed during dry run mode. > 10. > ``` > nslots_new = nslots_old + dbarr.ndbs; > > if (nslots_new > max_replication_slots) > { > pg_log_error("max_replication_slots (%d) must be greater than or equal to " > "the number of replication slots required (%d)", max_replication_slots, > nslots_new); > exit(1); > } > ``` > > I think standby server must not have replication slots. Because subsequent > pg_resetwal command discards all the WAL file, so WAL records pointed by them > are removed. Currently pg_resetwal does not raise ERROR at that time. Again, dry run mode might provide a message for it. > 11. > ``` > /* > * Stop the subscriber if it is a standby server. Before executing the > * transformation steps, make sure the subscriber is not running because > * one of the steps is to modify some recovery parameters that require a > * restart. > */ > if (stat(pidfile, &statbuf) == 0) > ``` > > I kept just in case, but I'm not sure it is still needed. How do you think? > Removing it can reduce an inclusion of pidfile.h. Are you suggesting another way to check if the standby is up and running? > 12. > ``` > pg_ctl_cmd = psprintf("\"%s/pg_ctl\" stop -D \"%s\" -s", > standby.bindir, standby.pgdata); > rc = system(pg_ctl_cmd); > pg_ctl_status(pg_ctl_cmd, rc, 0); > ``` > > > There are two places to stop the instance. Can you divide it into a function? Yes. > 13. > ``` > * A temporary replication slot is not used here to avoid keeping a > * replication connection open (depending when base backup was taken, the > * connection should be open for a few hours). > */ > conn = connect_database(primary.base_conninfo, dbarr.perdb[0].dbname); > if (conn == NULL) > exit(1); > consistent_lsn = create_logical_replication_slot(conn, true, > &dbarr.perdb[0]); > ``` > > I didn't notice the comment, but still not sure the reason. Why we must > reserve > the slot until pg_subscriber finishes? IIUC, the slot would be never used, it > is created only for getting a consistent_lsn. So we do not have to keep. > Also, just before, logical replication slots for each databases are created, > so > WAL records are surely reserved. This comment needs to be updated. It was written at the time I was pursuing base backup support too. It doesn't matter if you remove this transient replication slot earlier because all of the replication slots created to the subscriptions were created *before* the one for the consistent LSN. Hence, no additional WAL retention due to this transient replication slot. > 14. > > ``` > pg_log_info("starting the subscriber"); > start_standby_server(&standby, subport, server_start_log); > ``` > > This info should be in the function. Ok. > 15. > ``` > /* > * Create a subscription for each database. > */ > for (i = 0; i < dbarr.ndbs; i++) > ``` > > This can be divided into a function, like create_all_subscriptions(). Ok. > 16. > My fault: usage() must be updated. > > 17. use_primary_slot_name > ``` > if (PQntuples(res) != 1) > { > pg_log_error("could not obtain replication slot information: got %d rows, > expected %d row", > PQntuples(res), 1); > return NULL; > } > ``` > > Error message should be changed. I think this error means the standby has > wrong primary_slot_name, right? I refactored this code a bit but the message is the same. It detects 2 cases: (a) you set primary_slot_name but you don't have a replication slot with the same name and (b) a cannot-happen bug that provides > 1 rows. It is a broken setup so maybe a hint saying so is enough. > 18. misc > Sometimes the pid of pg_subscriber is referred. It can be stored as global > variable. I prefer to keep getpid() call. > 19. > C99-style has been allowed, so loop variables like "i" can be declared in the > for-statement, like > > ``` > for (int i = 0; i < MAX; i++) > ``` v9 does it. > 20. > Some comments, docs, and outputs must be fixed when the name is changed. Next patch. -- Euler Taveira EDB https://www.enterprisedb.com/