Dear hackers, Here are comments for v8 patch set. I may revise them by myself, but I want to post here to share all of them.
01. ``` /* Options */ static char *pub_conninfo_str = NULL; static SimpleStringList database_names = {NULL, NULL}; static int wait_seconds = DEFAULT_WAIT; static bool retain = false; static bool dry_run = false; ``` Just to confirm - is there a policy why we store the specified options? If you want to store as global ones, username and port should follow (my fault...). Or, should we have a structure to store them? 02. ``` {"subscriber-conninfo", required_argument, NULL, 'S'}, ``` This is my fault, but "--subscriber-conninfo" is still remained. It should be removed if it is not really needed. 03. ``` {"username", required_argument, NULL, 'u'}, ``` Should we accept 'U' instead of 'u'? 04. ``` {"dry-run", no_argument, NULL, 'n'}, ``` I'm not sure why the dry_run mode exists. In terms pg_resetwal, it shows the which value would be changed based on the input. As for the pg_upgrade, it checks whether the node can be upgraded for now. I think, we should have the checking feature, so it should be renamed to --check. Also, the process should exit earlier at that time. 05. I felt we should accept some settings from enviroment variables, like pg_upgrade. Currently, below items should be acceted. - data directory - username - port - timeout 06. ``` pg_logging_set_level(PG_LOG_WARNING); ``` If the default log level is warning, there are no ways to output debug logs. (-v option only raises one, so INFO would be output) I think it should be PG_LOG_INFO. 07. Can we combine verifications into two functions, e.g., check_primary() and check_standby/check_subscriber()? 08. Not sure, but if we want to record outputs by pg_subscriber, the sub-directory should be created. The name should contain the timestamp. 09. Not sure, but should we check max_slot_wal_keep_size of primary server? It can avoid to fail starting of logical replicaiton. 10. ``` nslots_new = nslots_old + dbarr.ndbs; if (nslots_new > max_replication_slots) { pg_log_error("max_replication_slots (%d) must be greater than or equal to " "the number of replication slots required (%d)", max_replication_slots, nslots_new); exit(1); } ``` I think standby server must not have replication slots. Because subsequent pg_resetwal command discards all the WAL file, so WAL records pointed by them are removed. Currently pg_resetwal does not raise ERROR at that time. 11. ``` /* * Stop the subscriber if it is a standby server. Before executing the * transformation steps, make sure the subscriber is not running because * one of the steps is to modify some recovery parameters that require a * restart. */ if (stat(pidfile, &statbuf) == 0) ``` I kept just in case, but I'm not sure it is still needed. How do you think? Removing it can reduce an inclusion of pidfile.h. 12. ``` pg_ctl_cmd = psprintf("\"%s/pg_ctl\" stop -D \"%s\" -s", standby.bindir, standby.pgdata); rc = system(pg_ctl_cmd); pg_ctl_status(pg_ctl_cmd, rc, 0); ``` There are two places to stop the instance. Can you divide it into a function? 13. ``` * A temporary replication slot is not used here to avoid keeping a * replication connection open (depending when base backup was taken, the * connection should be open for a few hours). */ conn = connect_database(primary.base_conninfo, dbarr.perdb[0].dbname); if (conn == NULL) exit(1); consistent_lsn = create_logical_replication_slot(conn, true, &dbarr.perdb[0]); ``` I didn't notice the comment, but still not sure the reason. Why we must reserve the slot until pg_subscriber finishes? IIUC, the slot would be never used, it is created only for getting a consistent_lsn. So we do not have to keep. Also, just before, logical replication slots for each databases are created, so WAL records are surely reserved. 14. ``` pg_log_info("starting the subscriber"); start_standby_server(&standby, subport, server_start_log); ``` This info should be in the function. 15. ``` /* * Create a subscription for each database. */ for (i = 0; i < dbarr.ndbs; i++) ``` This can be divided into a function, like create_all_subscriptions(). 16. My fault: usage() must be updated. 17. use_primary_slot_name ``` if (PQntuples(res) != 1) { pg_log_error("could not obtain replication slot information: got %d rows, expected %d row", PQntuples(res), 1); return NULL; } ``` Error message should be changed. I think this error means the standby has wrong primary_slot_name, right? 18. misc Sometimes the pid of pg_subscriber is referred. It can be stored as global variable. 19. C99-style has been allowed, so loop variables like "i" can be declared in the for-statement, like ``` for (int i = 0; i < MAX; i++) ``` 20. Some comments, docs, and outputs must be fixed when the name is changed. Best Regards, Hayato Kuroda FUJITSU LIMITED