Dear hackers, While tracking a buildfarm, I found that drongo failed the test pg_upgrade/003_logical_slots [1]. A strange point is that the test passed in the next iteration. Currently I'm not sure the reason, but I will keep my eye for it and will investigate if it happens again.
I think this failure is not related with our logical slots work, whereas it failed 003_logical_slots.pl. More detail, please see latter part. For more investigation, a server log during the upgrade may be needed. It will be in the data directory so BF system will not upload them. I may need additional information if it failed again. # Analysis of failure According to the output, pg_upgrade seemed to be failed while restoring objects to new cluster[2]. As code-level anaysis, pg_upgrade command failed in exec_prog(). In the function, pg_restore tried to be executed for database "postgres". Below is a brief call-stack. Note that pg_restore is not used for migrating logical replication slots, it is done by pg_upgrade binary itself. Also, the migration is done after all objects are copied, not in create_new_objects(). ``` exec_prog() parallel_exec_prog("pg_restore ... ") <-- Since -j option is not specified, it is just a wrapper create_new_objects() main() ``` In exec_prog(), system() system call was called but returned non-zero value. Doc said that sytem() returns value that is returned by the command interpreter, when input is not NULL [3]. Unfortunately, current code does not output the return code. Also, BF system does not upload data directory for failed tests. Therefore, I could not get more information for the investigation. [1]: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=drongo&dt=2023-11-07%2013%3A43%3A23&stg=pg_upgrade-check [2]: ``` ... # No postmaster PID for node "oldpub" # Running: pg_upgrade --no-sync -d C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\data/t_003_logical_slots_oldpub_data/pgdata -D C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\data/t_003_logical_slots_newpub_data/pgdata -b C:/prog/bf/root/HEAD/PGSQL~1.BUI/TMP_IN~1/prog/bf/root/HEAD/inst/bin -B C:/prog/bf/root/HEAD/PGSQL~1.BUI/TMP_IN~1/prog/bf/root/HEAD/inst/bin -s 127.0.0.1 -p 54813 -P 54814 --copy Performing Consistency Checks ... Setting frozenxid and minmxid counters in new cluster ok Restoring global objects in the new cluster ok Restoring database schemas in the new cluster *failure* Consult the last few lines of "C:/prog/bf/root/HEAD/pgsql.build/testrun/pg_upgrade/003_logical_slots/data/t_003_logical_slots_newpub_data/pgdata/pg_upgrade_output.d/20231107T142224.580/log/pg_upgrade_dump_5.log" for the probable cause of the failure. Failure, exiting [14:23:26.632](70.141s) not ok 10 - run of pg_upgrade of old cluster [14:23:26.632](0.000s) # Failed test 'run of pg_upgrade of old cluster' # at C:/prog/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/003_logical_slots.pl line 170. ### Starting node "newpub" # Running: pg_ctl -w -D C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\data/t_003_logical_slots_newpub_data/pgdata -l C:\\prog\\bf\\root\\HEAD\\pgsql.build/testrun/pg_upgrade/003_logical_slots\\log/003_logical_slots_newpub.log -o --cluster-name=newpub start waiting for server to start.... done server started # Postmaster PID for node "newpub" is 4604 [14:23:28.398](1.766s) not ok 11 - check the slot exists on new cluster [14:23:28.398](0.001s) # Failed test 'check the slot exists on new cluster' # at C:/prog/bf/root/HEAD/pgsql/src/bin/pg_upgrade/t/003_logical_slots.pl line 176. [14:23:28.399](0.000s) # got: '' # expected: 'regress_sub|t' ... ``` [3]: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/system-wsystem?view=msvc-170 Best Regards, Hayato Kuroda FUJITSU LIMITED