On Sat, Jan 07, 2023 at 12:39:11AM +1300, Thomas Munro wrote: > On Wed, Nov 9, 2022 at 2:04 PM Justin Pryzby <pry...@telsasoft.com> wrote: > > +data_sync_retry = on > > Sharing with the list some clues that Justin and I figured out about > what that part is doing. Without it, you get failures like: > > PANIC: could not open file "pg_logical/snapshots/0-14FE6B0.snap": > No such file or directory > > That's been seen before: > > https://www.postgresql.org/message-id/flat/17827.1549866683%40sss.pgh.pa.us > > That thread concluded that the operating system must have a non-atomic > rename(), ie a kernel bug. I don't know why Cygwin would display that > behaviour and our native Windows build not; maybe timing, or maybe our > own open() and rename() wrappers for Windows do something important > differently than Cygwin's open() and rename(). > > On reflection, that seems a bit too flimsy to have in-tree without > more investigation, which I won't have time for myself, so I'm going > to withdraw this entry.
Not so fast :) Here's my latest copy of the patch. Most recently, rather than setting data_sync_retry=no, I changed to call fsync_fname_ext() rather than fsync_fname(), which uses PANIC (except when data_sync_retry is disabled). That seems to work, showing that the problem is limited to SnapBuildSerialize(), and not a problem with all fsync()... https://cirrus-ci.com/task/5990885733695488 Thomas raised a good question, which was how the tests were passing when SnapBuildSerialize() was raising an error, which is what it would've been doing when I used data_sync_retry=no. So .. why is wal_sync_method being used to control fsync for things other than WAL? See 6dc7760ac (c. 2005) which added wal_fsync_writethrough, at which point (since 9b178555f, c. 2004) wal_sync_method was already being used for SLOG. Now, it's also being used for logical decoding (since b89e1510 and 858ec1185, c. 2014) in rewriteheap.c/snapbuild.c. And pidfiles (since ee0e525bf, 2010). And the control file (8b938d36f7, 2019). Note that data_sync_retry wasn't added until 9ccdd7f66 (c. 2018) It looks like logical decoding may be the "most wrong" place that wal_sync_method is being used, so maybe my change is reasonable to consider, and not just a workaround. I'm going to re-open the CF entry to let this run for a while to see how it works out. -- Justin
>From b07add11b8bf39f5bfbae4f9072470f31da97360 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Mon, 25 Jul 2022 23:05:10 +1200 Subject: [PATCH] WIP CI support for Cygwin. ci-os-only: cygwin See also: d8e78714-dc77-4a64-783f-e863ba4d9...@2ndquadrant.com https://cirrus-ci.com/task/5145086722834432 XXX This should use a canned Docker image with all the right packages installed? But if the larger image is slower to start, then maybe not... --- .cirrus.yml | 83 +++++++++++++++++++++ configure | 2 +- configure.ac | 2 +- src/backend/replication/logical/snapbuild.c | 4 +- src/test/perl/PostgreSQL/Test/Cluster.pm | 4 +- src/test/perl/PostgreSQL/Test/Utils.pm | 12 ++- src/test/recovery/t/020_archive_status.pl | 2 +- src/tools/ci/cores_backtrace.sh | 19 ++++- 8 files changed, 118 insertions(+), 10 deletions(-) diff --git a/.cirrus.yml b/.cirrus.yml index d13726ed893..4507f734e94 100644 --- a/.cirrus.yml +++ b/.cirrus.yml @@ -737,6 +737,89 @@ task: type: text/plain +task: + name: Windows - Cygwin + + # due to resource constraints we don't run this task by default for now + trigger_type: manual + # worth using only_if despite being manual, otherwise this task will show up + # when e.g. ci-os-only: linux is used. + only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*' + # otherwise it'll be sorted before other tasks + depends_on: SanityCheck + + #XXX only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*' + #timeout_in: 120m + + env: + CPUS: 4 + BUILD_JOBS: $CPUS + TEST_JOBS: $CPUS + CCACHE_DIR: /tmp/ccache + CCACHE_LOGFILE: ccache.log + # --disable-dynamicbase + # --with-gssapi + CONFIGURE_CACHE: /tmp/ccache/configure.cache + PG_TEST_USE_UNIX_SOCKETS: 1 + EXTRA_REGRESS_OPTS: --max-connections=1 + PG_TEST_EXTRA: ldap ssl # disable kerberos + CC: ccache gcc + CFLAGS: -Og -ggdb + BASH: C:\tools\cygwin\bin\bash.exe --login + + #windows_container: + #image: cirrusci/windowsservercore:2019-2022.06.23 + #os_version: 2019 + compute_engine_instance: + image_project: $IMAGE_PROJECT + image: family/pg-ci-windows-ci-vs-2019 + platform: windows + cpu: $CPUS + memory: 4G + + setup_additional_packages_script: | + choco install -y --no-progress cygwin + C:\tools\cygwin\cygwinsetup.exe -q -P cygrunsrv,make,gcc-core,ccache,binutils,libtool,pkg-config,flex,bison,zlib-devel,libxml2-devel,libxslt-devel,libssl-devel,openldap-devel,libreadline-devel,perl,meson,ninja,perl-IPC-Run + REM libkrb5-devel,krb5-server + %BASH% -c "cygserver-config -y" + %BASH% -c "echo 'kern.ipc.semmni 1024' >> /etc/cygserver.conf" + %BASH% -c "echo 'kern.ipc.semmns 1024' >> /etc/cygserver.conf" + %BASH% -c "net start cygserver" + + sysinfo_script: | + chcp + systeminfo + powershell -Command get-psdrive -psprovider filesystem + set + %BASH% -c "id; uname -a; ulimit -a -H; ulimit -a -S; export" + + ccache_cache: + folder: C:\tools\cygwin\tmp\ccache + fingerprint_key: ccache/cygwin + reupload_on_changes: true + + configure_script: | + %BASH% -c "cd '%cd%' && meson setup --buildtype=debug -Dcassert=true -Dssl=openssl -Duuid=e2fs -DPG_TEST_EXTRA='$PG_TEST_EXTRA' build" + + build_script: | + %BASH% -c "cd '%cd%' && ninja -C build -j${BUILD_JOBS}" + %BASH% -c "ccache --show-stats" + + always: + upload_caches: ccache + + #%BASH% -c "cd '%cd%' && echo 'data_sync_retry = on' >> src/tools/ci/pg_ci_base.conf" + #%BASH% -c "cd '%cd%' && echo 'wal_sync_method = fdatasync' >> src/tools/ci/pg_ci_base.conf" + # --repeat 9 + test_world_script: | + %BASH% -c "cd '%cd%' && meson test $MTEST_ARGS --num-processes ${TEST_JOBS}" + + on_failure: + <<: *on_failure_meson + cores_script: | + %BASH% -c "cd '%cd%' && src/tools/ci/cores_backtrace.sh cygwin ." + + task: name: CompilerWarnings # task that did not run, count as a success, so we need to recheck Linux' diff --git a/configure b/configure index 5d07fd0bb91..72d56f00534 100755 --- a/configure +++ b/configure @@ -16477,7 +16477,7 @@ fi # mingw has adopted a GNU-centric interpretation of optind/optreset, # so always use our version on Windows. -if test "$PORTNAME" = "win32"; then +if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then case " $LIBOBJS " in *" getopt.$ac_objext "* ) ;; *) LIBOBJS="$LIBOBJS getopt.$ac_objext" diff --git a/configure.ac b/configure.ac index e9b74ced6ca..e0a9c332060 100644 --- a/configure.ac +++ b/configure.ac @@ -1899,7 +1899,7 @@ fi # mingw has adopted a GNU-centric interpretation of optind/optreset, # so always use our version on Windows. -if test "$PORTNAME" = "win32"; then +if test "$PORTNAME" = "win32" -o "$PORTNAME" = "cygwin"; then AC_LIBOBJ(getopt) AC_LIBOBJ(getopt_long) fi diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 829c5681120..f0929600fcc 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -1812,7 +1812,9 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn) } /* make sure we persist */ - fsync_fname(path, false); + if (fsync_fname_ext(path, false, false, ERROR)) + elog(ERROR, "failed to fsync"); + fsync_fname("pg_logical/snapshots", true); /* diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index 04921ca3a3d..31ac1d020a4 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -1097,7 +1097,7 @@ sub enable_restoring # the path contains spaces. $path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os); my $copy_command = - $PostgreSQL::Test::Utils::windows_os + $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin ? qq{copy "$path\\\\%f" "%p"} : qq{cp "$path/%f" "%p"}; @@ -1167,7 +1167,7 @@ sub enable_archiving # the path contains spaces. $path =~ s{\\}{\\\\}g if ($PostgreSQL::Test::Utils::windows_os); my $copy_command = - $PostgreSQL::Test::Utils::windows_os + $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin ? qq{copy "%p" "$path\\\\%f"} : qq{cp "%p" "$path/%f"}; diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm b/src/test/perl/PostgreSQL/Test/Utils.pm index 878e12b15ed..0c3f4dc35a0 100644 --- a/src/test/perl/PostgreSQL/Test/Utils.pm +++ b/src/test/perl/PostgreSQL/Test/Utils.pm @@ -88,10 +88,11 @@ our @EXPORT = qw( $windows_os $is_msys2 + $is_cygwin $use_unix_sockets ); -our ($windows_os, $is_msys2, $use_unix_sockets, $timeout_default, +our ($windows_os, $is_msys2, $is_cygwin, $use_unix_sockets, $timeout_default, $tmp_check, $log_path, $test_logfile); BEGIN @@ -140,13 +141,18 @@ BEGIN $ENV{PGAPPNAME} = basename($0); # Must be set early - $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys'; + $windows_os = $Config{osname} eq 'MSWin32' || $Config{osname} eq 'msys' + || $Config{osname} eq 'cygwin'; + # Check if this environment is MSYS2. $is_msys2 = $windows_os && -x '/usr/bin/uname' && `uname -or` =~ /^[2-9].*Msys/; + # Check if this environment is Cygwin + $is_cygwin = $Config{osname} eq 'cygwin'; + if ($windows_os) { require Win32API::File; @@ -707,7 +713,7 @@ sub dir_symlink { my $oldname = shift; my $newname = shift; - if ($windows_os) + if ($windows_os && !$is_cygwin) { $oldname =~ s,/,\\,g; $newname =~ s,/,\\,g; diff --git a/src/test/recovery/t/020_archive_status.pl b/src/test/recovery/t/020_archive_status.pl index 13ada994dbb..0462d1d90c2 100644 --- a/src/test/recovery/t/020_archive_status.pl +++ b/src/test/recovery/t/020_archive_status.pl @@ -26,7 +26,7 @@ my $primary_data = $primary->data_dir; # a portable solution, use an archive command based on a command known to # work but will fail: copy with an incorrect original path. my $incorrect_command = - $PostgreSQL::Test::Utils::windows_os + $PostgreSQL::Test::Utils::windows_os && !$PostgreSQL::Test::Utils::is_cygwin ? qq{copy "%p_does_not_exist" "%f_does_not_exist"} : qq{cp "%p_does_not_exist" "%f_does_not_exist"}; $primary->safe_psql( diff --git a/src/tools/ci/cores_backtrace.sh b/src/tools/ci/cores_backtrace.sh index 28d3cecfc67..c49f9b07752 100755 --- a/src/tools/ci/cores_backtrace.sh +++ b/src/tools/ci/cores_backtrace.sh @@ -1,5 +1,7 @@ #! /bin/sh +#set -e + if [ $# -ne 2 ]; then echo "cores_backtrace.sh <os> <directory>" exit 1 @@ -8,9 +10,22 @@ fi os=$1 directory=$2 +findargs='' case $os in freebsd|linux|macos) - ;; + ;; + + cygwin) + for stack in $(find "$directory" -type f -name "*.stackdump") ; do + binary=`basename "$stack" .stackdump` + echo;echo; + echo "dumping ${stack} for ${binary}" + awk '/^0/{print $2}' $stack |addr2line -f -i -e ./build/tmp_install/usr/local/pgsql/bin/postgres.exe + #awk '/^0/{print $2}' $stack |addr2line -f -i -e "./build/src/backend/$binary.exe" + done + exit 0 + ;; + *) echo "unsupported operating system ${os}" exit 1 @@ -48,3 +63,5 @@ for corefile in $(find "$directory" -type f) ; do gdb --batch --quiet -ex "thread apply all bt full" -ex "quit" "$binary" "$corefile" 2>/dev/null fi done + +exit 0 -- 2.25.1