Re: pg_preadv() and pg_pwritev()

2021-01-13 Thread Sergey Shinderuk

On 11.01.2021 05:59, Thomas Munro wrote:

Since only sifaka has managed to return a result so far (nice CPU), I
had plenty of time to notice that macOS Big Sur has introduced
preadv/pwritev.  They were missing on Catalina[1].

[1] https://cirrus-ci.com/task/6002157537198080


Hi, Thomas!

Indeed, pwritev is not available on macOS Catalina. So I get compiler 
warnings about that:


/Users/shinderuk/src/pgwork/devel/build/../src/port/pwrite.c:117:10: 
warning: 'pwritev' is only available on macOS 11.0 or newer 
[-Wunguarded-availability-new]

part = pg_pwritev(fd, iov, iovcnt, offset);
   ^~
/Users/shinderuk/src/pgwork/devel/build/../src/include/port/pg_iovec.h:49:20: 
note: expanded from macro 'pg_pwritev'

#define pg_pwritev pwritev
   ^~~
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk/usr/include/sys/uio.h:104:9: 
note: 'pwritev' has been marked as being introduced in macOS 11.0 here, 
but the deployment target is macOS

  10.15.0
ssize_t pwritev(int, const struct iovec *, int, off_t) 
__DARWIN_NOCANCEL(pwritev) __API_AVAILABLE(macos(11.0), ios(14.0), 
watchos(7.0), tvos(14.0));

^
/Users/shinderuk/src/pgwork/devel/build/../src/port/pwrite.c:117:10: 
note: enclose 'pwritev' in a __builtin_available check to silence this 
warning

part = pg_pwritev(fd, iov, iovcnt, offset);
   ^~
/Users/shinderuk/src/pgwork/devel/build/../src/include/port/pg_iovec.h:49:20: 
note: expanded from macro 'pg_pwritev'

#define pg_pwritev pwritev
   ^~~
1 warning generated.
(... several more warnings ...)


And initdb fails:

running bootstrap script ... dyld: lazy symbol binding failed: Symbol 
not found: _pwritev

  Referenced from: /Users/shinderuk/src/pgwork/devel/install/bin/postgres
  Expected in: /usr/lib/libSystem.B.dylib

dyld: Symbol not found: _pwritev
  Referenced from: /Users/shinderuk/src/pgwork/devel/install/bin/postgres
  Expected in: /usr/lib/libSystem.B.dylib


Regards.

--
Sergey Shinderuk
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: pg_preadv() and pg_pwritev()

2021-01-13 Thread Sergey Shinderuk

On 13.01.2021 12:56, Thomas Munro wrote:

On Wed, Jan 13, 2021 at 10:40 PM Sergey Shinderuk
 wrote:

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk/usr/include/sys/uio.h:104:9:
note: 'pwritev' has been marked as being introduced in macOS 11.0 here,
but the deployment target is macOS
10.15.0
ssize_t pwritev(int, const struct iovec *, int, off_t)
__DARWIN_NOCANCEL(pwritev) __API_AVAILABLE(macos(11.0), ios(14.0),
watchos(7.0), tvos(14.0));
  ^


Hrm...  So why did "configure" think you have pwritev, then?  It seems
like you must have been using different compilers or options at
configure time and compile time, no?



No, i've just rerun configure from clean checkout without any options. 
It does think that pwritev is available. I'll try to figure this out 
later and come back to you. Thanks.





Re: pg_preadv() and pg_pwritev()

2021-01-13 Thread Sergey Shinderuk

The symptoms sound consistent with using bleeding-edge Xcode on a
Catalina machine ... please report exact OS and Xcode versions.


macOS 10.15.7 (19H2)
Xcode 12.3 (12C33)
macOS SDK 11.1 (20C63)



Everything is fine if I run "configure" with
PG_SYSROOT=/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk

I noticed that "cc" invoked from command line uses:
-isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk

But "xcodebuild -version -sdk macosx Path" invoked by "configure" when 
PG_SYSROOT is not provided gives:

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk

Now I'm confused about different SDK versions and locations used by 
Xcode and CommandLineTools :)





Re: pg_preadv() and pg_pwritev()

2021-01-14 Thread Sergey Shinderuk

On 14.01.2021 18:42, Tom Lane wrote:

I noticed that "cc" invoked from command line uses:
-isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk


Hm, how did you determine that exactly?



% echo 'int main(void){}' >tmp.c
% cc -v tmp.c
Apple clang version 12.0.0 (clang-1200.0.32.28)
Target: x86_64-apple-darwin19.6.0
Thread model: posix
InstalledDir: 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin


"/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang" 
-cc1 -triple x86_64-apple-macosx10.15.0 -Wdeprecated-objc-isa-usage 
-Werror=deprecated-objc-isa-usage -Werror=implicit-function-declaration 
-emit-obj -mrelax-all -disable-free -disable-llvm-verifier 
-discard-value-names -main-file-name tmp.c -mrelocation-model pic 
-pic-level 2 -mthread-model posix -mframe-pointer=all -fno-strict-return 
-masm-verbose -munwind-tables -target-sdk-version=10.15.6 
-fcompatibility-qualified-id-block-type-checking -target-cpu penryn 
-dwarf-column-info -debugger-tuning=lldb -target-linker-version 609.8 -v 
-resource-dir 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/12.0.0 
-isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk 
-I/usr/local/include -internal-isystem 
/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/local/include 
-internal-isystem 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/12.0.0/include 
-internal-externc-isystem 
/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/include 
-internal-externc-isystem 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include 
-Wno-reorder-init-list -Wno-implicit-int-float-conversion 
-Wno-c99-designator -Wno-final-dtor-non-final-class -Wno-extra-semi-stmt 
-Wno-misleading-indentation -Wno-quoted-include-in-framework-header 
-Wno-implicit-fallthrough -Wno-enum-enum-conversion 
-Wno-enum-float-conversion -fdebug-compilation-dir /Users/shinderuk 
-ferror-limit 19 -fmessage-length 238 -stack-protector 1 -fstack-check 
-mdarwin-stkchk-strong-link -fblocks -fencode-extended-block-signature 
-fregister-global-dtors-with-atexit -fgnuc-version=4.2.1 
-fobjc-runtime=macosx-10.15.0 -fmax-type-align=16 
-fdiagnostics-show-option -fcolor-diagnostics -o 
/var/folders/8x/jvqv7hyd5h98m7tz2zm9r0ycgn/T/tmp-91fb5e.o -x c tmp.c
clang -cc1 version 12.0.0 (clang-1200.0.32.28) default target 
x86_64-apple-darwin19.6.0
ignoring nonexistent directory 
"/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/local/include"
ignoring nonexistent directory 
"/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/Library/Frameworks"

#include "..." search starts here:
#include <...> search starts here:
 /usr/local/include

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/12.0.0/include
 /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/usr/include

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include

/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/System/Library/Frameworks 
(framework directory)

End of search list.

"/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld" 
-demangle -lto_library 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/libLTO.dylib 
-no_deduplicate -dynamic -arch x86_64 -platform_version macos 10.15.0 
10.15.6 -syslibroot 
/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk -o a.out 
-L/usr/local/lib 
/var/folders/8x/jvqv7hyd5h98m7tz2zm9r0ycgn/T/tmp-91fb5e.o -lSystem 
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/12.0.0/lib/darwin/libclang_rt.osx.a





Re: pg_preadv() and pg_pwritev()

2021-01-14 Thread Sergey Shinderuk

On 15.01.2021 01:13, Tom Lane wrote:

I borrowed my wife's Mac, which is still on Catalina and up to now
never had Xcode on it, and found some very interesting things.

Step 1: download/install Xcode 12.3, open it, agree to license,
wait for it to finish "installing components".

At this point, /Library/Developer/CommandLineTools doesn't exist,
and we have the following outputs from various probe commands:

% xcrun --show-sdk-path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
% xcrun --sdk macosx --show-sdk-path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
% xcodebuild -version -sdk macosx Path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk

Also, cc -v reports
   -isysroot 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk

Unsurprisingly, Xcode 12.3 itself only contains

% ls -l 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs
total 0
drwxr-xr-x  5 root  wheel  160 Nov 30 07:27 DriverKit20.2.sdk
drwxr-xr-x  7 root  wheel  224 Nov 30 07:27 MacOSX.sdk
lrwxr-xr-x  1 root  wheel   10 Jan 14 15:57 MacOSX11.1.sdk -> MacOSX.sdk

Step 2: install command line tools (I used "xcode-select --install"
to fire this off, rather than the Xcode menu item).

Now I have

% ls -l /Library/Developer/CommandLineTools/SDKs
total 0
lrwxr-xr-x  1 root  wheel   14 Jan 14 16:42 MacOSX.sdk -> MacOSX11.1.sdk
drwxr-xr-x  8 root  wheel  256 Jul  9  2020 MacOSX10.15.sdk
drwxr-xr-x  7 root  wheel  224 Nov 30 07:33 MacOSX11.1.sdk

which is pretty interesting in itself, because the same directory on
my recently-updated-to-Big-Sur Macs does NOT have the 11.1 SDK.
I wonder what determines which versions get installed here.

More interesting yet:

% xcrun --show-sdk-path
/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk
% xcrun --sdk macosx --show-sdk-path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
% xcodebuild -version -sdk macosx Path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk

and cc -v reports
   -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk

So apparently, "xcrun --show-sdk-path" (without any -sdk option)
is the most authoritative guide to the compiler's default sysroot.

However, googling turns up various people reporting that "xcrun
--show-sdk-path" returns an empty string for them, and our last
major investigation into this [1] found that there are some system
states where the compiler appears to have no default sysroot,
which I bet is the same thing.  I do not at this point have a recipe
to reproduce such a state, but we'd be fools to imagine it's no
longer possible.  My guess about it is that Apple's processes for
updating the default sysroot during system updates are just plain
buggy, with various corner cases that have ill-understood causes.

Also, after re-reading [1] I am not at all excited about trying to
remove the -isysroot switches from our *FLAGS.  What I propose to do
is keep that, but improve our mechanism for choosing a default value
for PG_SYSROOT.  It looks like first trying "xcrun --show-sdk-path",
and falling back to "xcodebuild -version -sdk macosx Path" if that
doesn't yield a valid path, is more likely to give a working build
than relying entirely on xcodebuild.  Maybe there's a case for trying
"xcrun --sdk macosx --show-sdk-path" in between; in my tests that
seemed noticeably faster than invoking xcodebuild, and I've not yet
seen a case where it gave a different answer.

Thoughts?

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/20840.1537850987%40sss.pgh.pa.us



Thanks for thorough investigation and sorry for the late reply.

I spent quite some time trying to understand / reverse engineer the 
logic behind xcrun's default SDK selection. Apparently, "man xcrun" is 
not accurate saying:


	The  SDK  which  will  be searched defaults to the most recent 
available...


I didn't find anything really useful or helpful. 
"/Library/Developer/CommandLineTools" is hardcoded into 
"libxcrun.dylib". On my machine xcrun scans


/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs

and

/Library/Developer/CommandLineTools/SDKs

in that order, and loads "SDKSettings.plist" from each subdirectory. I 
looked into plists, but couldn't find anything special about 
"MacOSX10.15.sdk".



Okay, here is what I have:

% ls -l 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs

total 0
drwxr-xr-x  5 root  wheel  160 Nov 30 15:27 DriverKit20.2.sdk
drwxr-xr-x  7 root  wheel  224 Nov 30 15:27 MacOSX.sdk
lrwxr-xr-x  1 root  wheel   10 Dec 17 14:25 MacOSX11.1.sdk -> MacOSX.sdk

% ls -l /Library/Developer/CommandLineTools/SDKs
total 0
lrwxr-xr-x  1 root  wheel   14 

Re: pg_preadv() and pg_pwritev()

2021-01-14 Thread Sergey Shinderuk

On 14.01.2021 21:05, Tom Lane wrote:

After considerable playing around, I'm guessing that the reason
-no_weak_imports doesn't help is that it rejects calls that are
marked as weak references on the *calling* side.  Since AC_CHECK_FUNCS
doesn't bother to #include the relevant header file, the compiler
doesn't know that preadv() ought to be marked as a weak reference.
Then, when the test program gets linked against the stub libc that's
provided by the SDK, there is a version of preadv() there so no link
failure occurs.  (There are way more moving parts in this weak-reference
thing than I'd realized.)



Oh, that's interesting. I've just played with it a bit and it looks 
exactly as you say.



Another thing I've been realizing while poking at this is that we
might not need to set -isysroot explicitly at all, which would then
lead to the compiler using its default sysroot automatically.
In some experimentation, it seems like what we need PG_SYSROOT for
is just for configure to be able to find tclConfig.sh and the Perl
header files.  So at this point I'm tempted to try ripping that
out altogether.  If you remove the lines in src/template/darwin
that inject PG_SYSROOT into CPPFLAGS and LDFLAGS, do things
work for you?


Yes, it works fine.




Re: pg_preadv() and pg_pwritev()

2021-01-14 Thread Sergey Shinderuk

On 15.01.2021 01:13, Tom Lane wrote:


than relying entirely on xcodebuild.  Maybe there's a case for trying
"xcrun --sdk macosx --show-sdk-path" in between; in my tests that
seemed noticeably faster than invoking xcodebuild, and I've not yet
seen a case where it gave a different answer.



I see that "xcrun --sdk macosx --show-sdk-path" really calls
"/Applications/Xcode.app/Contents/Developer/usr/bin/xcodebuild -sdk 
macosx -version Path" under the hood.



% lldb -- xcrun --no-cache --sdk macosx --show-sdk-path
(lldb) target create "xcrun"
Current executable set to 'xcrun' (x86_64).
(lldb) settings set -- target.run-args  "--no-cache" "--sdk" "macosx" 
"--show-sdk-path"

(lldb) settings set target.unset-env-vars SDKROOT
(lldb) b popen
Breakpoint 1: where = libsystem_c.dylib`popen, address = 0x7fff67265607
(lldb) r
Process 26857 launched: '/usr/bin/xcrun' (x86_64)
Process 26857 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x7fff6e313607 libsystem_c.dylib`popen
libsystem_c.dylib`popen:
->  0x7fff6e313607 <+0>: pushq  %rbp
0x7fff6e313608 <+1>: movq   %rsp, %rbp
0x7fff6e31360b <+4>: pushq  %r15
0x7fff6e31360d <+6>: pushq  %r14
Target 0: (xcrun) stopped.
(lldb) p (char *)$arg1
(char *) $1 = 0x000100406960 
"/Applications/Xcode.app/Contents/Developer/usr/bin/xcodebuild -sdk 
macosx -version Path"



% sudo dtrace -n 'pid$target::popen:entry { trace(copyinstr(arg0)) }' -c 
'xcrun --sdk macosx --show-sdk-path'

dtrace: description 'pid$target::popen:entry ' matched 1 probe
CPU IDFUNCTION:NAME
  0 413269  popen:entry 
/Applications/Xcode.app/Contents/Developer/usr/bin/xcodebuild -sdk 
macosx -version Path

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
dtrace: pid 26905 has exited




Re: pg_preadv() and pg_pwritev()

2021-01-14 Thread Sergey Shinderuk

On 15.01.2021 04:53, Sergey Shinderuk wrote:


I see that "xcrun --sdk macosx --show-sdk-path" really calls
"/Applications/Xcode.app/Contents/Developer/usr/bin/xcodebuild -sdk 
macosx -version Path" under the hood.




... and caches the result. xcodebuild not called without --no-cache.
So it still make sense to fall back on xcodebuild.

% sudo dtrace -n 'pid$target::popen:entry { trace(copyinstr(arg0)) }' -c 
'xcrun --sdk macosx --show-sdk-path'

dtrace: description 'pid$target::popen:entry ' matched 1 probe
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX11.1.sdk
dtrace: pid 26981 has exited




Re: pg_preadv() and pg_pwritev()

2021-01-14 Thread Sergey Shinderuk

On 15.01.2021 05:04, Tom Lane wrote:


on her machine there's no detail at all:

% xcrun --verbose --no-cache --show-sdk-path
/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk



The same on my machine. I get details for --find, but not for 
--show-sdk-path.




So I'm not sure what to make of that.  But I'm hesitant to assume
that xcrun is just a wrapper around xcodebuild.



I agree.




Re: pg_preadv() and pg_pwritev()

2021-01-14 Thread Sergey Shinderuk

On 15.01.2021 04:45, Tom Lane wrote:

Hence, I propose the attached.  This works as far as I can tell
to fix the problem you're seeing.

Yes, it works fine. Thank you very much.




Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-21 Thread Sergey Shinderuk

On 21.01.2021 02:07, Tom Lane wrote:

I now believe what is actually happening with the short command is
that it's iterating through the available SDKs (according to some not
very clear search path) and picking the first one it finds that
matches the host system version.  That matches the ktrace evidence
that shows it reading the SDKSettings.plist file in each SDK
directory.


Yes, you are right. After some more digging...

It searches the DEVELOPER_DIR first and then 
/Library/Developer/CommandLineTools, which is hardcoded.


My DEVELOPER_DIR is
% xcode-select -p
/Applications/Xcode.app/Contents/Developer

(For more detail try "otool -tV /usr/lib/libxcselect.dylib -p 
_xcselect_get_developer_dir_path".)


It reads ProductVersion from 
/System/Library/CoreServices/SystemVersion.plist


% plutil -p /System/Library/CoreServices/SystemVersion.plist | grep 
ProductVersion

  "ProductVersion" => "10.15.7"

Strips anything after the second dot, and prepends "macosx" to it, which 
gives "macosx10.15".


Then it scans through SDK dirs looking up CanonicalName from 
SDKSettings.plist until it finds a match with "macosx10.15".



The overall callstack:

% sudo dtrace -n 'syscall::getdirentries64:entry { ustack() }' -c 'xcrun 
--show-sdk-path'

dtrace: description 'syscall::getdirentries64:entry ' matched 1 probe
/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk
dtrace: pid 20183 has exited
CPU IDFUNCTION:NAME
  0846getdirentries64:entry
  libsystem_kernel.dylib`__getdirentries64+0xa
  libsystem_c.dylib`readdir$INODE64+0x23
  libsystem_c.dylib`scandir$INODE64+0x6c
  libxcrun.dylib`cltools_lookup_sdk_by_key+0x5f
  libxcrun.dylib`cltools_lookup_boot_system_sdk+0xda
  libxcrun.dylib`xcinfocache_resolve_sdkroot+0xc0
  libxcrun.dylib`xcrun_main2+0x57a
  libxcrun.dylib`xcrun_main+0x9
  libxcselect.dylib`xcselect_invoke_xcrun_via_library+0xc8
  libxcselect.dylib`xcselect_invoke_xcrun+0x25a
  xcrun`DYLD-STUB$$getprogname
  libdyld.dylib`start+0x1
  xcrun`0x2

  0846getdirentries64:entry
  libsystem_kernel.dylib`__getdirentries64+0xa
  libsystem_c.dylib`readdir$INODE64+0x23
  libsystem_c.dylib`scandir$INODE64+0x6c
  libxcrun.dylib`cltools_lookup_sdk_by_key+0x5f
  libxcrun.dylib`cltools_lookup_boot_system_sdk+0xf3
  libxcrun.dylib`xcinfocache_resolve_sdkroot+0xc0
  libxcrun.dylib`xcrun_main2+0x57a
  libxcrun.dylib`xcrun_main+0x9
  libxcselect.dylib`xcselect_invoke_xcrun_via_library+0xc8
  libxcselect.dylib`xcselect_invoke_xcrun+0x25a
  xcrun`DYLD-STUB$$getprogname
  libdyld.dylib`start+0x1
  xcrun`0x2


The SDK search path:

% sudo dtrace -n 'pid$target:::entry 
/probefunc=="cltools_lookup_sdk_by_key"/ { trace(copyinstr(arg0)); 
trace(copyinstr(arg1)) }' -c 'xcrun --show-sdk-path'

dtrace: description 'pid$target:::entry ' matched 17293 probes
/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk
dtrace: pid 20191 has exited
CPU IDFUNCTION:NAME
  8 398290  cltools_lookup_sdk_by_key:entry 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer 
 macosx10.15
  9 398290  cltools_lookup_sdk_by_key:entry 
/Library/Developer/CommandLineTools  macosx10.15



The properties read from SDKSettings.plist:

% sudo dtrace -n 'pid$target:::entry 
/probefunc=="_cltools_lookup_property_in_path"/ { 
trace(copyinstr(arg0)); trace(copyinstr(arg1)); trace(copyinstr(arg2)) 
}' -c 'xcrun --show-sdk-path'

dtrace: description 'pid$target:::entry ' matched 17293 probes
/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk
dtrace: pid 20195 has exited
CPU IDFUNCTION:NAME
  8 398288 _cltools_lookup_property_in_path:entry   / 
System/Library/CoreServices/SystemVersion.plist 
ProductVersion
  8 398288 _cltools_lookup_property_in_path:entry 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/DriverKit20.2.sdk 
 SDKSettings.plist  IsBaseSDK
  8 398288 _cltools_lookup_property_in_path:entry 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/DriverKit20.2.sdk 
 SDKSettings.plist  CanonicalName
  4 398288 _cltools_lookup_property_in_path:entry 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/DriverKit20.2.sdk 
 SDKSettings.plist  CanonicalNameForBuildSettings
  4 398288 _cltools_lookup_property_in_path:entry 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk 
 SDKSettings.plist  IsBaseSDK
  4 398288 _cltools_lookup_property_in_path:entry 
/Applications/Xcode.app/Contents/Developer/Platforms

Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-21 Thread Sergey Shinderuk

On 22.01.2021 01:17, James Hilliard wrote:

On Thu, Jan 21, 2021 at 11:38 AM Tom Lane  wrote:


James Hilliard  writes:

On Wed, Jan 20, 2021 at 4:07 PM Tom Lane  wrote:

I'm not sure that the case of not having the "command line tools"
installed is interesting for our purposes.  AFAIK you have to have
that in order to have access to required tools like bison and gmake.
(That reminds me, I was intending to add something to our docs
about how-to-build-from-source to say that you need to install those.)



Yeah, not 100% sure but I was able to build just fine after deleting my
command line tools.


Hm.  I've never been totally clear on what's included in the "command line
tools", although it's now apparent that one thing that gets installed is
an SDK matching the host OS version.  However, Apple's description at [1]
says

 Command Line Tools

 Download the macOS SDK, headers, and build tools such as the Apple
 LLVM compiler and Make. These tools make it easy to install open
 source software or develop on UNIX within Terminal. macOS can
 automatically download these tools the first time you try to build
 software, and they are available on the downloads page.

which certainly strongly implies that gmake is not there otherwise.
At this point I lack any "bare" macOS system to check it on.  I wonder
whether you have a copy of make available from MacPorts or Homebrew.
Or maybe uninstalling the command line tools doesn't really remove
everything?

Yeah, not entirely sure there but I do use homebrew.



FWIW, I tested with a clean install of Catalina. Before I install 
anything at all, I already have xcode-select, xcrun and all the shims in 
/usr/bin for developer tools, including cc, make, git, xcodebuild... 
Just about everything listed in the FILES section of "man xcode-select".


When I run any tool (except xcode-select), a GUI dialog pops up offering 
to install the Command Line Tools. So apparently those shims are not 
functional yet. I rejected the installation.


Instead I downloaded Xcode12.1.xip via [1], the latest version with 
macosx10.15 SDK. I unpacked it and installed by dragging Xcode.app to 
/Applications. It seems to me there is no magic behind the scenes, just 
moving the directory. I selectively checked that the shims in /usr/bin 
didn't change after that.


Now, running "cc" tells me that I have to accept the Xcode license 
agreement. After accepting it, all the shims in /usr/bin start to work, 
forwarding to the real tools inside Xcode.app.


If I run the Homebrew installer, it says that it's going to install the 
Command Line Tools. I don't know why it needs them, all the tools are 
there already. I thought that CLT is a lighter-weight option when you 
don't want the full Xcode installation, but Homebrew requires them anyway.


I rejected to install CLT and abandoned Homebrew. Then I just cloned and 
built Postgres successfully. So it looks like Xcode is really enough, at 
least on a recent macOS version.



[1] https://xcodereleases.com

--
Sergey Shinderuk
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-22 Thread Sergey Shinderuk

On 22.01.2021 20:12, Tom Lane wrote:

[ pokes more carefully... ]  Ah-hah, I see why I needed the CLT.
I bet you'll find that you can't build from "git clean -dfx" state
with only Xcode, because comparing the contents of
/Applications/Xcode.app/Contents/Developer/usr/bin and
/Library/Developer/CommandLineTools/usr/bin on my own Mac,
I observe that only the CLT provides bison and flex.  I also see
install_name_tool only in the CLT; we don't depend on that today,
but may soon (see the latest thread about coping with SIP).



I did git clone from scratch. Xcode really has all the tools.

configure:9519: checking for bison
configure:9537: found /usr/bin/bison
configure:9549: result: /usr/bin/bison
configure:9571: using bison (GNU Bison) 2.3
configure:9609: checking for flex
configure:9654: result: /usr/bin/flex
configure:9674: using flex 2.5.35 Apple(flex-32)

% xcrun --find bison
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/bison

% xcrun --find install_name_tool
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/install_name_tool



On the whole it looks like we should recommend installing the CLT
and not bothering with Xcode, which is about 10X the size:

$ du -hs /Library/Developer/CommandLineTools
1.1G/Library/Developer/CommandLineTools
$ du -hs /Applications/Xcode.app
  15G/Applications/Xcode.app



Fair.



BTW, reading [1] I see

You can install Xcode, the CLT, or both; Homebrew supports all three
configurations.

So I'm not sure why you got that prompt, unless you were using a formula
that knew you were going to need bison.

[1] https://docs.brew.sh/Installation#3


Apparently, this documentation is wrong. I’m not installing any 
particular formula, just running the Homebrew installer script.


% /bin/bash -c "$(curl -fsSL 
https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Password:
==> This script will install:
[...]
==> The following new directories will be created:
[...]
==> The Xcode Command Line Tools will be installed.

Press RETURN to continue or any other key to abort

==> Installing Command Line Tools for Xcode-12.3
==> /usr/bin/sudo /usr/sbin/softwareupdate -i Command\ Line\ Tools\ for\ 
Xcode-12.3

Software Update Tool

Downloading Command Line Tools for Xcode
[...]

I checked the script [1], and it really requires the CLT. Here is the 
explanation [2] for this:


There is actually no such requirement. However, there are
formulae that will be forced to build from source if you do not
have the CLT. They can still be built from source with Xcode
only, but because the pre-built bottles are compiled in an
environment that has both Xcode and the CLT installed, there are
some cases where the bottles end up having a hard dependency on
the CLT. A major example is gcc. So installing the CLT may help
you avoid some lengthy source builds.

We ensure that all Homebrew formulae can be built with Xcode.app
alone. Most formulae can be built with just the CLT, and those
that require the full Xcode.app have an explicit depends_on
:xcode => :build. Some users would prefer to use only the CLT
because it's a much smaller download and takes less time to
install and upgrade than Xcode.


[1] https://github.com/Homebrew/install/blob/master/install.sh#L191
[2] https://github.com/Homebrew/brew/issues/1613


Regards.

--
Sergey Shinderuk
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-22 Thread Sergey Shinderuk

On 23.01.2021 08:02, Sergey Shinderuk wrote:
I checked the script [1], and it really requires the CLT. Here is the 
explanation [2] for this:


 There is actually no such requirement. However, there are
 formulae that will be forced to build from source if you do not
 have the CLT. They can still be built from source with Xcode
 only, but because the pre-built bottles are compiled in an
 environment that has both Xcode and the CLT installed, there are
 some cases where the bottles end up having a hard dependency on
 the CLT. A major example is gcc. So installing the CLT may help
 you avoid some lengthy source builds.

 We ensure that all Homebrew formulae can be built with Xcode.app
 alone. Most formulae can be built with just the CLT, and those
 that require the full Xcode.app have an explicit depends_on
 :xcode => :build. Some users would prefer to use only the CLT
 because it's a much smaller download and takes less time to
 install and upgrade than Xcode.



In the gcc formula [1]:

  # The bottles are built on systems with the CLT installed, and do not 
work

  # out of the box on Xcode-only systems due to an incorrect sysroot.
  pour_bottle? do
reason "The bottle needs the Xcode CLT to be installed."
satisfy { MacOS::CLT.installed? }
  end


I guess this is the "xcrun --show-sdk-path" thing we've alredy disccussed :)


[1] https://github.com/Homebrew/homebrew-core/blob/master/Formula/gcc.rb#L36

--
Sergey Shinderuk
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [PATCH 1/1] Fix detection of pwritev support for OSX.

2021-01-23 Thread Sergey Shinderuk

On 23.01.2021 08:02, Sergey Shinderuk wrote:

On the whole it looks like we should recommend installing the CLT
and not bothering with Xcode, which is about 10X the size:

$ du -hs /Library/Developer/CommandLineTools
1.1G    /Library/Developer/CommandLineTools
$ du -hs /Applications/Xcode.app
  15G    /Applications/Xcode.app



Fair.


BTW, Homebrew prefers the CLT SDK:
https://github.com/Homebrew/brew/blob/master/Library/Homebrew/os/mac.rb#L138

  # Prefer CLT SDK when both Xcode and the CLT are installed.
  # Expected results:
  # 1. On Xcode-only systems, return the Xcode SDK.
  # 2. On Xcode-and-CLT systems where headers are provided by the 
system, return nil.

  # 3. On CLT-only systems with no CLT SDK, return nil.
  # 4. On CLT-only systems with a CLT SDK, where headers are 
provided by the system, return nil.
  # 5. On CLT-only systems with a CLT SDK, where headers are not 
provided by the system, return the CLT SDK.



Here is the relevant discussion:
https://github.com/Homebrew/brew/pull/7134

I like the example of Git compiled against the wrong 
LIBCURL_VERSION_NUM. Clearly, there are other issues with 
cross-compiling to a newer SDK, besides autoconf probes and weak imports.


--
Sergey Shinderuk
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Improve error handling of HMAC computations and SCRAM

2022-01-10 Thread Sergey Shinderuk

Hi,

On 11.01.2022 07:56, Michael Paquier wrote:
> Thoughts?

A few comments after a quick glance...

+ * Returns a static string providing errors about an error that happened

"errors about an error" looks odd.


+static const char *
+SSLerrmessage(unsigned long ecode)
+{
+   if (ecode == 0)
+   return NULL;
+
+   /*
+* This may return NULL, but we would fall back to a default error path 
if
+* that were the case.
+*/
+   return ERR_reason_error_string(ecode);
+}

We already have SSLerrmessage elsewhere and it's documented to never 
return NULL.  I find that confusing.


If I have two distinct pg_hmac_ctx's, are their errreason's idependent 
from one another or do they really point to the same static buffer?



Regards,

--
Sergey Shinderukhttps://postgrespro.com/




Re: Improve error handling of HMAC computations and SCRAM

2022-01-11 Thread Sergey Shinderuk

On 11.01.2022 10:57, Michael Paquier wrote:

On Tue, Jan 11, 2022 at 10:50:50AM +0300, Sergey Shinderuk wrote:

+ * Returns a static string providing errors about an error that happened

"errors about an error" looks odd.


Sure, that could be reworded.  What about "providing details about an
error"?


Yeah, that's better.  I thought "providing errors about an error" was a 
typo, but now I see the same comment was committed in b69aba745.  Is it 
just me? :)


Thanks,

--
Sergey Shinderukhttps://postgrespro.com/




Re: Improve error handling of HMAC computations and SCRAM

2022-01-12 Thread Sergey Shinderuk

On 12.01.2022 14:32, Michael Paquier wrote:

On Wed, Jan 12, 2022 at 12:56:17PM +0900, Michael Paquier wrote:

Attached is a rebased patch for the HMAC portions, with a couple of
fixes I noticed while going through this stuff again (mostly around
SASLprep and pg_fe_scram_build_secret), and a fix for a conflict
coming from 9cb5518.  psql's \password is wrong to assume that the
only error that can happen for scran-sha-256 is an OOM, but we'll get
there.


With an attachment, that's even better.  (Thanks, Daniel.)
Gave it a thorough read.  Looks good, except for errstr not set in a 
couple of places (see the diff attached).


Didn't test it.

--
Sergey Shinderukhttps://postgrespro.com/diff --git a/src/common/hmac.c b/src/common/hmac.c
index 592f9b20a38..a27778e86b3 100644
--- a/src/common/hmac.c
+++ b/src/common/hmac.c
@@ -46,9 +46,7 @@ typedef enum pg_hmac_errno
PG_HMAC_ERROR_INTERNAL
 } pg_hmac_errno;
 
-/*
- * Internal structure for pg_hmac_ctx->data with this implementation.
- */
+/* Internal pg_hmac_ctx structure */
 struct pg_hmac_ctx
 {
pg_cryptohash_ctx *hash;
diff --git a/src/common/hmac_openssl.c b/src/common/hmac_openssl.c
index c352f9db9e9..44f36d51dcb 100644
--- a/src/common/hmac_openssl.c
+++ b/src/common/hmac_openssl.c
@@ -60,9 +60,7 @@ typedef enum pg_hmac_errno
PG_HMAC_ERROR_OPENSSL
 } pg_hmac_errno;
 
-/*
- * Internal structure for pg_hmac_ctx->data with this implementation.
- */
+/* Internal pg_hmac_ctx structure */
 struct pg_hmac_ctx
 {
HMAC_CTX   *hmacctx;
diff --git a/src/common/scram-common.c b/src/common/scram-common.c
index 5f90397c66d..8896b1e73e4 100644
--- a/src/common/scram-common.c
+++ b/src/common/scram-common.c
@@ -44,7 +44,10 @@ scram_SaltedPassword(const char *password,
pg_hmac_ctx *hmac_ctx = pg_hmac_create(PG_SHA256);
 
if (hmac_ctx == NULL)
+   {
+   *errstr = pg_hmac_error(NULL);  /* returns OOM */
return -1;
+   }
 
/*
 * Iterate hash calculation of HMAC entry using given salt.  This is
@@ -126,7 +129,10 @@ scram_ClientKey(const uint8 *salted_password, uint8 
*result,
pg_hmac_ctx *ctx = pg_hmac_create(PG_SHA256);
 
if (ctx == NULL)
+   {
+   *errstr = pg_hmac_error(NULL);  /* returns OOM */
return -1;
+   }
 
if (pg_hmac_init(ctx, salted_password, SCRAM_KEY_LEN) < 0 ||
pg_hmac_update(ctx, (uint8 *) "Client Key", strlen("Client 
Key")) < 0 ||


Re: Improve error handling of HMAC computations and SCRAM

2022-01-13 Thread Sergey Shinderuk

On 13.01.2022 10:24, Michael Paquier wrote:

Thanks for the review.  The comments about pg_hmac_ctx->data were
wrong from the beginning, coming, I guess, from one of the earlier
patch versions where this was discussed.  So I have applied that
independently.

I have also spent a good amount of time on that to close the loop and
make sure that no code paths are missing an error context, adjusted a
couple of comments to explain more the role of *errstr in all the
SCRAM routines, and finally applied it on HEAD.

Thanks!

--
Sergey Shinderukhttps://postgrespro.com/




Re: Windows: Wrong error message at connection termination

2022-01-14 Thread Sergey Shinderuk

On 02.12.2021 22:31, Tom Lane wrote:

I'll push the close-only change in a little bit.


Unexpectedly, this changes the error message:

postgres=# set idle_session_timeout = '1s';
SET
postgres=# select 1;
	could not receive data from server: Software caused connection abort 
(0x2745/10053)

The connection to the server was lost. Succeeded.
postgres=#

Without shutdown/closesocket it would most likely be:

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

When the timeout expires, the server sends the error message and 
gracefully closes the connection by sending a FIN.  Later, psql sends 
another query to the server, and the server responds with a RST.  But 
now recv() returns WSAECONNABORTED(10053) instead of WSAECONNRESET(10054).


Without shutdown/closesocket, after the timeout expires, the server 
sends the error message, the client sends an ACK, and the server 
responds with a RST.  Then psql tries to sends the next query, but 
nothing is sent at the TCP level, and the next recv() returns WSAECONNRESET.


IIUIC, in both cases we may or may not recv() the error message from the 
server depending on how fast the RST arrives from the server.


Should we handle ECONNABORTED similarly to ECONNRESET in pqsecure_raw_read?

--
Sergey Shinderukhttps://postgrespro.com/




Re: Windows: Wrong error message at connection termination

2022-01-14 Thread Sergey Shinderuk

On 14.01.2022 13:01, Sergey Shinderuk wrote:
When the timeout expires, the server sends the error message and 
gracefully closes the connection by sending a FIN.  Later, psql sends 
another query to the server, and the server responds with a RST.  But 
now recv() returns WSAECONNABORTED(10053) instead of WSAECONNRESET(10054).


On the other hand, I cannot reproduce this behavior with a remote server 
even if pause psql just before the recv() call to let the RST win the race.


So I get:

postgres=# set idle_session_timeout = '1s';
recv() returned 15 errno 0
SET
recv() returned -1 errno 10035 (WSAEWOULDBLOCK)
postgres=# select 1;
recv() returned 116 errno 0
recv() returned 0 errno 0
recv() returned 0 errno 0
FATAL:  terminating connection due to idle-session timeout
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

recv() signals EOF like on Unix.

Here I connected from a Windows virtual machine to the macOS host, but 
the Wireshark dump looks the same (there is a RST) as for a localhost 
connection.


Is this "error-eating" behavior of RST on Windows specific only to 
localhost connections?


--
Sergey Shinderukhttps://postgrespro.com/




Re: Windows: Wrong error message at connection termination

2022-01-14 Thread Sergey Shinderuk

On 14.01.2022 13:01, Sergey Shinderuk wrote:

Unexpectedly, this changes the error message:

 postgres=# set idle_session_timeout = '1s';
 SET
 postgres=# select 1;
 could not receive data from server: Software caused connection 
abort (0x2745/10053)


For the record, after more poking I realized that it depends on timing. 
 By injecting delays I can get any of the following from libpq:


* could not receive data from server: Software caused connection abort
* server closed the connection unexpectedly
* no connection to the server



Should we handle ECONNABORTED similarly to ECONNRESET in pqsecure_raw_read?


So this doesn't make sense anymore.

Sorry for the noise.

--
Sergey Shinderukhttps://postgrespro.com/




Re: Bug in DefineRange() with multiranges

2021-10-11 Thread Sergey Shinderuk

On 10.10.2021 20:12, Peter Eisentraut wrote:

On 04.10.21 19:09, Sergey Shinderuk wrote:

I wonder what is the proper fix.  Just drop pfree() altogether or add
pstrdup() instead?  I see that makeMultirangeTypeName() doesn't bother
freeing its buf.


I think removing the pfree()s is a correct fix.



Thanks, here is a patch.


--
Sergey Shinderukhttps://postgrespro.com/
From 6c548f07d2a254da46cd0b6f6e99b7ed24a6b811 Mon Sep 17 00:00:00 2001
From: Sergey Shinderuk 
Date: Tue, 12 Oct 2021 07:57:42 +0300
Subject: [PATCH] Fix premature pfree() of multirange_type_name in
 DefineRange()

If the mutlirange_type_name parameter is given in the query, this would
erroneously pfree() the string in the parse tree.  Oversight in 6df7a9698bb0.
---
 src/backend/commands/typecmds.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c
index b290629a450..9ab40341793 100644
--- a/src/backend/commands/typecmds.c
+++ b/src/backend/commands/typecmds.c
@@ -1707,7 +1707,6 @@ DefineRange(ParseState *pstate, CreateRangeStmt *stmt)
/* Create cast from the range type to its multirange type */
CastCreate(typoid, multirangeOid, castFuncOid, 'e', 'f', 
DEPENDENCY_INTERNAL);
 
-   pfree(multirangeTypeName);
pfree(multirangeArrayName);
 
return address;
-- 
2.24.3 (Apple Git-128)



Re: Bug in DefineRange() with multiranges

2021-10-13 Thread Sergey Shinderuk

On 13.10.2021 07:21, Michael Paquier wrote:

Looks fine seen from here, so I'll apply shortly.


Thank you!

--
Sergey Shinderukhttps://postgrespro.com/




Re: [RFC] building postgres with meson

2021-10-14 Thread Sergey Shinderuk

Hi,

On 14.10.2021 23:54, John Naylor wrote:
On Thu, Oct 14, 2021 at 4:34 PM Andres Freund <mailto:and...@anarazel.de>> wrote:


 > Is this a Mac with SIP enabled? The Mac CI presumably has that 
disabled, which is why I didn't see this issue there. Probably need to 
implement whatever Tom figured out to do about that for the current way 
of running tests.


System Information says it's disabled. Running "csrutil status" 
complains of an unsupported configuration, which doesn't sound good, so 
I should probably go fix that independent of anything else. :-/



Maybe you could check that DYLD_LIBRARY_PATH is working for you?

% DYLD_FALLBACK_LIBRARY_PATH= 
DYLD_LIBRARY_PATH=./tmp_install/usr/local/lib 
./tmp_install/usr/local/bin/psql --version

psql (PostgreSQL) 15devel


Without DYLD_LIBRARY_PATH I get the error, as expected:

% DYLD_FALLBACK_LIBRARY_PATH= ./tmp_install/usr/local/bin/psql --version
dyld: Library not loaded: /usr/local/lib/libpq.5.dylib
  Referenced from: 
/Users/shinderuk/src/postgres-meson/build/./tmp_install/usr/local/bin/psql

  Reason: image not found


I add "DYLD_FALLBACK_LIBRARY_PATH=" because otherwise dyld falls back to 
/usr/lib/libpq.5.dylib provided by Apple (I am testing on Catalina).


% DYLD_PRINT_LIBRARIES=1 ./tmp_install/usr/local/bin/psql --version 2>&1 
| grep libpq

dyld: loaded: <4EDF735E-2104-32AD-BE7B-B400ABFCF57C> /usr/lib/libpq.5.dylib


Regards,

--
Sergey Shinderukhttps://postgrespro.com/




Re: Fix error handling in be_tls_open_server()

2023-09-19 Thread Sergey Shinderuk

On 19.09.2023 03:54, Michael Paquier wrote:

One doubt that I have is if we shouldn't let X509_NAME_print_ex() be
as it is now, and not force a failure on the bio if this calls fails.



If malloc fails inside X509_NAME_print_ex, then we will be left with 
empty port->peer_dn. Here is a gdb session showing this:


(gdb) b X509_NAME_print_ex
Breakpoint 1 at 0x7f539f6c0cf0
(gdb) c
Continuing.

Breakpoint 1, 0x7f539f6c0cf0 in X509_NAME_print_ex () from 
/lib/x86_64-linux-gnu/libcrypto.so.3

(gdb) bt
#0  0x7f539f6c0cf0 in X509_NAME_print_ex () from 
/lib/x86_64-linux-gnu/libcrypto.so.3
#1  0x56026d2fbe8d in be_tls_open_server 
(port=port@entry=0x56026ed5d730) at be-secure-openssl.c:635
#2  0x56026d2eefa5 in secure_open_server 
(port=port@entry=0x56026ed5d730) at be-secure.c:118
#3  0x56026d3dc412 in ProcessStartupPacket 
(port=port@entry=0x56026ed5d730, ssl_done=ssl_done@entry=false, 
gss_done=gss_done@entry=false) at postmaster.c:2065
#4  0x56026d3dcd8e in BackendInitialize 
(port=port@entry=0x56026ed5d730) at postmaster.c:4377
#5  0x56026d3def6a in BackendStartup 
(port=port@entry=0x56026ed5d730) at postmaster.c:4155

#6  0x56026d3df115 in ServerLoop () at postmaster.c:1781
#7  0x56026d3e0645 in PostmasterMain (argc=argc@entry=20, 
argv=argv@entry=0x56026ec5a0d0) at postmaster.c:1465

#8  0x56026d2fcd7c in main (argc=20, argv=0x56026ec5a0d0) at main.c:198
(gdb) b malloc
Breakpoint 2 at 0x7f539eca5120: file ./malloc/malloc.c, line 3287.
(gdb) c
Continuing.

Breakpoint 2, __GI___libc_malloc (bytes=4) at ./malloc/malloc.c:3287
3287./malloc/malloc.c: No such file or directory.
(gdb) bt
#0  __GI___libc_malloc (bytes=4) at ./malloc/malloc.c:3287
#1  0x7f539f6f6e09 in BUF_MEM_grow_clean () from 
/lib/x86_64-linux-gnu/libcrypto.so.3

#2  0x7f539f6e4fb8 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#3  0x7f539f6d22fb in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#4  0x7f539f6d5c06 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#5  0x7f539f6d5d37 in BIO_write () from 
/lib/x86_64-linux-gnu/libcrypto.so.3

#6  0x7f539f6bdb41 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#7  0x7f539f6c0b7d in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.3
#8  0x56026d2fbe8d in be_tls_open_server 
(port=port@entry=0x56026ed5d730) at be-secure-openssl.c:635
#9  0x56026d2eefa5 in secure_open_server 
(port=port@entry=0x56026ed5d730) at be-secure.c:118
#10 0x56026d3dc412 in ProcessStartupPacket 
(port=port@entry=0x56026ed5d730, ssl_done=ssl_done@entry=false, 
gss_done=gss_done@entry=false) at postmaster.c:2065
#11 0x56026d3dcd8e in BackendInitialize 
(port=port@entry=0x56026ed5d730) at postmaster.c:4377
#12 0x56026d3def6a in BackendStartup 
(port=port@entry=0x56026ed5d730) at postmaster.c:4155

#13 0x56026d3df115 in ServerLoop () at postmaster.c:1781
#14 0x56026d3e0645 in PostmasterMain (argc=argc@entry=20, 
argv=argv@entry=0x56026ec5a0d0) at postmaster.c:1465

#15 0x56026d2fcd7c in main (argc=20, argv=0x56026ec5a0d0) at main.c:198
(gdb) return 0
Make __GI___libc_malloc return now? (y or n) y
#0  0x7f539f6f6e09 in BUF_MEM_grow_clean () from 
/lib/x86_64-linux-gnu/libcrypto.so.3

(gdb) delete
Delete all breakpoints? (y or n) y
(gdb) c
Continuing.

And in the server log we see:

DEBUG:  SSL connection from DN:"" CN:"ssltestuser"

While in the normal case we get:

DEBUG:  SSL connection from DN:"CN=ssltestuser" CN:"ssltestuser"\

Probably we shouldn't ignore the error from X509_NAME_print_ex?


--
Sergey Shinderukhttps://postgrespro.com/





Re: Fix error handling in be_tls_open_server()

2023-09-20 Thread Sergey Shinderuk

On 20.09.2023 11:42, Daniel Gustafsson wrote:

Attached is a v2 on top of HEAD with commit message etc, which I propose to
backpatch to v15 where it was introduced.


There is a typo: upon en error. Otherwise, looks good to me. Thank you.

--
Sergey Shinderukhttps://postgrespro.com/





Re: Schema variables - new implementation for Postgres 15

2022-11-13 Thread Sergey Shinderuk

On 13.11.2022 20:59, Pavel Stehule wrote:

fresh rebase


Hello,

Sorry, I haven't been following this thread, but I'd like to report a 
memory management bug. I couldn't apply the latest patches, so I tested 
with v20221104-1-* patches applied atop of commit b0284bfb1db.



postgres=# create variable s text default 'abc';

create function f() returns text as $$
begin
return g(s);
end;
$$ language plpgsql;

create function g(t text) returns text as $$
begin
let s = 'BOOM!';
return t;
end;
$$ language plpgsql;

select f();
CREATE VARIABLE
CREATE FUNCTION
CREATE FUNCTION
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

LOG:  server process (PID 55307) was terminated by signal 11: 
Segmentation fault

DETAIL:  Failed process was running: select f();


I believe it's a use-after-free error, triggered by assigning a new 
value to s in g(), thus making t a dangling pointer.


After reconnecting I get a scary error:

postgres=# select f();
ERROR:  compressed pglz data is corrupt


Best regards,

--
Sergey Shinderukhttps://postgrespro.com/





Bug in row_number() optimization

2022-11-15 Thread Sergey Shinderuk

Hello,

We've accidentally found a subtle bug introduced by

commit 9d9c02ccd1aea8e9131d8f4edb21bf1687e40782
Author: David Rowley
Date:   Fri Apr 8 10:34:36 2022 +1200

Teach planner and executor about monotonic window funcs


On a 32-bit system Valgrind reports use-after-free when running the 
"window" test:


==35487== Invalid read of size 4
==35487==at 0x48398A4: memcpy (vg_replace_strmem.c:1035)
==35487==by 0x1A2902: fill_val (heaptuple.c:287)
==35487==by 0x1A2902: heap_fill_tuple (heaptuple.c:336)
==35487==by 0x1A3C29: heap_form_minimal_tuple (heaptuple.c:1412)
==35487==by 0x3D4555: tts_virtual_copy_minimal_tuple (execTuples.c:290)
==35487==by 0x72FC33: ExecCopySlotMinimalTuple (tuptable.h:473)
==35487==by 0x72FC33: tuplesort_puttupleslot (tuplesortvariants.c:610)
==35487==by 0x403463: ExecSort (nodeSort.c:153)
==35487==by 0x3D0C8E: ExecProcNodeFirst (execProcnode.c:464)
==35487==by 0x40AF09: ExecProcNode (executor.h:259)
==35487==by 0x40AF09: begin_partition (nodeWindowAgg.c:1106)
==35487==by 0x40D259: ExecWindowAgg (nodeWindowAgg.c:2125)
==35487==by 0x3D0C8E: ExecProcNodeFirst (execProcnode.c:464)
==35487==by 0x405E17: ExecProcNode (executor.h:259)
==35487==by 0x405E17: SubqueryNext (nodeSubqueryscan.c:53)
==35487==by 0x3D41C7: ExecScanFetch (execScan.c:133)
==35487==by 0x3D41C7: ExecScan (execScan.c:199)
==35487==  Address 0xe3e8af0 is 168 bytes inside a block of size 8,192 
alloc'd

==35487==at 0x483463B: malloc (vg_replace_malloc.c:299)
==35487==by 0x712B63: AllocSetContextCreateInternal (aset.c:446)
==35487==by 0x3D82BE: CreateExprContextInternal (execUtils.c:253)
==35487==by 0x3D84DC: CreateExprContext (execUtils.c:303)
==35487==by 0x3D8750: ExecAssignExprContext (execUtils.c:482)
==35487==by 0x40BC1A: ExecInitWindowAgg (nodeWindowAgg.c:2382)
==35487==by 0x3D1232: ExecInitNode (execProcnode.c:346)
==35487==by 0x4035E0: ExecInitSort (nodeSort.c:265)
==35487==by 0x3D11AB: ExecInitNode (execProcnode.c:321)
==35487==by 0x40BD36: ExecInitWindowAgg (nodeWindowAgg.c:2432)
==35487==by 0x3D1232: ExecInitNode (execProcnode.c:346)
==35487==by 0x405E99: ExecInitSubqueryScan (nodeSubqueryscan.c:126)


It's faster to run just this test under Valgrind:

make installcheck-test TESTS='test_setup window'


This can also be reproduced on a 64-bit system by forcing int8 to be 
passed by reference:


--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -82,9 +82,7 @@
  *
  * Changing this requires an initdb.
  */
-#if SIZEOF_VOID_P >= 8
-#define USE_FLOAT8_BYVAL 1
-#endif
+#undef USE_FLOAT8_BYVAL

 /*
  * When we don't have native spinlocks, we use semaphores to simulate 
them.



Futhermore, zeroing freed memory makes the test fail:

--- a/src/include/utils/memdebug.h
+++ b/src/include/utils/memdebug.h
@@ -39,7 +39,7 @@ static inline void
 wipe_mem(void *ptr, size_t size)
 {
VALGRIND_MAKE_MEM_UNDEFINED(ptr, size);
-   memset(ptr, 0x7F, size);
+   memset(ptr, 0, size);
VALGRIND_MAKE_MEM_NOACCESS(ptr, size);
 }

$ cat src/test/regress/regression.diffs
diff -U3 
/home/sergey/pgwork/devel/src/src/test/regress/expected/window.out 
/home/sergey/pgwork/devel/src/src/test/regress/results/window.out
--- /home/sergey/pgwork/devel/src/src/test/regress/expected/window.out 
2022-11-03 18:26:52.203624217 +0300
+++ /home/sergey/pgwork/devel/src/src/test/regress/results/window.out 
2022-11-16 01:47:18.494273352 +0300

@@ -3721,7 +3721,8 @@
 ---+---++-++++
  personnel | 5 |   3500 | 12-10-2007  |  2 |  1 |  2 |  2
  sales | 3 |   4800 | 08-01-2007  |  3 |  1 |  3 |  3
-(2 rows)
+ sales | 4 |   4800 | 08-08-2007  |  3 |  0 |  3 |  3
+(3 rows)

 -- Tests to ensure we don't push down the run condition when it's not 
valid to

 -- do so.


The failing query is:

SELECT * FROM
  (SELECT *,
  count(salary) OVER (PARTITION BY depname || '') c1, -- w1
  row_number() OVER (PARTITION BY depname) rn, -- w2
  count(*) OVER (PARTITION BY depname) c2, -- w2
  count(*) OVER (PARTITION BY '' || depname) c3 -- w3
   FROM empsalary
) e WHERE rn <= 1 AND c1 <= 3;


As far as I understand, ExecWindowAgg for the intermediate WindowAgg 
node switches into pass-through mode, stops evaluating row_number(), and 
returns the previous value instead. But if int8 is passed by reference, 
the previous value stored in econtext->ecxt_aggvalues becomes a dangling 
pointer when the per-output-tuple memory context is reset.


Attaching a patch that makes the window test fail on a 64-bit system.

Best regards,

--
Sergey Shinderukhttps://postgrespro.com/diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index f2a106f983d..66ab343d51c 100644
--- a/src/include/pg_config_manual.h
+++ b/

Re: Bug in row_number() optimization

2022-11-24 Thread Sergey Shinderuk

On 24.11.2022 06:16, Richard Guo wrote:

Regarding how to fix this problem, firstly I believe we need to evaluate
window functions in the per-tuple memory context, as the HEAD does.
When we decide we need to go into pass-through mode, I'm thinking that
we can just copy out the results of the last evaluation to the per-query
memory context, while still storing their pointers in ecxt_aggvalues.

Does this idea work?

Although I'm not familiar with the code, this makes sense to me.

You proposed:

+#ifdef USE_FLOAT8_BYVAL
+   evalWfuncContext = 
winstate->ss.ps.ps_ExprContext->ecxt_per_tuple_memory;

+#else
+   evalWfuncContext = 
winstate->ss.ps.ps_ExprContext->ecxt_per_query_memory;

+#endif

Shouldn't we handle any pass-by-reference type the same? I suppose, a 
user-defined window function can return some other type, not int8.


Best regards,

--
Sergey Shinderukhttps://postgrespro.com/





Re: Bug in row_number() optimization

2022-11-25 Thread Sergey Shinderuk

On 25.11.2022 15:46, Richard Guo wrote:

Considering the 'Filter' is a strict function, marking it as
NULL would do.  I think this is why this patch works.


What about user-defined operators? I created my own <= operator for int8 
which returns true on null input, and put it in a btree operator class. 
With my operator I get:


  depname  | empno | salary | enroll_date | c1 | rn | c2 | c3
---+---++-++++
 personnel | 5 |   3500 | 2007-12-10  |  2 |  1 |  2 |  2
 sales | 3 |   4800 | 2007-08-01  |  3 |  1 |  3 |  3
 sales | 4 |   4800 | 2007-08-08  |  3 |||  3
(3 rows)

Admittedly, it's weird that (null <= 1) evaluates to true. But does it 
violate  the contract of the btree operator class or something? Didn't 
find a clear answer in the docs.


--
Sergey Shinderukhttps://postgrespro.com/





Re: Bug in row_number() optimization

2022-11-28 Thread Sergey Shinderuk

On 28.11.2022 03:23, David Rowley wrote:

On Sat, 26 Nov 2022 at 05:19, Tom Lane  wrote:


Sergey Shinderuk  writes:

What about user-defined operators? I created my own <= operator for int8
which returns true on null input, and put it in a btree operator class.
Admittedly, it's weird that (null <= 1) evaluates to true. But does it
violate  the contract of the btree operator class or something? Didn't
find a clear answer in the docs.


It's pretty unlikely that this would work during an actual index scan.
I'm fairly sure that btree (and other index AMs) have hard-wired
assumptions that index operators are strict.


If we're worried about that then we could just restrict this
optimization to only work with strict quals.


Not sure this is necessary if btree operators must be strict anyway.



The proposal to copy the datums into the query context does not seem
to me to be a good idea. If there are a large number of partitions
then it sounds like we'll leak lots of memory.  We could invent some
partition context that we reset after each partition, but that's
probably more complexity than it would be worth.


Ah, good point.



I've attached a draft patch to move the code to nullify the aggregate
results so that's only done once per partition and adjusted the
planner to limit this to strict quals.


Not quite sure that we don't need to do anything for the 
WINDOWAGG_PASSTHROUGH_STRICT case. Although, we won't return any more 
tuples for the current partition, we still call ExecProject with 
dangling pointers. Is it okay?



+   if (!func_strict(opexpr->opfuncid))
+   return false;

Should return true instead?

--
Sergey Shinderukhttps://postgrespro.com/





Re: Bug in row_number() optimization

2022-12-01 Thread Sergey Shinderuk

On 01.12.2022 11:18, Richard Guo wrote:


On Mon, Nov 28, 2022 at 5:59 PM Sergey Shinderuk 
mailto:s.shinde...@postgrespro.ru>> wrote:


Not quite sure that we don't need to do anything for the
WINDOWAGG_PASSTHROUGH_STRICT case. Although, we won't return any more
tuples for the current partition, we still call ExecProject with
dangling pointers. Is it okay?

AFAIU once we go into WINDOWAGG_PASSTHROUGH_STRICT we will spool all the
remaining tuples in the current partition without storing them and then
move to the next partition if available and become WINDOWAGG_RUN again
or become WINDOWAGG_DONE if there are no further partitions.  It seems
we would not have chance to see the dangling pointers.


Maybe I'm missing something, but the previous call to spool_tuples() 
might have read extra tuples (if the tuplestore spilled to disk), and 
after switching to WINDOWAGG_PASSTHROUGH_STRICT mode we nevertheless 
would loop through these extra tuples and call ExecProject if only to 
increment winstate->currentpos.


--
Sergey Shinderukhttps://postgrespro.com/





Add PL/pgSQL extra check no_data_found

2022-12-08 Thread Sergey Shinderuk

Hello,

I propose to add a new value "no_data_found" for the 
plpgsql.extra_errors and plpgsql.extra_warnings parameters [1].


With plpgsql.extra_errors=no_data_found SELECT INTO raises no_data_found 
exception when the result set is empty. With 
plpgsql.extra_errors=too_many_rows,no_data_found SELECT INTO behaves 
like SELECT INTO STRICT [2]. This could simplify migration from PL/SQL 
and may be just more convenient.


One potential downside is that plpgsql.extra_errors=no_data_found could 
break existing functions expecting to get null or checking IF found 
explicitly. This is also true for the too_many_rows exception, but 
arguably it's a programmer error, while no_data_found switches to a 
different convention for handling (or not handling) an empty result with 
SELECT INTO.


Otherwise the patch is straightforward.

What do you think?

--
Sergey Shinderukhttps://postgrespro.com/


[1] 
https://www.postgresql.org/docs/devel/plpgsql-development-tips.html#PLPGSQL-EXTRA-CHECKS
[2] 
https://www.postgresql.org/docs/devel/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROWdiff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index a6473429480..80672a3672f 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -4197,8 +4197,14 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
 	long		tcount;
 	int			rc;
 	PLpgSQL_expr *expr = stmt->sqlstmt;
+	int			no_data_found_level = 0;
 	int			too_many_rows_level = 0;
 
+	if (plpgsql_extra_errors & PLPGSQL_XCHECK_NODATAFOUND)
+		no_data_found_level = ERROR;
+	else if (plpgsql_extra_warnings & PLPGSQL_XCHECK_NODATAFOUND)
+		no_data_found_level = WARNING;
+
 	if (plpgsql_extra_errors & PLPGSQL_XCHECK_TOOMANYROWS)
 		too_many_rows_level = ERROR;
 	else if (plpgsql_extra_warnings & PLPGSQL_XCHECK_TOOMANYROWS)
@@ -4355,16 +4361,19 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
 		 */
 		if (n == 0)
 		{
-			if (stmt->strict)
+			if (stmt->strict || no_data_found_level)
 			{
 char	   *errdetail;
+int			errlevel;
 
 if (estate->func->print_strict_params)
 	errdetail = format_expr_params(estate, expr);
 else
 	errdetail = NULL;
 
-ereport(ERROR,
+errlevel = stmt->strict ? ERROR : no_data_found_level;
+
+ereport(errlevel,
 		(errcode(ERRCODE_NO_DATA_FOUND),
 		 errmsg("query returned no rows"),
 		 errdetail ? errdetail_internal("parameters: %s", errdetail) : 0));
diff --git a/src/pl/plpgsql/src/pl_handler.c b/src/pl/plpgsql/src/pl_handler.c
index 5dc334b61b3..80151540c38 100644
--- a/src/pl/plpgsql/src/pl_handler.c
+++ b/src/pl/plpgsql/src/pl_handler.c
@@ -90,6 +90,8 @@ plpgsql_extra_checks_check_hook(char **newvalue, void **extra, GucSource source)
 
 			if (pg_strcasecmp(tok, "shadowed_variables") == 0)
 extrachecks |= PLPGSQL_XCHECK_SHADOWVAR;
+			else if (pg_strcasecmp(tok, "no_data_found") == 0)
+extrachecks |= PLPGSQL_XCHECK_NODATAFOUND;
 			else if (pg_strcasecmp(tok, "too_many_rows") == 0)
 extrachecks |= PLPGSQL_XCHECK_TOOMANYROWS;
 			else if (pg_strcasecmp(tok, "strict_multi_assignment") == 0)
diff --git a/src/pl/plpgsql/src/plpgsql.h b/src/pl/plpgsql/src/plpgsql.h
index 088768867d9..b69c058ba86 100644
--- a/src/pl/plpgsql/src/plpgsql.h
+++ b/src/pl/plpgsql/src/plpgsql.h
@@ -1202,8 +1202,9 @@ extern bool plpgsql_check_asserts;
 /* extra compile-time and run-time checks */
 #define PLPGSQL_XCHECK_NONE		0
 #define PLPGSQL_XCHECK_SHADOWVAR(1 << 1)
-#define PLPGSQL_XCHECK_TOOMANYROWS(1 << 2)
-#define PLPGSQL_XCHECK_STRICTMULTIASSIGNMENT	(1 << 3)
+#define PLPGSQL_XCHECK_NODATAFOUND(1 << 2)
+#define PLPGSQL_XCHECK_TOOMANYROWS(1 << 3)
+#define PLPGSQL_XCHECK_STRICTMULTIASSIGNMENT	(1 << 4)
 #define PLPGSQL_XCHECK_ALL		((int) ~0)
 
 extern int	plpgsql_extra_warnings;
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index 08e42f17dc2..454e517fcc9 100644
--- a/src/test/regress/expected/plpgsql.out
+++ b/src/test/regress/expected/plpgsql.out
@@ -3086,6 +3086,25 @@ select shadowtest(1);
 (1 row)
 
 -- runtime extra checks
+set plpgsql.extra_warnings to 'no_data_found';
+do $$
+declare x int;
+begin
+  select 1 into x where 0 = 1;
+end;
+$$;
+WARNING:  query returned no rows
+set plpgsql.extra_errors to 'no_data_found';
+do $$
+declare x int;
+begin
+  select 1 into x where 0 = 1;
+end;
+$$;
+ERROR:  query returned no rows
+CONTEXT:  PL/pgSQL function inline_code_block line 4 at SQL statement
+reset plpgsql.extra_errors;
+reset plpgsql.extra_warnings;
 set plpgsql.extra_warnings to 'too_many_rows';
 do $$
 declare x int;
diff --git a/src/test/regress/sql/plpgsql.sql b/src/test/regress/sql/plpgsql.sql
index 588c3310337..60896b3d0da 100644
--- a/src/test/regress/sql/plpgsql.sql
+++ b/src/test/regress/sql/plpgsql.sql
@@ -2620

Re: Add PL/pgSQL extra check no_data_found

2022-12-09 Thread Sergey Shinderuk

On 09.12.2022 09:46, Pavel Stehule wrote:
I don't like the idea about possible replacement of INTO STRICT by INTO 
+ extra warnings.


Handling exceptions is significantly more expensive than in Oracle, and 
using INTO without STRICT with the next test IF NOT FOUND THEN can save 
one safepoint and one handling an exception. It should be mentioned in 
the documentation. Using this very common Oracle's pattern can have a 
very negative impact on performance in Postgres. If somebody does port 
from Oracle, and wants compatible behavior then he should use INTO 
STRICT. I think it is counterproductive to hide syntax differences when 
there is a significant difference in performance (and will be).


Fair enough. Thank you, Pavel.

--
Sergey Shinderukhttps://postgrespro.com/





Re: A bug with ExecCheckPermissions

2023-02-09 Thread Sergey Shinderuk

On 08.02.2023 21:23, Alvaro Herrera wrote:

On 2023-Feb-08, Amit Langote wrote:


On Wed, Feb 8, 2023 at 16:19 Alvaro Herrera  wrote:



I think we should also patch ExecCheckPermissions to use forboth(),
scanning the RTEs as it goes over the perminfos, and make sure that the
entries are consistent.


Hmm, we can’t use forboth here, because not all RTEs have the corresponding
RTEPermissionInfo, inheritance children RTEs, for example.


Doh, of course.


Also, it doesn’t make much sense to reinstate the original loop over
range table and fetch the RTEPermissionInfo for the RTEs with non-0
perminfoindex, because the main goal of the patch was to make
ExecCheckPermissions() independent of range table length.


Yeah, I'm thinking in a mechanism that would allow us to detect bugs in
development builds — no need to have it run in production builds.
However, I can't see any useful way to implement it.




Maybe something like the attached would do?


--
Sergey Shinderukhttps://postgrespro.com/
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 39bfb48dc22..94866743f48 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -577,6 +577,28 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
ListCell   *l;
boolresult = true;
 
+#ifdef USE_ASSERT_CHECKING
+   Bitmapset  *indexset = NULL;
+
+   /* Check that rteperminfos is consistent with rangeTable */
+   foreach(l, rangeTable)
+   {
+   RangeTblEntry *rte = lfirst_node(RangeTblEntry, l);
+
+   if (rte->perminfoindex != 0)
+   {
+   /* Sanity checks */
+   (void) getRTEPermissionInfo(rteperminfos, rte);
+   /* Many-to-one mapping not allowed */
+   Assert(!bms_is_member(rte->perminfoindex, indexset));
+   indexset = bms_add_member(indexset, rte->perminfoindex);
+   }
+   }
+
+   /* All rteperminfos are referenced */
+   Assert(bms_num_members(indexset) == list_length(rteperminfos));
+#endif
+
foreach(l, rteperminfos)
{
RTEPermissionInfo *perminfo = lfirst_node(RTEPermissionInfo, l);
diff --git a/src/backend/utils/adt/ri_triggers.c 
b/src/backend/utils/adt/ri_triggers.c
index 375b17b9f3b..f6d19226a0a 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -1399,6 +1399,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, 
Relation pk_rel)
pkrte->relid = RelationGetRelid(pk_rel);
pkrte->relkind = pk_rel->rd_rel->relkind;
pkrte->rellockmode = AccessShareLock;
+   pkrte->perminfoindex = 2;
 
pk_perminfo = makeNode(RTEPermissionInfo);
pk_perminfo->relid = RelationGetRelid(pk_rel);
@@ -1409,6 +1410,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, 
Relation pk_rel)
fkrte->relid = RelationGetRelid(fk_rel);
fkrte->relkind = fk_rel->rd_rel->relkind;
fkrte->rellockmode = AccessShareLock;
+   fkrte->perminfoindex = 1;
 
fk_perminfo = makeNode(RTEPermissionInfo);
fk_perminfo->relid = RelationGetRelid(fk_rel);


Re: Compile fail on macos big sur

2021-09-23 Thread Sergey Shinderuk
Hi,

On 23.09.2021 10:09, zhang listar wrote:
> Hi, guys, I encount a problem on compiling pssql, the environment is:
> os: macos big sur version 11.5.2 (20G95)
> compiler:  gcc-11 (Homebrew GCC 11.2.0) 11.2.0

I've just tried building with gcc-11 on Catalina (yes, it's time to
upgrade) and it went fine, no warnings.

Maybe `git clean -fdx` would help? (Be careful, though, it removes any
changes you may have maid.)


> /usr/local/bin/gcc-11 -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla -Wendif-labels
> -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
> -Wformat-security -fno-strict-aliasing -fwrapv
> -fexcess-precision=standard -Wno-format-truncation
> -Wno-stringop-truncation -O2  zic.o -L../../src/port -L../../src/common
> -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk
> -L/usr/local/opt/binutils/lib  -Wl,-dead_strip_dylibs   -lpgcommon
> -lpgport -lz -lreadline -lm  -o zic

Here is what I have:

/usr/local/bin/gcc-11 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
-Wformat-security -fno-strict-aliasing -fwrapv
-fexcess-precision=standard -Wno-format-truncation
-Wno-stringop-truncation -g -O2  zic.o -L../../src/port
-L../../src/common -isysroot
/Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk
-Wl,-dead_strip_dylibs   -lpgcommon -lpgport -lz -lreadline -lm  -o zic

Looks the same, and gives no warnings.

Just in case, I configured like that:

./configure --prefix=$(cd ..;pwd)/install-gcc-11 --enable-cassert
--enable-debug --enable-tap-tests CC=/usr/local/bin/gcc-11

Hope that helps.

-- 
Sergey Shinderukhttps://postgrespro.com/




Re: Compile fail on macos big sur

2021-09-23 Thread Sergey Shinderuk
On 23.09.2021 10:50, zhang listar wrote:
> Thanks for your reply, I do make distclean and git clean -fdx, but it
> does no help.
> 
> the code: master, c7aeb775df895db240dcd6f47242f7e08899adfb
> It looks like the macos issue, because of the ignoring of some lib, it
> drives the compiling error. 

Maybe you could try adding -v to the problematic gcc command to see what
really goes on.

I see that gcc calls /usr/bin/ld, not binutils ld installed with
Homebrew.  I saw an advice to `brew unlink binutils` somewhere.




Memory leak in pg_hmac_final

2021-10-01 Thread Sergey Shinderuk
Hi,

Here is a patch fixing the subject.

Regards,

-- 
Sergey Shinderukhttps://postgrespro.com/
diff --git a/src/common/hmac.c b/src/common/hmac.c
index 1089db67443..bfe2e7cb5e9 100644
--- a/src/common/hmac.c
+++ b/src/common/hmac.c
@@ -232,7 +232,10 @@ pg_hmac_final(pg_hmac_ctx *ctx, uint8 *dest, size_t len)
memset(h, 0, ctx->digest_size);
 
if (pg_cryptohash_final(ctx->hash, h, ctx->digest_size) < 0)
+   {
+   FREE(h);
return -1;
+   }
 
/* H(K XOR opad, tmp) */
if (pg_cryptohash_init(ctx->hash) < 0 ||
@@ -240,9 +243,11 @@ pg_hmac_final(pg_hmac_ctx *ctx, uint8 *dest, size_t len)
pg_cryptohash_update(ctx->hash, h, ctx->digest_size) < 0 ||
pg_cryptohash_final(ctx->hash, dest, len) < 0)
{
+   FREE(h);
return -1;
}
 
+   FREE(h);
return 0;
 }
 


Re: Memory leak in pg_hmac_final

2021-10-01 Thread Sergey Shinderuk
On 01.10.2021 15:05, Daniel Gustafsson wrote:
>> On 1 Oct 2021, at 12:39, Sergey Shinderuk  wrote:
> 
>> Here is a patch fixing the subject.
> 
> Seems reasonable on a quick glance, the interim h buffer should be freed (this
> is present since 14).  I'll have another look at this in a bit and will take
> care of it.

Thanks.  I found it using the leaks tool on macOS.

Without the patch:

% MallocStackLogging=1 leaks -quiet -atExit -- psql -d 'dbname=postgres
user=alice password=secret' -XAtc 'select 1'
...
Process 91635: 4390 nodes malloced for 252 KB
Process 91635: 4103 leaks for 131296 total leaked bytes.
...

(User alice has a SCRAM-encrypted password.)


With the patch:

Process 98250: 290 nodes malloced for 124 KB
Process 98250: 3 leaks for 96 total leaked bytes.


The remaining leaks are expected and not worth fixing, I guess:

STACK OF 1 INSTANCE OF 'ROOT LEAK: malloc<32>':
4   libdyld.dylib  0x7fff68d80cc9 start + 1
3   psql  0x10938b9f9 main + 2393
startup.c:207
2   psql  0x1093ab5a5 pg_malloc + 21
fe_memutils.c:49
1   libsystem_malloc.dylib 0x7fff68f36cf5 malloc + 21
0   libsystem_malloc.dylib 0x7fff68f36d9e malloc_zone_malloc
+ 140

2 (48 bytes) ROOT LEAK: 0x7ffbb75040d0 [32]
   1 (16 bytes) 0x7ffbb75040f0 [16]  length: 8  "select 1"

STACK OF 1 INSTANCE OF 'ROOT LEAK: malloc<48>':
5   libdyld.dylib  0x7fff68d80cc9 start + 1
4   psql  0x10938b8b0 main + 2064
startup.c:207
3   psql  0x1093ab78e pg_strdup + 14
fe_memutils.c:96
2   libsystem_c.dylib  0x7fff68e26ce6 strdup + 32
1   libsystem_malloc.dylib 0x7fff68f36cf5 malloc + 21
0   libsystem_malloc.dylib 0x7fff68f36d9e malloc_zone_malloc
+ 140

1 (48 bytes) ROOT LEAK: 0x7ffbb75040a0 [48]  length: 42
"dbname=postgres user=alice password=secret"



-- 
Sergey Shinderukhttps://postgrespro.com/




Re: pgcrypto support for bcrypt $2b$ hashes

2021-10-01 Thread Sergey Shinderuk
On 02.10.2021 06:48, Daniel Fone wrote:
> I don’t get these compiler warnings and I can’t find any settings to use that 
> might generate them. I’m compiling on macOS 11.6 configured with 
> `--enable-cassert --enable-depend --enable-debug CFLAGS=-O0`
>

Hi, Daniel!
I don't get them from clang on macOS either.


> I’ve optimistically updated the patch to hopefully address them, but I’d like 
> to know what I need to do to get those warnings.

But gcc-11 on Ubuntu 20.04 emits them.

Regards,

-- 
Sergey Shinderukhttps://postgrespro.com/




Bug in DefineRange() with multiranges

2021-10-04 Thread Sergey Shinderuk

Hi,

My colleague, Alex Kozhemyakin, stumbled upon a bug in DefineRange().
The problem is here:

@@ -1707,7 +1707,6 @@ DefineRange(ParseState *pstate, 
CreateRangeStmt *stmt)

/* Create cast from the range type to its multirange type */
CastCreate(typoid, multirangeOid, castFuncOid, 'e', 'f', 
DEPENDENCY_INTERNAL);


-   pfree(multirangeTypeName);
pfree(multirangeArrayName);

return address;


Given a query

create type textrange1 as range(subtype=text, 
multirange_type_name=multirange_of_text, collation="C");


the string "multirange_of_text" in the parse tree is erroneously
pfree'd.  The corrupted parse tree is then passed to event triggers.

There is another branch in DefineRange() that genereates a multirange
type name which is fine to free.

I wonder what is the proper fix.  Just drop pfree() altogether or add
pstrdup() instead?  I see that makeMultirangeTypeName() doesn't bother
freeing its buf.

Here is a gdb session demonstating the bug:

Breakpoint 1, ProcessUtilitySlow (pstate=0x5652e80c7730, 
pstmt=0x5652e80a6a40, queryString=0x5652e80a5790 "create type textrange1 
as range(subtype=text, multirange_type_name=multirange_of_text, 
collation=\"C\");",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, 
qc=0x7ffe835b4be0, dest=) at 
/pgwork/REL_14_STABLE/src/src/backend/tcop/utility.c:1621
1621address = DefineRange((CreateRangeStmt *) 
parsetree);
(gdb) p *(Value *)((TypeName *)((DefElem *)((CreateRangeStmt 
*)parsetree)->params->elements[1].ptr_value)->arg)->names->elements[0].ptr_value
$1 = {type = T_String, val = {ival = -401972176, str = 0x5652e80a6430 
"multirange_of_text"}}

(gdb) n
1900if (!commandCollected)
(gdb) p *(Value *)((TypeName *)((DefElem *)((CreateRangeStmt 
*)parsetree)->params->elements[1].ptr_value)->arg)->names->elements[0].ptr_value
$2 = {type = T_String, val = {ival = -401972176, str = 0x5652e80a6430 
'\177' , "\020"}}



Regards,

--
Sergey Shinderukhttps://postgrespro.com/




Re: gcc -Wclobbered in PostgresMain

2023-07-10 Thread Sergey Shinderuk

Hello, Tom,


On 08.07.2023 18:11, Tom Lane wrote:

What we ought to be doing is resetting these two flags
after the disable_all_timeouts call.



Oops, I missed that.



Having done that, it wouldn't really be necessary to mark these
as volatile.  I kept that marking anyway for consistency with
send_ready_for_query, but perhaps we shouldn't?



I don't know. Maybe marking them volatile is more future proof. Not sure.



I also moved firstchar's declaration inside the loop where it's used, to
make it clear that this variable needn't be volatile and is not
preserved after longjmp().


Good idea, but then why not the same for input_message?  It's fully
reinitialized each time through the loop, too.



Yeah, that's better.



In short, something like the attached, except I'm not totally sold
on changing the volatility of the timeout flags.


Looks good to me.


Thank you.

--
Sergey Shinderukhttps://postgrespro.com/





Fix error handling in be_tls_open_server()

2023-08-01 Thread Sergey Shinderuk

Hi,

A static analyzer reported a possible pfree(NULL) in 
be_tls_open_server(). Here is a fix. Also handle an error from 
X509_NAME_print_ex().


AFAICS, the error "SSL certificate's distinguished name contains 
embedded null" could not be reached at all, because XN_FLAG_RFC2253 
passed to X509_NAME_print_ex() ensures that null bytes are escaped.


Best regards,

--
Sergey Shinderukhttps://postgrespro.com/diff --git a/src/backend/libpq/be-secure-openssl.c 
b/src/backend/libpq/be-secure-openssl.c
index 658b09988d6..31b6a6eacdf 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -620,8 +620,11 @@ aloop:
bio = BIO_new(BIO_s_mem());
if (!bio)
{
-   pfree(port->peer_cn);
-   port->peer_cn = NULL;
+   if (port->peer_cn != NULL)
+   {
+   pfree(port->peer_cn);
+   port->peer_cn = NULL;
+   }
return -1;
}
 
@@ -632,12 +635,15 @@ aloop:
 * which make regular expression matching a bit easier. Also 
note that
 * it prints the Subject fields in reverse order.
 */
-   X509_NAME_print_ex(bio, x509name, 0, XN_FLAG_RFC2253);
-   if (BIO_get_mem_ptr(bio, &bio_buf) <= 0)
+   if (X509_NAME_print_ex(bio, x509name, 0, XN_FLAG_RFC2253) == -1 
||
+   BIO_get_mem_ptr(bio, &bio_buf) <= 0)
{
BIO_free(bio);
-   pfree(port->peer_cn);
-   port->peer_cn = NULL;
+   if (port->peer_cn != NULL)
+   {
+   pfree(port->peer_cn);
+   port->peer_cn = NULL;
+   }
return -1;
}
peer_dn = MemoryContextAlloc(TopMemoryContext, bio_buf->length 
+ 1);
@@ -651,8 +657,11 @@ aloop:
(errcode(ERRCODE_PROTOCOL_VIOLATION),
 errmsg("SSL certificate's 
distinguished name contains embedded null")));
pfree(peer_dn);
-   pfree(port->peer_cn);
-   port->peer_cn = NULL;
+   if (port->peer_cn != NULL)
+   {
+   pfree(port->peer_cn);
+   port->peer_cn = NULL;
+   }
return -1;
}
 


Re: Fix error handling in be_tls_open_server()

2023-08-24 Thread Sergey Shinderuk

On 23.08.2023 16:23, Daniel Gustafsson wrote:

On 1 Aug 2023, at 16:44, Sergey Shinderuk  wrote:



A static analyzer reported a possible pfree(NULL) in be_tls_open_server().


This has the smell of a theoretical problem, I can't really imagine a
certificate where which would produce this.  Have you been able to trigger it?



I triggered a crash by generating a certificate without a CN and forcing 
malloc to return NULL when called from X509_NAME_print_ex or 
BIO_get_mem_ptr with gdb.


Initially I tried to trigger a crash by generating a certificate without 
a CN and with a DN contaning the null byte. But as I said, the error 
condition "SSL certificate's distinguished name contains embedded null" 
isn't really reachable, because XN_FLAG_RFC2253 escapes null bytes.



--
Sergey Shinderukhttps://postgrespro.com/





Re: Fix error handling in be_tls_open_server()

2023-08-24 Thread Sergey Shinderuk

On 24.08.2023 11:38, Daniel Gustafsson wrote:

On 24 Aug 2023, at 10:11, Sergey Shinderuk  wrote:



I triggered a crash by generating a certificate without a CN and forcing malloc 
to return NULL when called from X509_NAME_print_ex or BIO_get_mem_ptr with gdb.


Can you extend the patch with that certificate and a test using it?  The
certificates are generated from config files kept in the repo in src/test/ssl
in order to be reproducible.


But I need to force malloc to fail at the right time during the test. I 
don't think I can make a stable and portable test from this.



--
Sergey Shinderukhttps://postgrespro.com/





Re: Fix error handling in be_tls_open_server()

2023-08-28 Thread Sergey Shinderuk

On 28.08.2023 18:12, Jacob Champion wrote:

On Thu, Aug 24, 2023 at 6:25 PM Michael Paquier  wrote:

LD_PRELOAD is the only thing I can think about, but that's very fancy.
Even with that, having a certificate with a NULL peer_cn could prove
to be useful in the SSL suite to stress more patterns around it?


+1. Last we tried it, OpenSSL didn't want to create a certificate with
an embedded null, but maybe things have changed?



To embed a null byte into the Subject, I first generated a regular 
certificate request in the DER (binary) format, then manually inserted 
null into the file and recomputed the checksum. Like this:

https://security.stackexchange.com/a/58845

I'll try to add a client certificate lacking a CN to the SSL test suite.

--
Sergey Shinderukhttps://postgrespro.com/





Re: On login trigger: take three

2022-09-20 Thread Sergey Shinderuk

On 02.09.2022 18:36, Daniel Gustafsson wrote:

This had bitrotted a fair bit, attached is a rebase along with (mostly)
documentation fixes.  0001 adds a generic GUC for ignoring event triggers and
0002 adds the login event with event trigger support, and hooks it up to the
GUC such that broken triggers wont require single-user mode.  Moving the CF
entry back to Needs Review.



Hello!

There is a race around setting and clearing of dathasloginevt.

Steps to reproduce:

1. Create a trigger:

  CREATE FUNCTION on_login_proc() RETURNS event_trigger AS $$
  BEGIN
RAISE NOTICE 'You are welcome!';
  END;
  $$ LANGUAGE plpgsql;

  CREATE EVENT TRIGGER on_login_trigger ON login EXECUTE PROCEDURE 
on_login_proc();


2. Then drop it, but don't start new sessions:

  DROP EVENT TRIGGER on_login_trigger;

3. Create another trigger, but don't commit yet:

  BEGIN;
  CREATE EVENT TRIGGER on_login_trigger ON login EXECUTE PROCEDURE 
on_login_proc();


4. Start a new session. This clears dathasloginevt.

5. Commit the transaction:

  COMMIT;

Now we have a trigger, but it doesn't fire, because dathasloginevt=false.


If two sessions create triggers concurrently, one of them will fail.

Steps:

1. In the first session, start a transaction and create a trigger:

  BEGIN;
  CREATE EVENT TRIGGER on_login_trigger1 ON login EXECUTE PROCEDURE 
on_login_proc();


2. In the second session, create another trigger (this query blocks):

  CREATE EVENT TRIGGER on_login_trigger2 ON login EXECUTE PROCEDURE 
on_login_proc();


3. Commit in the first session:

  COMMIT;

The second session fails:

  postgres=# CREATE EVENT TRIGGER on_login_trigger2 ON login EXECUTE 
PROCEDURE on_login_proc();

  ERROR:  tuple concurrently updated



What else bothers me is that login triggers execute in an environment 
under user control and one has to be very careful. The example trigger 
from the documentation


+DECLARE

+  hour integer = EXTRACT('hour' FROM current_time);

+  rec boolean;

+BEGIN

+-- 1. Forbid logging in between 2AM and 4AM.

+IF hour BETWEEN 2 AND 4 THEN

+  RAISE EXCEPTION 'Login forbidden';

+END IF;


can be bypassed with PGOPTIONS='-c timezone=...'. Probably this is 
nothing new and concerns any SECURITY DEFINER function, but still...



Best regards,

--
Sergey Shinderukhttps://postgrespro.com/




Re: On login trigger: take three

2022-09-21 Thread Sergey Shinderuk

On 20.09.2022 17:10, Daniel Gustafsson wrote:
On 20 Sep 2022, at 15:43, Sergey Shinderuk  wrote: 
There is a race around setting and clearing of dathasloginevt.


Thanks for the report!  The whole dathasloginevt mechanism is IMO too clunky
and unelegant to go ahead with, I wouldn't be surprised if there are other bugs
lurking there.

Indeed.

CREATE DATABASE doesn't copy dathasloginevt from the template database.
ALTER EVENT TRIGGER ... ENABLE doesn't set dathasloginevt.

In both cases triggers are enabled according to \dy output, but don't 
fire. The admin may not notice it immediately.


--
Sergey Shinderukhttps://postgrespro.com/




gcc -Wclobbered in PostgresMain

2023-07-07 Thread Sergey Shinderuk

Hi, hackers!

While analyzing -Wclobbered warnings from gcc we found a true one in 
PostgresMain():


postgres.c: In function ‘PostgresMain’:
postgres.c:4118:25: warning: variable 
‘idle_in_transaction_timeout_enabled’ might be clobbered by ‘longjmp’ or 
‘vfork’ [-Wclobbered]
 4118 | boolidle_in_transaction_timeout_enabled = 
false;

  | ^~~
postgres.c:4119:25: warning: variable ‘idle_session_timeout_enabled’ 
might be clobbered by ‘longjmp’ or ‘vfork’ [-Wclobbered]

 4119 | boolidle_session_timeout_enabled = false;
  | ^~~~

These variables must be declared volatile, because they are read after 
longjmp(). send_ready_for_query declared there is volatile.


Without volatile, these variables are kept in registers and restored by 
longjump(). I think, this is harmless because the error handling code 
calls disable_all_timeouts() anyway. But strictly speaking the code is 
invoking undefined behavior by reading those variables after longjmp(), 
so it's worth fixing. And for consistency with send_ready_for_query too. 
I believe, making them volatile doesn't affect performance in any way.


I also moved firstchar's declaration inside the loop where it's used, to 
make it clear that this variable needn't be volatile and is not 
preserved after longjmp().


Best regards,

--
Sergey Shinderukhttps://postgrespro.com/diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 01b6cc1f7d3..56d8b0814b5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4111,12 +4111,11 @@ PostgresSingleUserMain(int argc, char *argv[],
 void
 PostgresMain(const char *dbname, const char *username)
 {
-	int			firstchar;
 	StringInfoData input_message;
 	sigjmp_buf	local_sigjmp_buf;
 	volatile bool send_ready_for_query = true;
-	bool		idle_in_transaction_timeout_enabled = false;
-	bool		idle_session_timeout_enabled = false;
+	volatile bool idle_in_transaction_timeout_enabled = false;
+	volatile bool idle_session_timeout_enabled = false;
 
 	Assert(dbname != NULL);
 	Assert(username != NULL);
@@ -4418,6 +4417,8 @@ PostgresMain(const char *dbname, const char *username)
 
 	for (;;)
 	{
+		int			firstchar;
+
 		/*
 		 * At top of loop, reset extended-query-message flag, so that any
 		 * errors encountered in "idle" state don't provoke skip.


Postmaster fails to shut down right after crash restart

2025-04-24 Thread Sergey Shinderuk

Hello,

While developing a patch and running regression tests I noticed that the 
postmaster could fail to shut down right after crash restart. It could 
get stuck in the PM_WAIT_BACKENDS state forever.


As far as I understand, the problem occurs when a shutdown signal is 
received before getting PMSIGNAL_RECOVERY_STARTED from the startup 
process. In that case the FatalError flag is not cleared, and the 
postmaster is stuck in PM_WAIT_BACKENDS waiting for the checkpointer, 
which ignores SIGTERM.


To easily reproduce the problem I added pg_usleep in xlogrecovery.c just 
before SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED). See the patch 
attached.


Then I run a script that simulates a crash and does pg_ctl stop:

$ ./init.sh
[...]

$ ./stop-after-crash.sh
waiting for server to start done
server started
waiting for server to shut 
down... failed

pg_ctl: server does not shut down


Some processes are still alive:

$ ps uf -C postgres
USER PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
sergey279874  0.0  0.0 222816 28560 ?Ss   14:25   0:00 
/home/sergey/pgwork/devel/install/bin/postgres -D data
sergey279887  0.0  0.0 222772  5664 ?Ss   14:25   0:00  \_ 
postgres: io worker 0
sergey279888  0.0  0.0 222772  5664 ?Ss   14:25   0:00  \_ 
postgres: io worker 1
sergey279889  0.0  0.0 222772  5664 ?Ss   14:25   0:00  \_ 
postgres: io worker 2
sergey279891  0.0  0.0 222884  8480 ?Ss   14:25   0:00  \_ 
postgres: checkpointer



Here is an excerpt from the debug log:

postmaster[279874] LOG:  all server processes terminated; reinitializing
startup[279890] LOG:  database system was interrupted; last known up at 
2025-04-24 14:25:58 MSK
startup[279890] LOG:  database system was not properly shut down; 
automatic recovery in progress


postmaster[279874] DEBUG:  postmaster received shutdown request signal
postmaster[279874] LOG:  received fast shutdown request
postmaster[279874] DEBUG:  updating PMState from PM_STARTUP to 
PM_STOP_BACKENDS
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to background 
writer process with pid 279892
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to checkpointer 
process with pid 279891
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to startup process 
with pid 279890
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to io worker 
process with pid 279889
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to io worker 
process with pid 279888
postmaster[279874] DEBUG:  sending signal 15/SIGTERM to io worker 
process with pid 279887
postmaster[279874] DEBUG:  updating PMState from PM_STOP_BACKENDS to 
PM_WAIT_BACKENDS


startup[279890] LOG:  invalid record length at 0/175A4D8: expected at 
least 24, got 0

postmaster[279874] DEBUG:  postmaster received pmsignal signal
startup[279890] LOG:  redo is not required

checkpointer[279891] LOG:  checkpoint starting: end-of-recovery 
immediate wait
checkpointer[279891] LOG:  checkpoint complete: wrote 0 buffers (0.0%), 
wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; 
write=0.007 s, sync=0.002 s, total=0.026 s; sync files=2, longest=0.001 
s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/175A4D8, redo 
lsn=0/175A4D8


startup[279890] DEBUG:  exit(0)
postmaster[279874] DEBUG:  updating PMState from PM_WAIT_BACKENDS to 
PM_WAIT_BACKENDS


checkpointer[279891] DEBUG:  checkpoint skipped because system is idle
checkpointer[279891] DEBUG:  checkpoint skipped because system is idle


I don't know how to fix this, but thought it's worth reporting.

Best regards,

--
Sergey Shinderukhttps://postgrespro.com/diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 6ce979f2d8b..19ee8b09685 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1696,7 +1696,10 @@ PerformWalRecovery(void)
 	 * archiver if necessary.
 	 */
 	if (IsUnderPostmaster)
+	{
+		pg_usleep(300);
 		SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED);
+	}
 
 	/*
 	 * Allow read-only connections immediately if we're consistent already.


init.sh
Description: application/shellscript


stop-after-crash.sh
Description: application/shellscript


logfile.gz
Description: application/gzip


Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum)

2025-06-25 Thread Sergey Shinderuk

On 16.06.2025 17:41, Andres Freund wrote:

TBH, I don't see a point in continuing with this thread without something that
others can test.  I rather doubt that the right fix here is to just change the
lock model over, but without a repro I can't evaluate that.



Hello,

I think I can reproduce the issue with pgbench on a muti-core server. I 
start a regular select-only test with 64 clients, and while it's 
running, I start a plpgsql loop creating and dropping temporary tables 
from a single psql session. I observe ~25% drop in tps reported by 
pgbench until I cancel the query in psql.



$ pgbench -n -S -c64 -j64 -T300 -P1

progress: 10.0 s, 1249724.7 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 11.0 s, 1248289.0 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 12.0 s, 1246001.0 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 13.0 s, 1247832.5 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 14.0 s, 1248205.8 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 15.0 s, 1247737.3 tps, lat 0.051 ms stddev 0.002, 0 failed
progress: 16.0 s, 1219444.3 tps, lat 0.052 ms stddev 0.039, 0 failed
progress: 17.0 s, 893943.4 tps, lat 0.071 ms stddev 0.159, 0 failed
progress: 18.0 s, 927861.3 tps, lat 0.069 ms stddev 0.150, 0 failed
progress: 19.0 s, 886317.1 tps, lat 0.072 ms stddev 0.163, 0 failed
progress: 20.0 s, 877200.1 tps, lat 0.073 ms stddev 0.164, 0 failed
progress: 21.0 s, 875424.4 tps, lat 0.073 ms stddev 0.163, 0 failed
progress: 22.0 s, 877693.0 tps, lat 0.073 ms stddev 0.165, 0 failed
progress: 23.0 s, 897202.8 tps, lat 0.071 ms stddev 0.158, 0 failed
progress: 24.0 s, 917853.4 tps, lat 0.070 ms stddev 0.153, 0 failed
progress: 25.0 s, 907865.1 tps, lat 0.070 ms stddev 0.154, 0 failed

Here I started the following loop in psql around 17s and tps dropped by 
~25%:


do $$
begin
  for i in 1..100 loop
create temp table tt1 (a bigserial primary key, b text);
drop table tt1;
commit;
  end loop;
end;
$$;

Now, if I simply remove the spinlock in SIGetDataEntries, I see a drop 
of just ~6% under concurrent DDL. I think this strongly suggests that 
the spinlock is the bottleneck.


Before that, I tried removing `if (!hasMessages) return` optimization in 
SIGetDataEntries to stress the spinlock and observed ~35% drop in tps of 
select-only with an empty sinval queue (no DDL running in background). 
Then I also removed the spinlock in SIGetDataEntries, and the loss was 
just ~4%, which may be noise. I think this also suggests that the 
spinlock could be the bottleneck.


I'm running this on a 2 socket AMD EPYC 9654 96-Core server with 
postgres and pgbench bound to distinct CPUs. PGDATA is placed on tmpfs. 
postgres is running with the default settings. pgbench tables are of 
scale 1. pgbench is connecting via loopback/127.0.0.1.


Does this sound convincing?

Best regards,

--
Sergey Shinderukhttps://postgrespro.com/





Re: Error with DEFAULT VALUE in temp table

2025-09-12 Thread Sergey Shinderuk

Hi, Tom!

Thank you for working on this. I see you've fixed the patch and 
committed it as a0b99fc1220. I tested it a bit and see some side effects 
which may be unintentional.


1. SCHEMA lost object_name.

Before:

postgres=# create schema foo;
CREATE SCHEMA
postgres=# drop schema foo;
DROP SCHEMA
postgres=# select * from dropped_objects \gx
-[ RECORD 1 ]---+---
n   | 1
classid | 2615
objid   | 16404
objsubid| 0
original| t
normal  | f
is_temporary| f
object_type | schema
schema_name |
object_name | foo
object_identity | foo
address_names   | {foo}
address_args| {}

After:

postgres=# select * from dropped_objects \gx
-[ RECORD 1 ]---+---
n   | 1
classid | 2615
objid   | 16394
objsubid| 0
original| t
normal  | f
is_temporary| f
object_type | schema
schema_name |
object_name |
object_identity | foo
address_names   | {foo}
address_args| {}


2. DEFAULT VALUE now has schema_name and object_name.

Before:

postgres=# create temp table bar (a int default 0);
CREATE TABLE
postgres=# drop table bar;
DROP TABLE
postgres=# select * from dropped_objects where object_type = 
'default value' \gx

-[ RECORD 1 ]---+--
n   | 4
classid | 2604
objid   | 16422
objsubid| 0
original| f
normal  | f
is_temporary| f
object_type | default value
schema_name |
object_name |
object_identity | for pg_temp.bar.a
address_names   | {pg_temp,bar,a}
address_args| {}

After:

postgres=# select * from dropped_objects where object_type = 
'default value' \gx

-[ RECORD 1 ]---+--
n   | 4
classid | 2604
objid   | 16430
objsubid| 0
original| f
normal  | f
is_temporary| t
object_type | default value
schema_name | pg_temp
object_name | bar
object_identity | for pg_temp.bar.a
address_names   | {pg_temp,bar,a}
address_args| {}

This may be intentional, but doesn't quite match the description for 
object_name in the docs:


Name of the object, if the combination of schema and name can be 
used as a unique identifier for the object; otherwise NULL.


Also it doesn't match with the record for the column itself:

postgres=# create temp table bar (a int default 0);
CREATE TABLE
postgres=# alter table bar drop column a;
ALTER TABLE
postgres=# select * from dropped_objects \gx
-[ RECORD 1 ]---+--
n   | 1
classid | 1259
objid   | 16435
objsubid| 1
original| t
normal  | f
is_temporary| t
object_type | table column
schema_name | pg_temp
object_name |
object_identity | pg_temp.bar.a
address_names   | {pg_temp,bar,a}
address_args| {}
-[ RECORD 2 ]---+--
n   | 2
classid | 2604
objid   | 16438
objsubid| 0
original| f
normal  | f
is_temporary| t
object_type | default value
schema_name | pg_temp
object_name | bar
object_identity | for pg_temp.bar.a
address_names   | {pg_temp,bar,a}
address_args| {}

object_name is null for the table column, but not null for its default 
value.


As for schema_name, I'm not sure whether it should be null or not. 
Currently schema_name is null for triggers and policy objects, but that 
may be accidental.


Best regards,

--
Sergey Shinderukhttps://postgrespro.com/




Re: Error with DEFAULT VALUE in temp table

2025-09-12 Thread Sergey Shinderuk

On 12.09.2025 14:01, Sergey Shinderuk wrote:
object_name is null for the table column, but not null for its default 
value.


As for schema_name, I'm not sure whether it should be null or not. 
Currently schema_name is null for triggers and policy objects, but that 
may be accidental.



Perhaps "default value" should be like "table constraint", which have 
schema_name and null object_name.



postgres=# create temp table bar (a int not null default 0);
CREATE TABLE
postgres=# alter table bar drop column a;
ALTER TABLE
postgres=# select * from dropped_objects \gx
-[ RECORD 1 ]---+--
n   | 1
classid | 1259
objid   | 16445
objsubid| 1
original| t
normal  | f
is_temporary| t
object_type | table column
schema_name | pg_temp
object_name |
object_identity | pg_temp.bar.a
address_names   | {pg_temp,bar,a}
address_args| {}
-[ RECORD 2 ]---+--
n   | 2
classid | 2604
objid   | 16448
objsubid| 0
original| f
normal  | f
is_temporary| t
object_type | default value
schema_name | pg_temp
object_name | bar
object_identity | for pg_temp.bar.a
address_names   | {pg_temp,bar,a}
address_args| {}
-[ RECORD 3 ]---+--
n   | 3
classid | 2606
objid   | 16449
objsubid| 0
original| f
normal  | f
is_temporary| t
object_type | table constraint
schema_name | pg_temp
object_name |
object_identity | bar_a_not_null on pg_temp.bar
address_names   | {pg_temp,bar,bar_a_not_null}
address_args| {}


Best regards,

--
Sergey Shinderukhttps://postgrespro.com/




Re: Error with DEFAULT VALUE in temp table

2025-09-13 Thread Sergey Shinderuk

On 13.09.2025 00:19, Tom Lane wrote:

Fixed (and tested) in the attached.


Great! Thank you.


> So that "else" action was unreachable, and the code failed
> to set "istemp" true for its own temp schema.

As for dropping my own temp schema, it's still a bit inconsistent (as it 
was before):


postgres=# select pg_my_temp_schema()::regnamespace;
 pg_my_temp_schema
---
 pg_temp_0
(1 row)
postgres=# drop schema pg_temp_0;
DROP SCHEMA

postgres=# select * from dropped_objects where object_type = 
'schema' and is_temporary \gx

-[ RECORD 1 ]---+--
n   | 7
classid | 2615
objid   | 16398
objsubid| 0
original| t
normal  | f
is_temporary| t
object_type | schema
schema_name |
object_name | pg_temp_0
object_identity | pg_temp
address_names   | {pg_temp}
address_args| {}

object_identity is pg_temp, but object_name is pg_temp_0. But maybe 
that's okay. Anyway, I don't think that dropping my own temp schema 
makes sense.



Also I noticed that schema_name for temp functions doesn't match with 
object_identity (pg_temp vs pg_temp_1):


postgres=# create function pg_temp.bar(int) returns int as 'select 
$1' language sql;

CREATE FUNCTION
postgres=# drop function pg_temp.bar(int);
DROP FUNCTION

postgres=# select * from dropped_objects where object_type = 
'function' and is_temporary \gx

-[ RECORD 1 ]---+---
n   | 8
classid | 1255
objid   | 16412
objsubid| 0
original| t
normal  | f
is_temporary| t
object_type | function
schema_name | pg_temp
object_name |
object_identity | pg_temp_1.bar(integer)
address_names   | {pg_temp,bar}
address_args| {integer}

There should be a call to get_namespace_name_or_temp somewhere, I guess.

If you say this should be fixed, I can come up with a patch later. But 
maybe it's trivial.



Thanks again!

--
Sergey Shinderukhttps://postgrespro.com/