Re: Making background psql nicer to use in tap tests
> On 8 Apr 2023, at 02:49, Tom Lane wrote: > > I wrote: >> I've been doing some checking with perlbrew locally. It appears to not >> be about IO::Pty so much as IPC::Run: it works with IPC::Run 0.99 but >> not 0.79. Still bisecting to identify exactly what's the minimum >> okay version. > > The answer is: it works with IPC::Run >= 0.98. The version of IO::Pty > doesn't appear significant; it works at least back to 1.00 from early > 2002. Thanks for investigating this! > IPC::Run 0.98 is relatively new (2018), so I don't think it'd fly > to make that our new minimum version across-the-board. Absolutely, that's not an option. > I recommend > just setting up this one test to SKIP if IPC::Run is too old. Yes, will do that when I have a little more time at hand for monitoring the BF later today. -- Daniel Gustafsson
broken master branch
Hi on fresh Fedora 38, I cannot to run regress tests +ERROR: could not load library "/home/pavel/src/postgresql.master/tmp_install/usr/local/pgsql/master/lib/llvmjit.so": /home/pavel/src/p ostgresql.master/tmp_install/usr/local/pgsql/master/lib/llvmjit.so: undefined symbol: LLVMBuildGEP SELECT BOOLTBL1.*, BOOLTBL2.* FROM BOOLTBL1, BOOLTBL2 WHERE boolne(BOOLTBL2.f1,BOOLTBL1.f1); There is lot of compile warnings In file included from llvmjit_expr.c:31: ../../../../src/include/jit/llvmjit_emit.h: In function ‘l_load_struct_gep’: ../../../../src/include/jit/llvmjit_emit.h:112:30: warning: implicit declaration of function ‘LLVMBuildStructGEP’; did you mean ‘LLVMBuildStructGEP2’? [-Wimplicit-function-declaration] 112 | LLVMValueRef v_ptr = LLVMBuildStructGEP(b, v, idx, ""); | ^~ | LLVMBuildStructGEP2 ../../../../src/include/jit/llvmjit_emit.h:112:30: warning: initialization of ‘LLVMValueRef’ {aka ‘struct LLVMOpaqueValue *’} from ‘int’ makes pointer from integer without a cast [-Wint-conversion] ../../../../src/include/jit/llvmjit_emit.h:114:16: warning: implicit declaration of function ‘LLVMBuildLoad’; did you mean ‘LLVMBuildLoad2’? [-Wimplicit-function-declaration] 114 | return LLVMBuildLoad(b, v_ptr, name); |^ |LLVMBuildLoad2 ../../../../src/include/jit/llvmjit_emit.h:114:16: warning: returning ‘int’ from a function with return type ‘LLVMValueRef’ {aka ‘struct LLVMOpaqueValue *’} makes pointer from integer without a cast [-Wint-conversion] 114 | return LLVMBuildLoad(b, v_ptr, name); |^ ../../../../src/include/jit/llvmjit_emit.h: In function ‘l_load_gep1’: ../../../../src/include/jit/llvmjit_emit.h:123:30: warning: implicit declaration of function ‘LLVMBuildGEP’; did you mean ‘LLVMBuildGEP2’? [-Wimplicit-function-declaration] 123 | LLVMValueRef v_ptr = LLVMBuildGEP(b, v, &idx, 1, ""); | ^~~~ | LLVMBuildGEP2 ../../../../src/include/jit/llvmjit_emit.h:123:30: warning: initialization of ‘LLVMValueRef’ {aka ‘struct LLVMOpaqueValue *’} from ‘int’ makes pointer from integer without a cast [-Wint-conversion] ../../../../src/include/jit/llvmjit_emit.h:125:16: warning: returning ‘int’ from a function with return type ‘LLVMValueRef’ {aka ‘struct LLVMOpaqueValue *’} makes pointer from integer without a cast [-Wint-conversion] Regards Pavel
Re: broken master branch
On Sat, Apr 8, 2023 at 8:04 PM Pavel Stehule wrote: > on fresh Fedora 38, I cannot to run regress tests Looks like the new LLVM 16. I'll try to look at this again next week. In the meantime you could try using 15.
Re: broken master branch
so 8. 4. 2023 v 10:38 odesílatel Thomas Munro napsal: > On Sat, Apr 8, 2023 at 8:04 PM Pavel Stehule > wrote: > > on fresh Fedora 38, I cannot to run regress tests > > Looks like the new LLVM 16. I'll try to look at this again next week. > In the meantime you could try using 15. > ok Thank you for info Pavel
Re: Direct I/O
On Sat, Apr 8, 2023 at 4:59 PM Thomas Munro wrote: > On Sat, Apr 8, 2023 at 4:47 PM Thomas Munro wrote: > > After a bit more copy-editing on docs and comments and a round of > > automated indenting, I have now pushed this. I will now watch the > > build farm. I tested on quite a few OSes that I have access to, but > > this is obviously a very OS-sensitive kind of a thing. > > Hmm. I see a strange "invalid page" failure on Andrew's machine crake > in 004_io_direct.pl. Let's see what else comes in. No particular ideas about what happened there yet. It *looks* like we wrote out a page, and then read it back in very soon afterwards, all via the usual locked bufmgr/smgr pathways, and it failed basic page validation. The reader was a parallel worker, because of the debug_parallel_mode setting on that box. The page number looks reasonable (I guess it's reading a page created by the UPDATE full of new tuples, but I don't know which process wrote it). It's also not immediately obvious how this could be connected to the 32 pinned buffer problem (all that would have happened in the CREATE TABLE process which ended already before the UPDATE and then the SELECT backends even started). Andrew, what file system and type of disk is that machine using?
Re: Minimal logical decoding on standbys
Hi, On 2023-04-07 14:27:09 -0700, Andres Freund wrote: > I think I'll push these in a few hours. While this needed more changes than > I'd like shortly before the freeze, I think they're largely not in very > interesting bits and pieces - and this feature has been in the works for about > three eternities, and it is blocking a bunch of highly requested features. > > If anybody still has energy, I would appreciate a look at 0001, 0002, the new > pieces I added, to make what's now 0003 and 0004 cleaner. Pushed. Thanks all! I squashed some of the changes. There didn't seem to be a need for a separate stats commit, or a separate docs commit. Besides that, I did find plenty of grammar issues, and a bunch of formatting issues. Let's see what the buildfarm says. Greetings, Andres Freund
Re: longfin missing gssapi_ext.h
Hi, On 2023-04-07 22:50:18 -0400, Tom Lane wrote: > Or should credential delegation be viewed as an incremental feature that we > can support or not? That seems like the best way forward here. Greetings, Andres Freund
Re: longfin missing gssapi_ext.h
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > I'm open to considering support for older versions, however ... > > NetBSD 9.3, which is their *latest production release*, doesn't have > gssapi_ext.h [1]. For that matter, it doesn't look like their > not-yet-released 10.0BETA does either (my NetBSD 10 animals would > be failing if they had --with-gssapi). I do not think it's going > to be acceptable to require this feature. I'm certainly curious to understand what Kerberos library they're using and how they're maintaining it. At least some of the documentation I've found seems to indicate that it might be Heimdal, which does have gss_store_cred_into in gssapi.h. > I'm now going to reiterate my opinion that this patch should not > have been pushed in at this point of the dev cycle. A month ago, > there was time to deal with these sorts of issues. I suspected there would be an issue with OSX but hadn't expected an issue with NetBSD. I had tested this across a few Linux platforms and cfbot showed it wasn't causing issues on Windows or the platforms that are run there. Would be really great to have a way to test these things out on these other platforms other than just committing them and seeing what happens on the buildfarm. In any case, I've reverted it and we can pick this up for the next cycle. I'll play around with Heimdal locally as it appears to be available on Ubuntu (I had actually thought Heimdal to be mostly gone at this point since it only ever existed really due to silly export restrictions...) and see if there's actually anything other than making gssapi_ext.h itself be optional to be pulled in that's needed to make it work. Thanks, Stephen signature.asc Description: PGP signature
Re: longfin missing gssapi_ext.h
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2023-04-07 22:50:18 -0400, Tom Lane wrote: > > Or should credential delegation be viewed as an incremental feature that we > > can support or not? > > That seems like the best way forward here. Yeah, that's certainly doable too, though I'm really not sure we should be accepting OSX's GSSAPI library and that might really be the only case at issue here. Either way, I've reverted it and will see about picking it up for the next cycle (again) and hopefully be able to work through these issues either by having it be optional or, if NetBSD and Heimdal actually support all the APIs just with different headers, perhaps deciding we're willing to require them. Thanks, Stephen signature.asc Description: PGP signature
Re: daitch_mokotoff module
On 2023-04-07 Fr 23:25, Tom Lane wrote: I wrote: Anyway, I assume this is just syntactic sugar for something we can do another way? If it's at all fundamental, I'll have to back the patch out. On closer inspection, this script is completely devoid of any need to deal in non-ASCII data at all. So I just nuked the "use" lines. Yeah. I just spent a little while staring at the perl code. I have to say it seems rather opaque, the data structure seems a bit baroque. I'll try to simplify it. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: PostgreSQL 16 Release Management Team & Feature Freeze
On 4/6/23 5:37 PM, Jonathan S. Katz wrote: On 3/21/23 11:35 AM, Jonathan S. Katz wrote: Additionally, the RMT has set the feature freeze to be **April 8, 2023 at 0:00 AoE**[1]. This is the last time to commit features for PostgreSQL 16. In other words, no new PostgreSQL 16 feature can be committed after April 8, 2023 at 0:00 AoE. This is a reminder that feature freeze is rapidly approach. The freeze begins at April 8, 2023 at 0:00 AoE. No new PostgreSQL 16 features can be committed after this time. The feature freeze for PostgreSQL 16 has begun. Thank you everyone for all of your hard work and contributions towards this release! Now we can begin to prepare for the beta period. Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: Making background psql nicer to use in tap tests
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?= writes: > So what do you want me to do about grison and morepork? I guess I could > try to install a newer version of IPC::Run from CPAN or should I just > leave it be? I think "leave it be" is fine. regards, tom lane
Re: Partial aggregates pushdown
On Fri, Apr 7, 2023 at 09:16:14PM -0700, Andres Freund wrote: > On 2023-04-07 22:53:53 -0400, Bruce Momjian wrote: > > > postgres_fdw has no business pushing down calls to non-builtin functions > > > unless the user has explicitly authorized that via the existing > > > whitelisting mechanism. I think you're reinventing the wheel, > > > and not very well. > > > > The patch has you assign an option at CREATE AGGREGATE time if it can do > > push down, and postgres_fdw checks that. What whitelisting mechanism > > are you talking about? async_capable? > > extensions (string) > > This option is a comma-separated list of names of PostgreSQL extensions > that are installed, in compatible versions, on both the local and remote > servers. Functions and operators that are immutable and belong to a listed > extension will be considered shippable to the remote server. This option can > only be specified for foreign servers, not per-table. > > When using the extensions option, it is the user's responsibility that > the listed extensions exist and behave identically on both the local and > remote servers. Otherwise, remote queries may fail or behave unexpectedly. Okay, this is very helpful --- it is exactly the issue we are dealing with --- how can we know if partial aggregate functions exists on the remote server. (I knew I was going to need API help on this.) So, let's remove the PARTIALAGG_MINVERSION option from the patch and just make it automatic --- we push down builtin partial aggregates if the remote server is the same or newer _major_ version than the sending server. For extensions, if people have older extensions on the same or newer foreign servers, they can adjust 'extensions' above. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths
On Fri, 7 Apr 2023 at 08:05, Michael Paquier wrote: > > On Fri, Apr 07, 2023 at 08:59:22AM +0900, Michael Paquier wrote: > > Okay, cool! > > Done this one with 8fcb32d. Thanks a lot! I'll post the separation of record construction and write-out to xlog in a future thread for 17. One remaining question: Considering that the changes and checks of that commit are mostly internal to xloginsert.c (or xlog.c in older releases), and that no special public-facing changes were made, would it be safe to backport this to older releases? PostgreSQL 15 specifically would benefit from this as it supports external rmgrs which may generate WAL records and would benefit from these additional checks, but all supported releases of PostgreSQL have pg_logical_emit_message and are thus easily subject to the issue of writing oversized WAL records and subsequent recovery- and replication stream failures. Kind regards, Matthias van de Meent
Re: is_superuser is not documented
On Mon, Apr 3, 2023 at 10:47 AM Fujii Masao wrote: >Yes, the patch has not been committed yet because of lack of review comments. >Do you have any review comments on this patch? >Or you think it's ready for committer? I'm not very familiar with this code, so I'm not sure how much my review is worth, but maybe it will spark some discussion. > Yes, this patch moves the descriptions of is_superuser to config.sgml > and changes its group to PRESET_OPTIONS. is_superuser feels a little out of place in this file. All of the options here apply to the entire PostgreSQL server, while is_superuser only applies to the current session. The description of this file says : > These options report various aspects of PostgreSQL behavior that > might be of interest to certain applications, particularly > administrative front-ends. Most of them are determined when > PostgreSQL is compiled or when it is installed. Which doesn't seem to apply to is_superuser. It doesn't affect the behavior of PostgreSQL, only what the current session is allowed to do. It's also not determined when PostgreSQL is compiled/installed. Is there some update that we can make to the description that would make is_superuser fit in better? I'm not familiar with the origins of is_superuser and it may be too late for this, but it seems like is_superuser would fit in much better as a system information function [0] rather than a GUC. Particularly in the Session Information Functions. > - GUC_REPORT | GUC_NO_SHOW_ALL | GUC_NO_RESET_ALL | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE > + GUC_REPORT | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE This looks good to me. The lack of is_superuser from SHOW ALL has been a source of confusion to me in the past. As a side note server_version, server_encoding, lc_collate, and lc_ctype all appear in both the preset options section of config.sgml and in show.sgml. I'm not sure what the logic is for just including these three parameters in show.sgml, but I think we should either include all of the preset options or none of them. Thanks, Joe Koshakow [0] https://www.postgresql.org/docs/current/functions-info.html
Re: broken master branch
Thomas Munro writes: > On Sat, Apr 8, 2023 at 8:04 PM Pavel Stehule wrote: >> on fresh Fedora 38, I cannot to run regress tests > Looks like the new LLVM 16. I'll try to look at this again next week. > In the meantime you could try using 15. I've become entirely desensitized to seawasp failing, which is probably a bad thing, but today I happened to look at it and discovered that its compiler has been dumping core for some time now: clang: /home/fabien/llvm-src/llvm/lib/Transforms/Scalar/SROA.cpp:1745: llvm::Value* getAdjustedPtr({anonymous}::IRBuilderTy&, const llvm::DataLayout&, llvm::Value*, llvm::APInt, llvm::Type*, const llvm::Twine&): Assertion `Ptr->getType()->isOpaquePointerTy() && "Only opaque pointers supported"' failed. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/fabien/clgtk/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Xclang -no-opaque-pointers -Wno-unused-command-line-argument -Wno-compound-token-split-by-macro -O2 -I../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -flto=thin -emit-llvm -c -o strftime.bc strftime.c 1. parser at end of file 2. Optimizer Seems like we ought to look into that, and report it as requested. regards, tom lane
Re: Commitfest 2023-03 starting tomorrow!
So here we are at the end of the CF: Status summary: March 15March 22March 28April 4 April 8 Needs review: 152 128 116 96 74 Waiting on Author: 42 36 30 21 14 Ready for Committer: 39 32 32 35 27 Committed: 61 82 94 101 124 Moved to next CF: 4 15 17 28 39 Withdrawn: 17 16 18 20 10 Rejected: 0 5 6 8 9 Returned with Feedback: 4 5 6 10 22 Total: 319. I'm now going to go through and: a) Mark Waiting on Author any patches that aren't building Waiting on Author. There was some pushback about asking authors to do trivial rebases but in three months it won't make sense to start a CF with already-non-applying patches. b) Mark any patches that received solid feedback in the last week with either Returned with Feedback or Rejected. I think this was already done though with 12 RwF and 1 Rejected in the past four days alone. c) Pick out the Bug Fixes, cleanup patches, and other non-feature patches that might be open issues for v16. d) Move to Next CF any patches that remain. -- greg
Re: Partial aggregates pushdown
On Sat, Apr 8, 2023 at 10:15:40AM -0400, Bruce Momjian wrote: > On Fri, Apr 7, 2023 at 09:16:14PM -0700, Andres Freund wrote: > > extensions (string) > > > > This option is a comma-separated list of names of PostgreSQL extensions > > that are installed, in compatible versions, on both the local and remote > > servers. Functions and operators that are immutable and belong to a listed > > extension will be considered shippable to the remote server. This option > > can only be specified for foreign servers, not per-table. > > > > When using the extensions option, it is the user's responsibility that > > the listed extensions exist and behave identically on both the local and > > remote servers. Otherwise, remote queries may fail or behave unexpectedly. > > Okay, this is very helpful --- it is exactly the issue we are dealing > with --- how can we know if partial aggregate functions exists on the > remote server. (I knew I was going to need API help on this.) > > So, let's remove the PARTIALAGG_MINVERSION option from the patch and > just make it automatic --- we push down builtin partial aggregates if > the remote server is the same or newer _major_ version than the sending > server. For extensions, if people have older extensions on the same or > newer foreign servers, they can adjust 'extensions' above. Looking further, I don't see any cases where we check if a builtin function added in a major release also exists on the foreign server, so maybe we don't need any checks but just need a mention in the release notes. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Embrace your flaws. They make you human, rather than perfect, which you will never be.
Re: Making background psql nicer to use in tap tests
On 2023-04-08 09:53, Daniel Gustafsson wrote: I recommend just setting up this one test to SKIP if IPC::Run is too old. Yes, will do that when I have a little more time at hand for monitoring the BF later today. So what do you want me to do about grison and morepork? I guess I could try to install a newer version of IPC::Run from CPAN or should I just leave it be? /Mikael
Re: Parallel Full Hash Join
Thomas Munro writes: > I committed the main patch. BTW, it was easy to miss in all the buildfarm noise from last-possible-minute patches, but chimaera just showed something that looks like a bug in this code [1]: 2023-04-08 12:25:28.709 UTC [18027:321] pg_regress/join_hash LOG: statement: savepoint settings; 2023-04-08 12:25:28.709 UTC [18027:322] pg_regress/join_hash LOG: statement: set local max_parallel_workers_per_gather = 2; 2023-04-08 12:25:28.710 UTC [18027:323] pg_regress/join_hash LOG: statement: explain (costs off) select count(*) from simple r full outer join simple s on (r.id = 0 - s.id); 2023-04-08 12:25:28.710 UTC [18027:324] pg_regress/join_hash LOG: statement: select count(*) from simple r full outer join simple s on (r.id = 0 - s.id); TRAP: failed Assert("BarrierParticipants(&batch->batch_barrier) == 1"), File: "nodeHash.c", Line: 2118, PID: 19147 postgres: parallel worker for PID 18027 (ExceptionalCondition+0x84)[0x10ae2bfa4] postgres: parallel worker for PID 18027 (ExecParallelPrepHashTableForUnmatched+0x224)[0x10aa67544] postgres: parallel worker for PID 18027 (+0x3db868)[0x10aa6b868] postgres: parallel worker for PID 18027 (+0x3c4204)[0x10aa54204] postgres: parallel worker for PID 18027 (+0x3c81b8)[0x10aa581b8] postgres: parallel worker for PID 18027 (+0x3b3d28)[0x10aa43d28] postgres: parallel worker for PID 18027 (standard_ExecutorRun+0x208)[0x10aa39768] postgres: parallel worker for PID 18027 (ParallelQueryMain+0x2bc)[0x10aa4092c] postgres: parallel worker for PID 18027 (ParallelWorkerMain+0x660)[0x10a874870] postgres: parallel worker for PID 18027 (StartBackgroundWorker+0x2a8)[0x10ab8abf8] postgres: parallel worker for PID 18027 (+0x50290c)[0x10ab9290c] postgres: parallel worker for PID 18027 (+0x5035e4)[0x10ab935e4] postgres: parallel worker for PID 18027 (PostmasterMain+0x1304)[0x10ab96334] postgres: parallel worker for PID 18027 (main+0x86c)[0x10a79daec] regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chimaera&dt=2023-04-08%2012%3A07%3A08
Re: Minimal logical decoding on standbys
On 4/8/23 5:27 AM, Andres Freund wrote: Hi, On 2023-04-07 14:27:09 -0700, Andres Freund wrote: I think I'll push these in a few hours. While this needed more changes than I'd like shortly before the freeze, I think they're largely not in very interesting bits and pieces - and this feature has been in the works for about three eternities, and it is blocking a bunch of highly requested features. If anybody still has energy, I would appreciate a look at 0001, 0002, the new pieces I added, to make what's now 0003 and 0004 cleaner. Pushed. Thanks all! I squashed some of the changes. There didn't seem to be a need for a separate stats commit, or a separate docs commit. Besides that, I did find plenty of grammar issues, and a bunch of formatting issues. Let's see what the buildfarm says. Thanks to everyone for working on this feature -- this should have a big impact on users of logical replication! While it still needs to get through the beta period etc. this is a big milestone for what's been a multi-year effort to support this. Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: Parallel Full Hash Join
On Sat, Apr 8, 2023 at 12:33 PM Tom Lane wrote: > > Thomas Munro writes: > > I committed the main patch. > > BTW, it was easy to miss in all the buildfarm noise from > last-possible-minute patches, but chimaera just showed something > that looks like a bug in this code [1]: > > 2023-04-08 12:25:28.709 UTC [18027:321] pg_regress/join_hash LOG: statement: > savepoint settings; > 2023-04-08 12:25:28.709 UTC [18027:322] pg_regress/join_hash LOG: statement: > set local max_parallel_workers_per_gather = 2; > 2023-04-08 12:25:28.710 UTC [18027:323] pg_regress/join_hash LOG: statement: > explain (costs off) > select count(*) from simple r full outer join simple s on (r.id > = 0 - s.id); > 2023-04-08 12:25:28.710 UTC [18027:324] pg_regress/join_hash LOG: statement: > select count(*) from simple r full outer join simple s on (r.id = 0 - s.id); > TRAP: failed Assert("BarrierParticipants(&batch->batch_barrier) == 1"), File: > "nodeHash.c", Line: 2118, PID: 19147 > postgres: parallel worker for PID 18027 > (ExceptionalCondition+0x84)[0x10ae2bfa4] > postgres: parallel worker for PID 18027 > (ExecParallelPrepHashTableForUnmatched+0x224)[0x10aa67544] > postgres: parallel worker for PID 18027 (+0x3db868)[0x10aa6b868] > postgres: parallel worker for PID 18027 (+0x3c4204)[0x10aa54204] > postgres: parallel worker for PID 18027 (+0x3c81b8)[0x10aa581b8] > postgres: parallel worker for PID 18027 (+0x3b3d28)[0x10aa43d28] > postgres: parallel worker for PID 18027 > (standard_ExecutorRun+0x208)[0x10aa39768] > postgres: parallel worker for PID 18027 (ParallelQueryMain+0x2bc)[0x10aa4092c] > postgres: parallel worker for PID 18027 > (ParallelWorkerMain+0x660)[0x10a874870] > postgres: parallel worker for PID 18027 > (StartBackgroundWorker+0x2a8)[0x10ab8abf8] > postgres: parallel worker for PID 18027 (+0x50290c)[0x10ab9290c] > postgres: parallel worker for PID 18027 (+0x5035e4)[0x10ab935e4] > postgres: parallel worker for PID 18027 (PostmasterMain+0x1304)[0x10ab96334] > postgres: parallel worker for PID 18027 (main+0x86c)[0x10a79daec] Having not done much debugging on buildfarm animals before, I don't suppose there is any way to get access to the core itself? I'd like to see how many participants the batch barrier had at the time of the assertion failure. I assume it was 2, but I just wanted to make sure I understand the race. - Melanie
Re: longfin missing gssapi_ext.h
Stephen Frost writes: > I suspected there would be an issue with OSX but hadn't expected an > issue with NetBSD. I had tested this across a few Linux platforms and > cfbot showed it wasn't causing issues on Windows or the platforms that > are run there. Would be really great to have a way to test these things > out on these other platforms other than just committing them and seeing > what happens on the buildfarm. I poked around a bit more and found that: * NetBSD's package collection[1] includes both Heimdal and MIT Kerberos (mit-krb5). Apparently what's installed on at least some of the buildfarm animals is the former. * FreeBSD seems to offer *only* Heimdal [2]; OpenBSD ditto [3]. * I cannot find any sign of either gss_store_cred_into or gssapi_ext.h in FreeBSD's Heimdal (7.8.0_6). So it does not look like supporting Heimdal is going to be optional, and that means the credential delegation feature is going to have to be optional, or else we need to find some equivalent Heimdal APIs. I share your feeling that we could probably blow off Apple's built-in GSSAPI. MacPorts offers both Heimdal and kerberos5, and I imagine Homebrew has at least one of them, so Mac people could easily get hold of newer implementations. But the BSDen are going to be a problem. regards, tom lane [1] https://cdn.netbsd.org/pub/pkgsrc/current/pkgsrc/security/index.html [2] https://ports.freebsd.org/cgi/ports.cgi?query=kerberos&stype=all&sektion=all [3] https://cdn.openbsd.org/pub/OpenBSD/snapshots/packages/amd64/
Re: Parallel Full Hash Join
Melanie Plageman writes: > Having not done much debugging on buildfarm animals before, I don't > suppose there is any way to get access to the core itself? I'd like to > see how many participants the batch barrier had at the time of the > assertion failure. I assume it was 2, but I just wanted to make sure I > understand the race. I don't know about chimaera in particular, but buildfarm animals are not typically configured to save any build products. They'd run out of disk space after awhile :-(. If you think the number of participants would be useful data, I'd suggest replacing that Assert with an elog() that prints what you want to know. regards, tom lane
Re: longfin missing gssapi_ext.h
Greetings, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > I suspected there would be an issue with OSX but hadn't expected an > > issue with NetBSD. I had tested this across a few Linux platforms and > > cfbot showed it wasn't causing issues on Windows or the platforms that > > are run there. Would be really great to have a way to test these things > > out on these other platforms other than just committing them and seeing > > what happens on the buildfarm. > > I poked around a bit more and found that: > > * NetBSD's package collection[1] includes both Heimdal and MIT Kerberos > (mit-krb5). Apparently what's installed on at least some of the buildfarm > animals is the former. > > * FreeBSD seems to offer *only* Heimdal [2]; OpenBSD ditto [3]. > > * I cannot find any sign of either gss_store_cred_into or gssapi_ext.h > in FreeBSD's Heimdal (7.8.0_6). > > So it does not look like supporting Heimdal is going to be optional, > and that means the credential delegation feature is going to have > to be optional, or else we need to find some equivalent Heimdal APIs. Thanks for doing that digging! I've been looking too and while Heimdal added gss_store_cred_into in their development branch 5 years ago[1] (!), it's not made it into an actual release. Good that they seem to at least be maintaining it enough to deal with CVEs, but unfortunately I'm fairly confident that there won't be a way to support constrained delegation (which is the next goal, once unconstrained delegation is in and working) on the Heimdal platforms. I suspected that would have to be optional anyway, but I hadn't expected it to hit all the BSD platforms. In any case, for this I'm working switching over to gss_store_cred() which does seem to be available in the Heimdal Debian packages that I was able to install locally (looks to be 7.7.0) and should work just fine for these purposes, though it requires a bit more work on the libpq side as we need to tell libpq explicitly the name which was on the delegated credential when we call gss_acquire_cred(). Once that's done, should be able to drop the gssapi_ext.h include entirely and still have the test suite able to run with MIT Kerberos. One thing I'm on the fence about is trying to make the test suite actually work with Heimdal.. I'm planning to install the Heimdal KDC, et al, and see what happens but if it ends up looking like it's a lot of work then I might forgo that effort. I'm not sure it's really necessary but I could be argued out of that position without too much effort. The stated Heimdal goal is to be a re-implementation of MIT Kerberos and these are all documented APIs with RFCs, after all. > I share your feeling that we could probably blow off Apple's built-in > GSSAPI. MacPorts offers both Heimdal and kerberos5, and I imagine > Homebrew has at least one of them, so Mac people could easily get > hold of newer implementations. But the BSDen are going to be a > problem. Yeah. Unfortunate that Heimdal doesn't seem to really be moving forward in terms of new development. Thanks, Stephen [1] https://github.com/heimdal/heimdal/commit/e0bb9c10cad0fd98245caecf8af8fca855b2df49 signature.asc Description: PGP signature
Re: Direct I/O
Hi, On 2023-04-07 23:04:08 -0700, Andres Freund wrote: > There were some failures in CI (e.g. [1] (and perhaps also bf, didn't yet > check), about "no unpinned buffers available". I was worried for a moment > that this could actually be relation to the bulk extension patch. > > But it looks like it's older - and not caused by direct_io support (except by > way of the test existing). I reproduced the issue locally by setting s_b even > lower, to 16 and made the ERROR a PANIC. > > [backtrace] > > If you look at log_newpage_range(), it's not surprising that we get this error > - it pins up to 32 buffers at once. > > Afaics log_newpage_range() originates in 9155580fd5fc, but this caller is from > c6b92041d385. > > > It doesn't really seem OK to me to unconditionally pin 32 buffers. For the > relation extension patch I introduced LimitAdditionalPins() to deal with this > concern. Perhaps it needs to be exposed and log_newpage_buffers() should use > it? > > > Do we care about fixing this in the backbranches? Probably not, given there > haven't been user complaints? Here's a quick prototype of this approach. If we expose LimitAdditionalPins(), we'd probably want to add "Buffer" to the name, and pass it a relation, so that it can hand off LimitAdditionalLocalPins() when appropriate? The callsite in question doesn't need it, but ... Without the limiting of pins the modified 004_io_direct.pl fails 100% of the time for me. Presumably the reason it fails occasionally with 256kB of shared buffers (i.e. NBuffers=32) is that autovacuum or checkpointer briefly pins a single buffer. As log_newpage_range() thinks it can just pin 32 buffers unconditionally, it fails in that case. Greetings, Andres Freund diff --git i/src/include/storage/bufmgr.h w/src/include/storage/bufmgr.h index 6ab00daa2ea..e5788309c86 100644 --- i/src/include/storage/bufmgr.h +++ w/src/include/storage/bufmgr.h @@ -223,6 +223,7 @@ extern void DropRelationBuffers(struct SMgrRelationData *smgr_reln, extern void DropRelationsAllBuffers(struct SMgrRelationData **smgr_reln, int nlocators); extern void DropDatabaseBuffers(Oid dbid); +extern void LimitAdditionalPins(uint32 *additional_pins); #define RelationGetNumberOfBlocks(reln) \ RelationGetNumberOfBlocksInFork(reln, MAIN_FORKNUM) diff --git i/src/backend/access/transam/xloginsert.c w/src/backend/access/transam/xloginsert.c index e2a5a3d13ba..2189fc9f71f 100644 --- i/src/backend/access/transam/xloginsert.c +++ w/src/backend/access/transam/xloginsert.c @@ -1268,8 +1268,8 @@ log_newpage_range(Relation rel, ForkNumber forknum, /* * Iterate over all the pages in the range. They are collected into - * batches of XLR_MAX_BLOCK_ID pages, and a single WAL-record is written - * for each batch. + * batches of up to XLR_MAX_BLOCK_ID pages, and a single WAL-record is + * written for each batch. */ XLogEnsureRecordSpace(XLR_MAX_BLOCK_ID - 1, 0); @@ -1278,14 +1278,18 @@ log_newpage_range(Relation rel, ForkNumber forknum, { Buffer bufpack[XLR_MAX_BLOCK_ID]; XLogRecPtr recptr; - int nbufs; + uint32 limit = XLR_MAX_BLOCK_ID; + int nbufs; int i; CHECK_FOR_INTERRUPTS(); + /* avoid running out pinnable buffers */ + LimitAdditionalPins(&limit); + /* Collect a batch of blocks. */ nbufs = 0; - while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk) + while (nbufs < limit && blkno < endblk) { Buffer buf = ReadBufferExtended(rel, forknum, blkno, RBM_NORMAL, NULL); diff --git i/src/backend/storage/buffer/bufmgr.c w/src/backend/storage/buffer/bufmgr.c index 7778dde3e57..31c75d6240e 100644 --- i/src/backend/storage/buffer/bufmgr.c +++ w/src/backend/storage/buffer/bufmgr.c @@ -1742,7 +1742,7 @@ again: * pessimistic, but outside of toy-sized shared_buffers it should allow * sufficient pins. */ -static void +void LimitAdditionalPins(uint32 *additional_pins) { uint32 max_backends; diff --git i/src/test/modules/test_misc/t/004_io_direct.pl w/src/test/modules/test_misc/t/004_io_direct.pl index f5bf0b11e4e..5791c2ab7bd 100644 --- i/src/test/modules/test_misc/t/004_io_direct.pl +++ w/src/test/modules/test_misc/t/004_io_direct.pl @@ -22,7 +22,7 @@ $node->init; $node->append_conf( 'postgresql.conf', qq{ io_direct = 'data,wal,wal_init' -shared_buffers = '256kB' # tiny to force I/O +shared_buffers = '16' # tiny to force I/O }); $node->start;
Re: Parallel Full Hash Join
On Sat, Apr 8, 2023 at 1:30 PM Melanie Plageman wrote: > > On Sat, Apr 8, 2023 at 12:33 PM Tom Lane wrote: > > > > Thomas Munro writes: > > > I committed the main patch. > > > > BTW, it was easy to miss in all the buildfarm noise from > > last-possible-minute patches, but chimaera just showed something > > that looks like a bug in this code [1]: > > > > 2023-04-08 12:25:28.709 UTC [18027:321] pg_regress/join_hash LOG: > > statement: savepoint settings; > > 2023-04-08 12:25:28.709 UTC [18027:322] pg_regress/join_hash LOG: > > statement: set local max_parallel_workers_per_gather = 2; > > 2023-04-08 12:25:28.710 UTC [18027:323] pg_regress/join_hash LOG: > > statement: explain (costs off) > > select count(*) from simple r full outer join simple s on > > (r.id = 0 - s.id); > > 2023-04-08 12:25:28.710 UTC [18027:324] pg_regress/join_hash LOG: > > statement: select count(*) from simple r full outer join simple s on (r.id > > = 0 - s.id); > > TRAP: failed Assert("BarrierParticipants(&batch->batch_barrier) == 1"), > > File: "nodeHash.c", Line: 2118, PID: 19147 So, after staring at this for awhile, I suspect this assertion is just plain wrong. BarrierArriveAndDetachExceptLast() contains this code: if (barrier->participants > 1) { --barrier->participants; SpinLockRelease(&barrier->mutex); return false; } Assert(barrier->participants == 1); So in between this assertion and the one we tripped, if (!BarrierArriveAndDetachExceptLast(&batch->batch_barrier)) { ... return false; } /* Now we are alone with this batch. */ Assert(BarrierPhase(&batch->batch_barrier) == PHJ_BATCH_SCAN); Assert(BarrierParticipants(&batch->batch_barrier) == 1); Another worker attached to the batch barrier, saw that it was in PHJ_BATCH_SCAN, marked it done and detached. This is fine. BarrierArriveAndDetachExceptLast() is meant to ensure no one waits (deadlock hazard) and that at least one worker stays to do the unmatched scan. It doesn't hurt anything for another worker to join and find out there is no work to do. We should simply delete this assertion. - Melanie
Re: longfin missing gssapi_ext.h
I wrote: > * NetBSD's package collection[1] includes both Heimdal and MIT Kerberos > (mit-krb5). Apparently what's installed on at least some of the buildfarm > animals is the former. Oh! New data: the core NetBSD OS includes a copy of Heimdal (looks to be 7.7.0 in the 10.0_BETA sources). The installable package is a slightly newer version, 7.8.0, but I think it's a very solid bet that the relevant buildfarm animals are just using the core copy and haven't installed the add-on package. Even if they had, it would take some fooling around with include and link paths to pull in the packaged version rather than the built-in one. The exact same thing applies to FreeBSD, except that their in-core Heimdal is ancient (1.5.2). Also, they do have MIT Kerberos available as a package [1]. I'd been misled by the lack of a hit on "kerberos", but "krb5" finds it. Our code does compile against that version of Heimdal, but src/test/kerberos/ refuses to try to run. I've not dug into OpenBSD any further. regards, tom lane [1] https://ports.freebsd.org/cgi/ports.cgi?query=krb5&stype=all&sektion=all
Re: Direct I/O
Hi, Given the frequency of failures on this in the buildfarm, I propose using the temporary workaround of using wal_level=replica. That avoids the use of the over-eager log_newpage_range(). Greetings, Andres Freund
When to drop src/tools/msvc support
Hi, I'd planned to write this soon anyway, but it was just brought up in [1]. Originally we had planned to drop src/tools/msvc support shortly after meson went in. Unfortunately, it took a bit longer than originally hoped for to merge meson support and then longer than hoped to add buildfarm support. I don't think there's been any buildfarm client release with meson support yet - but we do have windows buildfarm animals using it. Do we want to drop src/tools/msvc support in 16 (i.e. now), or do it early in 17? I do have a set of patches removing src/tools/msvc. There are a few loose ends that I know of (my eyes glaze over every time I try to reconcile the src/tools/perl.m4 comments with the referenced comments in Mkvcbuild.pm), and probably more small references that grep terms didn't find. Greetings, Andres Freund [1] https://postgr.es/m/3598664.1680976414%40sss.pgh.pa.us
Re: When to drop src/tools/msvc support
On 4/8/23 3:10 PM, Andres Freund wrote: Hi, I'd planned to write this soon anyway, but it was just brought up in [1]. Originally we had planned to drop src/tools/msvc support shortly after meson went in. Unfortunately, it took a bit longer than originally hoped for to merge meson support and then longer than hoped to add buildfarm support. I don't think there's been any buildfarm client release with meson support yet - but we do have windows buildfarm animals using it. Do we want to drop src/tools/msvc support in 16 (i.e. now), or do it early in 17? I do have a set of patches removing src/tools/msvc. There are a few loose ends that I know of (my eyes glaze over every time I try to reconcile the src/tools/perl.m4 comments with the referenced comments in Mkvcbuild.pm), and probably more small references that grep terms didn't find. (reads [1]) Can we treat this as an "open item" for completing the transition to meson for builds as part of v16? With my personal hat on, it seems silly to wait until v17 to do this, and I don't see why we would want to wait. If there's limited risk in doing this and it'll make our builds both more stable + faster, it seems like we should just do it. Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: When to drop src/tools/msvc support
Andres Freund writes: > Do we want to drop src/tools/msvc support in 16 (i.e. now), or do it early in > 17? On the one hand, it feels like something we shouldn't do after feature freeze. On the other hand, continuing to maintain three build systems is a real drag (although you could argue that there shouldn't be much churn there until the tree opens for 17). We clearly can't consider it in any case until the buildfarm is prepared, with all the Windows animals updated to a compatible client script. I don't know what timeline Andrew has in mind for that. I guess I'd vote for pulling the trigger in v16 if we can get that done by the end of April. Once we're close to beta I think it must wait for v17 to open. regards, tom lane
Re: Direct I/O
On Sun, Apr 9, 2023 at 6:55 AM Andres Freund wrote: > Given the frequency of failures on this in the buildfarm, I propose using the > temporary workaround of using wal_level=replica. That avoids the use of the > over-eager log_newpage_range(). Will do.
Re: Direct I/O
Thomas Munro writes: > On Sun, Apr 9, 2023 at 6:55 AM Andres Freund wrote: >> Given the frequency of failures on this in the buildfarm, I propose using the >> temporary workaround of using wal_level=replica. That avoids the use of the >> over-eager log_newpage_range(). > Will do. Now crake is doing this: 2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] 004_io_direct.pl LOG: statement: select count(*) from t1 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR: invalid page in block 56 of relation base/5/16384 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:2] STATEMENT: select count(*) from t1 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:4] 004_io_direct.pl ERROR: invalid page in block 56 of relation base/5/16384 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:5] 004_io_direct.pl STATEMENT: select count(*) from t1 2023-04-08 16:50:03.319 EDT [2023-04-08 16:50:02 EDT 3257591:4] LOG: background worker "parallel worker" (PID 3257646) exited with exit code 1 The fact that the error is happening in a parallel worker seems interesting ... (BTW, why are the log lines doubly timestamped?) regards, tom lane
Re: Direct I/O
On Sun, Apr 9, 2023 at 9:10 AM Tom Lane wrote: > 2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] > 004_io_direct.pl LOG: statement: select count(*) from t1 > 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR: > invalid page in block 56 of relation base/5/16384 > The fact that the error is happening in a parallel worker seems > interesting ... That's because it's running with debug_parallel_query=regress. I've been trying to repro that but no luck... A different kind of failure also showed up, where it counted the wrong number of tuples: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2023-04-08%2015%3A52%3A03 A paranoid explanation would be that this system is failing to provide basic I/O coherency, we're writing pages out and not reading them back in. Or of course there is a dumb bug... but why only here? Can of course be timing-sensitive and it's interesting that crake suffers from the "no unpinned buffers available" thing (which should now be gone) with higher frequency; I'm keen to see if the dodgy-read problem continues with a similar frequency now.
Re: Direct I/O
Hi, On 2023-04-08 17:10:19 -0400, Tom Lane wrote: > Thomas Munro writes: > Now crake is doing this: > > 2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] > 004_io_direct.pl LOG: statement: select count(*) from t1 > 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR: > invalid page in block 56 of relation base/5/16384 > 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:2] STATEMENT: > select count(*) from t1 > 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:4] > 004_io_direct.pl ERROR: invalid page in block 56 of relation base/5/16384 > 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:5] > 004_io_direct.pl STATEMENT: select count(*) from t1 > 2023-04-08 16:50:03.319 EDT [2023-04-08 16:50:02 EDT 3257591:4] LOG: > background worker "parallel worker" (PID 3257646) exited with exit code 1 > > The fact that the error is happening in a parallel worker seems > interesting ... There were a few prior instances of that error. One that I hadn't seen before is this: [11:35:07.190](0.001s) # Failed test 'read back from shared' # at /home/andrew/bf/root/HEAD/pgsql/src/test/modules/test_misc/t/004_io_direct.pl line 43. [11:35:07.190](0.000s) # got: '1' # expected: '10098' For one it points to the arguments to is() being switched around, but that's a sideshow. > (BTW, why are the log lines doubly timestamped?) It's odd. It's also odd that it's just crake having the issue. It's just a linux host, afaics. Andrew, is there any chance you can run that test in isolation and see whether it reproduces? If so, does the problem vanish, if you comment out the io_direct= in the test? Curious whether this is actually an O_DIRECT issue, or whether it's an independent issue exposed by the new test. I wonder if we should make the test use data checksum - if we continue to see the wrong query results, the corruption is more likely to be in memory. Greetings, Andres Freund
Re: Direct I/O
Andres Freund writes: > On 2023-04-08 17:10:19 -0400, Tom Lane wrote: >> (BTW, why are the log lines doubly timestamped?) > It's odd. Oh, I guess that's intentional, because crake has 'log_line_prefix = \'%m [%s %p:%l] %q%a \'', > It's also odd that it's just crake having the issue. It's just a linux host, > afaics. Indeed. I'm guessing from the compiler version that it's Fedora 37 now (the lack of such basic information in the meson configuration output is pretty annoying). I've been trying to repro it here on an F37 box, with no success, suggesting that it's very timing sensitive. Or maybe it's inside a VM and that matters? regards, tom lane
Re: Direct I/O
Hi, On 2023-04-08 17:31:02 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2023-04-08 17:10:19 -0400, Tom Lane wrote: > > It's also odd that it's just crake having the issue. It's just a linux host, > > afaics. > > Indeed. I'm guessing from the compiler version that it's Fedora 37 now The 15 branch says: hostname = neoemma uname -m = x86_64 uname -r = 6.2.8-100.fc36.x86_64 uname -s = Linux uname -v = #1 SMP PREEMPT_DYNAMIC Wed Mar 22 19:14:19 UTC 2023 So at least the kernel claims to be 36... > (the lack of such basic information in the meson configuration output > is pretty annoying). Yea, I was thinking yesterday that we should add uname output to meson's configure (if available). I'm sure we can figure out a reasonably fast windows command for the version, too. > I've been trying to repro it here on an F37 box, with no success, suggesting > that it's very timing sensitive. Or maybe it's inside a VM and that > matters? Could also be filesystem specific? Greetings, Andres Freund
Re: Direct I/O
On 2023-04-08 Sa 17:42, Andres Freund wrote: Hi, On 2023-04-08 17:31:02 -0400, Tom Lane wrote: Andres Freund writes: On 2023-04-08 17:10:19 -0400, Tom Lane wrote: It's also odd that it's just crake having the issue. It's just a linux host, afaics. Indeed. I'm guessing from the compiler version that it's Fedora 37 now The 15 branch says: hostname = neoemma uname -m = x86_64 uname -r = 6.2.8-100.fc36.x86_64 uname -s = Linux uname -v = #1 SMP PREEMPT_DYNAMIC Wed Mar 22 19:14:19 UTC 2023 So at least the kernel claims to be 36... (the lack of such basic information in the meson configuration output is pretty annoying). Yea, I was thinking yesterday that we should add uname output to meson's configure (if available). I'm sure we can figure out a reasonably fast windows command for the version, too. I've been trying to repro it here on an F37 box, with no success, suggesting that it's very timing sensitive. Or maybe it's inside a VM and that matters? Could also be filesystem specific? I migrated it in February from a VM to a non-virtual instance. Almost nothing else runs on the machine. The personality info shown on the BF server is correct. andrew@neoemma:~ $ cat /etc/fedora-release Fedora release 36 (Thirty Six) andrew@neoemma:~ $ uname -a Linux neoemma 6.2.8-100.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 22 19:14:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux andrew@neoemma:~ $ gcc --version gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4) andrew@neoemma:~ $ mount | grep home /dev/mapper/luks-xxx on /home type btrfs (rw,relatime,seclabel,compress=zstd:1,ssd,discard=async,space_cache,subvolid=256,subvol=/home) I guess it could be btrfs-specific. I'll be somewhat annoyed if I have to re-init the machine to use something else. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Direct I/O
On Sun, Apr 9, 2023 at 10:08 AM Andrew Dunstan wrote: > btrfs Aha!
Re: Direct I/O
On 2023-04-08 Sa 17:23, Andres Freund wrote: Hi, On 2023-04-08 17:10:19 -0400, Tom Lane wrote: Thomas Munro writes: Now crake is doing this: 2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] 004_io_direct.pl LOG: statement: select count(*) from t1 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR: invalid page in block 56 of relation base/5/16384 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:2] STATEMENT: select count(*) from t1 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:4] 004_io_direct.pl ERROR: invalid page in block 56 of relation base/5/16384 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:5] 004_io_direct.pl STATEMENT: select count(*) from t1 2023-04-08 16:50:03.319 EDT [2023-04-08 16:50:02 EDT 3257591:4] LOG: background worker "parallel worker" (PID 3257646) exited with exit code 1 The fact that the error is happening in a parallel worker seems interesting ... There were a few prior instances of that error. One that I hadn't seen before is this: [11:35:07.190](0.001s) # Failed test 'read back from shared' # at /home/andrew/bf/root/HEAD/pgsql/src/test/modules/test_misc/t/004_io_direct.pl line 43. [11:35:07.190](0.000s) # got: '1' # expected: '10098' For one it points to the arguments to is() being switched around, but that's a sideshow. It's also odd that it's just crake having the issue. It's just a linux host, afaics. Andrew, is there any chance you can run that test in isolation and see whether it reproduces? If so, does the problem vanish, if you comment out the io_direct= in the test? Curious whether this is actually an O_DIRECT issue, or whether it's an independent issue exposed by the new test. I wonder if we should make the test use data checksum - if we continue to see the wrong query results, the corruption is more likely to be in memory. I can run the test in isolation, and it's get an error reliably. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: longfin missing gssapi_ext.h
On Sun, Apr 9, 2023 at 6:40 AM Tom Lane wrote: > The exact same thing applies to FreeBSD, except that their in-core > Heimdal is ancient (1.5.2). Also, they do have MIT Kerberos > available as a package [1]. I'd been misled by the lack of a hit > on "kerberos", but "krb5" finds it. Our code does compile against > that version of Heimdal, but src/test/kerberos/ refuses to try to > run. FWIW my FBSD animal elver has krb5 installed. Sorry it wasn't running when the relevant commit landed. Stupid network cable wriggled out.
Re: Direct I/O
On Sun, Apr 9, 2023 at 10:17 AM Andrew Dunstan wrote: > I can run the test in isolation, and it's get an error reliably. Random idea: it looks like you have compression enabled. What if you turn it off in the directory where the test runs? Something like btrfs property set compression ... according to the intergoogles. (I have never used btrfs before 6 minutes ago but I can't seem to repro this with basic settings in a loopback btrfs filesystems).
Re: Direct I/O
Thomas Munro writes: > On Sun, Apr 9, 2023 at 10:08 AM Andrew Dunstan wrote: >> btrfs > Aha! Googling finds a lot of suggestions that O_DIRECT doesn't play nice with btrfs, for example https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg92824.html It's not clear to me how much of that lore is still current, but it's disturbing. regards, tom lane
Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)
On Wed, 22 Feb 2023 at 13:38, Kirk Wolak wrote: > > I already requested ONLY the HH24 format. 8 characters of output. no > options. It's a waste of time. > After all these years, sqlplus still has only one setting (show it, or not). > I am asking the same here. > And I will gladly defend not changing it! Ever! Yeah, well, it's kind of beside the point that you're satisfied with this one format. We tend to think about what all users would expect and what a complete feature would look like. I actually tend to think this would be a nice feature. It's telling that log files and other tracing tools tend to produce exactly this type of output with every line prefixed with either a relative or absolute timestamp. I'm not sure if the *prompt* is a sensible place for it though. The place it seems like it would be most useful is reading the output of script executions where there would be no prompts. Perhaps it's the command tags and \echo statements that should be timestamped. And I think experience shows that there are three reasonable formats for dates, the default LC_TIME format, ISO8601, and a relative "seconds (with milliseconds) since starting". I think having a feature that doesn't support those three would feel incomplete and eventually need to be finished. -- greg
Re: Commitfest 2023-03 starting tomorrow!
On Sat, 8 Apr 2023 at 11:37, Greg Stark wrote: > > c) Pick out the Bug Fixes, cleanup patches, and other non-feature > patches that might be open issues for v16. So on further examination it seems there are multiple category of patches that are worth holding onto rather than shifting to the next release: * Bug Fixes * Documentation patches * Build system patches, especially meson-related patches * Testing patches, especially if they're testing new features * Patches that are altering new features that were committed in this release Some of these, especially the last category, are challenging for me to determine. If I move forward a patch of yours that you think makes sense to treat as an open issue that should be resolved in this release then feel free to say. -- Gregory Stark As Commitfest Manager
Re: Commitfest 2023-03 starting tomorrow!
"Gregory Stark (as CFM)" writes: > Some of these, especially the last category, are challenging for me to > determine. If I move forward a patch of yours that you think makes > sense to treat as an open issue that should be resolved in this > release then feel free to say. I think that's largely independent. We don't look back at closed-out CFs as a kind of TODO list; anything that's left behind there is basically never going to be seen again, until the author makes a new CF entry. Anything that's to be treated as an open item for v16 should get added to the wiki page at https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items It's not necessary that a patch exist to do that. regards, tom lane
Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)
Greg Stark writes: > I'm not sure if the *prompt* is a sensible place for it though. The > place it seems like it would be most useful is reading the output of > script executions where there would be no prompts. Perhaps it's the > command tags and \echo statements that should be timestamped. Hmm, that is an interesting idea. I kind of like it, not least because it eliminates most of the tension between wanting a complete timestamp and wanting a short prompt. Command tags are short enough that there's plenty of room. > And I think experience shows that there are three reasonable formats > for dates, the default LC_TIME format, ISO8601, and a relative > "seconds (with milliseconds) since starting". I think having a feature > that doesn't support those three would feel incomplete and eventually > need to be finished. Yeah, I don't believe that one timestamp format is going to satisfy everyone. But that was especially true when trying to wedge it into the prompt, where the need for brevity adds more constraints. regards, tom lane
Re: Direct I/O
On Sun, Apr 9, 2023 at 11:05 AM Tom Lane wrote: > Googling finds a lot of suggestions that O_DIRECT doesn't play nice > with btrfs, for example > > https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg92824.html > > It's not clear to me how much of that lore is still current, > but it's disturbing. I think that particular thing might relate to modifications of the user buffer while a write is in progress (breaking btrfs's internal checksums). I don't think we should ever do that ourselves (not least because it'd break our own checksums). We lock the page during the write so no one can do that, and then we sleep in a synchronous syscall. Here's something recent. I guess it's probably not relevant (a fault on our buffer that we recently touched sounds pretty unlikely), but who knows... (developer lists for file systems are truly terrifying places to drive through). https://lore.kernel.org/linux-btrfs/20230315195231.gw10...@twin.jikos.cz/T/ It's odd, though, if it is their bug and not ours: I'd expect our friends in other databases to have hit all that sort of thing years ago, since many comparable systems have a direct I/O knob*. What are we doing differently? Are our multiple processes a factor here, breaking some coherency logic? Unsurprisingly, having compression on as Andrew does actually involves buffering anyway[1] despite our O_DIRECT flag, but maybe that's saying writes are buffered but reads are still direct (?), which sounds like the sort of initial conditions that might produce a coherency bug. I dunno. I gather that btrfs is actually Fedora's default file system (or maybe it's just "laptops and desktops"[2]?). I wonder if any of the several green Fedora systems in the 'farm are using btrfs. I wonder if they are using different mount options (thinking again of compression). *Probably a good reason to add a more prominent warning that the feature is developer-only, experimental and not for production use. I'm thinking a warning at startup or something. [1] https://btrfs.readthedocs.io/en/latest/Compression.html [2] https://fedoraproject.org/wiki/Changes/BtrfsByDefault
RE: PGDOCS - function pg_get_publication_tables is not documented?
On Fri, Mar 24, 2023 6:26 AM Tom Lane wrote: > > I do see a docs change that I think would be worth making: get > rid of the explicit mention of it in create_subscription.sgml > in favor of using that view. > I agree and I tried to modify the query to use the view. Please see the attached patch. Regards, Shi Yu v1-0001-Avoid-using-pg_get_publication_tables-in-create_s.patch Description: v1-0001-Avoid-using-pg_get_publication_tables-in-create_s.patch
Re: Direct I/O
Thomas Munro writes: > It's odd, though, if it is their bug and not ours: I'd expect our > friends in other databases to have hit all that sort of thing years > ago, since many comparable systems have a direct I/O knob*. Yeah, it seems moderately likely that it's our own bug ... but this code's all file-system-ignorant, so how? Maybe we are breaking some POSIX rule that btrfs exploits but others don't? > I gather that btrfs is actually Fedora's default file system (or maybe > it's just "laptops and desktops"[2]?). I have a ton of Fedora images laying about, and I doubt that any of them use btrfs, mainly because that's not the default in the "server spin" which is what I usually install from. It's hard to guess about the buildfarm, but it wouldn't surprise me that most of them are on xfs. (If we haven't figured this out pretty shortly, I'm probably going to put together a btrfs-on-bare-metal machine to try to duplicate crake's results.) regards, tom lane
Re: Direct I/O
Hi, On 2023-04-09 13:55:33 +1200, Thomas Munro wrote: > I think that particular thing might relate to modifications of the > user buffer while a write is in progress (breaking btrfs's internal > checksums). I don't think we should ever do that ourselves (not least > because it'd break our own checksums). We lock the page during the > write so no one can do that, and then we sleep in a synchronous > syscall. Oh, but we actually *do* modify pages while IO is going on. I wonder if you hit the jack pot here. The content lock doesn't prevent hint bit writes. That's why we copy the page to temporary memory when computing checksums. I think we should modify the test to enable checksums - if the problem goes away, then it's likely to be related to modifying pages while an O_DIRECT write is ongoing... Greetings, Andres Freund
Re: Direct I/O
On Sun, Apr 9, 2023 at 2:18 PM Andres Freund wrote: > On 2023-04-09 13:55:33 +1200, Thomas Munro wrote: > > I think that particular thing might relate to modifications of the > > user buffer while a write is in progress (breaking btrfs's internal > > checksums). I don't think we should ever do that ourselves (not least > > because it'd break our own checksums). We lock the page during the > > write so no one can do that, and then we sleep in a synchronous > > syscall. > > Oh, but we actually *do* modify pages while IO is going on. I wonder if you > hit the jack pot here. The content lock doesn't prevent hint bit > writes. That's why we copy the page to temporary memory when computing > checksums. More like the jackpot hit me. Woo, I can now reproduce this locally on a loop filesystem. Previously I had missed a step, the parallel worker seems to be necessary. More soon.
Re: Use fadvise in wal replay
On Thu, 19 Jan 2023 at 18:19, Andres Freund wrote: > > On 2023-01-19 22:19:10 +0100, Tomas Vondra wrote: > > > So I'm a bit unsure about this patch. I doesn't seem like it can perform > > better than read-ahead (although perhaps it does, on a different storage > > system). > > I really don't see the point of the patch as-is. ... > I don't disagree fundamentally. But that doesn't make this patch a useful > starting point. It sounds like this patch has gotten off on the wrong foot and is not worth moving forward to the next commitfest. Hopefully a starting over from a different approach might target i/o that is more amenable to fadvise. I'll mark it RwF. -- Gregory Stark As Commitfest Manager
Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)
ne 9. 4. 2023 v 3:54 odesílatel Tom Lane napsal: > Greg Stark writes: > > I'm not sure if the *prompt* is a sensible place for it though. The > > place it seems like it would be most useful is reading the output of > > script executions where there would be no prompts. Perhaps it's the > > command tags and \echo statements that should be timestamped. > > Hmm, that is an interesting idea. I kind of like it, not least because > it eliminates most of the tension between wanting a complete timestamp > and wanting a short prompt. Command tags are short enough that there's > plenty of room. > I don't agree so there is a common request for a short prompt. Usually I use four terminals on screen, and still my terminal has a width of 124 characters (and I use relatively small display of my Lenovo T520). Last years I use prompt like: (2023-04-09 06:08:30) postgres=# select 1; ┌──┐ │ ?column? │ ╞══╡ │1 │ └──┘ (1 row) and it is working. Nice thing when I paste the timestamp in examples. I have not any problems with prompt width Regards Pavel
Re: Direct I/O
On Sat, Apr 08, 2023 at 11:08:16AM -0700, Andres Freund wrote: > On 2023-04-07 23:04:08 -0700, Andres Freund wrote: > > There were some failures in CI (e.g. [1] (and perhaps also bf, didn't yet > > check), about "no unpinned buffers available". I was worried for a moment > > that this could actually be relation to the bulk extension patch. > > > > But it looks like it's older - and not caused by direct_io support (except > > by > > way of the test existing). I reproduced the issue locally by setting s_b > > even > > lower, to 16 and made the ERROR a PANIC. > > > > [backtrace] I get an ERROR, not a PANIC: $ git rev-parse HEAD 2e57ffe12f6b5c1498f29cb7c0d9e17c797d9da6 $ git diff -U0 diff --git a/src/test/modules/test_misc/t/004_io_direct.pl b/src/test/modules/test_misc/t/004_io_direct.pl index f5bf0b1..8f0241b 100644 --- a/src/test/modules/test_misc/t/004_io_direct.pl +++ b/src/test/modules/test_misc/t/004_io_direct.pl @@ -25 +25 @@ io_direct = 'data,wal,wal_init' -shared_buffers = '256kB' # tiny to force I/O +shared_buffers = 16 $ ./configure -C --enable-debug --enable-cassert --enable-depend --enable-tap-tests --with-tcl --with-python --with-perl $ make -C src/test/modules/test_misc check PROVE_TESTS=t/004_io_direct.pl # +++ tap check in src/test/modules/test_misc +++ t/004_io_direct.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00) No subtests run Test Summary Report --- t/004_io_direct.pl (Wstat: 7424 Tests: 0 Failed: 0) Non-zero exit status: 29 Parse errors: No plan found in TAP output Files=1, Tests=0, 1 wallclock secs ( 0.01 usr 0.00 sys + 0.41 cusr 0.14 csys = 0.56 CPU) Result: FAIL make: *** [../../../../src/makefiles/pgxs.mk:460: check] Error 1 $ grep pinned src/test/modules/test_misc/tmp_check/log/* src/test/modules/test_misc/tmp_check/log/004_io_direct_main.log:2023-04-08 21:12:46.781 PDT [929628] 004_io_direct.pl ERROR: no unpinned buffers available src/test/modules/test_misc/tmp_check/log/regress_log_004_io_direct:error running SQL: 'psql::1: ERROR: no unpinned buffers available' No good reason to PANIC there, so the path to PANIC may be fixable. > > If you look at log_newpage_range(), it's not surprising that we get this > > error > > - it pins up to 32 buffers at once. > > > > Afaics log_newpage_range() originates in 9155580fd5fc, but this caller is > > from > > c6b92041d385. > > Do we care about fixing this in the backbranches? Probably not, given there > > haven't been user complaints? I would not. This is only going to come up where the user goes out of the way to use near-minimum shared_buffers. > Here's a quick prototype of this approach. This looks fine. I'm not enthusiastic about incurring post-startup cycles to cater to allocating less than 512k*max_connections of shared buffers, but I expect the cycles in question are negligible here.
Re: Direct I/O
Indeed, I can't reproduce this with (our) checksums on. I also can't reproduce it with O_DIRECT off. I also can't reproduce it if I use "mkdir pgdata && chattr +C pgdata && initdb -D pgdata" to have a pgdata directory with copy-on-write and (their) checksums disabled. But it reproduces quite easily with COW on (default behaviour) with io_direct=data, debug_parallel_query=debug, create table as ...; update ...; select count(*) ...; from that test. Unfortunately my mental model of btrfs is extremely limited, basically just "something a bit like ZFS". FWIW I've been casually following along with OpenZFS's ongoing O_DIRECT project, and I know that the plan there is to make a temporary stable copy if checksums and other features are on (a bit like PostgreSQL does for the same reason, as you reminded us). Time will tell how that works out but it *seems* like all available modes would therefore work correctly for us, with different tradeoffs (ie if you want the fastest zero-copy I/O, don't use checksums, compression, etc). Here, btrfs seems to be taking a different path that I can't quite make out... I see no warning/error about a checksum failure like [1], and we apparently managed to read something other than a mix of the old and new page contents (which, based on your hypothesis, should just leave it indeterminate whether the hint bit changes were captured or not, and the rest of the page should be stable, right). It's like the page time-travelled or got scrambled in some other way, but it didn't tell us? I'll try to dig further... [1] https://archive.kernel.org/oldwiki/btrfs.wiki.kernel.org/index.php/Gotchas.html#Direct_IO_and_CRCs
Re: CREATE SUBSCRIPTION -- add missing tab-completes
On Fri, 7 Apr 2023 at 01:28, Amit Kapila wrote: > > On Wed, Apr 5, 2023 at 5:58 AM Peter Smith wrote: > > > PSA patches to add those tab completions. > > LGTM, so pushed. I moved this to the next CF but actually I just noticed the thread starts with the original patch being pushed. Maybe we should just close the CF entry? Is this further change relevant? -- Gregory Stark As Commitfest Manager