Re: Making background psql nicer to use in tap tests

2023-04-08 Thread Daniel Gustafsson
> On 8 Apr 2023, at 02:49, Tom Lane  wrote:
> 
> I wrote:
>> I've been doing some checking with perlbrew locally.  It appears to not
>> be about IO::Pty so much as IPC::Run: it works with IPC::Run 0.99 but
>> not 0.79.  Still bisecting to identify exactly what's the minimum
>> okay version.
> 
> The answer is: it works with IPC::Run >= 0.98.  The version of IO::Pty
> doesn't appear significant; it works at least back to 1.00 from early
> 2002.

Thanks for investigating this!

> IPC::Run 0.98 is relatively new (2018), so I don't think it'd fly
> to make that our new minimum version across-the-board.  

Absolutely, that's not an option.

> I recommend
> just setting up this one test to SKIP if IPC::Run is too old.

Yes, will do that when I have a little more time at hand for monitoring the BF
later today.

--
Daniel Gustafsson





broken master branch

2023-04-08 Thread Pavel Stehule
Hi

on fresh Fedora 38, I cannot to run regress tests

+ERROR:  could not load library
"/home/pavel/src/postgresql.master/tmp_install/usr/local/pgsql/master/lib/llvmjit.so":
/home/pavel/src/p
ostgresql.master/tmp_install/usr/local/pgsql/master/lib/llvmjit.so:
undefined symbol: LLVMBuildGEP
 SELECT BOOLTBL1.*, BOOLTBL2.*
FROM BOOLTBL1, BOOLTBL2
WHERE boolne(BOOLTBL2.f1,BOOLTBL1.f1);

There is lot of compile warnings

In file included from llvmjit_expr.c:31:
../../../../src/include/jit/llvmjit_emit.h: In function ‘l_load_struct_gep’:
../../../../src/include/jit/llvmjit_emit.h:112:30: warning: implicit
declaration of function ‘LLVMBuildStructGEP’; did you mean
‘LLVMBuildStructGEP2’? [-Wimplicit-function-declaration]
  112 | LLVMValueRef v_ptr = LLVMBuildStructGEP(b, v, idx, "");
  |  ^~
  |  LLVMBuildStructGEP2
../../../../src/include/jit/llvmjit_emit.h:112:30: warning: initialization
of ‘LLVMValueRef’ {aka ‘struct LLVMOpaqueValue *’} from ‘int’ makes pointer
from integer without a cast [-Wint-conversion]
../../../../src/include/jit/llvmjit_emit.h:114:16: warning: implicit
declaration of function ‘LLVMBuildLoad’; did you mean ‘LLVMBuildLoad2’?
[-Wimplicit-function-declaration]
  114 | return LLVMBuildLoad(b, v_ptr, name);
  |^
  |LLVMBuildLoad2
../../../../src/include/jit/llvmjit_emit.h:114:16: warning: returning ‘int’
from a function with return type ‘LLVMValueRef’ {aka ‘struct
LLVMOpaqueValue *’} makes pointer from integer without a cast
[-Wint-conversion]
  114 | return LLVMBuildLoad(b, v_ptr, name);
  |^
../../../../src/include/jit/llvmjit_emit.h: In function ‘l_load_gep1’:
../../../../src/include/jit/llvmjit_emit.h:123:30: warning: implicit
declaration of function ‘LLVMBuildGEP’; did you mean ‘LLVMBuildGEP2’?
[-Wimplicit-function-declaration]
  123 | LLVMValueRef v_ptr = LLVMBuildGEP(b, v, &idx, 1, "");
  |  ^~~~
  |  LLVMBuildGEP2
../../../../src/include/jit/llvmjit_emit.h:123:30: warning: initialization
of ‘LLVMValueRef’ {aka ‘struct LLVMOpaqueValue *’} from ‘int’ makes pointer
from integer without a cast [-Wint-conversion]
../../../../src/include/jit/llvmjit_emit.h:125:16: warning: returning ‘int’
from a function with return type ‘LLVMValueRef’ {aka ‘struct
LLVMOpaqueValue *’} makes pointer from integer without a cast
[-Wint-conversion]


Regards

Pavel


Re: broken master branch

2023-04-08 Thread Thomas Munro
On Sat, Apr 8, 2023 at 8:04 PM Pavel Stehule  wrote:
> on fresh Fedora 38, I cannot to run regress tests

Looks like the new LLVM 16.  I'll try to look at this again next week.
In the meantime you could try using 15.




Re: broken master branch

2023-04-08 Thread Pavel Stehule
so 8. 4. 2023 v 10:38 odesílatel Thomas Munro 
napsal:

> On Sat, Apr 8, 2023 at 8:04 PM Pavel Stehule 
> wrote:
> > on fresh Fedora 38, I cannot to run regress tests
>
> Looks like the new LLVM 16.  I'll try to look at this again next week.
> In the meantime you could try using 15.
>

ok

Thank you for info

Pavel


Re: Direct I/O

2023-04-08 Thread Thomas Munro
On Sat, Apr 8, 2023 at 4:59 PM Thomas Munro  wrote:
> On Sat, Apr 8, 2023 at 4:47 PM Thomas Munro  wrote:
> > After a bit more copy-editing on docs and comments and a round of
> > automated indenting, I have now pushed this.  I will now watch the
> > build farm.  I tested on quite a few OSes that I have access to, but
> > this is obviously a very OS-sensitive kind of a thing.
>
> Hmm.  I see a strange "invalid page" failure on Andrew's machine crake
> in 004_io_direct.pl.  Let's see what else comes in.

No particular ideas about what happened there yet.  It *looks* like we
wrote out a page, and then read it back in very soon afterwards, all
via the usual locked bufmgr/smgr pathways, and it failed basic page
validation.  The reader was a parallel worker, because of the
debug_parallel_mode setting on that box.  The page number looks
reasonable (I guess it's reading a page created by the UPDATE full of
new tuples, but I don't know which process wrote it).  It's also not
immediately obvious how this could be connected to the 32 pinned
buffer problem (all that would have happened in the CREATE TABLE
process which ended already before the UPDATE and then the SELECT
backends even started).

Andrew, what file system and type of disk is that machine using?




Re: Minimal logical decoding on standbys

2023-04-08 Thread Andres Freund
Hi,

On 2023-04-07 14:27:09 -0700, Andres Freund wrote:
> I think I'll push these in a few hours. While this needed more changes than
> I'd like shortly before the freeze, I think they're largely not in very
> interesting bits and pieces - and this feature has been in the works for about
> three eternities, and it is blocking a bunch of highly requested features.
>
> If anybody still has energy, I would appreciate a look at 0001, 0002, the new
> pieces I added, to make what's now 0003 and 0004 cleaner.

Pushed. Thanks all!

I squashed some of the changes. There didn't seem to be a need for a separate
stats commit, or a separate docs commit. Besides that, I did find plenty of
grammar issues, and a bunch of formatting issues.

Let's see what the buildfarm says.

Greetings,

Andres Freund




Re: longfin missing gssapi_ext.h

2023-04-08 Thread Andres Freund
Hi,

On 2023-04-07 22:50:18 -0400, Tom Lane wrote:
> Or should credential delegation be viewed as an incremental feature that we
> can support or not?

That seems like the best way forward here.

Greetings,

Andres Freund




Re: longfin missing gssapi_ext.h

2023-04-08 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > I'm open to considering support for older versions, however ...
> 
> NetBSD 9.3, which is their *latest production release*, doesn't have
> gssapi_ext.h [1].  For that matter, it doesn't look like their
> not-yet-released 10.0BETA does either (my NetBSD 10 animals would
> be failing if they had --with-gssapi).  I do not think it's going
> to be acceptable to require this feature.

I'm certainly curious to understand what Kerberos library they're using
and how they're maintaining it.  At least some of the documentation I've
found seems to indicate that it might be Heimdal, which does have
gss_store_cred_into in gssapi.h.

> I'm now going to reiterate my opinion that this patch should not
> have been pushed in at this point of the dev cycle.  A month ago,
> there was time to deal with these sorts of issues.

I suspected there would be an issue with OSX but hadn't expected an
issue with NetBSD.  I had tested this across a few Linux platforms and
cfbot showed it wasn't causing issues on Windows or the platforms that
are run there.  Would be really great to have a way to test these things
out on these other platforms other than just committing them and seeing
what happens on the buildfarm.

In any case, I've reverted it and we can pick this up for the next
cycle.  I'll play around with Heimdal locally as it appears to be
available on Ubuntu (I had actually thought Heimdal to be mostly gone at
this point since it only ever existed really due to silly export
restrictions...) and see if there's actually anything other than making
gssapi_ext.h itself be optional to be pulled in that's needed to make it
work.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: longfin missing gssapi_ext.h

2023-04-08 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2023-04-07 22:50:18 -0400, Tom Lane wrote:
> > Or should credential delegation be viewed as an incremental feature that we
> > can support or not?
> 
> That seems like the best way forward here.

Yeah, that's certainly doable too, though I'm really not sure we should
be accepting OSX's GSSAPI library and that might really be the only case
at issue here.  Either way, I've reverted it and will see about picking
it up for the next cycle (again) and hopefully be able to work through
these issues either by having it be optional or, if NetBSD and Heimdal
actually support all the APIs just with different headers, perhaps
deciding we're willing to require them.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: daitch_mokotoff module

2023-04-08 Thread Andrew Dunstan


On 2023-04-07 Fr 23:25, Tom Lane wrote:

I wrote:

Anyway, I assume this is just syntactic sugar for something
we can do another way?  If it's at all fundamental, I'll have
to back the patch out.

On closer inspection, this script is completely devoid of any
need to deal in non-ASCII data at all.  So I just nuked the
"use" lines.





Yeah.

I just spent a little while staring at the perl code. I have to say it 
seems rather opaque, the data structure seems a bit baroque. I'll try to 
simplify it.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: PostgreSQL 16 Release Management Team & Feature Freeze

2023-04-08 Thread Jonathan S. Katz

On 4/6/23 5:37 PM, Jonathan S. Katz wrote:

On 3/21/23 11:35 AM, Jonathan S. Katz wrote:

Additionally, the RMT has set the feature freeze to be **April 8, 2023 
at 0:00 AoE**[1]. This is the last time to commit features for 
PostgreSQL 16. In  other words, no new PostgreSQL 16 feature can be 
committed after April 8, 2023 at 0:00 AoE.


This is a reminder that feature freeze is rapidly approach. The freeze 
begins at April 8, 2023 at 0:00 AoE. No new PostgreSQL 16 features can 
be committed after this time.


The feature freeze for PostgreSQL 16 has begun.

Thank you everyone for all of your hard work and contributions towards 
this release! Now we can begin to prepare for the beta period.


Thanks,

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature


Re: Making background psql nicer to use in tap tests

2023-04-08 Thread Tom Lane
=?UTF-8?Q?Mikael_Kjellstr=c3=b6m?=  writes:
> So what do you want me to do about grison and morepork?  I guess I could 
> try to install a newer version of IPC::Run from CPAN or should I just 
> leave it be?

I think "leave it be" is fine.

regards, tom lane




Re: Partial aggregates pushdown

2023-04-08 Thread Bruce Momjian
On Fri, Apr  7, 2023 at 09:16:14PM -0700, Andres Freund wrote:
> On 2023-04-07 22:53:53 -0400, Bruce Momjian wrote:
> > > postgres_fdw has no business pushing down calls to non-builtin functions
> > > unless the user has explicitly authorized that via the existing
> > > whitelisting mechanism.  I think you're reinventing the wheel,
> > > and not very well.
> > 
> > The patch has you assign an option at CREATE AGGREGATE time if it can do
> > push down, and postgres_fdw checks that.  What whitelisting mechanism
> > are you talking about?  async_capable?
> 
> extensions (string)
> 
> This option is a comma-separated list of names of PostgreSQL extensions 
> that are installed, in compatible versions, on both the local and remote 
> servers. Functions and operators that are immutable and belong to a listed 
> extension will be considered shippable to the remote server. This option can 
> only be specified for foreign servers, not per-table.
> 
> When using the extensions option, it is the user's responsibility that 
> the listed extensions exist and behave identically on both the local and 
> remote servers. Otherwise, remote queries may fail or behave unexpectedly.

Okay, this is very helpful --- it is exactly the issue we are dealing
with --- how can we know if partial aggregate functions exists on the
remote server.  (I knew I was going to need API help on this.)

So, let's remove the PARTIALAGG_MINVERSION option from the patch and
just make it automatic --- we push down builtin partial aggregates if
the remote server is the same or newer _major_ version than the sending
server.  For extensions, if people have older extensions on the same or
newer foreign servers, they can adjust 'extensions' above.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.




Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

2023-04-08 Thread Matthias van de Meent
On Fri, 7 Apr 2023 at 08:05, Michael Paquier  wrote:
>
> On Fri, Apr 07, 2023 at 08:59:22AM +0900, Michael Paquier wrote:
> > Okay, cool!
>
> Done this one with 8fcb32d.

Thanks a lot! I'll post the separation of record construction and
write-out to xlog in a future thread for 17.

One remaining question: Considering that the changes and checks of
that commit are mostly internal to xloginsert.c (or xlog.c in older
releases), and that no special public-facing changes were made, would
it be safe to backport this to older releases?

PostgreSQL 15 specifically would benefit from this as it supports
external rmgrs which may generate WAL records and would benefit from
these additional checks, but all supported releases of PostgreSQL have
pg_logical_emit_message and are thus easily subject to the issue of
writing oversized WAL records and subsequent recovery- and replication
stream failures.

Kind regards,

Matthias van de Meent




Re: is_superuser is not documented

2023-04-08 Thread Joseph Koshakow
On Mon, Apr 3, 2023 at 10:47 AM Fujii Masao 
wrote:
>Yes, the patch has not been committed yet because of lack of review
comments.
>Do you have any review comments on this patch?
>Or you think it's ready for committer?

I'm not very familiar with this code, so I'm not sure how much my
review is worth, but maybe it will spark some discussion.

> Yes, this patch moves the descriptions of is_superuser to config.sgml
> and changes its group to PRESET_OPTIONS.

is_superuser feels a little out of place in this file. All of
the options here apply to the entire PostgreSQL server, while
is_superuser only applies to the current session. The description of
this file says :

> These options report various aspects of PostgreSQL behavior that
> might be of interest to certain applications, particularly
> administrative front-ends. Most of them are determined when
> PostgreSQL is compiled or when it is installed.

Which doesn't seem to apply to is_superuser. It doesn't affect
the behavior of PostgreSQL, only what the current session is allowed to
do. It's also not determined when PostgreSQL is compiled/installed. Is
there some update that we can make to the description that would make
is_superuser fit in better?

I'm not familiar with the origins of is_superuser and it may be too
late for this, but it seems like is_superuser would fit in much better
as a system information function [0] rather than a GUC. Particularly
in the Session Information Functions.

> - GUC_REPORT | GUC_NO_SHOW_ALL | GUC_NO_RESET_ALL | GUC_NOT_IN_SAMPLE |
GUC_DISALLOW_IN_FILE
> + GUC_REPORT | GUC_NOT_IN_SAMPLE | GUC_DISALLOW_IN_FILE

This looks good to me. The lack of is_superuser from SHOW ALL has been
a source of confusion to me in the past.

As a side note server_version, server_encoding, lc_collate, and
lc_ctype all appear in both the preset options section of config.sgml
and in show.sgml. I'm not sure what the logic is for just including
these three parameters in show.sgml, but I think we should either
include all of the preset options or none of them.

Thanks,
Joe Koshakow

[0] https://www.postgresql.org/docs/current/functions-info.html


Re: broken master branch

2023-04-08 Thread Tom Lane
Thomas Munro  writes:
> On Sat, Apr 8, 2023 at 8:04 PM Pavel Stehule  wrote:
>> on fresh Fedora 38, I cannot to run regress tests

> Looks like the new LLVM 16.  I'll try to look at this again next week.
> In the meantime you could try using 15.

I've become entirely desensitized to seawasp failing, which is probably
a bad thing, but today I happened to look at it and discovered that
its compiler has been dumping core for some time now:

clang: /home/fabien/llvm-src/llvm/lib/Transforms/Scalar/SROA.cpp:1745: 
llvm::Value* getAdjustedPtr({anonymous}::IRBuilderTy&, const llvm::DataLayout&, 
llvm::Value*, llvm::APInt, llvm::Type*, const llvm::Twine&): Assertion 
`Ptr->getType()->isOpaquePointerTy() && "Only opaque pointers supported"' 
failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and 
include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.  Program arguments: /home/fabien/clgtk/bin/clang -Wno-ignored-attributes 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -Xclang 
-no-opaque-pointers -Wno-unused-command-line-argument 
-Wno-compound-token-split-by-macro -O2 -I../../src/include -D_GNU_SOURCE 
-I/usr/include/libxml2 -flto=thin -emit-llvm -c -o strftime.bc strftime.c
1.   parser at end of file
2.  Optimizer

Seems like we ought to look into that, and report it as requested.

regards, tom lane




Re: Commitfest 2023-03 starting tomorrow!

2023-04-08 Thread Greg Stark
So here we are at the end of the CF:

 Status summary: March 15March 22March 28April 4 April 8
   Needs review: 152 128 116  96  74
   Waiting on Author: 42  36  30  21  14
   Ready for Committer:   39  32  32  35  27
   Committed: 61  82  94 101 124
   Moved to next CF:   4  15  17  28  39
   Withdrawn: 17  16  18  20  10
   Rejected:   0   5   6   8   9
   Returned with Feedback: 4   5   6  10  22
 Total: 319.

I'm now going to go through and:

a) Mark Waiting on Author any patches that aren't building Waiting on
Author. There was some pushback about asking authors to do trivial
rebases but in three months it won't make sense to start a CF with
already-non-applying patches.

b) Mark any patches that received solid feedback in the last week with
either Returned with Feedback or Rejected. I think this was already
done though with 12 RwF and 1 Rejected in the past four days alone.

c) Pick out the Bug Fixes, cleanup patches, and other non-feature
patches that might be open issues for v16.

d) Move to Next CF any patches that remain.

-- 
greg




Re: Partial aggregates pushdown

2023-04-08 Thread Bruce Momjian
On Sat, Apr  8, 2023 at 10:15:40AM -0400, Bruce Momjian wrote:
> On Fri, Apr  7, 2023 at 09:16:14PM -0700, Andres Freund wrote:
> > extensions (string)
> > 
> > This option is a comma-separated list of names of PostgreSQL extensions 
> > that are installed, in compatible versions, on both the local and remote 
> > servers. Functions and operators that are immutable and belong to a listed 
> > extension will be considered shippable to the remote server. This option 
> > can only be specified for foreign servers, not per-table.
> > 
> > When using the extensions option, it is the user's responsibility that 
> > the listed extensions exist and behave identically on both the local and 
> > remote servers. Otherwise, remote queries may fail or behave unexpectedly.
> 
> Okay, this is very helpful --- it is exactly the issue we are dealing
> with --- how can we know if partial aggregate functions exists on the
> remote server.  (I knew I was going to need API help on this.)
> 
> So, let's remove the PARTIALAGG_MINVERSION option from the patch and
> just make it automatic --- we push down builtin partial aggregates if
> the remote server is the same or newer _major_ version than the sending
> server.  For extensions, if people have older extensions on the same or
> newer foreign servers, they can adjust 'extensions' above.

Looking further, I don't see any cases where we check if a builtin
function added in a major release also exists on the foreign server, so
maybe we don't need any checks but just need a mention in the release
notes.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Embrace your flaws.  They make you human, rather than perfect,
  which you will never be.




Re: Making background psql nicer to use in tap tests

2023-04-08 Thread Mikael Kjellström



On 2023-04-08 09:53, Daniel Gustafsson wrote:


I recommend
just setting up this one test to SKIP if IPC::Run is too old.


Yes, will do that when I have a little more time at hand for monitoring the BF
later today.


So what do you want me to do about grison and morepork?  I guess I could 
try to install a newer version of IPC::Run from CPAN or should I just 
leave it be?


/Mikael





Re: Parallel Full Hash Join

2023-04-08 Thread Tom Lane
Thomas Munro  writes:
> I committed the main patch.

BTW, it was easy to miss in all the buildfarm noise from
last-possible-minute patches, but chimaera just showed something
that looks like a bug in this code [1]:

2023-04-08 12:25:28.709 UTC [18027:321] pg_regress/join_hash LOG:  statement: 
savepoint settings;
2023-04-08 12:25:28.709 UTC [18027:322] pg_regress/join_hash LOG:  statement: 
set local max_parallel_workers_per_gather = 2;
2023-04-08 12:25:28.710 UTC [18027:323] pg_regress/join_hash LOG:  statement: 
explain (costs off)
 select  count(*) from simple r full outer join simple s on (r.id = 
0 - s.id);
2023-04-08 12:25:28.710 UTC [18027:324] pg_regress/join_hash LOG:  statement: 
select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
TRAP: failed Assert("BarrierParticipants(&batch->batch_barrier) == 1"), File: 
"nodeHash.c", Line: 2118, PID: 19147
postgres: parallel worker for PID 18027 (ExceptionalCondition+0x84)[0x10ae2bfa4]
postgres: parallel worker for PID 18027 
(ExecParallelPrepHashTableForUnmatched+0x224)[0x10aa67544]
postgres: parallel worker for PID 18027 (+0x3db868)[0x10aa6b868]
postgres: parallel worker for PID 18027 (+0x3c4204)[0x10aa54204]
postgres: parallel worker for PID 18027 (+0x3c81b8)[0x10aa581b8]
postgres: parallel worker for PID 18027 (+0x3b3d28)[0x10aa43d28]
postgres: parallel worker for PID 18027 
(standard_ExecutorRun+0x208)[0x10aa39768]
postgres: parallel worker for PID 18027 (ParallelQueryMain+0x2bc)[0x10aa4092c]
postgres: parallel worker for PID 18027 (ParallelWorkerMain+0x660)[0x10a874870]
postgres: parallel worker for PID 18027 
(StartBackgroundWorker+0x2a8)[0x10ab8abf8]
postgres: parallel worker for PID 18027 (+0x50290c)[0x10ab9290c]
postgres: parallel worker for PID 18027 (+0x5035e4)[0x10ab935e4]
postgres: parallel worker for PID 18027 (PostmasterMain+0x1304)[0x10ab96334]
postgres: parallel worker for PID 18027 (main+0x86c)[0x10a79daec]

regards, tom lane

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chimaera&dt=2023-04-08%2012%3A07%3A08




Re: Minimal logical decoding on standbys

2023-04-08 Thread Jonathan S. Katz

On 4/8/23 5:27 AM, Andres Freund wrote:

Hi,

On 2023-04-07 14:27:09 -0700, Andres Freund wrote:

I think I'll push these in a few hours. While this needed more changes than
I'd like shortly before the freeze, I think they're largely not in very
interesting bits and pieces - and this feature has been in the works for about
three eternities, and it is blocking a bunch of highly requested features.

If anybody still has energy, I would appreciate a look at 0001, 0002, the new
pieces I added, to make what's now 0003 and 0004 cleaner.


Pushed. Thanks all!

I squashed some of the changes. There didn't seem to be a need for a separate
stats commit, or a separate docs commit. Besides that, I did find plenty of
grammar issues, and a bunch of formatting issues.

Let's see what the buildfarm says.


Thanks to everyone for working on this feature -- this should have a big 
impact on users of logical replication!


While it still needs to get through the beta period etc. this is a big 
milestone for what's been a multi-year effort to support this.


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: Parallel Full Hash Join

2023-04-08 Thread Melanie Plageman
On Sat, Apr 8, 2023 at 12:33 PM Tom Lane  wrote:
>
> Thomas Munro  writes:
> > I committed the main patch.
>
> BTW, it was easy to miss in all the buildfarm noise from
> last-possible-minute patches, but chimaera just showed something
> that looks like a bug in this code [1]:
>
> 2023-04-08 12:25:28.709 UTC [18027:321] pg_regress/join_hash LOG:  statement: 
> savepoint settings;
> 2023-04-08 12:25:28.709 UTC [18027:322] pg_regress/join_hash LOG:  statement: 
> set local max_parallel_workers_per_gather = 2;
> 2023-04-08 12:25:28.710 UTC [18027:323] pg_regress/join_hash LOG:  statement: 
> explain (costs off)
>  select  count(*) from simple r full outer join simple s on (r.id 
> = 0 - s.id);
> 2023-04-08 12:25:28.710 UTC [18027:324] pg_regress/join_hash LOG:  statement: 
> select  count(*) from simple r full outer join simple s on (r.id = 0 - s.id);
> TRAP: failed Assert("BarrierParticipants(&batch->batch_barrier) == 1"), File: 
> "nodeHash.c", Line: 2118, PID: 19147
> postgres: parallel worker for PID 18027 
> (ExceptionalCondition+0x84)[0x10ae2bfa4]
> postgres: parallel worker for PID 18027 
> (ExecParallelPrepHashTableForUnmatched+0x224)[0x10aa67544]
> postgres: parallel worker for PID 18027 (+0x3db868)[0x10aa6b868]
> postgres: parallel worker for PID 18027 (+0x3c4204)[0x10aa54204]
> postgres: parallel worker for PID 18027 (+0x3c81b8)[0x10aa581b8]
> postgres: parallel worker for PID 18027 (+0x3b3d28)[0x10aa43d28]
> postgres: parallel worker for PID 18027 
> (standard_ExecutorRun+0x208)[0x10aa39768]
> postgres: parallel worker for PID 18027 (ParallelQueryMain+0x2bc)[0x10aa4092c]
> postgres: parallel worker for PID 18027 
> (ParallelWorkerMain+0x660)[0x10a874870]
> postgres: parallel worker for PID 18027 
> (StartBackgroundWorker+0x2a8)[0x10ab8abf8]
> postgres: parallel worker for PID 18027 (+0x50290c)[0x10ab9290c]
> postgres: parallel worker for PID 18027 (+0x5035e4)[0x10ab935e4]
> postgres: parallel worker for PID 18027 (PostmasterMain+0x1304)[0x10ab96334]
> postgres: parallel worker for PID 18027 (main+0x86c)[0x10a79daec]

Having not done much debugging on buildfarm animals before, I don't
suppose there is any way to get access to the core itself? I'd like to
see how many participants the batch barrier had at the time of the
assertion failure. I assume it was 2, but I just wanted to make sure I
understand the race.

- Melanie




Re: longfin missing gssapi_ext.h

2023-04-08 Thread Tom Lane
Stephen Frost  writes:
> I suspected there would be an issue with OSX but hadn't expected an
> issue with NetBSD.  I had tested this across a few Linux platforms and
> cfbot showed it wasn't causing issues on Windows or the platforms that
> are run there.  Would be really great to have a way to test these things
> out on these other platforms other than just committing them and seeing
> what happens on the buildfarm.

I poked around a bit more and found that:

* NetBSD's package collection[1] includes both Heimdal and MIT Kerberos
(mit-krb5).  Apparently what's installed on at least some of the buildfarm
animals is the former.

* FreeBSD seems to offer *only* Heimdal [2]; OpenBSD ditto [3].

* I cannot find any sign of either gss_store_cred_into or gssapi_ext.h
in FreeBSD's Heimdal (7.8.0_6).

So it does not look like supporting Heimdal is going to be optional,
and that means the credential delegation feature is going to have
to be optional, or else we need to find some equivalent Heimdal APIs.

I share your feeling that we could probably blow off Apple's built-in
GSSAPI.  MacPorts offers both Heimdal and kerberos5, and I imagine
Homebrew has at least one of them, so Mac people could easily get
hold of newer implementations.  But the BSDen are going to be a
problem.

regards, tom lane

[1] https://cdn.netbsd.org/pub/pkgsrc/current/pkgsrc/security/index.html
[2] https://ports.freebsd.org/cgi/ports.cgi?query=kerberos&stype=all&sektion=all
[3] https://cdn.openbsd.org/pub/OpenBSD/snapshots/packages/amd64/




Re: Parallel Full Hash Join

2023-04-08 Thread Tom Lane
Melanie Plageman  writes:
> Having not done much debugging on buildfarm animals before, I don't
> suppose there is any way to get access to the core itself? I'd like to
> see how many participants the batch barrier had at the time of the
> assertion failure. I assume it was 2, but I just wanted to make sure I
> understand the race.

I don't know about chimaera in particular, but buildfarm animals are
not typically configured to save any build products.  They'd run out
of disk space after awhile :-(.

If you think the number of participants would be useful data, I'd
suggest replacing that Assert with an elog() that prints what you
want to know.

regards, tom lane




Re: longfin missing gssapi_ext.h

2023-04-08 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > I suspected there would be an issue with OSX but hadn't expected an
> > issue with NetBSD.  I had tested this across a few Linux platforms and
> > cfbot showed it wasn't causing issues on Windows or the platforms that
> > are run there.  Would be really great to have a way to test these things
> > out on these other platforms other than just committing them and seeing
> > what happens on the buildfarm.
> 
> I poked around a bit more and found that:
> 
> * NetBSD's package collection[1] includes both Heimdal and MIT Kerberos
> (mit-krb5).  Apparently what's installed on at least some of the buildfarm
> animals is the former.
> 
> * FreeBSD seems to offer *only* Heimdal [2]; OpenBSD ditto [3].
> 
> * I cannot find any sign of either gss_store_cred_into or gssapi_ext.h
> in FreeBSD's Heimdal (7.8.0_6).
> 
> So it does not look like supporting Heimdal is going to be optional,
> and that means the credential delegation feature is going to have
> to be optional, or else we need to find some equivalent Heimdal APIs.

Thanks for doing that digging!

I've been looking too and while Heimdal added gss_store_cred_into in
their development branch 5 years ago[1] (!), it's not made it into an
actual release.  Good that they seem to at least be maintaining it
enough to deal with CVEs, but unfortunately I'm fairly confident that
there won't be a way to support constrained delegation (which is the
next goal, once unconstrained delegation is in and working) on the
Heimdal platforms.  I suspected that would have to be optional anyway,
but I hadn't expected it to hit all the BSD platforms.

In any case, for this I'm working switching over to gss_store_cred()
which does seem to be available in the Heimdal Debian packages that I
was able to install locally (looks to be 7.7.0) and should work just
fine for these purposes, though it requires a bit more work on the
libpq side as we need to tell libpq explicitly the name which was on
the delegated credential when we call gss_acquire_cred().

Once that's done, should be able to drop the gssapi_ext.h include
entirely and still have the test suite able to run with MIT Kerberos.

One thing I'm on the fence about is trying to make the test suite
actually work with Heimdal..  I'm planning to install the Heimdal KDC,
et al, and see what happens but if it ends up looking like it's a lot of
work then I might forgo that effort.  I'm not sure it's really necessary
but I could be argued out of that position without too much effort.  The
stated Heimdal goal is to be a re-implementation of MIT Kerberos and
these are all documented APIs with RFCs, after all.

> I share your feeling that we could probably blow off Apple's built-in
> GSSAPI.  MacPorts offers both Heimdal and kerberos5, and I imagine
> Homebrew has at least one of them, so Mac people could easily get
> hold of newer implementations.  But the BSDen are going to be a
> problem.

Yeah.  Unfortunate that Heimdal doesn't seem to really be moving forward
in terms of new development.

Thanks,

Stephen

[1] 
https://github.com/heimdal/heimdal/commit/e0bb9c10cad0fd98245caecf8af8fca855b2df49


signature.asc
Description: PGP signature


Re: Direct I/O

2023-04-08 Thread Andres Freund
Hi,

On 2023-04-07 23:04:08 -0700, Andres Freund wrote:
> There were some failures in CI (e.g. [1] (and perhaps also bf, didn't yet
> check), about "no unpinned buffers available".  I was worried for a moment
> that this could actually be relation to the bulk extension patch.
> 
> But it looks like it's older - and not caused by direct_io support (except by
> way of the test existing). I reproduced the issue locally by setting s_b even
> lower, to 16 and made the ERROR a PANIC.
>
> [backtrace]
> 
> If you look at log_newpage_range(), it's not surprising that we get this error
> - it pins up to 32 buffers at once.
> 
> Afaics log_newpage_range() originates in 9155580fd5fc, but this caller is from
> c6b92041d385.
> 
> 
> It doesn't really seem OK to me to unconditionally pin 32 buffers. For the
> relation extension patch I introduced LimitAdditionalPins() to deal with this
> concern. Perhaps it needs to be exposed and log_newpage_buffers() should use
> it?
> 
> 
> Do we care about fixing this in the backbranches? Probably not, given there
> haven't been user complaints?

Here's a quick prototype of this approach. If we expose LimitAdditionalPins(),
we'd probably want to add "Buffer" to the name, and pass it a relation, so
that it can hand off LimitAdditionalLocalPins() when appropriate? The callsite
in question doesn't need it, but ...

Without the limiting of pins the modified 004_io_direct.pl fails 100% of the
time for me.

Presumably the reason it fails occasionally with 256kB of shared buffers
(i.e. NBuffers=32) is that autovacuum or checkpointer briefly pins a single
buffer. As log_newpage_range() thinks it can just pin 32 buffers
unconditionally, it fails in that case.

Greetings,

Andres Freund
diff --git i/src/include/storage/bufmgr.h w/src/include/storage/bufmgr.h
index 6ab00daa2ea..e5788309c86 100644
--- i/src/include/storage/bufmgr.h
+++ w/src/include/storage/bufmgr.h
@@ -223,6 +223,7 @@ extern void DropRelationBuffers(struct SMgrRelationData *smgr_reln,
 extern void DropRelationsAllBuffers(struct SMgrRelationData **smgr_reln,
 	int nlocators);
 extern void DropDatabaseBuffers(Oid dbid);
+extern void LimitAdditionalPins(uint32 *additional_pins);
 
 #define RelationGetNumberOfBlocks(reln) \
 	RelationGetNumberOfBlocksInFork(reln, MAIN_FORKNUM)
diff --git i/src/backend/access/transam/xloginsert.c w/src/backend/access/transam/xloginsert.c
index e2a5a3d13ba..2189fc9f71f 100644
--- i/src/backend/access/transam/xloginsert.c
+++ w/src/backend/access/transam/xloginsert.c
@@ -1268,8 +1268,8 @@ log_newpage_range(Relation rel, ForkNumber forknum,
 
 	/*
 	 * Iterate over all the pages in the range. They are collected into
-	 * batches of XLR_MAX_BLOCK_ID pages, and a single WAL-record is written
-	 * for each batch.
+	 * batches of up to XLR_MAX_BLOCK_ID pages, and a single WAL-record is
+	 * written for each batch.
 	 */
 	XLogEnsureRecordSpace(XLR_MAX_BLOCK_ID - 1, 0);
 
@@ -1278,14 +1278,18 @@ log_newpage_range(Relation rel, ForkNumber forknum,
 	{
 		Buffer		bufpack[XLR_MAX_BLOCK_ID];
 		XLogRecPtr	recptr;
-		int			nbufs;
+		uint32		limit = XLR_MAX_BLOCK_ID;
+		int		nbufs;
 		int			i;
 
 		CHECK_FOR_INTERRUPTS();
 
+		/* avoid running out pinnable buffers */
+		LimitAdditionalPins(&limit);
+
 		/* Collect a batch of blocks. */
 		nbufs = 0;
-		while (nbufs < XLR_MAX_BLOCK_ID && blkno < endblk)
+		while (nbufs < limit && blkno < endblk)
 		{
 			Buffer		buf = ReadBufferExtended(rel, forknum, blkno,
  RBM_NORMAL, NULL);
diff --git i/src/backend/storage/buffer/bufmgr.c w/src/backend/storage/buffer/bufmgr.c
index 7778dde3e57..31c75d6240e 100644
--- i/src/backend/storage/buffer/bufmgr.c
+++ w/src/backend/storage/buffer/bufmgr.c
@@ -1742,7 +1742,7 @@ again:
  * pessimistic, but outside of toy-sized shared_buffers it should allow
  * sufficient pins.
  */
-static void
+void
 LimitAdditionalPins(uint32 *additional_pins)
 {
 	uint32		max_backends;
diff --git i/src/test/modules/test_misc/t/004_io_direct.pl w/src/test/modules/test_misc/t/004_io_direct.pl
index f5bf0b11e4e..5791c2ab7bd 100644
--- i/src/test/modules/test_misc/t/004_io_direct.pl
+++ w/src/test/modules/test_misc/t/004_io_direct.pl
@@ -22,7 +22,7 @@ $node->init;
 $node->append_conf(
 	'postgresql.conf', qq{
 io_direct = 'data,wal,wal_init'
-shared_buffers = '256kB' # tiny to force I/O
+shared_buffers = '16' # tiny to force I/O
 });
 $node->start;
 


Re: Parallel Full Hash Join

2023-04-08 Thread Melanie Plageman
On Sat, Apr 8, 2023 at 1:30 PM Melanie Plageman
 wrote:
>
> On Sat, Apr 8, 2023 at 12:33 PM Tom Lane  wrote:
> >
> > Thomas Munro  writes:
> > > I committed the main patch.
> >
> > BTW, it was easy to miss in all the buildfarm noise from
> > last-possible-minute patches, but chimaera just showed something
> > that looks like a bug in this code [1]:
> >
> > 2023-04-08 12:25:28.709 UTC [18027:321] pg_regress/join_hash LOG:  
> > statement: savepoint settings;
> > 2023-04-08 12:25:28.709 UTC [18027:322] pg_regress/join_hash LOG:  
> > statement: set local max_parallel_workers_per_gather = 2;
> > 2023-04-08 12:25:28.710 UTC [18027:323] pg_regress/join_hash LOG:  
> > statement: explain (costs off)
> >  select  count(*) from simple r full outer join simple s on 
> > (r.id = 0 - s.id);
> > 2023-04-08 12:25:28.710 UTC [18027:324] pg_regress/join_hash LOG:  
> > statement: select  count(*) from simple r full outer join simple s on (r.id 
> > = 0 - s.id);
> > TRAP: failed Assert("BarrierParticipants(&batch->batch_barrier) == 1"), 
> > File: "nodeHash.c", Line: 2118, PID: 19147

So, after staring at this for awhile, I suspect this assertion is just
plain wrong. BarrierArriveAndDetachExceptLast() contains this code:

if (barrier->participants > 1)
{
--barrier->participants;
SpinLockRelease(&barrier->mutex);

return false;
}
Assert(barrier->participants == 1);

So in between this assertion and the one we tripped,

if (!BarrierArriveAndDetachExceptLast(&batch->batch_barrier))
{
...
return false;
}

/* Now we are alone with this batch. */
Assert(BarrierPhase(&batch->batch_barrier) == PHJ_BATCH_SCAN);
Assert(BarrierParticipants(&batch->batch_barrier) == 1);

Another worker attached to the batch barrier, saw that it was in
PHJ_BATCH_SCAN, marked it done and detached. This is fine.
BarrierArriveAndDetachExceptLast() is meant to ensure no one waits
(deadlock hazard) and that at least one worker stays to do the unmatched
scan. It doesn't hurt anything for another worker to join and find out
there is no work to do.

We should simply delete this assertion.

- Melanie




Re: longfin missing gssapi_ext.h

2023-04-08 Thread Tom Lane
I wrote:
> * NetBSD's package collection[1] includes both Heimdal and MIT Kerberos
> (mit-krb5).  Apparently what's installed on at least some of the buildfarm
> animals is the former.

Oh!  New data: the core NetBSD OS includes a copy of Heimdal (looks
to be 7.7.0 in the 10.0_BETA sources).  The installable package is
a slightly newer version, 7.8.0, but I think it's a very solid bet
that the relevant buildfarm animals are just using the core copy
and haven't installed the add-on package.  Even if they had, it
would take some fooling around with include and link paths to pull
in the packaged version rather than the built-in one.

The exact same thing applies to FreeBSD, except that their in-core
Heimdal is ancient (1.5.2).  Also, they do have MIT Kerberos
available as a package [1].  I'd been misled by the lack of a hit
on "kerberos", but "krb5" finds it.  Our code does compile against
that version of Heimdal, but src/test/kerberos/ refuses to try to
run.

I've not dug into OpenBSD any further.

regards, tom lane

[1] https://ports.freebsd.org/cgi/ports.cgi?query=krb5&stype=all&sektion=all




Re: Direct I/O

2023-04-08 Thread Andres Freund
Hi,

Given the frequency of failures on this in the buildfarm, I propose using the
temporary workaround of using wal_level=replica. That avoids the use of the
over-eager log_newpage_range().

Greetings,

Andres Freund




When to drop src/tools/msvc support

2023-04-08 Thread Andres Freund
Hi,

I'd planned to write this soon anyway, but it was just brought up in [1].

Originally we had planned to drop src/tools/msvc support shortly after meson
went in. Unfortunately, it took a bit longer than originally hoped for to
merge meson support and then longer than hoped to add buildfarm support. I
don't think there's been any buildfarm client release with meson support yet -
but we do have windows buildfarm animals using it.

Do we want to drop src/tools/msvc support in 16 (i.e. now), or do it early in
17?

I do have a set of patches removing src/tools/msvc. There are a few loose ends
that I know of (my eyes glaze over every time I try to reconcile the
src/tools/perl.m4 comments with the referenced comments in Mkvcbuild.pm), and
probably more small references that grep terms didn't find.

Greetings,

Andres Freund

[1] https://postgr.es/m/3598664.1680976414%40sss.pgh.pa.us




Re: When to drop src/tools/msvc support

2023-04-08 Thread Jonathan S. Katz

On 4/8/23 3:10 PM, Andres Freund wrote:

Hi,

I'd planned to write this soon anyway, but it was just brought up in [1].

Originally we had planned to drop src/tools/msvc support shortly after meson
went in. Unfortunately, it took a bit longer than originally hoped for to
merge meson support and then longer than hoped to add buildfarm support. I
don't think there's been any buildfarm client release with meson support yet -
but we do have windows buildfarm animals using it.

Do we want to drop src/tools/msvc support in 16 (i.e. now), or do it early in
17?

I do have a set of patches removing src/tools/msvc. There are a few loose ends
that I know of (my eyes glaze over every time I try to reconcile the
src/tools/perl.m4 comments with the referenced comments in Mkvcbuild.pm), and
probably more small references that grep terms didn't find.


(reads [1])

Can we treat this as an "open item" for completing the transition to 
meson for builds as part of v16?


With my personal hat on, it seems silly to wait until v17 to do this, 
and I don't see why we would want to wait. If there's limited risk in 
doing this and it'll make our builds both more stable + faster, it seems 
like we should just do it.


Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: When to drop src/tools/msvc support

2023-04-08 Thread Tom Lane
Andres Freund  writes:
> Do we want to drop src/tools/msvc support in 16 (i.e. now), or do it early in
> 17?

On the one hand, it feels like something we shouldn't do after
feature freeze.  On the other hand, continuing to maintain three
build systems is a real drag (although you could argue that there
shouldn't be much churn there until the tree opens for 17).

We clearly can't consider it in any case until the buildfarm
is prepared, with all the Windows animals updated to a compatible
client script.  I don't know what timeline Andrew has in mind
for that.

I guess I'd vote for pulling the trigger in v16 if we can get that
done by the end of April.  Once we're close to beta I think it
must wait for v17 to open.

regards, tom lane




Re: Direct I/O

2023-04-08 Thread Thomas Munro
On Sun, Apr 9, 2023 at 6:55 AM Andres Freund  wrote:
> Given the frequency of failures on this in the buildfarm, I propose using the
> temporary workaround of using wal_level=replica. That avoids the use of the
> over-eager log_newpage_range().

Will do.




Re: Direct I/O

2023-04-08 Thread Tom Lane
Thomas Munro  writes:
> On Sun, Apr 9, 2023 at 6:55 AM Andres Freund  wrote:
>> Given the frequency of failures on this in the buildfarm, I propose using the
>> temporary workaround of using wal_level=replica. That avoids the use of the
>> over-eager log_newpage_range().

> Will do.

Now crake is doing this:

2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] 
004_io_direct.pl LOG:  statement: select count(*) from t1
2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR:  invalid 
page in block 56 of relation base/5/16384
2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:2] STATEMENT:  
select count(*) from t1
2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:4] 
004_io_direct.pl ERROR:  invalid page in block 56 of relation base/5/16384
2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:5] 
004_io_direct.pl STATEMENT:  select count(*) from t1
2023-04-08 16:50:03.319 EDT [2023-04-08 16:50:02 EDT 3257591:4] LOG:  
background worker "parallel worker" (PID 3257646) exited with exit code 1

The fact that the error is happening in a parallel worker seems
interesting ...

(BTW, why are the log lines doubly timestamped?)

regards, tom lane




Re: Direct I/O

2023-04-08 Thread Thomas Munro
On Sun, Apr 9, 2023 at 9:10 AM Tom Lane  wrote:
> 2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] 
> 004_io_direct.pl LOG:  statement: select count(*) from t1
> 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR:  
> invalid page in block 56 of relation base/5/16384

> The fact that the error is happening in a parallel worker seems
> interesting ...

That's because it's running with debug_parallel_query=regress.  I've
been trying to repro that but no luck...  A different kind of failure
also showed up, where it counted the wrong number of tuples:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2023-04-08%2015%3A52%3A03

A paranoid explanation would be that this system is failing to provide
basic I/O coherency, we're writing pages out and not reading them back
in.  Or of course there is a dumb bug... but why only here?  Can of
course be timing-sensitive and it's interesting that crake suffers
from the "no unpinned buffers available" thing (which should now be
gone) with higher frequency; I'm keen to see if the dodgy-read problem
continues with a similar frequency now.




Re: Direct I/O

2023-04-08 Thread Andres Freund
Hi,

On 2023-04-08 17:10:19 -0400, Tom Lane wrote:
> Thomas Munro  writes:
> Now crake is doing this:
> 
> 2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] 
> 004_io_direct.pl LOG:  statement: select count(*) from t1
> 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR:  
> invalid page in block 56 of relation base/5/16384
> 2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:2] STATEMENT:  
> select count(*) from t1
> 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:4] 
> 004_io_direct.pl ERROR:  invalid page in block 56 of relation base/5/16384
> 2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:5] 
> 004_io_direct.pl STATEMENT:  select count(*) from t1
> 2023-04-08 16:50:03.319 EDT [2023-04-08 16:50:02 EDT 3257591:4] LOG:  
> background worker "parallel worker" (PID 3257646) exited with exit code 1
> 
> The fact that the error is happening in a parallel worker seems
> interesting ...

There were a few prior instances of that error. One that I hadn't seen before
is this:

[11:35:07.190](0.001s) #   Failed test 'read back from shared'
#   at 
/home/andrew/bf/root/HEAD/pgsql/src/test/modules/test_misc/t/004_io_direct.pl 
line 43.
[11:35:07.190](0.000s) #  got: '1'
# expected: '10098'

For one it points to the arguments to is() being switched around, but that's a
sideshow.


> (BTW, why are the log lines doubly timestamped?)

It's odd.

It's also odd that it's just crake having the issue. It's just a linux host,
afaics. Andrew, is there any chance you can run that test in isolation and see
whether it reproduces? If so, does the problem vanish, if you comment out the
io_direct= in the test? Curious whether this is actually an O_DIRECT issue, or
whether it's an independent issue exposed by the new test.


I wonder if we should make the test use data checksum - if we continue to see
the wrong query results, the corruption is more likely to be in memory.

Greetings,

Andres Freund




Re: Direct I/O

2023-04-08 Thread Tom Lane
Andres Freund  writes:
> On 2023-04-08 17:10:19 -0400, Tom Lane wrote:
>> (BTW, why are the log lines doubly timestamped?)

> It's odd.

Oh, I guess that's intentional, because crake has

 'log_line_prefix = \'%m [%s %p:%l] %q%a \'',

> It's also odd that it's just crake having the issue. It's just a linux host,
> afaics.

Indeed.  I'm guessing from the compiler version that it's Fedora 37 now
(the lack of such basic information in the meson configuration output
is pretty annoying).  I've been trying to repro it here on an F37 box,
with no success, suggesting that it's very timing sensitive.  Or maybe
it's inside a VM and that matters?

regards, tom lane




Re: Direct I/O

2023-04-08 Thread Andres Freund
Hi,

On 2023-04-08 17:31:02 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-04-08 17:10:19 -0400, Tom Lane wrote:
> > It's also odd that it's just crake having the issue. It's just a linux host,
> > afaics.
> 
> Indeed.  I'm guessing from the compiler version that it's Fedora 37 now

The 15 branch says:

hostname = neoemma
uname -m = x86_64
uname -r = 6.2.8-100.fc36.x86_64
uname -s = Linux
uname -v = #1 SMP PREEMPT_DYNAMIC Wed Mar 22 19:14:19 UTC 2023

So at least the kernel claims to be 36...


> (the lack of such basic information in the meson configuration output
> is pretty annoying).

Yea, I was thinking yesterday that we should add uname output to meson's
configure (if available). I'm sure we can figure out a reasonably fast windows
command for the version, too.


> I've been trying to repro it here on an F37 box, with no success, suggesting
> that it's very timing sensitive.  Or maybe it's inside a VM and that
> matters?

Could also be filesystem specific?

Greetings,

Andres Freund




Re: Direct I/O

2023-04-08 Thread Andrew Dunstan


On 2023-04-08 Sa 17:42, Andres Freund wrote:

Hi,

On 2023-04-08 17:31:02 -0400, Tom Lane wrote:

Andres Freund  writes:

On 2023-04-08 17:10:19 -0400, Tom Lane wrote:
It's also odd that it's just crake having the issue. It's just a linux host,
afaics.

Indeed.  I'm guessing from the compiler version that it's Fedora 37 now

The 15 branch says:

hostname = neoemma
uname -m = x86_64
uname -r = 6.2.8-100.fc36.x86_64
uname -s = Linux
uname -v = #1 SMP PREEMPT_DYNAMIC Wed Mar 22 19:14:19 UTC 2023

So at least the kernel claims to be 36...



(the lack of such basic information in the meson configuration output
is pretty annoying).

Yea, I was thinking yesterday that we should add uname output to meson's
configure (if available). I'm sure we can figure out a reasonably fast windows
command for the version, too.



I've been trying to repro it here on an F37 box, with no success, suggesting
that it's very timing sensitive.  Or maybe it's inside a VM and that
matters?

Could also be filesystem specific?



I migrated it in February from a VM to a non-virtual instance. Almost 
nothing else runs on the machine. The personality info shown on the BF 
server is correct.


andrew@neoemma:~ $ cat /etc/fedora-release
Fedora release 36 (Thirty Six)
andrew@neoemma:~ $ uname -a
Linux neoemma 6.2.8-100.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 22 
19:14:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

andrew@neoemma:~ $ gcc --version
gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4)
andrew@neoemma:~ $ mount | grep home
/dev/mapper/luks-xxx on /home type btrfs 
(rw,relatime,seclabel,compress=zstd:1,ssd,discard=async,space_cache,subvolid=256,subvol=/home)



I guess it could be btrfs-specific. I'll be somewhat annoyed if I have 
to re-init the machine to use something else.



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: Direct I/O

2023-04-08 Thread Thomas Munro
On Sun, Apr 9, 2023 at 10:08 AM Andrew Dunstan  wrote:
> btrfs

Aha!




Re: Direct I/O

2023-04-08 Thread Andrew Dunstan


On 2023-04-08 Sa 17:23, Andres Freund wrote:

Hi,

On 2023-04-08 17:10:19 -0400, Tom Lane wrote:

Thomas Munro  writes:
Now crake is doing this:

2023-04-08 16:50:03.177 EDT [2023-04-08 16:50:03 EDT 3257645:3] 
004_io_direct.pl LOG:  statement: select count(*) from t1
2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:1] ERROR:  invalid 
page in block 56 of relation base/5/16384
2023-04-08 16:50:03.316 EDT [2023-04-08 16:50:03 EDT 3257646:2] STATEMENT:  
select count(*) from t1
2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:4] 
004_io_direct.pl ERROR:  invalid page in block 56 of relation base/5/16384
2023-04-08 16:50:03.317 EDT [2023-04-08 16:50:03 EDT 3257645:5] 
004_io_direct.pl STATEMENT:  select count(*) from t1
2023-04-08 16:50:03.319 EDT [2023-04-08 16:50:02 EDT 3257591:4] LOG:  background worker 
"parallel worker" (PID 3257646) exited with exit code 1

The fact that the error is happening in a parallel worker seems
interesting ...

There were a few prior instances of that error. One that I hadn't seen before
is this:

[11:35:07.190](0.001s) #   Failed test 'read back from shared'
#   at 
/home/andrew/bf/root/HEAD/pgsql/src/test/modules/test_misc/t/004_io_direct.pl 
line 43.
[11:35:07.190](0.000s) #  got: '1'
# expected: '10098'

For one it points to the arguments to is() being switched around, but that's a
sideshow.


It's also odd that it's just crake having the issue. It's just a linux host,
afaics. Andrew, is there any chance you can run that test in isolation and see
whether it reproduces? If so, does the problem vanish, if you comment out the
io_direct= in the test? Curious whether this is actually an O_DIRECT issue, or
whether it's an independent issue exposed by the new test.


I wonder if we should make the test use data checksum - if we continue to see
the wrong query results, the corruption is more likely to be in memory.



I can run the test in isolation, and it's get an error reliably.


cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: longfin missing gssapi_ext.h

2023-04-08 Thread Thomas Munro
On Sun, Apr 9, 2023 at 6:40 AM Tom Lane  wrote:
> The exact same thing applies to FreeBSD, except that their in-core
> Heimdal is ancient (1.5.2).  Also, they do have MIT Kerberos
> available as a package [1].  I'd been misled by the lack of a hit
> on "kerberos", but "krb5" finds it.  Our code does compile against
> that version of Heimdal, but src/test/kerberos/ refuses to try to
> run.

FWIW my FBSD animal elver has krb5 installed.  Sorry it wasn't running
when the relevant commit landed.  Stupid network cable wriggled out.




Re: Direct I/O

2023-04-08 Thread Thomas Munro
On Sun, Apr 9, 2023 at 10:17 AM Andrew Dunstan  wrote:
> I can run the test in isolation, and it's get an error reliably.

Random idea: it looks like you have compression enabled.  What if you
turn it off in the directory where the test runs?  Something like
btrfs property set  compression ... according to the
intergoogles.  (I have never used btrfs before 6 minutes ago but I
can't seem to repro this with basic settings in a loopback btrfs
filesystems).




Re: Direct I/O

2023-04-08 Thread Tom Lane
Thomas Munro  writes:
> On Sun, Apr 9, 2023 at 10:08 AM Andrew Dunstan  wrote:
>> btrfs

> Aha!

Googling finds a lot of suggestions that O_DIRECT doesn't play nice
with btrfs, for example

https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg92824.html

It's not clear to me how much of that lore is still current,
but it's disturbing.

regards, tom lane




Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-04-08 Thread Greg Stark
On Wed, 22 Feb 2023 at 13:38, Kirk Wolak  wrote:
>
> I already requested ONLY the HH24 format.  8 characters of output.  no 
> options.  It's a waste of time.
> After all these years, sqlplus still has only one setting (show it, or not).  
> I am asking the same here.
> And I will gladly defend not changing it!  Ever!

Yeah, well, it's kind of beside the point that you're satisfied with
this one format. We tend to think about what all users would expect
and what a complete feature would look like.

I actually tend to think this would be a nice feature. It's telling
that log files and other tracing tools tend to produce exactly this
type of output with every line prefixed with either a relative or
absolute timestamp.

I'm not sure if the *prompt* is a sensible place for it though. The
place it seems like it would be most useful is reading the output of
script executions where there would be no prompts. Perhaps it's the
command tags and \echo statements that should be timestamped.

And I think experience shows that there are three reasonable formats
for dates, the default LC_TIME format, ISO8601, and a relative
"seconds (with milliseconds) since starting". I think having a feature
that doesn't support those three would feel incomplete and eventually
need to be finished.

-- 
greg




Re: Commitfest 2023-03 starting tomorrow!

2023-04-08 Thread Gregory Stark (as CFM)
On Sat, 8 Apr 2023 at 11:37, Greg Stark  wrote:
>
> c) Pick out the Bug Fixes, cleanup patches, and other non-feature
> patches that might be open issues for v16.

So on further examination it seems there are multiple category of
patches that are worth holding onto rather than shifting to the next
release:

* Bug Fixes
* Documentation patches
* Build system patches, especially meson-related patches
* Testing patches, especially if they're testing new features
* Patches that are altering new features that were committed in this release

Some of these, especially the last category, are challenging for me to
determine. If I move forward a patch of yours that you think makes
sense to treat as an open issue that should be resolved in this
release then feel free to say.

-- 
Gregory Stark
As Commitfest Manager




Re: Commitfest 2023-03 starting tomorrow!

2023-04-08 Thread Tom Lane
"Gregory Stark (as CFM)"  writes:
> Some of these, especially the last category, are challenging for me to
> determine. If I move forward a patch of yours that you think makes
> sense to treat as an open issue that should be resolved in this
> release then feel free to say.

I think that's largely independent.  We don't look back at closed-out CFs
as a kind of TODO list; anything that's left behind there is basically
never going to be seen again, until the author makes a new CF entry.

Anything that's to be treated as an open item for v16 should get added
to the wiki page at

https://wiki.postgresql.org/wiki/PostgreSQL_16_Open_Items

It's not necessary that a patch exist to do that.

regards, tom lane




Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-04-08 Thread Tom Lane
Greg Stark  writes:
> I'm not sure if the *prompt* is a sensible place for it though. The
> place it seems like it would be most useful is reading the output of
> script executions where there would be no prompts. Perhaps it's the
> command tags and \echo statements that should be timestamped.

Hmm, that is an interesting idea.  I kind of like it, not least because
it eliminates most of the tension between wanting a complete timestamp
and wanting a short prompt.  Command tags are short enough that there's
plenty of room.

> And I think experience shows that there are three reasonable formats
> for dates, the default LC_TIME format, ISO8601, and a relative
> "seconds (with milliseconds) since starting". I think having a feature
> that doesn't support those three would feel incomplete and eventually
> need to be finished.

Yeah, I don't believe that one timestamp format is going to satisfy
everyone.  But that was especially true when trying to wedge it
into the prompt, where the need for brevity adds more constraints.

regards, tom lane




Re: Direct I/O

2023-04-08 Thread Thomas Munro
On Sun, Apr 9, 2023 at 11:05 AM Tom Lane  wrote:
> Googling finds a lot of suggestions that O_DIRECT doesn't play nice
> with btrfs, for example
>
> https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg92824.html
>
> It's not clear to me how much of that lore is still current,
> but it's disturbing.

I think that particular thing might relate to modifications of the
user buffer while a write is in progress (breaking btrfs's internal
checksums).  I don't think we should ever do that ourselves (not least
because it'd break our own checksums).  We lock the page during the
write so no one can do that, and then we sleep in a synchronous
syscall.

Here's something recent.  I guess it's probably not relevant (a fault
on our buffer that we recently touched sounds pretty unlikely), but
who knows...  (developer lists for file systems are truly terrifying
places to drive through).

https://lore.kernel.org/linux-btrfs/20230315195231.gw10...@twin.jikos.cz/T/

It's odd, though, if it is their bug and not ours: I'd expect our
friends in other databases to have hit all that sort of thing years
ago, since many comparable systems have a direct I/O knob*.  What are
we doing differently?  Are our multiple processes a factor here,
breaking some coherency logic?  Unsurprisingly, having compression on
as Andrew does actually involves buffering anyway[1] despite our
O_DIRECT flag, but maybe that's saying writes are buffered but reads
are still direct (?), which sounds like the sort of initial conditions
that might produce a coherency bug.  I dunno.

I gather that btrfs is actually Fedora's default file system (or maybe
it's just "laptops and desktops"[2]?).  I wonder if any of the several
green Fedora systems in the 'farm are using btrfs.  I wonder if they
are using different mount options (thinking again of compression).

*Probably a good reason to add a more prominent warning that the
feature is developer-only, experimental and not for production use.
I'm thinking a warning at startup or something.

[1] https://btrfs.readthedocs.io/en/latest/Compression.html
[2] https://fedoraproject.org/wiki/Changes/BtrfsByDefault




RE: PGDOCS - function pg_get_publication_tables is not documented?

2023-04-08 Thread Yu Shi (Fujitsu)
On Fri, Mar 24, 2023 6:26 AM Tom Lane  wrote:
> 
> I do see a docs change that I think would be worth making: get
> rid of the explicit mention of it in create_subscription.sgml
> in favor of using that view.
> 

I agree and I tried to modify the query to use the view.
Please see the attached patch.

Regards,
Shi Yu


v1-0001-Avoid-using-pg_get_publication_tables-in-create_s.patch
Description:  v1-0001-Avoid-using-pg_get_publication_tables-in-create_s.patch


Re: Direct I/O

2023-04-08 Thread Tom Lane
Thomas Munro  writes:
> It's odd, though, if it is their bug and not ours: I'd expect our
> friends in other databases to have hit all that sort of thing years
> ago, since many comparable systems have a direct I/O knob*.

Yeah, it seems moderately likely that it's our own bug ... but this
code's all file-system-ignorant, so how?  Maybe we are breaking some
POSIX rule that btrfs exploits but others don't?

> I gather that btrfs is actually Fedora's default file system (or maybe
> it's just "laptops and desktops"[2]?).

I have a ton of Fedora images laying about, and I doubt that any of them
use btrfs, mainly because that's not the default in the "server spin"
which is what I usually install from.  It's hard to guess about the
buildfarm, but it wouldn't surprise me that most of them are on xfs.
(If we haven't figured this out pretty shortly, I'm probably going to
put together a btrfs-on-bare-metal machine to try to duplicate crake's
results.)

regards, tom lane




Re: Direct I/O

2023-04-08 Thread Andres Freund
Hi,

On 2023-04-09 13:55:33 +1200, Thomas Munro wrote:
> I think that particular thing might relate to modifications of the
> user buffer while a write is in progress (breaking btrfs's internal
> checksums).  I don't think we should ever do that ourselves (not least
> because it'd break our own checksums).  We lock the page during the
> write so no one can do that, and then we sleep in a synchronous
> syscall.

Oh, but we actually *do* modify pages while IO is going on. I wonder if you
hit the jack pot here. The content lock doesn't prevent hint bit
writes. That's why we copy the page to temporary memory when computing
checksums.

I think we should modify the test to enable checksums - if the problem goes
away, then it's likely to be related to modifying pages while an O_DIRECT
write is ongoing...

Greetings,

Andres Freund




Re: Direct I/O

2023-04-08 Thread Thomas Munro
On Sun, Apr 9, 2023 at 2:18 PM Andres Freund  wrote:
> On 2023-04-09 13:55:33 +1200, Thomas Munro wrote:
> > I think that particular thing might relate to modifications of the
> > user buffer while a write is in progress (breaking btrfs's internal
> > checksums).  I don't think we should ever do that ourselves (not least
> > because it'd break our own checksums).  We lock the page during the
> > write so no one can do that, and then we sleep in a synchronous
> > syscall.
>
> Oh, but we actually *do* modify pages while IO is going on. I wonder if you
> hit the jack pot here. The content lock doesn't prevent hint bit
> writes. That's why we copy the page to temporary memory when computing
> checksums.

More like the jackpot hit me.

Woo, I can now reproduce this locally on a loop filesystem.
Previously I had missed a step, the parallel worker seems to be
necessary.  More soon.




Re: Use fadvise in wal replay

2023-04-08 Thread Gregory Stark (as CFM)
On Thu, 19 Jan 2023 at 18:19, Andres Freund  wrote:
>
> On 2023-01-19 22:19:10 +0100, Tomas Vondra wrote:
>
> > So I'm a bit unsure about this patch. I doesn't seem like it can perform
> > better than read-ahead (although perhaps it does, on a different storage
> > system).
>
> I really don't see the point of the patch as-is.
...
> I don't disagree fundamentally. But that doesn't make this patch a useful
> starting point.

It sounds like this patch has gotten off on the wrong foot and is not
worth moving forward to the next commitfest. Hopefully a starting over
from a different approach might target i/o that is more amenable to
fadvise. I'll mark it RwF.

-- 
Gregory Stark
As Commitfest Manager




Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

2023-04-08 Thread Pavel Stehule
ne 9. 4. 2023 v 3:54 odesílatel Tom Lane  napsal:

> Greg Stark  writes:
> > I'm not sure if the *prompt* is a sensible place for it though. The
> > place it seems like it would be most useful is reading the output of
> > script executions where there would be no prompts. Perhaps it's the
> > command tags and \echo statements that should be timestamped.
>
> Hmm, that is an interesting idea.  I kind of like it, not least because
> it eliminates most of the tension between wanting a complete timestamp
> and wanting a short prompt.  Command tags are short enough that there's
> plenty of room.
>

I don't agree so there is a common request for a short prompt. Usually I
use four terminals on screen, and still my terminal has a width of 124
characters (and I use relatively small display of my Lenovo T520). Last
years I use prompt like:

(2023-04-09 06:08:30) postgres=# select 1;
┌──┐
│ ?column? │
╞══╡
│1 │
└──┘
(1 row)

and it is working. Nice thing when I paste the timestamp in examples. I
have not any problems with prompt width

Regards

Pavel


Re: Direct I/O

2023-04-08 Thread Noah Misch
On Sat, Apr 08, 2023 at 11:08:16AM -0700, Andres Freund wrote:
> On 2023-04-07 23:04:08 -0700, Andres Freund wrote:
> > There were some failures in CI (e.g. [1] (and perhaps also bf, didn't yet
> > check), about "no unpinned buffers available".  I was worried for a moment
> > that this could actually be relation to the bulk extension patch.
> > 
> > But it looks like it's older - and not caused by direct_io support (except 
> > by
> > way of the test existing). I reproduced the issue locally by setting s_b 
> > even
> > lower, to 16 and made the ERROR a PANIC.
> >
> > [backtrace]

I get an ERROR, not a PANIC:

$ git rev-parse HEAD
2e57ffe12f6b5c1498f29cb7c0d9e17c797d9da6
$ git diff -U0
diff --git a/src/test/modules/test_misc/t/004_io_direct.pl 
b/src/test/modules/test_misc/t/004_io_direct.pl
index f5bf0b1..8f0241b 100644
--- a/src/test/modules/test_misc/t/004_io_direct.pl
+++ b/src/test/modules/test_misc/t/004_io_direct.pl
@@ -25 +25 @@ io_direct = 'data,wal,wal_init'
-shared_buffers = '256kB' # tiny to force I/O
+shared_buffers = 16
$ ./configure -C --enable-debug --enable-cassert --enable-depend 
--enable-tap-tests --with-tcl --with-python --with-perl
$ make -C src/test/modules/test_misc check PROVE_TESTS=t/004_io_direct.pl
# +++ tap check in src/test/modules/test_misc +++
t/004_io_direct.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
No subtests run 

Test Summary Report
---
t/004_io_direct.pl (Wstat: 7424 Tests: 0 Failed: 0)
  Non-zero exit status: 29
  Parse errors: No plan found in TAP output
Files=1, Tests=0,  1 wallclock secs ( 0.01 usr  0.00 sys +  0.41 cusr  0.14 
csys =  0.56 CPU)
Result: FAIL
make: *** [../../../../src/makefiles/pgxs.mk:460: check] Error 1
$ grep pinned src/test/modules/test_misc/tmp_check/log/*
src/test/modules/test_misc/tmp_check/log/004_io_direct_main.log:2023-04-08 
21:12:46.781 PDT [929628] 004_io_direct.pl ERROR:  no unpinned buffers available
src/test/modules/test_misc/tmp_check/log/regress_log_004_io_direct:error 
running SQL: 'psql::1: ERROR:  no unpinned buffers available'

No good reason to PANIC there, so the path to PANIC may be fixable.

> > If you look at log_newpage_range(), it's not surprising that we get this 
> > error
> > - it pins up to 32 buffers at once.
> > 
> > Afaics log_newpage_range() originates in 9155580fd5fc, but this caller is 
> > from
> > c6b92041d385.

> > Do we care about fixing this in the backbranches? Probably not, given there
> > haven't been user complaints?

I would not.  This is only going to come up where the user goes out of the way
to use near-minimum shared_buffers.

> Here's a quick prototype of this approach.

This looks fine.  I'm not enthusiastic about incurring post-startup cycles to
cater to allocating less than 512k*max_connections of shared buffers, but I
expect the cycles in question are negligible here.




Re: Direct I/O

2023-04-08 Thread Thomas Munro
Indeed, I can't reproduce this with (our) checksums on.  I also can't
reproduce it with O_DIRECT off.  I also can't reproduce it if I use
"mkdir pgdata && chattr +C pgdata && initdb -D pgdata" to have a
pgdata directory with copy-on-write and (their) checksums disabled.
But it reproduces quite easily with COW on (default behaviour) with
io_direct=data, debug_parallel_query=debug, create table as ...;
update ...; select count(*) ...; from that test.

Unfortunately my mental model of btrfs is extremely limited, basically
just "something a bit like ZFS".  FWIW I've been casually following
along with OpenZFS's ongoing O_DIRECT project, and I know that the
plan there is to make a temporary stable copy if checksums and other
features are on (a bit like PostgreSQL does for the same reason, as
you reminded us).  Time will tell how that works out but it *seems*
like all available modes would therefore work correctly for us, with
different tradeoffs (ie if you want the fastest zero-copy I/O, don't
use checksums, compression, etc).

Here, btrfs seems to be taking a different path that I can't quite
make out...  I see no warning/error about a checksum failure like [1],
and we apparently managed to read something other than a mix of the
old and new page contents (which, based on your hypothesis, should
just leave it indeterminate whether the hint bit changes were captured
or not, and the rest of the page should be stable, right).  It's like
the page time-travelled or got scrambled in some other way, but it
didn't tell us?  I'll try to dig further...

[1] 
https://archive.kernel.org/oldwiki/btrfs.wiki.kernel.org/index.php/Gotchas.html#Direct_IO_and_CRCs




Re: CREATE SUBSCRIPTION -- add missing tab-completes

2023-04-08 Thread Gregory Stark (as CFM)
On Fri, 7 Apr 2023 at 01:28, Amit Kapila  wrote:
>
> On Wed, Apr 5, 2023 at 5:58 AM Peter Smith  wrote:
>
> > PSA patches to add those tab completions.
>
> LGTM, so pushed.

I moved this to the next CF but actually I just noticed the thread
starts with the original patch being pushed. Maybe we should just
close the CF entry? Is this further change relevant?

-- 
Gregory Stark
As Commitfest Manager