date:20231206

2023-12-06 Thread fujii.y...@df.mitsubishielectric.co.jp

On 05.10.23 17:49, Peter Eisentraut wrote:

On 19.09.23 15:11, Peter Eisentraut wrote:
Here is an updated version of this patch set.  I resolved some 
conflicts and addressed this comment of yours.  I also dropped the one 
patch with some catalog header edits that people didn't seem to 
particularly like.

The patches that are now 0001 through 0004 were previously reviewed 
and just held for the not-null constraint patches, I think, so I'll 
commit them in a few days if there are no objections.

Patches 0005 through 0007 are also ready in my opinion, but they 
haven't really been reviewed, so this would be something for reviewers 
to focus on.  (0005 and 0007 are trivial, but they go to together with 
0006.)

The remaining 0008 and 0009 were still under discussion and 
contemplation.

I have committed through 0007, and I'll now close this patch set as 
"Committed", and I will (probably) bring back the rest (especially 0008) 
as part of a different patch set soon.

After playing with this for, well, 2 months, and considering various 
other approaches, I would like to bring back the remaining two patches 
in unchanged form.

Especially the (now) first patch "Refactor ATExecAddColumn() to use 
BuildDescForRelation()" would be very helpful for facilitating further 
refactoring in this area, because it avoids having two essentially 
duplicate pieces of code responsible for converting a ColumnDef node 
into internal form.

One of your (Álvaro's) comments about this earlier was

> Hmm, crazy.  I'm not sure I like this, because it seems much too clever.
> The number of lines that are deleted is alluring, though.
>
> Maybe it'd be better to create a separate routine that takes a single
> ColumnDef and returns the Form_pg_attribute element for it; then use
> that in both BuildDescForRelation and ATExecAddColumn.

which was also my thought at the beginning.  However, this wouldn't 
quite work that way, for several reasons.  For instance, 
BuildDescForRelation() also needs to keep track of the has_not_null 
property across all fields, and so if you split that function up, you 
would have to somehow make that an output argument and have the caller 
keep track of it.  Also, the output of BuildDescForRelation() in 
ATExecAddColumn() is passed into InsertPgAttributeTuples(), which 
requires a tuple descriptor anyway, so splitting this up into a 
per-attribute function would then require ATExecAddColumn() to 
re-assemble a tuple descriptor anyway, so this wouldn't save anything. 
Also note that ATExecAddColumn() could in theory be enhanced to add more 
than one column in one go, so having this code structure in place isn't 
inconsistent with that.

The main hackish thing, I suppose, is that we have to fix up the 
attribute number after returning from BuildDescForRelation().  I suppose 
we could address that by passing in a starting attribute number (or 
alternatively maximum existing attribute number) into 
BuildDescForRelation().  I think that would be okay; it would probably 
be about a wash in terms of code added versus saved.

The (now) second patch is also still of interest to me, but I have since 
noticed that I think [0] should be fixed first, but to make that fix 
simpler, I would like the first patch from here.

[0]: 
https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7%40eisentraut.org
From 42615f4c523920c7ae916ba0b1819cc6a5e622b1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 5 Oct 2023 16:17:16 +0200
Subject: [PATCH v4 1/2] Refactor ATExecAddColumn() to use
 BuildDescForRelation()

BuildDescForRelation() has all the knowledge for converting a
ColumnDef into pg_attribute/tuple descriptor.  ATExecAddColumn() can
make use of that, instead of duplicating all that logic.  We just pass
a one-element list of ColumnDef and we get back exactly the data
structure we need.  Note that we don't even need to touch
BuildDescForRelation() to make this work.

Discussion: 
https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e...@eisentraut.org
---
 src/backend/commands/tablecmds.c | 89 
 1 file changed, 22 insertions(+), 67 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6b0a20010e..875cfeaa5a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -6965,22 +6965,15 @@ ATExecAddColumn(List **wqueue, AlteredTableInfo *tab, 
Relation rel,
Relationpgclass,
attrdesc;
HeapTuple   reltup;
-   FormData_pg_attribute attribute;
+   Form_pg_attribute attribute;
int newattnum;
charrelkind;
-   HeapTuple   typeTuple;
-   Oid typeOid;
-   int32   typmod;
-   Oid collOid;
-   Form_pg_type tform;
Expr   *defval;
List   *children;
ListCell

RE: Partial aggregates pushdown

Hi Mr.Haas, hackers.

> From: Robert Haas 
> Sent: Tuesday, November 28, 2023 5:03 AM
> Also, I want to make one other point here about security and reliability. 
> Right now, there is no way for a user to feed
> arbitrary data to a deserialization function. Since serialization and 
> deserialization functions are only used in the context of
> parallel query, we always know that the data fed to the deserialization 
> function must have come from the serialization
> function on the same machine. Nor can users call the deserialization function 
> directly with arbitrary data of their own
> choosing, because users cannot call functions that take or return internal. 
> But with this system, it becomes possible to
> feed arbitrary data to a deserialization function.
> The user could redefine the function on the remote side so that it produces 
> arbitrary data of their choosing, and the local
> deserialization function will ingest it.
> 
> That's potentially quite a significant problem. Consider for example that 
> numericvar_deserialize() does no validity
> checking on any of the weight, sign, or dscale, but not all values for those 
> fields are legal. Right now that doesn't matter,
> but if you can feed arbitrary data to that function, then it is. I don't know 
> exactly what the consequences are if you can get
> that function to spit out a NumericVar with values outside the normal legal 
> range. What can you do then?
> Store a bogus numeric on disk? Crash the server? Worst case, some problem 
> like this could be a security issue allowing for
> escalation to superuser; more likely, it would be a crash bug, corrupt your 
> database, or lead to unexpected and strange
> error messages.
> 
> Unfortunately, I have the unpleasant suspicion that most internal-type 
> aggregates will be affected by this problem.
> Consider, for example, string_agg_deserialize(). Generally, strings are one 
> of the least-constrained data types, so you
> might hope that this function would be OK. But it doesn't look very 
> promising. The first int4 in the serialized representation
> is the cursor, which would have to be bounds-checked, lest someone provide a 
> cursor that falls outside the bounds of the
> StringInfo and, maybe, cause a reference to an arbitrary memory location. 
> Then the rest of the representation is the actual
> data, which could be anything. This function is used for both bytea and for 
> text, and for bytea, letting the payload be
> anything is OK.
> But for text, the supplied data shouldn't contain an embedded zero byte, or 
> otherwise be invalid in the server encoding. If
> it were, that would provide a vector to inject invalidly encoded data into 
> the database. This feature can't be allowed to do
> that.
I completely overlooked this issue. I should have considered the risks of 
sending raw state values or serialized state
data directly from remote to local. I apologize.

> What could be a solution to this class of problems? One option is to just 
> give up on supporting this feature for internal-type
> aggregates for now. That's easy enough to do, and just means we have less 
> functionality, but it's sad because that's
> functionality we'd like to have. Another approach is to add necessary sanity 
> checks to the relevant deserialization
> functions, but that seems a little hard to get right, and it would slow down 
> parallel query cases which are probably going to
> be more common than the use of this feature. I think the slowdown might be 
> significant, too. A third option is to change
> those aggregates in some way, like giving them a transition function that 
> operates on some data type other than internal,
> but there again we have to be careful of slowdowns. A final option is to 
> rethink the infrastructure in some way, like having
> a way to serialize to something other than bytea, for which we already have 
> input functions with adequate checks. For
> instance, if string_agg_serialize() produced a record containing an integer 
> column and a text or bytea column, we could
> attempt to ingest that record on the other side and presumably the right 
> things would happen in the case of any invalid
> data. But I'm not quite sure what infrastructure would be required to make 
> this kind of idea work.
Thank you very much for providing a direction towards resolving this issue.
As you have suggested as the last option, it seems that expanding the current 
mechanism of the aggregation
function is the only choice. It may take some time, but I will consider 
specific solutions.

> From: Robert Haas 
> Sent: Tuesday, November 28, 2023 4:08 AM
> On Wed, Nov 22, 2023 at 1:32 AM Alexander Pyhalov  
> wrote:
> > Hi. HAVING is also a problem. Consider the following query
> >
> > SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
> > foreign server as HAVING needs full aggregate result, but foreign
> > server don't know it.
> 
> I don't see it that way. What we would push to the foreig

Re: logical decoding and replication of sequences, take 2

2023-12-06 Thread Amit Kapila

On Tue, Dec 5, 2023 at 10:23 PM Tomas Vondra
 wrote:
>
> On 12/5/23 13:17, Amit Kapila wrote:
>
> > (b) for transactional
> > cases, we see overhead due to traversing all the top-level txns and
> > check the hash table for each one to find whether change is
> > transactional.
> >
>
> Not really, no. As I explained in my preceding e-mail, this check makes
> almost no difference - I did expect it to matter, but it doesn't. And I
> was a bit disappointed the global hash table didn't move the needle.
>
> Most of the time is spent in
>
> 78.81% 0.00%  postgres  postgres  [.] DecodeCommit (inlined)
>   |
>   ---DecodeCommit (inlined)
>  |
>  |--72.65%--SnapBuildCommitTxn
>  | |
>  |  --72.61%--SnapBuildBuildSnapshot
>  ||
>  | --72.09%--pg_qsort
>  ||
>  ||--66.24%--pg_qsort
>  ||  |
>
> And there's almost no difference between master and build with sequence
> decoding - see the attached diff-alter-sequence.perf, comparing the two
> branches (perf diff -c delta-abs).
>

I think in this the commit time predominates which hides the overhead.
We didn't investigate in detail if that can be improved but if we see
a similar case of abort [1], it shows the overhead of
ReorderBufferSequenceIsTransactional(). I understand that aborts won't
be frequent and it is sort of unrealistic test but still helps to show
that there is overhead in ReorderBufferSequenceIsTransactional(). Now,
I am not sure if we can ignore that case because theoretically, the
overhead can increase based on the number of top-level transactions.

[1]: 
https://www.postgresql.org/message-id/TY3PR01MB9889D457278B254CA87D1325F581A%40TY3PR01MB9889.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.

Re: SQL:2011 application time


On 02.12.23 19:41, Paul Jungwirth wrote:

So what do you think of this idea instead?:

We could add a new (optional) support function to GiST that translates 
"well-known" strategy numbers into the opclass's own strategy numbers. 
This would be support function 12. Then we can say 
translateStrategyNumber(RTEqualStrategyNumber) and look up the operator 
with the result.


There is not a performance hit, because we do this for the DDL command 
(create pk/uq/fk), then store the operator in the index/constraint.


If you don't provide this new support function, then creating the 
pk/uq/fk fails with a hint about what you can do to make it work.


This approach means we don't change the rules about GiST opclasses: you 
can still use the stranums how you like.


This function would also let me support non-range "temporal" foreign 
keys, where I'll need to build queries with && and maybe other operators.


I had some conversations about this behind the scenes.  I think this 
idea makes sense.


The other idea was that we create new strategy numbers, like 
TemporalEqualsStrategy / TemporalOverlapsStrategy.  But then you'd have 
the situation where some strategy numbers are reserved and others are 
not, so perhaps that is not so clean.  I think your idea is good.

Re: logical decoding and replication of sequences, take 2

2023-12-06 Thread Dilip Kumar

On Wed, Dec 6, 2023 at 11:12 AM Dilip Kumar  wrote:
>
> On Sun, Dec 3, 2023 at 11:22 PM Tomas Vondra
>  wrote:
> >

I was also wondering what happens if the sequence changes are
transactional but somehow the snap builder state changes to
SNAPBUILD_FULL_SNAPSHOT in between processing of the smgr_decode() and
the seq_decode() which means RelFileLocator will not be added to the
hash table and during the seq_decode() we will consider the change as
non-transactional.  I haven't fully analyzed that what is the real
problem in this case but have we considered this case? what happens if
the transaction having both ALTER SEQUENCE and nextval() gets aborted
but the nextva() has been considered as non-transactional because
smgr_decode() changes were not processed because snap builder state
was not yet SNAPBUILD_FULL_SNAPSHOT.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Clean up some signal usage mainly related to Windows

On 04.12.23 18:20, Tristan Partin wrote:

On Mon Dec 4, 2023 at 9:22 AM CST, Peter Eisentraut wrote:

On 01.12.23 23:10, Tristan Partin wrote:
> On Wed Jul 12, 2023 at 9:35 AM CDT, Tristan Partin wrote:
>> On Wed Jul 12, 2023 at 9:31 AM CDT, Peter Eisentraut wrote:
>> > On 12.07.23 16:23, Tristan Partin wrote:
>> > > It has come to my attention that STDOUT_FILENO might not be >> 
portable and
>> > > fileno(3) isn't marked as signal-safe, so I have just used the 
raw >> 1 for

>> > > stdout, which as far as I know is portable.
>> >
>> > We do use STDOUT_FILENO elsewhere in the code, and there are even 
> >> workaround definitions for Windows, so it appears it is meant to 
be used.

>>
>> v3 is back to the original patch with newline being printed. Thanks.
> > Peter, did you have anything more to say about patch 1 in this 
series?

I think that patch is correct.  However, I wonder whether we even need 
that signal handler.  We could just delete the file immediately after 
opening it; then we don't need to worry about deleting it later.  On 
Windows, we could use O_TEMPORARY instead.

I don't think that would work because the same file is opened and closed 
multiple times throughout the course of the program.

Ok, I have committed your 0001 patch.

Re: Synchronizing slots from primary to standby

2023-12-06 Thread Drouvot, Bertrand


Hi,

On 12/6/23 7:18 AM, shveta malik wrote:

On Wed, Dec 6, 2023 at 10:56 AM Amit Kapila  wrote:


I feel that is indirectly relying on the fact that the primary won't
advance logical slots unless physical standby has consumed data.


Yes, that is the basis of this discussion.


Yes.


But now on rethinking, if
the user has not set 'standby_slot_names' on primary at first pace,
then even if walreceiver on standby is down, slots on primary will
keep on advancing


Oh right, good point.

and thus we need to sync. 


Yes and I think our current check "XLogRecPtrIsInvalid(WalRcv->latestWalEnd)"
in synchronize_slots() prevents us to do so (as I think WalRcv->latestWalEnd
would be invalid for a non started walreceiver).


We have no check currently
that mandates users to set standby_slot_names.



Yeah and OTOH unset standby_slot_names is currently the only
way for users to "force" advance failover slots if they want to (in case
say the standby is down for a long time and they don't want to block logical 
decoding
on the primary) as we don't provide a way to alter the failover property
(unless connecting with replication which sounds more like a hack).


Now,
it is possible that slot-sync worker lags behind and still needs to
sync more data for slots in which it makes sense for slot-sync worker
to be alive.


Right.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: RFI: Extending the TOAST Pointer

2023-12-06 Thread Nikita Malakhov

Hi,

Here's the PoC for a custom TOAST pointer. The main idea is that custom
pointer
provides data space allowing to store custom metadata (i.e. TOAST method,
relation
OIDs, advanced compression information, etc, and even keep part of the data
inline.

Any feedback would be greatly appreciated.

--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


0001_custom_toast_pointer_v1.patch
Description: Binary data

Re: logical decoding and replication of sequences, take 2

2023-12-06 Thread Amit Kapila

On Wed, Dec 6, 2023 at 11:12 AM Dilip Kumar  wrote:
>
> On Sun, Dec 3, 2023 at 11:22 PM Tomas Vondra
>  wrote:
> >
>
> > Some time ago I floated the idea of maybe "queuing" the sequence changes
> > and only replay them on the next commit, somehow. But we did ran into
> > problems with which snapshot to use, that I didn't know how to solve.
> > Maybe we should try again. The idea is we'd queue the non-transactional
> > changes somewhere (can't be in the transaction, because we must keep
> > them even if it aborts), and then "inject" them into the next commit.
> > That'd mean we wouldn't do the separate start/abort for each change.
>
> Why can't we use the same concept of
> SnapBuildDistributeNewCatalogSnapshot(), I mean we keep queuing the
> non-transactional changes (have some base snapshot before the first
> change), and whenever there is any catalog change, queue new snapshot
> change also in the queue of the non-transactional sequence change so
> that while sending it to downstream whenever it is necessary we will
> change the historic snapshot?
>

Oh, do you mean maintain different historic snapshots and then switch
based on the change we are processing? I guess the other thing we need
to consider is the order of processing the changes if we maintain
separate queues that need to be processed.

-- 
With Regards,
Amit Kapila.

Re: logical decoding and replication of sequences, take 2

2023-12-06 Thread Amit Kapila

On Sun, Dec 3, 2023 at 11:56 PM Tomas Vondra
 wrote:
>
> On 12/3/23 18:52, Tomas Vondra wrote:
> > ...
> >
>
> Another idea is that maybe we could somehow inform ReorderBuffer whether
> the output plugin even is interested in sequences. That'd help with
> cases where we don't even want/need to replicate sequences, e.g. because
> the publication does not specify (publish=sequence).
>
> What happens now in that case is we call ReorderBufferQueueSequence(),
> it does the whole dance with starting/aborting the transaction, calls
> rb->sequence() which just does "meh" and doesn't do anything. Maybe we
> could just short-circuit this by asking the output plugin somehow.
>
> In an extreme case the plugin may not even specify the sequence
> callbacks, and we're still doing all of this.
>

We could explore this but I guess it won't solve the problem we are
facing in cases where all sequences are published and plugin has
specified the sequence callbacks. I think it would add some overhead
of this check in positive cases where we decide to anyway do send the
changes.

-- 
With Regards,
Amit Kapila.

Re: remaining sql/json patches

2023-12-06 Thread Alvaro Herrera

On 2023-Dec-05, Amit Langote wrote:

> I've attempted to trim down the JSON_TABLE grammar (0004), but this is
> all I've managed so far.  Among other things, I couldn't refactor the
> grammar to do away with the following:
> 
> +%nonassoc  NESTED
> +%left  PATH

To recap, the reason we're arguing about this is that this creates two
new precedence classes, which are higher than everything else.  Judging
by the discussios in thread [1], this is not acceptable.  Without either
those new classes or the two hacks I describe below, the grammar has the
following shift/reduce conflict:

State 6220

  2331 json_table_column_definition: NESTED . path_opt Sconst COLUMNS '(' 
json_table_column_definition_list ')'
  2332 | NESTED . path_opt Sconst AS name COLUMNS 
'(' json_table_column_definition_list ')'
  2636 unreserved_keyword: NESTED .

PATH  shift, and go to state 6286

SCONSTreduce using rule 2336 (path_opt)
PATH  [reduce using rule 2636 (unreserved_keyword)]
$default  reduce using rule 2636 (unreserved_keyword)

path_opt  go to state 6287

First, while the grammar uses "NESTED path_opt" in the relevant productions, I
noticed that there's no test that uses NESTED without PATH, so if we break that
case, we won't notice.  I propose we remove the PATH keyword from one of
the tests in jsonb_sqljson.sql in order to make sure the grammar
continues to work after whatever hacking we do:

diff --git a/src/test/regress/expected/jsonb_sqljson.out 
b/src/test/regress/expected/jsonb_sqljson.out
index 7e8ae6a696..8fd2385cdc 100644
--- a/src/test/regress/expected/jsonb_sqljson.out
+++ b/src/test/regress/expected/jsonb_sqljson.out
@@ -1548,7 +1548,7 @@ HINT:  JSON_TABLE column names must be distinct from one 
another.
 SELECT * FROM JSON_TABLE(
jsonb 'null', '$[*]' AS p0
COLUMNS (
-   NESTED PATH '$' AS p1 COLUMNS (
+   NESTED '$' AS p1 COLUMNS (
NESTED PATH '$' AS p11 COLUMNS ( foo int ),
NESTED PATH '$' AS p12 COLUMNS ( bar int )
),
diff --git a/src/test/regress/sql/jsonb_sqljson.sql 
b/src/test/regress/sql/jsonb_sqljson.sql
index ea5db88b40..ea9b4ff8b6 100644
--- a/src/test/regress/sql/jsonb_sqljson.sql
+++ b/src/test/regress/sql/jsonb_sqljson.sql
@@ -617,7 +617,7 @@ SELECT * FROM JSON_TABLE(
 SELECT * FROM JSON_TABLE(
jsonb 'null', '$[*]' AS p0
COLUMNS (
-   NESTED PATH '$' AS p1 COLUMNS (
+   NESTED '$' AS p1 COLUMNS (
NESTED PATH '$' AS p11 COLUMNS ( foo int ),
NESTED PATH '$' AS p12 COLUMNS ( bar int )
),

Having done that, AFAICS there are two possible fixes for the grammar.
One is to keep the idea of assigning precedence explicitly to these
keywords, but do something less hackish -- we can put NESTED together
with UNBOUNDED, and classify PATH in the IDENT group.  This requires no
further changes.  This would make NESTED PATH follow the same rationale
as UNBOUNDED FOLLOWING / UNBOUNDED PRECEDING.  Here's is a preliminary
patch for that (the large comment above needs to be updated.)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c15fcf2eb2..1493ac7580 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -887,9 +887,9 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
  * json_predicate_type_constraint and json_key_uniqueness_constraint_opt
  * productions (see comments there).
  */
-%nonassoc  UNBOUNDED   /* ideally would have same precedence 
as IDENT */
+%nonassoc  UNBOUNDED NESTED/* ideally would have same 
precedence as IDENT */
 %nonassoc  IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE 
ROLLUP
-   SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT
+   SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT PATH
 %left  Op OPERATOR /* multi-character ops and user-defined 
operators */
 %left  '+' '-'
 %left  '*' '/' '%'
@@ -911,8 +911,6 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
  */
 %left  JOIN CROSS LEFT FULL RIGHT INNER_P NATURAL

-%nonassoc  NESTED
-%left  PATH
 %%

 /*

The other thing we can do is use the two-token lookahead trick, by
declaring
%token NESTED_LA
and using the parser.c code to replace NESTED with NESTED_LA when it is
followed by PATH.  This doesn't require assigning precedence to
anything.  We do need to expand the two rules that have "NESTED
opt_path Sconst" to each be two rules, one for "NESTED_LA PATH Sconst"
and another for "NESTED Sconst".  So the opt_path production goes away.
This preliminary patch does that. (I did not touch the ecpg grammar, but
it needs an update too.)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index c15fcf2eb2..8e4c1d4ebe

Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-12-06 Thread jian he

On Tue, Dec 5, 2023 at 6:07 PM Alena Rybakina  wrote:
>
> Hi!
>
> Thank you for your contribution to this thread.
>
>
> I reviewed it and have a few questions.
>
> 1. I have seen that you delete a table before creating it, to which you want 
> to add errors due to a failed "copy from" operation. I think this is wrong 
> because this table can save useful data for the user.
> At a minimum, we should warn the user about this, but I think we can just add 
> some number at the end of the name, such as name_table1, name_table_2.

Sorry. I don't understand this part.
Currently, if the error table name already exists, then the copy will
fail, an error will be reported.
I try to first create a table, if no error then the error table will be dropped.
Can you demo the expected behavior?

> 2. I noticed that you are forming a table name using the type of errors that 
> prevent rows from being added during 'copy from' operation.
> I think it would be better to use the name of the source file that was used 
> while 'copy from' was running.
> In addition, there may be several such files, it is also worth considering.
>

Another column added.
now it looks like:

SELECT * FROM save_error_csv_error;
 filename | lineno |line
 | field | source | err_message |
err_detail | errorcode
--+++---++-++---
 STDIN|  1 | 2002232 40  50  60  70
80 | NULL  | NULL   | extra data after last expected column   |
NULL   | 22P04
 STDIN|  1 | 2000230 23
 | d | NULL   | missing data for column "d" | NULL
  | 22P04
 STDIN|  1 | z,,""
 | a | z  | invalid input syntax for type integer: "z"  | NULL
  | 22P02
 STDIN|  2 | \0,,
 | a | \0 | invalid input syntax for type integer: "\0" | NULL
  | 22P02


> 3. I found spelling:
>
> /* no err_nsp.error_rel table then crete one. for holding error. */
>

fixed.

> 4. Maybe rewrite this comment
>
> these info need, no error will drop err_nsp.error_rel table
> to:
> this information is necessary, no error will lead to the deletion of the 
> err_sp.error_rel table.
>

fixed.

> 5. Is this part of the comment needed? I think it duplicates the information 
> below when we form the query.
>
>  * . column list(order by attnum, begin from ctid) =
>  *{ctid, lineno,line,field,source,err_message,err_detail,errorcode}
>  * . data types (from attnum = -1) ={tid, int8,text,text,text,text,text,text}
>
> I'm not sure if we need to order the rows by number. It might be easier to 
> work with these lines in the order they appear.
>
Simplified the comment. "order by attnum" is to make sure that if
there is a table already existing, and the column name is like X and
the data type like Y, then we consider this table is good for holding
potential error info.

COPY FROM, main entry point is NextCopyFrom.
Now for non-binary mode, if you specified save_error then it will not
fail at NextCopyFrom.
all these three errors will be tolerated: extra data after last
expected column, missing data for column, data type conversion.
From 990e1e0f5130431cf32069963bb980bb0692ce0b Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Wed, 6 Dec 2023 18:26:32 +0800
Subject: [PATCH v9 1/1] Make COPY FROM more error tolerant

Currently COPY FROM has 3 types of error while processing the source file.
* extra data after last expected column
* missing data for column \"%s\"
* data type conversion error.

Instead of throwing errors while copying, save_error will save errors to a table automatically.
We check the table definition via column name and column data type.
if table already exists and meets the criteria then errors will save to that table.
if the table does not exist, then create one.

Only works for COPY FROM, non-BINARY mode.

While copying, if error never happened, error save table will be dropped at the ending of COPY FROM.
If the error saving table already exists, meaning at least once COPY FROM errors has happened,
then all the future errors will be saved to that table.
we save the error to error saving table using SPI, construct a query, then execute the query.
---
 contrib/file_fdw/file_fdw.c  |   4 +-
 doc/src/sgml/ref/copy.sgml   |  93 
 src/backend/commands/copy.c  |  12 ++
 src/backend/commands/copyfrom.c  | 147 ++-
 src/backend/commands/copyfromparse.c | 171 +--
 src/backend/parser/gram.y|   8 +-
 src/bin/psql/tab-complete.c  |   3 +-
 src/include/commands/copy.h  |   3 +-
 src/include/commands/copyfrom_internal.h |   7 +
 src/include/parser/kwlist.h  |   1 +
 src/test/regress/expected/copy2.out  | 135 ++
 src/test/regress/sql/copy2.sql   | 108 +

Re: Synchronizing slots from primary to standby

2023-12-06 Thread shveta malik

On Wed, Dec 6, 2023 at 3:00 PM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 12/6/23 7:18 AM, shveta malik wrote:
> > On Wed, Dec 6, 2023 at 10:56 AM Amit Kapila  wrote:
> >>
> >> I feel that is indirectly relying on the fact that the primary won't
> >> advance logical slots unless physical standby has consumed data.
> >
> > Yes, that is the basis of this discussion.
>
> Yes.
>
> > But now on rethinking, if
> > the user has not set 'standby_slot_names' on primary at first pace,
> > then even if walreceiver on standby is down, slots on primary will
> > keep on advancing
>
> Oh right, good point.
>
> > and thus we need to sync.
>
> Yes and I think our current check "XLogRecPtrIsInvalid(WalRcv->latestWalEnd)"
> in synchronize_slots() prevents us to do so (as I think WalRcv->latestWalEnd
> would be invalid for a non started walreceiver).
>

But I think we do not need to deal with the case that walreceiver is
not started at all on standby. It is always started. Walreceiver not
getting started or down for long is a rare scenario. We have other
checks too for 'latestWalEnd' in slotsync worker and I think we should
retain those as is.

> > We have no check currently
> > that mandates users to set standby_slot_names.
> >
>
> Yeah and OTOH unset standby_slot_names is currently the only
> way for users to "force" advance failover slots if they want to (in case
> say the standby is down for a long time and they don't want to block logical 
> decoding
> on the primary) as we don't provide a way to alter the failover property
> (unless connecting with replication which sounds more like a hack).
>

yes, right.

> >> Now,
> >> it is possible that slot-sync worker lags behind and still needs to
> >> sync more data for slots in which it makes sense for slot-sync worker
> >> to be alive.
>
> Right.
>
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com

Re: Row pattern recognition


On 04.12.23 12:40, Tatsuo Ishii wrote:

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d631ac89a9..5a77fca17f 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
DefElem*defelt;
SortBy *sortby;
WindowDef  *windef;
+   RPCommonSyntax  *rpcom;
+   RPSubsetItem*rpsubset;
JoinExpr   *jexpr;
IndexElem  *ielem;
StatsElem  *selem;
@@ -278,6 +280,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
MergeWhenClause *mergewhen;
struct KeyActions *keyactions;
struct KeyAction *keyaction;
+   RPSkipToskipto;
  }
  
  %type 	stmt toplevel_stmt schema_stmt routine_body_stmt


It is usually not the style to add an entry for every node type to the 
%union.  Otherwise, we'd have hundreds of entries in there.



@@ -866,6 +878,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
  %nonassoc UNBOUNDED   /* ideally would have same precedence 
as IDENT */
  %nonassoc IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE 
ROLLUP
SET KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT
+%nonassoc  MEASURES AFTER INITIAL SEEK PATTERN_P
  %left Op OPERATOR /* multi-character ops and user-defined 
operators */
  %left '+' '-'
  %left '*' '/' '%'


It was recently discussed that these %nonassoc should ideally all have 
the same precedence.  Did you consider that here?

Re: logical decoding and replication of sequences, take 2

2023-12-06 Thread Dilip Kumar

On Wed, Dec 6, 2023 at 3:36 PM Amit Kapila  wrote:
>
> > Why can't we use the same concept of
> > SnapBuildDistributeNewCatalogSnapshot(), I mean we keep queuing the
> > non-transactional changes (have some base snapshot before the first
> > change), and whenever there is any catalog change, queue new snapshot
> > change also in the queue of the non-transactional sequence change so
> > that while sending it to downstream whenever it is necessary we will
> > change the historic snapshot?
> >
>
> Oh, do you mean maintain different historic snapshots and then switch
> based on the change we are processing? I guess the other thing we need
> to consider is the order of processing the changes if we maintain
> separate queues that need to be processed.

I mean we will not specifically maintain the historic changes, but if
there is any catalog change where we are pushing the snapshot to all
the transaction's change queue, at the same time we will push this
snapshot in the non-transactional sequence queue as well.  I am not
sure what is the problem with the ordering? because we will be
queueing all non-transactional sequence changes in a separate queue in
the order they arrive and as soon as we process the next commit we
will process all the non-transactional changes at that time.  Do you
see issue with that?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Is WAL_DEBUG related code still relevant today?


On 02.12.23 15:06, Bharath Rupireddy wrote:

I enabled this code by compiling with the WAL_DEBUG macro and setting
wal_debug GUC to on. Firstly, the compilation on Windows failed
because XL_ROUTINE was passed inappropriately for XLogReaderAllocate()
used.


This kind of thing could be mostly avoided if we didn't hide all the 
WAL_DEBUG behind #ifdefs.  For example, in the attached patch, I instead 
changed it so that


if (XLOG_DEBUG)

resolves to

if (false)

in the normal case.  That way, we don't need to wrap that in #ifdef 
WAL_DEBUG, and the compiler can see the disabled code and make sure it 
continues to build.
From 93c74462c3e253f3ce9fd1beba1f09290988de8c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 6 Dec 2023 12:22:38 +0100
Subject: [PATCH] Make WAL_DEBUG code harder to break accidentally

---
 src/backend/access/transam/generic_xlog.c |  2 --
 src/backend/access/transam/xlog.c | 10 --
 src/backend/access/transam/xlogrecovery.c |  7 ---
 src/include/access/xlog.h |  2 ++
 4 files changed, 2 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/transam/generic_xlog.c 
b/src/backend/access/transam/generic_xlog.c
index abd9e1c749..5b910f1024 100644
--- a/src/backend/access/transam/generic_xlog.c
+++ b/src/backend/access/transam/generic_xlog.c
@@ -248,7 +248,6 @@ computeDelta(PageData *pageData, Page curpage, Page 
targetpage)
 * If xlog debug is enabled, then check produced delta.  Result of delta
 * application to curpage should be equivalent to targetpage.
 */
-#ifdef WAL_DEBUG
if (XLOG_DEBUG)
{
PGAlignedBlock tmp;
@@ -260,7 +259,6 @@ computeDelta(PageData *pageData, Page curpage, Page 
targetpage)
   BLCKSZ - targetUpper) != 0)
elog(ERROR, "result of generic xlog apply does not 
match");
}
-#endif
 }
 
 /*
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 2d603d8dee..3a7a38a100 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -644,9 +644,7 @@ static bool updateMinRecoveryPoint = true;
 static int MyLockNo = 0;
 static bool holdingAllLocks = false;
 
-#ifdef WAL_DEBUG
 static MemoryContext walDebugCxt = NULL;
-#endif
 
 static void CleanupAfterArchiveRecovery(TimeLineID EndOfLogTLI,

XLogRecPtr EndOfLog,
@@ -997,7 +995,6 @@ XLogInsertRecord(XLogRecData *rdata,
}
}
 
-#ifdef WAL_DEBUG
if (XLOG_DEBUG)
{
static XLogReaderState *debug_reader = NULL;
@@ -1061,7 +1058,6 @@ XLogInsertRecord(XLogRecData *rdata,
pfree(recordBuf.data);
MemoryContextSwitchTo(oldCxt);
}
-#endif
 
/*
 * Update our global variables
@@ -1997,13 +1993,11 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, 
bool opportunistic)
}
LWLockRelease(WALBufMappingLock);
 
-#ifdef WAL_DEBUG
if (XLOG_DEBUG && npages > 0)
{
elog(DEBUG1, "initialized %d pages, up to %X/%X",
 npages, LSN_FORMAT_ARGS(NewPageEndPtr));
}
-#endif
 }
 
 /*
@@ -2641,13 +2635,11 @@ XLogFlush(XLogRecPtr record)
if (record <= LogwrtResult.Flush)
return;
 
-#ifdef WAL_DEBUG
if (XLOG_DEBUG)
elog(LOG, "xlog flush request %X/%X; write %X/%X; flush %X/%X",
 LSN_FORMAT_ARGS(record),
 LSN_FORMAT_ARGS(LogwrtResult.Write),
 LSN_FORMAT_ARGS(LogwrtResult.Flush));
-#endif
 
START_CRIT_SECTION();
 
@@ -2903,14 +2895,12 @@ XLogBackgroundFlush(void)
WriteRqst.Flush = 0;
}
 
-#ifdef WAL_DEBUG
if (XLOG_DEBUG)
elog(LOG, "xlog bg flush request write %X/%X; flush: %X/%X, 
current is write %X/%X; flush %X/%X",
 LSN_FORMAT_ARGS(WriteRqst.Write),
 LSN_FORMAT_ARGS(WriteRqst.Flush),
 LSN_FORMAT_ARGS(LogwrtResult.Write),
 LSN_FORMAT_ARGS(LogwrtResult.Flush));
-#endif
 
START_CRIT_SECTION();
 
diff --git a/src/backend/access/transam/xlogrecovery.c 
b/src/backend/access/transam/xlogrecovery.c
index c6156a..26ba990b71 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -396,9 +396,7 @@ static bool read_tablespace_map(List **tablespaces);
 static void xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI);
 static void CheckRecoveryConsistency(void);
 static void rm_redo_error_callback(void *arg);
-#ifdef WAL_DEBUG
 static void xlog_outrec(StringInfo buf, XLogReaderState *record);
-#endif
 static void xlog_block_info(StringInfo buf, XLogReaderState *record);
 static void checkTimeLineSwitch(XLogRecPtr lsn, TimeLineI

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-06 Thread Daniel Verite

Sutou Kouhei wrote:

> * 2022-04: Apache Arrow [2]
> * 2018-02: Apache Avro, Apache Parquet and Apache ORC [3]
> 
> (FYI: I want to add support for Apache Arrow.)
> 
> There were discussions how to add support for more formats. [3][4]
> In these discussions, we got a consensus about making COPY
> format extendable.


These formats seem all column-oriented whereas COPY is row-oriented
at the protocol level [1].
With regard to the procotol, how would it work to support these formats?


[1] https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Emitting JSON to file using COPY TO




On 2023-12-05 Tu 16:46, Joe Conway wrote:

On 12/5/23 16:20, Andrew Dunstan wrote:

On 2023-12-05 Tu 16:09, Joe Conway wrote:

On 12/5/23 16:02, Joe Conway wrote:

On 12/5/23 15:55, Andrew Dunstan wrote:

and in any other case (e.g. LINES) I can't see why you
would have them.


Oh I didn't address this -- I saw examples in the interwebs of MSSQL 
server I think [1] which had the non-array with commas import and 
export style. It was not that tough to support and the code as 
written already does it, so why not?


That seems quite absurd, TBH. I know we've catered for some absurdity in
the CSV code (much of it down to me), so maybe we need to be liberal in
what we accept here too. IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).



So your preference would be to not allow the non-array-with-commas 
case but if/when we implement COPY FROM we would accept that format? 
As in Postel'a law ("be conservative in what you do, be liberal in 
what you accept from others")?



Yes, I think so.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Is WAL_DEBUG related code still relevant today?

2023-12-06 Thread Euler Taveira

On Wed, Dec 6, 2023, at 8:27 AM, Peter Eisentraut wrote:
> On 02.12.23 15:06, Bharath Rupireddy wrote:
> > I enabled this code by compiling with the WAL_DEBUG macro and setting
> > wal_debug GUC to on. Firstly, the compilation on Windows failed
> > because XL_ROUTINE was passed inappropriately for XLogReaderAllocate()
> > used.
> 
> This kind of thing could be mostly avoided if we didn't hide all the 
> WAL_DEBUG behind #ifdefs.

AFAICS LOCK_DEBUG also hides its GUCs behind #ifdefs. The fact that XLOG_DEBUG
is a variable but seems like a constant surprises me. I would rename it to
XLogDebug or xlog_debug.

> in the normal case.  That way, we don't need to wrap that in #ifdef 
> WAL_DEBUG, and the compiler can see the disabled code and make sure it 
> continues to build.

I didn't check the LOCK_DEBUG code path to make sure it fits in the same
category as WAL_DEBUG. If it does, maybe it is worth to apply the same logic
there.

--
Euler Taveira
EDB   https://www.enterprisedb.com/

Forbid the use of invalidated physical slots in streaming replication.

2023-12-06 Thread Zhijie Hou (Fujitsu)

Hi,

When testing streaming replication with a physical slot. I found an unexpected
behavior that the walsender could use an invalidated physical slot for
streaming.

This occurs when the primary slot is invalidated due to reaching the
max_slot_wal_keep_size before initializing the streaming replication
(e.g. before creating the standby). Attach a draft 
script(test_invalidated_slot.sh)
which can reproduce this.

Once the slot is invalidated, it can no longer protect the WALs and
Rows, as these invalidated slots are not considered in functions like
ReplicationSlotsComputeRequiredXmin().

Besides, the walsender could advance the restart_lsn of an invalidated slot,
then user won't be able to know that if the slot is actually validated or not,
because the 'conflicting' of view pg_replication_slot could be set back to
null.

So, I think it's a bug and one idea to fix is to check the validity of the 
physical slot in
StartReplication() after acquiring the slot like what the attachment does,
what do you think ?

Best Regards,
Hou Zhijie



0001-Forbid-the-use-of-invalidated-slot-in-streaming-repl.patch
Description:  0001-Forbid-the-use-of-invalidated-slot-in-streaming-repl.patch


test_invalidated_slot.sh
Description: test_invalidated_slot.sh

Add missing error codes to PANIC/FATAL error reports in xlog.c and relcache.c

2023-12-06 Thread Nazir Bilal Yavuz

Hi,

There is an ongoing thread [1] for adding missing SQL error codes to
PANIC and FATAL error reports in xlogrecovery.c file. I did the same
but for xlog.c and relcache.c files.

I couldn't find a suitable error code for the "cache lookup failed for
relation" error in relcache.c and this error comes up in many places.
Would it be reasonable to create a new error code specifically for
this?

Any kind of feedback would be appreciated.

[1] 
https://www.postgresql.org/message-id/CAPMWgZ8g17Myb5ZRE5aTNowUohafk4j48mZ_5_Zn9JnR5p2u0w%40mail.gmail.com

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


v1-0001-xlog.c-Add-missing-error-codes-to-PANIC-FATAL-err.patch
Description: Binary data


v1-0002-relcache.c-Add-missing-error-codes-to-PANIC-FATAL.patch
Description: Binary data

Re: Transaction timeout

2023-12-06 Thread Andrey M. Borodin

> On 30 Nov 2023, at 20:06, Andrey M. Borodin  wrote:
> 
> 
> Tomorrow I plan to fix raising of the timeout when the transaction is idle.
> Renaming transaction_timeout to something else (to avoid confusion with 
> prepared xacts) also seems correct to me.

Here's a v6 version of the feature. Changes:
1. Now transaction_timeout will break connection with FATAL instead of hanging 
in "idle in transaction (aborted)"
2. It will kill equally idle and active transactions
3. New isolation tests are slightly more complex: isolation tester does not 
like when the connection is forcibly killed, thus there must be only 1 
permutation with killed connection.

TODO: as Yuhang pointed out prepared transactions must not be killed, thus name 
"transaction_timeout" is not correct. I think the name must be like 
"session_transaction_timeout", but I'd like to have an opinion of someone more 
experienced in giving names to GUCs than me. Or, perhaps, a native speaker?

Best regards, Andrey Borodin.

v6-0001-Introduce-transaction_timeout.patch
Description: Binary data

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Tue, Dec 5, 2023 at 8:15 PM Peter Geoghegan  wrote:
> Just to be clear, you're raising a concern that seems to me to apply
> to "the other optimization" from the same commit, specifically -- the
> precheck optimization. Not the one I found a problem in. (They're
> closely related but distinct optimizations.)

It isn't very clear from the commit message that this commit is doing
two different things, and in fact I'm still unclear on what exactly
the other optimization is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Possible segfault when sending notification within a ProcessUtility hook

2023-12-06 Thread Anthonin Bonnefoy

> On Tue, Dec 5, 2023 at 9:03 PM Tom Lane  wrote:
> Why should we regard that as anything other than a bug in the
> ProcessUtility hook?  A failed transaction should not send any
> notifies.

Fair point. That was also my initial assumption but I thought that the
transaction
state was not available from a hook as I've missed
IsAbortedTransactionBlockState.

I will rely on IsAbortedTransactionBlockState to avoid this case,
thanks for the input.

Regards,
Anthonin.

Re: SQL:2011 application time

2023-12-06 Thread jian he

On Sun, Dec 3, 2023 at 2:11 AM Paul Jungwirth
 wrote:
>
> v19 patch series attached, rebased to a11c9c42ea.
>

this TODO:
 * TODO: It sounds like FOR PORTION OF might need to do something here too?
based on comments on ExprContext. I refactor a bit, and solved this TODO.

tring to the following TODO:
// TODO: Need to save context->mtstate->mt_transition_capture? (See
comment on ExecInsert)

but failed.
I also attached the trial, and also added the related test.

You can also use the test to check portion update with insert trigger
with "referencing old table as old_table new table as new_table"
situation.
From 8c97db9391900d766db99834c23c5e4b60cadf01 Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Tue, 5 Dec 2023 17:59:43 +0800
Subject: [PATCH v1 1/1] set eval targetrange in per-output-tuple exprcontext

---
 src/backend/executor/nodeModifyTable.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 88eda012..fe57e592 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3780,7 +3780,6 @@ ExecModifyTable(PlanState *pstate)
 		 * Reset per-tuple memory context used for processing on conflict and
 		 * returning clauses, to free any expression evaluation storage
 		 * allocated in the previous cycle.
-		 * TODO: It sounds like FOR PORTION OF might need to do something here too?
 		 */
 		if (pstate->ps_ExprContext)
 			ResetExprContext(pstate->ps_ExprContext);
@@ -4472,9 +4471,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		forPortionOf = (ForPortionOfExpr *) node->forPortionOf;
 
 		/* Eval the FOR PORTION OF target */
-		if (mtstate->ps.ps_ExprContext == NULL)
-			ExecAssignExprContext(estate, &mtstate->ps);
-		econtext = mtstate->ps.ps_ExprContext;
+		econtext = GetPerTupleExprContext(estate);
 
 		exprState = ExecPrepareExpr((Expr *) forPortionOf->targetRange, estate);
 		targetRange = ExecEvalExpr(exprState, econtext, &isNull);
-- 
2.34.1

From cda2f8324ef920c807008e2763f97c6503c17a94 Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Wed, 6 Dec 2023 20:58:08 +0800
Subject: [PATCH v1 1/1] trying to save mt_transition_capture while ExecInsert

---
 src/backend/executor/nodeModifyTable.c   | 21 -
 src/test/regress/expected/for_portion_of.out | 80 
 src/test/regress/sql/for_portion_of.sql  | 72 ++
 3 files changed, 171 insertions(+), 2 deletions(-)

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index fe57e592..c9d14dab 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -167,7 +167,8 @@ static bool ExecMergeMatched(ModifyTableContext *context,
 static void ExecMergeNotMatched(ModifyTableContext *context,
 ResultRelInfo *resultRelInfo,
 bool canSetTag);
-
+static void
+ExecSetupTransitionCaptureState(ModifyTableState *mtstate, EState *estate);

 /*
  * Verify that the tuples to be produced by INSERT match the
@@ -1259,7 +1260,7 @@ ExecForPortionOfLeftovers(ModifyTableContext *context,
 	TupleTableSlot *leftoverTuple2 = fpoState->fp_Leftover2;
 	HeapTuple oldtuple = NULL;
 	bool shouldFree = false;
-
+	int		i;	/* original cmd type */
 	/*
 	 * Get the range of the old pre-UPDATE/DELETE tuple,
 	 * so we can intersect it with the FOR PORTION OF target
@@ -1312,6 +1313,7 @@ ExecForPortionOfLeftovers(ModifyTableContext *context,
 	 * TODO: Really? What if you update the partition key?
 	 */

+	i	= context->mtstate->operation;
 	if (!RangeIsEmpty(leftoverRangeType1))
 	{
 		oldtuple = ExecFetchSlotHeapTuple(oldtupleSlot, false, &shouldFree);
@@ -1320,6 +1322,10 @@ ExecForPortionOfLeftovers(ModifyTableContext *context,
 		set_leftover_tuple_bounds(leftoverTuple1, forPortionOf, typcache, leftoverRangeType1);
 		ExecMaterializeSlot(leftoverTuple1);

+		context->mtstate->operation = CMD_INSERT;
+		context->mtstate->mt_transition_capture = NULL;
+		ExecSetupTransitionCaptureState(context->mtstate, estate);
+
 		// TODO: Need to save context->mtstate->mt_transition_capture? (See comment on ExecInsert)
 		ExecInsert(context, resultRelInfo, leftoverTuple1, node->canSetTag, NULL, NULL);
 	}
@@ -1340,10 +1346,21 @@ ExecForPortionOfLeftovers(ModifyTableContext *context,
 		set_leftover_tuple_bounds(leftoverTuple2, forPortionOf, typcache, leftoverRangeType2);
 		ExecMaterializeSlot(leftoverTuple2);

+		context->mtstate->operation = CMD_INSERT;
+		context->mtstate->mt_transition_capture = NULL;
+		ExecSetupTransitionCaptureState(context->mtstate, estate);
+
 		// TODO: Need to save context->mtstate->mt_transition_capture? (See comment on ExecInsert)
 		ExecInsert(context, resultRelInfo, leftoverTuple2, node->canSetTag, NULL, NULL);
 	}

+	if (!RangeIsEmpty(leftoverRangeType2) || !RangeIsEmpty(leftoverRangeType1))
+	{
+		/* if any of the above branch executed then */
+		context->mtstate->operation =

Re: Partial aggregates pushdown

On Wed, Dec 6, 2023 at 3:41 AM fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> Are you concerned about the hassle and potential human errors of manually 
> adding new partial
> aggregation functions, rather than the catalog becoming bloated?

I'm concerned about both.

> The process of creating partial aggregation functions from aggregation 
> functions can be automated,
> so I believe this issue can be resolved. However, automating it may increase 
> the size of the patch
> even more, so overall, approach#2 might be better.
> To implement approach #2, it would be necessary to investigate how much 
> additional code is required.

Yes. Unfortunately I fear that there is quite a lot of work left to do
here in order to produce a committable feature. To me it seems
necessary to conduct an investigation of approach #2. If the result of
that investigation is that nothing major stands in the way of approach
#2, then I think we should adopt it, which is more work. In addition,
the problems around transmitting serialized bytea blobs between
machines that can't be assumed to fully trust each other will need to
be addressed in some way, which seems like it will require a good deal
of design work, forming some kind of consensus, and then
implementation work to follow. In addition to that there may be some
small problems that need to be solved at a detail level, such as the
HAVING issue. I think the last category won't be too hard to sort out,
but that still leaves two really major areas to address.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Wed, 6 Dec 2023 at 14:11, Robert Haas  wrote:
>
> On Tue, Dec 5, 2023 at 8:15 PM Peter Geoghegan  wrote:
> > Just to be clear, you're raising a concern that seems to me to apply
> > to "the other optimization" from the same commit, specifically -- the
> > precheck optimization. Not the one I found a problem in. (They're
> > closely related but distinct optimizations.)
>
> It isn't very clear from the commit message that this commit is doing
> two different things, and in fact I'm still unclear on what exactly
> the other optimization is.

I feel that Peter refered to these two distinct optimizations:

1. When scanning an index in ascending order using scankey a > 1 (so,
one that defines a start point of the scan), we don't need to check
items for consistency with that scankey once we've found the first
value that is consistent with the scankey, as all future values will
also be consistent with the scankey (if we assume no concurrent page
deletions).

2. When scanning an index in ascending order using scankey a < 10 (one
that defines an endpoint of the scan), we can look ahead and check if
the last item on the page is consistent. If so, then all other items
on the page will also be consistent with that scankey.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Wed, Dec 6, 2023 at 8:27 AM Matthias van de Meent
 wrote:
> I feel that Peter refered to these two distinct optimizations:
>
> 1. When scanning an index in ascending order using scankey a > 1 (so,
> one that defines a start point of the scan), we don't need to check
> items for consistency with that scankey once we've found the first
> value that is consistent with the scankey, as all future values will
> also be consistent with the scankey (if we assume no concurrent page
> deletions).
>
> 2. When scanning an index in ascending order using scankey a < 10 (one
> that defines an endpoint of the scan), we can look ahead and check if
> the last item on the page is consistent. If so, then all other items
> on the page will also be consistent with that scankey.

Oh, interesting. Thanks.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: logical decoding and replication of sequences, take 2

On 12/6/23 10:05, Dilip Kumar wrote:
> On Wed, Dec 6, 2023 at 11:12 AM Dilip Kumar  wrote:
>>
>> On Sun, Dec 3, 2023 at 11:22 PM Tomas Vondra
>>  wrote:
>>>
> 
> I was also wondering what happens if the sequence changes are
> transactional but somehow the snap builder state changes to
> SNAPBUILD_FULL_SNAPSHOT in between processing of the smgr_decode() and
> the seq_decode() which means RelFileLocator will not be added to the
> hash table and during the seq_decode() we will consider the change as
> non-transactional.  I haven't fully analyzed that what is the real
> problem in this case but have we considered this case? what happens if
> the transaction having both ALTER SEQUENCE and nextval() gets aborted
> but the nextva() has been considered as non-transactional because
> smgr_decode() changes were not processed because snap builder state
> was not yet SNAPBUILD_FULL_SNAPSHOT.
> 

Yes, if something like this happens, that'd be a problem:

1) decoding starts, with

   SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT

2) transaction that creates a new refilenode gets decoded, but we skip
   it because we don't have the correct snapshot

3) snapshot changes to SNAPBUILD_FULL_SNAPSHOT

4) we decode sequence change from nextval() for the sequence

This would lead to us attempting to apply sequence change for a
relfilenode that's not visible yet (and may even get aborted).

But can this even happen? Can we start decoding in the middle of a
transaction? How come this wouldn't affect e.g. XLOG_HEAP2_NEW_CID,
which is also skipped until SNAPBUILD_FULL_SNAPSHOT. Or logical
messages, where we also call the output plugin in non-transactional cases.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: logical decoding and replication of sequences, take 2

On 12/6/23 12:05, Dilip Kumar wrote:
> On Wed, Dec 6, 2023 at 3:36 PM Amit Kapila  wrote:
>>
>>> Why can't we use the same concept of
>>> SnapBuildDistributeNewCatalogSnapshot(), I mean we keep queuing the
>>> non-transactional changes (have some base snapshot before the first
>>> change), and whenever there is any catalog change, queue new snapshot
>>> change also in the queue of the non-transactional sequence change so
>>> that while sending it to downstream whenever it is necessary we will
>>> change the historic snapshot?
>>>
>>
>> Oh, do you mean maintain different historic snapshots and then switch
>> based on the change we are processing? I guess the other thing we need
>> to consider is the order of processing the changes if we maintain
>> separate queues that need to be processed.
> 
> I mean we will not specifically maintain the historic changes, but if
> there is any catalog change where we are pushing the snapshot to all
> the transaction's change queue, at the same time we will push this
> snapshot in the non-transactional sequence queue as well.  I am not
> sure what is the problem with the ordering? because we will be
> queueing all non-transactional sequence changes in a separate queue in
> the order they arrive and as soon as we process the next commit we
> will process all the non-transactional changes at that time.  Do you
> see issue with that?
> 

Isn't this (in principle) the idea of queuing the non-transactional
changes and then applying them on the next commit? Yes, I didn't get
very far with that, but I got stuck exactly on tracking which snapshot
to use, so if there's a way to do that, that'd fix my issue.

Also, would this mean we don't need to track the relfilenodes, if we're
able to query the catalog? Would we be able to check if the relfilenode
was created by the current xact?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Emitting JSON to file using COPY TO


On 12/6/23 07:36, Andrew Dunstan wrote:


On 2023-12-05 Tu 16:46, Joe Conway wrote:

On 12/5/23 16:20, Andrew Dunstan wrote:

On 2023-12-05 Tu 16:09, Joe Conway wrote:

On 12/5/23 16:02, Joe Conway wrote:

On 12/5/23 15:55, Andrew Dunstan wrote:

and in any other case (e.g. LINES) I can't see why you
would have them.


Oh I didn't address this -- I saw examples in the interwebs of MSSQL 
server I think [1] which had the non-array with commas import and 
export style. It was not that tough to support and the code as 
written already does it, so why not?


That seems quite absurd, TBH. I know we've catered for some absurdity in
the CSV code (much of it down to me), so maybe we need to be liberal in
what we accept here too. IMNSHO, we should produce either a single JSON
document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).


So your preference would be to not allow the non-array-with-commas 
case but if/when we implement COPY FROM we would accept that format? 
As in Postel'a law ("be conservative in what you do, be liberal in 
what you accept from others")?



Yes, I think so.


Awesome. The attached does it that way. I also ran pgindent.

I believe this is ready to commit unless there are further comments or 
objections.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Add json format mode to COPY TO

Add json format mode support to COPY TO, which includes two output
variations: 1) "json lines" which is each row as a json object delimited
by newlines (the default); and 2) "json array" which is the same as #1,
but with the addition of a leading "[", trailing "]", and comma row
delimiters, to form a valid json array.

Early versions: helpful hints/reviews provided by Nathan Bossart,
Tom Lane, and Maciek Sakrejda. Final versions: reviewed by Andrew Dunstan
and Davin Shearer.

Requested-by: Davin Shearer
Author: Joe Conway
Reviewed-by: Andrew Dunstan, Davin Shearer 
Discussion: https://postgr.es/m/flat/24e3ee88-ec1e-421b-89ae-8a47ee0d2df1%40joeconway.com#a5e6b8829f9a74dfc835f6f29f2e44c5

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69..8915fb3 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*** COPY { ta
*** 43,48 
--- 43,49 
  FORCE_QUOTE { ( column_name [, ...] ) | * }
  FORCE_NOT_NULL { ( column_name [, ...] ) | * }
  FORCE_NULL { ( column_name [, ...] ) | * }
+ FORCE_ARRAY [ boolean ]
  ENCODING 'encoding_name'
  
   
*** COPY { ta
*** 206,214 
--- 207,220 
Selects the data format to be read or written:
text,
csv (Comma Separated Values),
+   json (JavaScript Object Notation),
or binary.
The default is text.
   
+  
+   The json option is allowed only in
+   COPY TO.
+  
  
 
  
*** COPY { ta
*** 372,377 
--- 378,396 
   
  
 
+ 
+
+ FORCE_ARRAY
+ 
+  
+   Force output of square brackets as array decorations at the beginning
+   and end of output, and commas between the rows. It is allowed only in
+   COPY TO, and only when using
+   JSON format. The default is
+   false.
+  
+ 
+
  
 
  ENCODING
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..23b570f 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*** ProcessCopyOptions(ParseState *pstate,
*** 419,424 
--- 419,425 
  	bool		format_specified = false;
  	bool		freeze_specified = false;
  	bool		header_specified = false;
+ 	bool		force_array_specified = false;
  	ListCell   *option;
  
  	/* Support external use for option sanity checking */
*** ProcessCopyOptions(ParseState *pstate,
*** 443,448 
--- 444,451 
   /* default format */ ;
  			else if (strcmp(fmt, "csv") == 0)
  opts_out->csv_mode = true;
+ 			else if (strcmp(fmt, "json") == 0)
+ opts_out->json_mode = true;
  			else if (strcmp(fmt, "binary") == 0)
  opts_out->binary = true;
  			else
*** ProcessCopyOptions(ParseState *pstate,
*** 540,545 
--- 543,555 
  defel->defname),
  		 parser_errposition(pstate, defel->location)));
  		}
+ 		else if (strcmp(defel->defname, "force_array") == 0)
+ 		{
+ 			if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ 			force_array_specified = true;
+ 			opts_out->force_array = defGetBoolean(defel);
+ 		}
  		else if (strcmp(defel->defname, "convert_selectively") == 0)
  		{
  			/*
*** ProcessCopyOptions(ParseState *pstate,
*** 598,603 
--- 608,625 
  (errcode(ERRCODE_SYNTAX_ERROR),
   errmsg("cannot specify DEFAULT in BINARY mode")));
  
+ 	if (opts_out->json_mode)
+ 	{
+ 		if (is_from)
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 	 errmsg("cannot use JSON mode in C

Re: logical decoding and replication of sequences, take 2

On 12/6/23 11:19, Amit Kapila wrote:
> On Sun, Dec 3, 2023 at 11:56 PM Tomas Vondra
>  wrote:
>>
>> On 12/3/23 18:52, Tomas Vondra wrote:
>>> ...
>>>
>>
>> Another idea is that maybe we could somehow inform ReorderBuffer whether
>> the output plugin even is interested in sequences. That'd help with
>> cases where we don't even want/need to replicate sequences, e.g. because
>> the publication does not specify (publish=sequence).
>>
>> What happens now in that case is we call ReorderBufferQueueSequence(),
>> it does the whole dance with starting/aborting the transaction, calls
>> rb->sequence() which just does "meh" and doesn't do anything. Maybe we
>> could just short-circuit this by asking the output plugin somehow.
>>
>> In an extreme case the plugin may not even specify the sequence
>> callbacks, and we're still doing all of this.
>>
> 
> We could explore this but I guess it won't solve the problem we are
> facing in cases where all sequences are published and plugin has
> specified the sequence callbacks. I think it would add some overhead
> of this check in positive cases where we decide to anyway do send the
> changes.

Well, the idea is the check would be very simple (essentially just a
boolean flag somewhere), so not really measurable.

And if the plugin requests decoding sequences, I guess it's natural it
may have a bit of overhead. It needs to do more things, after all. It
needs to be acceptable, ofc.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-06 Thread Junwang Zhao

On Wed, Dec 6, 2023 at 3:28 PM Sutou Kouhei  wrote:
>
> Hi,
>
> In 
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Wed, 6 Dec 2023 15:11:34 +0800,
>   Junwang Zhao  wrote:
>
> > For the extra curly braces, I mean the following code block in
> > CopyToFormatBinaryStart:
> >
> > + {<-- I thought this is useless?
> > + /* Generate header for a binary copy */
> > + int32 tmp;
> > +
> > + /* Signature */
> > + CopySendData(cstate, BinarySignature, 11);
> > + /* Flags field */
> > + tmp = 0;
> > + CopySendInt32(cstate, tmp);
> > + /* No header extension */
> > + tmp = 0;
> > + CopySendInt32(cstate, tmp);
> > + }
>
> Oh, I see. I've removed and attach the v3 patch. In general,
> I don't change variable name and so on in this patch. I just
> move codes in this patch. But I also removed the "tmp"
> variable for this case because I think that the name isn't
> suitable for larger scope. (I think that "tmp" is acceptable
> in a small scope like the above code.)
>
> New code:
>
> /* Generate header for a binary copy */
> /* Signature */
> CopySendData(cstate, BinarySignature, 11);
> /* Flags field */
> CopySendInt32(cstate, 0);
> /* No header extension */
> CopySendInt32(cstate, 0);
>
>
> Thanks,
> --
> kou

Hi Kou,

I read the thread[1] you posted and I think Andres's suggestion sounds great.

Should we extract both *copy to* and *copy from* for the first step, in that
case we can add the pg_copy_handler catalog smoothly later.

Attached V4 adds 'extract copy from' and it passed the cirrus ci,
please take a look.

I added a hook *copy_from_end* but this might be removed later if not used.

[1]: 
https://www.postgresql.org/message-id/20180211211235.5x3jywe5z3lkgcsr%40alap3.anarazel.de
-- 
Regards
Junwang Zhao


v4-0001-Extract-COPY-handlers.patch
Description: Binary data

Re: logical decoding and replication of sequences, take 2

On 12/6/23 09:56, Amit Kapila wrote:
> On Tue, Dec 5, 2023 at 10:23 PM Tomas Vondra
>  wrote:
>>
>> On 12/5/23 13:17, Amit Kapila wrote:
>>
>>> (b) for transactional
>>> cases, we see overhead due to traversing all the top-level txns and
>>> check the hash table for each one to find whether change is
>>> transactional.
>>>
>>
>> Not really, no. As I explained in my preceding e-mail, this check makes
>> almost no difference - I did expect it to matter, but it doesn't. And I
>> was a bit disappointed the global hash table didn't move the needle.
>>
>> Most of the time is spent in
>>
>> 78.81% 0.00%  postgres  postgres  [.] DecodeCommit (inlined)
>>   |
>>   ---DecodeCommit (inlined)
>>  |
>>  |--72.65%--SnapBuildCommitTxn
>>  | |
>>  |  --72.61%--SnapBuildBuildSnapshot
>>  ||
>>  | --72.09%--pg_qsort
>>  ||
>>  ||--66.24%--pg_qsort
>>  ||  |
>>
>> And there's almost no difference between master and build with sequence
>> decoding - see the attached diff-alter-sequence.perf, comparing the two
>> branches (perf diff -c delta-abs).
>>
> 
> I think in this the commit time predominates which hides the overhead.
> We didn't investigate in detail if that can be improved but if we see
> a similar case of abort [1], it shows the overhead of
> ReorderBufferSequenceIsTransactional(). I understand that aborts won't
> be frequent and it is sort of unrealistic test but still helps to show
> that there is overhead in ReorderBufferSequenceIsTransactional(). Now,
> I am not sure if we can ignore that case because theoretically, the
> overhead can increase based on the number of top-level transactions.
> 
> [1]: 
> https://www.postgresql.org/message-id/TY3PR01MB9889D457278B254CA87D1325F581A%40TY3PR01MB9889.jpnprd01.prod.outlook.com
> 

But those profiles were with the "old" patch, with one hash table per
top-level transaction. I see nothing like that with the patch [1] that
replaces that with a single global hash table. With that patch, the
ReorderBufferSequenceIsTransactional() took ~0.5% in any tests I did.

What did have bigger impact is this:

46.12%   1.47%  postgres [.] pg_logical_slot_get_changes_guts
  |
  |--45.12%--pg_logical_slot_get_changes_guts
  ||
  ||--42.34%--LogicalDecodingProcessRecord
  |||
  |||--12.82%--xact_decode
  ||||
  ||||--9.46%--DecodeAbort (inlined)
  ||||   |
  ||||   |--8.44%--ReorderBufferCleanupTXN
  ||||   |   |
  ||||   |   |--3.25%--ReorderBufferSequenceCleanup (in)
  ||||   |   |   |
  ||||   |   |   |--1.59%--hash_seq_search
  ||||   |   |   |
  ||||   |   |   |--0.80%--hash_search_with_hash_value
  ||||   |   |   |
  ||||   |   |--0.59%--hash_search
  ||||   |   |  hash_bytes

I guess that could be optimized, but it's also a direct consequence of
the huge number of aborts for transactions that create relfilenode. For
any other workload this will be negligible.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Make COPY format extendable: Extract COPY TO format implementations

2023-12-06 Thread Junwang Zhao

On Wed, Dec 6, 2023 at 8:32 PM Daniel Verite  wrote:
>
> Sutou Kouhei wrote:
>
> > * 2022-04: Apache Arrow [2]
> > * 2018-02: Apache Avro, Apache Parquet and Apache ORC [3]
> >
> > (FYI: I want to add support for Apache Arrow.)
> >
> > There were discussions how to add support for more formats. [3][4]
> > In these discussions, we got a consensus about making COPY
> > format extendable.
>
>
> These formats seem all column-oriented whereas COPY is row-oriented
> at the protocol level [1].
> With regard to the procotol, how would it work to support these formats?
>

They have kind of *RowGroup* concepts, a bunch of rows goes to a RowBatch
and the data of the same column goes together.

I think they should fit the COPY semantics and there are some  FDW out there for
these modern formats, like [1]. If we support COPY to deal with the
format, it will
be easier to interact with them(without creating
server/usermapping/foreign table).

[1]: https://github.com/adjust/parquet_fdw

>
> [1] https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY
>
>
> Best regards,
> --
> Daniel Vérité
> https://postgresql.verite.pro/
> Twitter: @DanielVerite
>
>

-- 
Regards
Junwang Zhao

Re: Is WAL_DEBUG related code still relevant today?

Peter Eisentraut  writes:
> This kind of thing could be mostly avoided if we didn't hide all the 
> WAL_DEBUG behind #ifdefs.  For example, in the attached patch, I instead 
> changed it so that
>  if (XLOG_DEBUG)
> resolves to
>  if (false)
> in the normal case.  That way, we don't need to wrap that in #ifdef 
> WAL_DEBUG, and the compiler can see the disabled code and make sure it 
> continues to build.

Hmm, maybe, but I'm not sure this would be an unalloyed good.
The main concern I have is compilers and static analyzers starting
to bleat about unreachable code (warnings like "variable set but
never used", or the like, seem plausible).  The dead code would
also decrease our code coverage statistics, not that those are
wonderful now.

regards, tom lane

Re: [RFC] Clang plugin for catching suspicious typedef casting

2023-12-06 Thread Dmitry Dolgov

> On Sun, Dec 03, 2023 at 07:02:55PM -0800, Peter Geoghegan wrote:
> On Sun, Dec 3, 2023 at 6:31 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > > Only advantage I see to using libclang is that you can run programs built
> > > with libclang regardless of what your C compiler is. I typically use GCC.
> > >
> > > I think your idea of automating this kind of thing is great no matter how 
> > > it
> > > is implemented.
> >
> > Yeah, absolutely.
>
> What is the difference between a clang plugin (or whatever this is),
> and a custom clang-tidy check? Are they the same thing?

Good question. Obviously one difference is that a clang plugin is more
tightly integrated into the build process, where a custom clang-tidy
check could be used independently. IIUC they also use different API, AST
Consumer vs AST Matchers, but so far I have no idea what consequences
does it have.

Clang tooling page has this to say about choosing the right interface
[1], where clang-tidy seems to fall into LibTooling cathegory:

Clang Plugins allow you to run additional actions on the AST as part
of a compilation. Plugins are dynamic libraries that are loaded at
runtime by the compiler, and they’re easy to integrate into your
build environment.

Canonical examples of when to use Clang Plugins:
* special lint-style warnings or errors for your project creating
* additional build artifacts from a single compile step

[...]

LibTooling is a C++ interface aimed at writing standalone tools, as
well as integrating into services that run clang tools. Canonical
examples of when to use LibTooling:

* a simple syntax checker
* refactoring tools

> I myself use clang-tidy through clangd. It has a huge number of
> checks, plus some custom checks that are only used by certain open
> source projects.
>
> An example of a check that seems like it would be interesting to
> Postgres hackers:
>
> https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/bugprone/signal-handler.html
>
> An example of a custom clang-tidy check, used for the Linux Kernel:
>
> https://releases.llvm.org/17.0.1/tools/clang/tools/extra/docs/clang-tidy/checks/linuxkernel/must-use-errs.html

Yeah, those look interesting, and might be even worth including
independently of the topic in this thread.

> Your idea of starting with a check that identifies when BlockNumber
> and Buffer variables were confused seems like the right general idea
> to me. It's just about impossible for this to be a false positive in practice,
> which is important. But, I wonder, is there a way of accomplishing the
> same thing without any custom code? Seems like the general idea of not
> confusing two types like BlockNumber and Buffer might be a generic
> one?

Agree, the example in this patch could be generalized.

> * A custom check that enforces coding standards within signal handlers
> -- the generic one that I linked to might need to be customized, in
> whatever way (don't use it myself).
>
> * A custom check that enforces a coding standard that applies within
> critical sections.
>
> We already have an assertion that catches memory allocations made
> within a critical section. It might make sense to have tooling that
> can detect other kinds of definitely-unsafe things in critical
> sections. For example, maybe it would be possible to detect when
> XLogRegisterBufData() has been passed a pointer to something on the
> stack that will go out of scope, in a way that leaves open the
> possibility of reading random stuff from the stack once inside
> XLogInsert(). AddressSanitizer's use-after-scope can detect that sort
> of thing, though not reliably.

Interesting ideas, thanks for sharing. I'm going to try implementing
some of those.

[1]: https://clang.llvm.org/docs/Tooling.html

Re: remaining sql/json patches

2023-12-06 Thread Alvaro Herrera

On 2023-Dec-06, Amit Langote wrote:

> I think I'm inclined toward adapting the LA-token fix (attached 0005),
> because we've done that before with SQL/JSON constructors patch.
> Also, if I understand the concerns that Tom mentioned at [1]
> correctly, maybe we'd be better off not assigning precedence to
> symbols as much as possible, so there's that too against the approach
> #1.

Sounds ok to me, but I'm happy for this decision to be overridden by
others with more experience in parser code.

> Also I've attached 0006 to add news tests under ECPG for the SQL/JSON
> query functions, which I haven't done so far but realized after you
> mentioned ECPG.  It also includes the ECPG variant of the LA-token
> fix.  I'll eventually merge it into 0003 and 0004 after expanding the
> test cases some more.  I do wonder what kinds of tests we normally add
> to ECPG suite but not others?

Well, I only added tests to the ecpg suite in the previous round of
SQL/JSON deeds because its grammar was being modified, so it seemed
possible that it'd break.  Because you're also going to modify its
parser.c, it seems reasonable to expect tests to be added.  I wouldn't
expect to have to do this for other patches, because it should behave
like straight SQL usage.

Looking at 0002 I noticed that populate_array_assign_ndims() is called
in some places and its return value is not checked, so we'd ultimately
return JSON_SUCCESS even though there's actually a soft error stored
somewhere.  I don't know if it's possible to hit this in practice, but
it seems odd.

Looking at get_json_object_as_hash(), I think its comment is not
explicit enough about its behavior when an error is stored in escontext,
so its hard to judge whether its caller is doing the right thing (I
think it is).  OTOH, populate_record seems to have the same issue, but
callers of that definitely seem to be doing the wrong thing -- namely,
not checking whether an error was saved; particularly populate_composite
seems to rely on the returned tuple, even though an error might have
been reported.

(I didn't look at the subsequent patches in the series to see if these
things were fixed later.)

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Emitting JSON to file using COPY TO




On 2023-12-06 We 08:49, Joe Conway wrote:

On 12/6/23 07:36, Andrew Dunstan wrote:


On 2023-12-05 Tu 16:46, Joe Conway wrote:

On 12/5/23 16:20, Andrew Dunstan wrote:

On 2023-12-05 Tu 16:09, Joe Conway wrote:

On 12/5/23 16:02, Joe Conway wrote:

On 12/5/23 15:55, Andrew Dunstan wrote:

and in any other case (e.g. LINES) I can't see why you
would have them.


Oh I didn't address this -- I saw examples in the interwebs of 
MSSQL server I think [1] which had the non-array with commas 
import and export style. It was not that tough to support and the 
code as written already does it, so why not?


That seems quite absurd, TBH. I know we've catered for some 
absurdity in
the CSV code (much of it down to me), so maybe we need to be 
liberal in
what we accept here too. IMNSHO, we should produce either a single 
JSON

document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).


So your preference would be to not allow the non-array-with-commas 
case but if/when we implement COPY FROM we would accept that format? 
As in Postel'a law ("be conservative in what you do, be liberal in 
what you accept from others")?



Yes, I think so.


Awesome. The attached does it that way. I also ran pgindent.

I believe this is ready to commit unless there are further comments or 
objections.



Sorry to bikeshed a little more, I'm a bit late looking at this.

I suspect that most users will actually want the table as a single JSON 
document, so it should probably be the default. In any case FORCE_ARRAY 
as an option has a slightly wrong feel to it. I'm having trouble coming 
up with a good name for the reverse of that, off the top of my head.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Emitting JSON to file using COPY TO

Joe Conway  writes:
> I believe this is ready to commit unless there are further comments or 
> objections.

I thought we were still mostly at proof-of-concept stage?

In particular, has anyone done any performance testing?
I'm concerned about that because composite_to_json() has
zero capability to cache any metadata across calls, meaning
there is going to be a large amount of duplicated work
per row.

regards, tom lane

Re: Emitting JSON to file using COPY TO


On 12/6/23 10:32, Andrew Dunstan wrote:


On 2023-12-06 We 08:49, Joe Conway wrote:

On 12/6/23 07:36, Andrew Dunstan wrote:


On 2023-12-05 Tu 16:46, Joe Conway wrote:

On 12/5/23 16:20, Andrew Dunstan wrote:

On 2023-12-05 Tu 16:09, Joe Conway wrote:

On 12/5/23 16:02, Joe Conway wrote:

On 12/5/23 15:55, Andrew Dunstan wrote:

and in any other case (e.g. LINES) I can't see why you
would have them.


Oh I didn't address this -- I saw examples in the interwebs of 
MSSQL server I think [1] which had the non-array with commas 
import and export style. It was not that tough to support and the 
code as written already does it, so why not?


That seems quite absurd, TBH. I know we've catered for some 
absurdity in
the CSV code (much of it down to me), so maybe we need to be 
liberal in
what we accept here too. IMNSHO, we should produce either a single 
JSON

document (the ARRAY case) or a series of JSON documents, one per row
(the LINES case).


So your preference would be to not allow the non-array-with-commas 
case but if/when we implement COPY FROM we would accept that format? 
As in Postel'a law ("be conservative in what you do, be liberal in 
what you accept from others")?



Yes, I think so.


Awesome. The attached does it that way. I also ran pgindent.

I believe this is ready to commit unless there are further comments or 
objections.


Sorry to bikeshed a little more, I'm a bit late looking at this.

I suspect that most users will actually want the table as a single JSON
document, so it should probably be the default. In any case FORCE_ARRAY
as an option has a slightly wrong feel to it.


Sure, I can make that happen, although I figured that for the 
many-rows-scenario the single array size might be an issue for whatever 
you are importing into.



I'm having trouble coming up with a good name for the reverse of
that, off the top of my head.


Will think about it and propose something with the next patch revision.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Clean up some signal usage mainly related to Windows

On Wed, Dec 06, 2023 at 10:23:52AM +0100, Peter Eisentraut wrote:
> Ok, I have committed your 0001 patch.

My compiler is unhappy about this one:

../postgresql/src/bin/pg_test_fsync/pg_test_fsync.c:605:2: error: ignoring 
return value of ‘write’, declared with attribute warn_unused_result 
[-Werror=unused-result]
  605 |  write(STDOUT_FILENO, "\n", 1);
  |  ^

I think we need to do something like the following, which is similar to
what was done in aa90e148ca7, 27314d32a88, and 6c72a28e5ce.

diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c 
b/src/bin/pg_test_fsync/pg_test_fsync.c
index f109aa5717..0684f4bc54 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -598,11 +598,14 @@ test_non_sync(void)
 static void
 signal_cleanup(SIGNAL_ARGS)
 {
+int rc;
+
 /* Delete the file if it exists. Ignore errors */
 if (needs_unlink)
 unlink(filename);
 /* Finish incomplete line on stdout */
-write(STDOUT_FILENO, "\n", 1);
+rc = write(STDOUT_FILENO, "\n", 1);
+(void) rc;
 _exit(1);
 }

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Emitting JSON to file using COPY TO


On 12/6/23 10:44, Tom Lane wrote:

Joe Conway  writes:
I believe this is ready to commit unless there are further comments or 
objections.


I thought we were still mostly at proof-of-concept stage?


The concept is narrowly scoped enough that I think we are homing in on 
the final patch.



In particular, has anyone done any performance testing?
I'm concerned about that because composite_to_json() has
zero capability to cache any metadata across calls, meaning
there is going to be a large amount of duplicated work
per row.


I will devise some kind of test and report back. I suppose something 
with many rows and many narrow columns comparing time to COPY 
text/csv/json modes would do the trick?


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Emitting JSON to file using COPY TO




On 2023-12-06 We 10:44, Tom Lane wrote:

Joe Conway  writes:

I believe this is ready to commit unless there are further comments or
objections.

I thought we were still mostly at proof-of-concept stage?

In particular, has anyone done any performance testing?
I'm concerned about that because composite_to_json() has
zero capability to cache any metadata across calls, meaning
there is going to be a large amount of duplicated work
per row.





Yeah, that's hard to deal with, too, as it can be called recursively.

OTOH I'd rather have a version of this that worked slowly than none at all.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Emitting JSON to file using COPY TO

Joe Conway  writes:
> On 12/6/23 10:44, Tom Lane wrote:
>> In particular, has anyone done any performance testing?

> I will devise some kind of test and report back. I suppose something 
> with many rows and many narrow columns comparing time to COPY 
> text/csv/json modes would do the trick?

Yeah.  If it's at least in the same ballpark as the existing text/csv
formats then I'm okay with it.  I'm worried that it might be 10x worse,
in which case I think we'd need to do something.

regards, tom lane

Re: Remove MSVC scripts from the tree




On 2023-12-06 We 01:18, Peter Eisentraut wrote:

On 04.12.23 21:11, Andrew Dunstan wrote:
I just had a look at shifting bowerbird to use meson, and it got 
stymied at the c99 test, which apparently doesn't compile with 
anything less than VS2019.


If that is the case, then wouldn't that invalidate the documented 
claim that you can build with VS2015 or newer?



Indeed it would.

Here's what the Microsoft site says at 
:



You can invoke the Microsoft C compiler by using the /TC or /Tc 
compiler option. It's used by default for code that has a .c file 
extension, unless overridden by a /TP or /Tp option. The default C 
compiler (that is, the compiler when /std:c11 or /std:c17 isn't 
specified) implements ANSI C89, but includes several Microsoft 
extensions, some of which are part of ISO C99. Some Microsoft 
extensions to C89 can be disabled by using the /Za compiler option, 
but others remain in effect. It isn't possible to specify strict C89 
conformance. The compiler doesn't implement several required features 
of C99, so it isn't possible to specify C99 conformance, either.


But the VS2019 compiler implements enough of C99 to pass our meson test, 
unlike VS2017. Maybe the test is too strict. After all, we know we can 
in fact build with the earlier versions.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Emitting JSON to file using COPY TO

2023-12-06 Thread Sehrope Sarkuni

Big +1 to this overall feature.

This is something I've wanted for a long time as well. While it's possible
to use a COPY with text output for a trivial case, the double escaping
falls apart quickly for arbitrary data. It's really only usable when you
know exactly what you are querying and know it will not be a problem.

Regarding the defaults for the output, I think JSON lines (rather than a
JSON array of objects) would be preferred. It's more natural to combine
them and generate that type of data on the fly rather than forcing
aggregation into a single object.

Couple more features / use cases come to mind as well. Even if they're not
part of a first round of this feature I think it'd be helpful to document
them now as it might give some ideas for what does make that first cut:

1. Outputting a top level JSON object without the additional column keys.
IIUC, the top level keys are always the column names. A common use case
would be a single json/jsonb column that is already formatted exactly as
the user would like for output. Rather than enveloping it in an object with
a dedicated key, it would be nice to be able to output it directly. This
would allow non-object results to be outputted as well (e.g., lines of JSON
arrays, numbers, or strings). Due to how JSON is structured, I think this
would play nice with the JSON lines v.s. array concept.

COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_NOT_ENVELOPE)
{"foo":1}
{"foo":2}
{"foo":3}

2. An option to ignore null fields so they are excluded from the output.
This would not be a default but would allow shrinking the total size of the
output data in many situations. This would be recursive to allow nested
objects to be shrunk down (not just the top level). This might be
worthwhile as a standalone JSON function though handling it during output
would be more efficient as it'd only be read once.

COPY (SELECT json_build_object('foo', CASE WHEN x > 1 THEN x END) FROM
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_NOT_ENVELOPE, JSON_SKIP_NULLS)
{}
{"foo":2}
{"foo":3}

3. Reverse of #2 when copying data in to allow defaulting missing fields to
NULL.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Re: Clean up some signal usage mainly related to Windows

2023-12-06 Thread Tristan Partin


On Wed Dec 6, 2023 at 10:18 AM CST, Nathan Bossart wrote:

On Wed, Dec 06, 2023 at 10:23:52AM +0100, Peter Eisentraut wrote:
> Ok, I have committed your 0001 patch.

My compiler is unhappy about this one:

../postgresql/src/bin/pg_test_fsync/pg_test_fsync.c:605:2: error: ignoring 
return value of ‘write’, declared with attribute warn_unused_result 
[-Werror=unused-result]
  605 |  write(STDOUT_FILENO, "\n", 1);
  |  ^


Some glibc source:


/* If fortification mode, we warn about unused results of certain
   function calls which can lead to problems.  */
#if __GNUC_PREREQ (3,4) || __glibc_has_attribute (__warn_unused_result__)
# define __attribute_warn_unused_result__ \
   __attribute__ ((__warn_unused_result__))
# if defined __USE_FORTIFY_LEVEL && __USE_FORTIFY_LEVEL > 0
#  define __wur __attribute_warn_unused_result__
# endif
#else
# define __attribute_warn_unused_result__ /* empty */
#endif
#ifndef __wur
# define __wur /* Ignore */
#endif



extern ssize_t write (int __fd, const void *__buf, size_t __n) __wur
__attr_access ((__read_only__, 2, 3));


According to my setup, I am hitting the /* Ignore */ variant of __wur. 
I am guessing that Fedora doesn't add fortification to the default 
CFLAGS. What distro are you using? But yes, something like what you 
proposed sounds good to me. Sorry for leaving this out!


Makes me wonder if setting -D_FORTIFY_SOURCE=2 in debug builds at least 
would make sense, if not all builds. According to the OpenSSF[0], level 
2 is only supposed to impact runtime performance by 0.1%.


[0]: 
https://best.openssf.org/Compiler-Hardening-Guides/Compiler-Options-Hardening-Guide-for-C-and-C++.html#performance-implications

--
Tristan Partin
Neon (https://neon.tech)

Re: Emitting JSON to file using COPY TO

Andrew Dunstan  writes:
> On 2023-12-06 We 10:44, Tom Lane wrote:
>> In particular, has anyone done any performance testing?
>> I'm concerned about that because composite_to_json() has
>> zero capability to cache any metadata across calls, meaning
>> there is going to be a large amount of duplicated work
>> per row.

> Yeah, that's hard to deal with, too, as it can be called recursively.

Right.  On the plus side, if we did improve this it would presumably
also benefit other callers of composite_to_json[b].

> OTOH I'd rather have a version of this that worked slowly than none at all.

It might be acceptable to plan on improving the performance later,
depending on just how bad it is now.

regards, tom lane

Re: Emitting JSON to file using COPY TO

On Wed, Dec 06, 2023 at 11:28:59AM -0500, Tom Lane wrote:
> It might be acceptable to plan on improving the performance later,
> depending on just how bad it is now.

On 10M rows with 11 integers each, I'm seeing the following:

(format text)
Time: 10056.311 ms (00:10.056)
Time: 8789.331 ms (00:08.789)
Time: 8755.070 ms (00:08.755)

(format csv)
Time: 12295.480 ms (00:12.295)
Time: 12311.059 ms (00:12.311)
Time: 12305.469 ms (00:12.305)

(format json)
Time: 24568.621 ms (00:24.569)
Time: 23756.234 ms (00:23.756)
Time: 24265.730 ms (00:24.266)

'perf top' tends to look a bit like this:

  13.31%  postgres  [.] appendStringInfoString
   7.57%  postgres  [.] datum_to_json_internal
   6.82%  postgres  [.] SearchCatCache1
   5.35%  [kernel]  [k] intel_gpio_irq
   3.57%  postgres  [.] composite_to_json
   3.31%  postgres  [.] IsValidJsonNumber

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Clean up some signal usage mainly related to Windows

On Wed, Dec 06, 2023 at 10:28:49AM -0600, Tristan Partin wrote:
> According to my setup, I am hitting the /* Ignore */ variant of __wur. I am
> guessing that Fedora doesn't add fortification to the default CFLAGS. What
> distro are you using? But yes, something like what you proposed sounds good
> to me. Sorry for leaving this out!

This was on an Ubuntu LTS.  I always build with -Werror during development,
too.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Emitting JSON to file using COPY TO

On Wed, Dec 06, 2023 at 10:33:49AM -0600, Nathan Bossart wrote:
>   (format csv)
>   Time: 12295.480 ms (00:12.295)
>   Time: 12311.059 ms (00:12.311)
>   Time: 12305.469 ms (00:12.305)
> 
>   (format json)
>   Time: 24568.621 ms (00:24.569)
>   Time: 23756.234 ms (00:23.756)
>   Time: 24265.730 ms (00:24.266)

I should also note that the json output is 85% larger than the csv output.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Wed, Dec 6, 2023 at 5:27 AM Matthias van de Meent
 wrote:
> On Wed, 6 Dec 2023 at 14:11, Robert Haas  wrote:
> > It isn't very clear from the commit message that this commit is doing
> > two different things, and in fact I'm still unclear on what exactly
> > the other optimization is.
>
> I feel that Peter refered to these two distinct optimizations:

Right.

> 2. When scanning an index in ascending order using scankey a < 10 (one
> that defines an endpoint of the scan), we can look ahead and check if
> the last item on the page is consistent. If so, then all other items
> on the page will also be consistent with that scankey.

Also worth noting that it could be "scankey a = 10". That is, the
precheck optimization (i.e. the optimization that's not the target of
my test case) can deal with equalities and inequalities just as well
(any scan key that's required in the current scan direction is
supported). On the other hand the required-in-opposite-direction-only
optimization (i.e. the target of my test case) only applies to
inequality strategy scan keys.

It kinda makes sense to explain both concepts using an example that
involves both > and < strategy inequalities, since that makes the
symmetry between the two optimizations clearer.

-- 
Peter Geoghegan

Re: Change GUC hashtable to use simplehash?

2023-12-06 Thread Jeff Davis

On Wed, 2023-12-06 at 07:39 +0700, John Naylor wrote:
> "git grep cstring_hash" found nothing, so not sure what you're
> asking.

Sorry, I meant string_hash(). Your v5-0002 changes the way hashing
works for cstrings, and that means it's no longer equivalent to
hash_bytes with strlen. That's probably fine, but someone might assume
that they are equivalent.

> 
> In the abstract, I consider (b) to be a layering violation. As a
> consequence, the cleverness in (b) is not confined to one or two
> places, but is smeared over a whole bunch of places. I find it hard
> to
> follow.

OK. I am fine with (a).

Regards,
Jeff Davis

Re: Remove MSVC scripts from the tree


On 06.12.23 17:27, Andrew Dunstan wrote:
But the VS2019 compiler implements enough of C99 to pass our meson test, 
unlike VS2017. Maybe the test is too strict. After all, we know we can 
in fact build with the earlier versions.


I just realized that the C99 test is actually our own, not provided by 
meson.  (See "c99_test" in meson.build.)


Can you try disabling a few bits of that to see what makes it pass for 
you?  I suspect it's the structfunc() call.

Re: Clean up some signal usage mainly related to Windows


On 06.12.23 17:18, Nathan Bossart wrote:

On Wed, Dec 06, 2023 at 10:23:52AM +0100, Peter Eisentraut wrote:

Ok, I have committed your 0001 patch.


My compiler is unhappy about this one:

../postgresql/src/bin/pg_test_fsync/pg_test_fsync.c:605:2: error: ignoring 
return value of ‘write’, declared with attribute warn_unused_result 
[-Werror=unused-result]
   605 |  write(STDOUT_FILENO, "\n", 1);
   |  ^

I think we need to do something like the following, which is similar to
what was done in aa90e148ca7, 27314d32a88, and 6c72a28e5ce.

diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c 
b/src/bin/pg_test_fsync/pg_test_fsync.c
index f109aa5717..0684f4bc54 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -598,11 +598,14 @@ test_non_sync(void)
  static void
  signal_cleanup(SIGNAL_ARGS)
  {
+int rc;
+
  /* Delete the file if it exists. Ignore errors */
  if (needs_unlink)
  unlink(filename);
  /* Finish incomplete line on stdout */
-write(STDOUT_FILENO, "\n", 1);
+rc = write(STDOUT_FILENO, "\n", 1);
+(void) rc;
  _exit(1);
  }


Makes sense.  Can you commit that?

Re: Clean up some signal usage mainly related to Windows

On Wed, Dec 06, 2023 at 06:27:04PM +0100, Peter Eisentraut wrote:
> Makes sense.  Can you commit that?

Yes, I will do so shortly.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Tue, Dec 5, 2023 at 8:20 PM Peter Geoghegan  wrote:
> On Tue, Dec 5, 2023 at 8:06 PM Alexander Korotkov  
> wrote:
> > Thank you for raising this issue.  Preprocessing of btree scan keys is
> > normally removing the redundant scan keys.  However, redundant scan
> > keys aren't removed when they have arguments of different types.
> > Please give me a bit of time to figure out how to workaround this.
>
> Couldn't you condition the use of the optimization on
> _bt_preprocess_keys being able to use cross-type operators when it
> checked for redundant or contradictory scan keys? The vast majority of
> index scans will be able to do that.

Some quick experimentation shows that my test case works as expected
once _bt_preprocess_keys is taught to remember that it has seen a
maybe-unsafe case, which it stashes in a special new field from the
scan's state for later. As I said, this field can be treated as a
condition of applying the required-in-opposite-direction-only
optimization in _bt_readpage().

This new field would be analogous to the existing
requiredMatchedByPrecheck state used by _bt_readpage() to determine
whether it can apply the required-in-same-direction optimization. The
new field works for the whole scan instead of just for one page, and
it works based on information from "behind the scan" instead of
information "just ahead of the scan". But the basic idea is the same.

_bt_preprocess_keys is rather complicated. It is perhaps tempting to
do this in a targeted way, that specifically limits itself to the exact
cases that we know to be unsafe. But it might be okay to just disable
the optimization in most or all cases where _bt_compare_scankey_args()
returns false. That's likely to be very rare in practice, anyway (who
really uses opfamilies like these?). Not really sure where to come
down on that.

--
Peter Geoghegan

Re: Building PosgresSQL with LLVM fails on Solaris 11.4

2023-12-06 Thread Andres Freund

Hi,

On 2023-12-01 23:06:59 +, Sacha Hottinger wrote:
> // I used the patch command to patch the src/backend/port/Makefile with your 
> attached file and tried again with the Sun Studio compiler. There is now a 
> different error at this stage:
> …
> /opt/developerstudio12.6/bin/cc -m64 -xarch=native -Xa -v -O 
> -I../../../src/include-c -o pg_shmem.o pg_shmem.c
> echo | /usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv 
> -O2  -I../../../src/include   -flto=thin -emit-llvm -c -xc -o tas.bc tas.s
> tas.s:1:1: error: expected identifier or '('
> !-
> ^
> 1 error generated.

That's me making a silly mistake...  I've attached at an updated, but still
blindly written, diff.


> // Have attached the config.log, gmake all full log, and patched Makefile.

Could you attach config.log and gmake for the gcc based build? Because so far
I have no idea what causes the linker issue there.

Greetings,

Andres Freund
diff --git i/src/backend/port/Makefile w/src/backend/port/Makefile
index 47338d99229..4b3bd9af229 100644
--- i/src/backend/port/Makefile
+++ w/src/backend/port/Makefile
@@ -33,6 +33,14 @@ endif
 
 include $(top_srcdir)/src/backend/common.mk
 
+# Hacky rule to allow clang to generate tas.bc. One reasonably would think
+# that we could just compile tas.s, but on solaris tas.s can't be parsed with
+# clang. It's not interesting from a JIT perspective anyway, so just generate
+# an empty file.
+
+tas.bc: tas.s
+	echo | $(COMPILE.c.bc) -xc -o $@ -
+
 tas.o: tas.s
 ifeq ($(SUN_STUDIO_CC), yes)
 # preprocess assembler file with cpp

Re: generic plans and "initial" pruning

Reviewing 0001:

Perhaps ExecEndCteScan needs an adjustment. What if node->leader was never set?

Other than that, I think this is in good shape. Maybe there are other
things we'd want to adjust here, or maybe there aren't, but there
doesn't seem to be any good reason to bundle more changes into the
same patch.

Reviewing 0002 and beyond:

I think it's good that you have tried to divide up a big change into
little pieces, but I'm finding the result difficult to understand. It
doesn't really seem like each patch stands on its own. I keep flipping
between patches to try to understand why other patches are doing
things, which kind of defeats the purpose of splitting stuff up. For
example, 0002 adds a NodeTag field to QueryDesc, but it doesn't even
seem to initialize that field, let alone use it for anything. It adds
a CachedPlan pointer to QueryDesc too, and adapts CreateQueryDesc to
allow one as an argument, but none of the callers actually pass
anything. I suspect that that the first change (adding a NodeTag)
field is a bug, and that the second one is intentional, but it's hard
to tell without flipping through all of the other patches to see how
they build on what 0002 does. And even when something isn't a bug,
it's also hard to tell whether it's the right design, again because
you can't consider each patch in isolation. Ideally, splitting a patch
set should bring related changes together in a single patch and push
unrelated changes apart into different patches, but I don't really see
this particular split having that effect.

There is a chicken and egg problem here, to be fair. If we add code
that can make plan initialization fail without teaching the planner to
cope with failures, then we have broken the server, and if we do the
reverse, then we have a bunch of dead code that we can't test. Neither
is very satisfactory. But I still hope there's some better division
possible than what you have here currently. For instance, I wonder if
it would be possible to add all the stuff to cope with plan
initialization failing and then have a test patch that makes
initialization randomly fail with some probability (or maybe you can
even cause failures at specific points). Then you could test that
infrastructure by running the regression tests in a loop with various
values of the relevant setting.

Another overall comment that I have is that it doesn't feel like
there's enough high-level explanation of the design. I don't know how
much of that should go in comments vs. commit messages vs. a README
that accompanies the patch set vs. whatever else, and I strongly
suspect that some of the stuff that seems confusing now is actually
stuff that at one point I understood and have just forgotten about.
But rediscovering it shouldn't be quite so hard. For example, consider
the question "why are we storing the CachedPlan in the QueryDesc?" I
eventually figured out that it's so that ExecPlanStillValid can call
CachedPlanStillValid which can then consult the cached plan's is_valid
flag. But is that the only access to the CachedPlan that we ever
expect to occur via the QueryDesc? If not, what else is allowable? If
so, why not just store a Boolean in the QueryDesc and arrange for the
plancache to be able to flip it when invalidating? I'm not saying
that's a better design -- I'm saying that it looks hard to understand
your thought process from the patch set. And also, you know, assuming
the current design is correct, could there be some way of dividing up
the patch set so that this one change, where we add the CachedPlan to
the QueryDesc, isn't so spread out across the whole series?

Some more detailed review comments below. This isn't really a full
review because I don't understand the patches well enough for that,
but it's some stuff I noticed.

In 0002:

+ * result-rel info, etc.  Also, we don't pass the parent't copy of the

Typo.

+/*
+ * All the necessary locks must already have been taken when
+ * initializing the parent's copy of subplanstate, so the CachedPlan,
+ * if any, should not have become invalid during ExecInitNode().
+ */
+Assert(ExecPlanStillValid(rcestate));

This -- and the other similar instance -- feel very uncomfortable.
There's a lot of action at a distance here. If this assertion ever
failed, how would anyone ever figure out what went wrong? You wouldn't
for example know which object got invalidated, presumably
corresponding to a lock that you failed to take. Unless the problem
were easily reproducible in a test environment, trying to guess what
happened might be pretty awful; imagine seeing this assertion failure
in a customer log file and trying to back-track to the find the
underlying bug. A further problem is that what would actually happen
is you *wouldn't* see this in the customer log file, because
assertions wouldn't be enabled, so you'd just see queries occasionally
returning wrong answers, I guess? Or crashing in some other random
part of the code

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Wed, Dec 6, 2023 at 5:27 AM Matthias van de Meent
 wrote:
> 1. When scanning an index in ascending order using scankey a > 1 (so,
> one that defines a start point of the scan), we don't need to check
> items for consistency with that scankey once we've found the first
> value that is consistent with the scankey, as all future values will
> also be consistent with the scankey (if we assume no concurrent page
> deletions).

BTW, I don't think that page deletion is a concern for these
optimizations in the way that it is for the similar idea of "dynamic
prefix compression", which works against insertion-type scan keys
(used to descend the tree and to do an initial binary search of a leaf
page).

We already rely on the first call to _bt_readpage (the one that takes
place from within _bt_first rather than from _bt_next) passing a page
offset number that's exactly at the start of where matches begin --
this is crucial in the case of scans with required equality strategy
scan keys (most scans). If we just skipped the _bt_binsrch and passed
P_FIRSTDATAKEY(opaque) to _bt_readpage within _bt_first instead, that
would break lots of queries. So the _bt_binsrch step within _bt_first
isn't just an optimization -- it's crucial. This is nothing new.

Recall that _bt_readpage only deals with search-type scan keys,
meaning scan keys that use a simple operator (so it uses = operators
with the equality strategy, as opposed to using a 3-way ORDER
proc/support function 1 that can tell the difference between < and >).
In general _bt_readpage doesn't know how to tell the difference
between a tuple that's before the start of = matches, and a tuple
that's at (or after) the end of any = matches. If it is ever allowed
to conflate these two cases, then we'll overlook matching tuples,
which is of course wrong (it'll terminate the scan before it even
starts). It is up to the caller (really just _bt_first) to never call
_bt_readpage in a way that allows this confusion to take place --
which is what makes the _bt_binsrch step crucial.

A race condition with page deletion might allow the key space covered
by a leaf page to "widen" after we've left its parent, but before we
arrive on the leaf page. But the _bt_binsrch step within _bt_first
happens *after* we land on and lock that leaf page, in any case. So
there is no risk of the scan ever doing anything with
concurrently-inserted index tuples. In general we only have to worry
about such race conditions when descending the tree -- they're not a
problem after the scan has reached the leaf level and established an
initial page offset number. (The way that _bt_readpage processes whole
pages in one atomic step helps with this sort of thing, too. We can
almost pretend that the B-Tree structure is immutable, even though
that's obviously not really true at all. We know that we'll always
make forward progress through the key space by remembering the next
page to visit when processing each page.)

My test case broke the required-in-opposite-direction-only
optimization by finding a way in which
required-in-opposite-direction-only inequalities were not quite the
same as required equalities with respect to this business about the
precise leaf page offset number that the scan begins at. They make
*almost* the same set of guarantees (note in particular that both will
be transformed into insertion scan key/3-way ORDER proc scan keys by
_bt_first's initial positioning code), but there is at least one
special case that applies only to inequalities. I had to play games
with weird incomplete opfamilies to actually break the optimization --
that was required to tickle the special case in just the right/wrong
way.

-- 
Peter Geoghegan

Re: Emitting JSON to file using COPY TO

2023-12-06 Thread Daniel Verite

Andrew Dunstan wrote:

> IMNSHO, we should produce either a single JSON 
> document (the ARRAY case) or a series of JSON documents, one per row 
> (the LINES case).

"COPY Operations" in the doc says:

" The backend sends a CopyOutResponse message to the frontend, followed
   by zero or more CopyData messages (always one per row), followed by
   CopyDone".

In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):

PostgreSQL
Type: CopyOut response
Length: 13
Format: Text (0)
Columns: 3
Format: Text (0)
PostgreSQL
Type: Copy data
Length: 6
Copy data: 5b0a
PostgreSQL
Type: Copy data
Length: 76
Copy data:
207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…

The first Copy data message with contents "5b0a" does not qualify
as a row of data with 3 columns as advertised in the CopyOut
message. Isn't that a problem?

At least the json non-ARRAY case ("json lines") doesn't have
this issue, since every CopyData message corresponds effectively
to a row in the table.

[1] https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY

Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Wed, 6 Dec 2023 at 19:55, Peter Geoghegan  wrote:
>
> On Wed, Dec 6, 2023 at 5:27 AM Matthias van de Meent
>  wrote:
> > 1. When scanning an index in ascending order using scankey a > 1 (so,
> > one that defines a start point of the scan), we don't need to check
> > items for consistency with that scankey once we've found the first
> > value that is consistent with the scankey, as all future values will
> > also be consistent with the scankey (if we assume no concurrent page
> > deletions).
>
> BTW, I don't think that page deletion is a concern for these
> optimizations in the way that it is for the similar idea of "dynamic
> prefix compression", which works against insertion-type scan keys
> (used to descend the tree and to do an initial binary search of a leaf
> page).
>
> We already rely on the first call to _bt_readpage (the one that takes
> place from within _bt_first rather than from _bt_next) passing a page
> offset number that's exactly at the start of where matches begin --
> this is crucial in the case of scans with required equality strategy
> scan keys (most scans). If we just skipped the _bt_binsrch and passed
> P_FIRSTDATAKEY(opaque) to _bt_readpage within _bt_first instead, that
> would break lots of queries. So the _bt_binsrch step within _bt_first
> isn't just an optimization -- it's crucial. This is nothing new.

I was thinking more along the lines of page splits+deletions while
we're doing _bt_stepright(), but forgot to consider that we first lock
the right sibling, and only then release the left sibling for splits,
so we should be fine here.

Kind regards,

Matthias van de Meent

Re: Emitting JSON to file using COPY TO


On 12/6/23 13:59, Daniel Verite wrote:

Andrew Dunstan wrote:

IMNSHO, we should produce either a single JSON 
document (the ARRAY case) or a series of JSON documents, one per row 
(the LINES case).


"COPY Operations" in the doc says:

" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone".

In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):

PostgreSQL
 Type: CopyOut response
 Length: 13
 Format: Text (0)
 Columns: 3
Format: Text (0)
PostgreSQL
 Type: Copy data
 Length: 6
 Copy data: 5b0a
PostgreSQL
 Type: Copy data
 Length: 76
 Copy data:
207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…

The first Copy data message with contents "5b0a" does not qualify
as a row of data with 3 columns as advertised in the CopyOut
message. Isn't that a problem?



Is it a real problem, or just a bit of documentation change that I missed?

Anything receiving this and looking for a json array should know how to 
assemble the data correctly despite the extra CopyData messages.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Emitting JSON to file using COPY TO


On 12/6/23 11:44, Nathan Bossart wrote:

On Wed, Dec 06, 2023 at 10:33:49AM -0600, Nathan Bossart wrote:

(format csv)
Time: 12295.480 ms (00:12.295)
Time: 12311.059 ms (00:12.311)
Time: 12305.469 ms (00:12.305)

(format json)
Time: 24568.621 ms (00:24.569)
Time: 23756.234 ms (00:23.756)
Time: 24265.730 ms (00:24.266)


I should also note that the json output is 85% larger than the csv output.


I'll see if I can add some caching to composite_to_json(), but based on 
the relative data size it does not sound like there is much performance 
left on the table to go after, no?


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

On Wed, Dec 6, 2023 at 11:14 AM Matthias van de Meent
 wrote:
> I was thinking more along the lines of page splits+deletions while
> we're doing _bt_stepright(), but forgot to consider that we first lock
> the right sibling, and only then release the left sibling for splits,
> so we should be fine here.

In general the simplest (and possibly most convincing) arguments for
the correctness of optimizations like the ones that Alexander added
rely on seeing that the only way that the optimization can be wrong is
if some more fundamental and long established thing was also wrong. We
could try to prove that the new optimization is correct (or wrong),
but it is often more helpful to "prove" that some much more
fundamental thing is correct instead, if that provides us with a
useful corollary about the new thing also being correct.

Take the _bt_readpage precheck optimization, for example. Rather than
thinking about the key space and transitive rules, it might be more
helpful to focus on what must have been true in earlier Postgres
versions, and what we can expect to still hold now. The only way that
that optimization could be wrong is if the same old _bt_checkkeys
logic that decides when to terminate the scan (by setting
continuescan=false) always had some propensity to "change its mind"
about ending the scan, at least when it somehow got the opportunity to
see tuples after the first tuple that it indicated should end the
scan. That's not quite bulletproof, of course (it's not like older
Postgres versions actually provided _bt_checkkeys with opportunities
to "change its mind" in this sense), but it's a useful starting point
IME. It helps to build intuition.

-- 
Peter Geoghegan

automating RangeTblEntry node support

I have been looking into what it would take to get rid of the 
custom_read_write and custom_query_jumble for the RangeTblEntry node 
type.  This is one of the larger and more complex exceptions left.


(Similar considerations would also apply to the Constraint node type.)

Allegedly, only certain fields of RangeTblEntry are valid based on 
rtekind.  But exactly which ones seems to be documented and handled 
inconsistently.  It seems that over time, new RTE kinds have "borrowed" 
fields that notionally belong to other RTE kinds, which is technically 
not a problem but creates a bit of a mess when trying to understand all 
this.


I have some WIP patches to accompany this discussion.

Let's start with the jumble function.  I suppose that this was just 
carried over from the pg_stat_statements-specific code without any 
detailed review.  For example, the "inh" field is documented to be valid 
in all RTEs, but it's only jumbled for RTE_RELATION.  The "lateral" 
field isn't looked at at all.  I wouldn't be surprised if there are more 
cases like this.


In the first attached patch, I remove _jumbleRangeTblEntry() and instead 
add per-field query_jumble_ignore annotations to approximately match the 
behavior of the previous custom code.  The pg_stat_statements test suite 
has some coverage of this.  I get rid of switch on rtekind; this should 
be technically correct, since we do the equal and copy functions like 
this also.  So for example the "inh" field is now considered in each 
case.  But I left "lateral" alone.  I suspect several of these new 
query_jumble_ignore should actually be dropped because the code was 
wrong before.


In the second patch, I'm removing the switch on rtekind from 
range_table_mutator_impl().  This should be fine because all the 
subroutines can handle unset/NULL fields.  And it removes one more place 
that needs to track knowledge about which fields are valid when.


In the third patch, I'm removing the custom read/write functions for 
RangeTblEntry.  Those functions wanted to have a few fields at the front 
to make the dump more legible; I'm doing that now by moving the fields 
up in the actual struct.


Not done here, but something we should do is restructure the 
documentation of RangeTblEntry itself.  I'm still thinking about the 
best way to structure this, but I'm thinking more like noting for each 
field when it's used, instead by block like it is now, which makes it 
awkward if a new RTE wants to borrow some fields.


Now one could probably rightfully complain that having all these unused 
fields dumped would make the RangeTblEntry serialization bigger.  I'm 
not sure who big of a problem that actually is, considering how many 
often-unset fields other node types have.  But it deserves some 
consideration.  I think the best way to work around that would be to 
have a mode that omits fields that have their default value (zero). 
This would be more generally useful; for example Query also has a bunch 
of fields that are not often set.  I think this would be pretty easy to 
implement, for example like


#define WRITE_INT_FIELD(fldname) \
if (full_mode || node->fldname) \
appendStringInfo(str, " :" CppAsString(fldname) " %d", 
node->fldname)


There is also the discussion over at [0] about larger redesigns of the 
node serialization format.  I'm also interested in that, but here I'm 
mainly trying to remove more special cases to make that kind of work 
easier in the future.


Any thoughts about the direction?


[0]: 
https://www.postgresql.org/message-id/flat/CACxu%3DvL_SD%3DWJiFSJyyBuZAp_2v_XBqb1x9JBiqz52a_g9z3jA%40mail.gmail.comFrom ef08061a6d8f01e7db350b74eaa2d403df6079b9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 1 Dec 2023 11:45:50 +0100
Subject: [PATCH v1 1/3] Remove custom _jumbleRangeTblEntry()

---
 src/backend/nodes/queryjumblefuncs.c | 48 
 src/include/nodes/parsenodes.h   | 42 
 2 files changed, 21 insertions(+), 69 deletions(-)

diff --git a/src/backend/nodes/queryjumblefuncs.c 
b/src/backend/nodes/queryjumblefuncs.c
index 281907a4d8..9094ea02d8 100644
--- a/src/backend/nodes/queryjumblefuncs.c
+++ b/src/backend/nodes/queryjumblefuncs.c
@@ -347,51 +347,3 @@ _jumbleA_Const(JumbleState *jstate, Node *node)
}
}
 }
-
-static void
-_jumbleRangeTblEntry(JumbleState *jstate, Node *node)
-{
-   RangeTblEntry *expr = (RangeTblEntry *) node;
-
-   JUMBLE_FIELD(rtekind);
-   switch (expr->rtekind)
-   {
-   case RTE_RELATION:
-   JUMBLE_FIELD(relid);
-   JUMBLE_NODE(tablesample);
-   JUMBLE_FIELD(inh);
-   break;
-   case RTE_SUBQUERY:
-   JUMBLE_NODE(subquery);
-   break;
-   case RTE_JOIN:
-   JUMBLE_FIELD(jointype);
-   break;
-   case RTE_FUNCTION

Re: Emitting JSON to file using COPY TO

Joe Conway  writes:
> I'll see if I can add some caching to composite_to_json(), but based on 
> the relative data size it does not sound like there is much performance 
> left on the table to go after, no?

If Nathan's perf results hold up elsewhere, it seems like some
micro-optimization around the text-pushing (appendStringInfoString)
might be more useful than caching.  The 7% spent in cache lookups
could be worth going after later, but it's not the top of the list.

The output size difference does say that maybe we should pay some
attention to the nearby request to not always label every field.
Perhaps there should be an option for each row to transform to
a JSON array rather than an object?

regards, tom lane

Re: gai_strerror() is not thread-safe on Windows

2023-12-06 Thread Thomas Munro

On Tue, Dec 5, 2023 at 3:43 PM Kyotaro Horiguchi
 wrote:
> At Tue, 5 Dec 2023 08:26:54 +1300, Thomas Munro  
> wrote in
> > On second thoughts, I guess it would make more sense to use the exact
> > messages Windows' own implementation would return instead of whatever
> > we had in the past (probably cribbed from some other OS or just made
> > up?).  I asked CI to spit those out[1].  Updated patch attached.  Will
> > add to CF.
> >
> > [1] https://cirrus-ci.com/task/5816802207334400?logs=main#L15
>
> Windows' gai_strerror outputs messages that correspond to the language
> environment. Similarly, I think that the messages that the messages
> returned by our version should be translatable.

Hmm, that is a good point.  Wow, POSIX has given us a terrible
interface here, in terms of resource management.  Let's see what glibc
does:

https://github.com/lattera/glibc/blob/master/sysdeps/posix/gai_strerror.c
https://github.com/lattera/glibc/blob/master/sysdeps/posix/gai_strerror-strs.h

It doesn't look like it knows about locales at all.  And a test
program seems to confirm:

#include 
#include 
#include 
int main()
{
setlocale(LC_MESSAGES, "ja_JP.UTF-8");
printf("%s\n", gai_strerror(EAI_MEMORY));
}

That prints:

Memory allocation failure

FreeBSD tries harder, and prints:

メモリ割り当て失敗

We can see that it has a thread-local variable that holds a copy of
that localised string until the next call to gai_strerror() in the
same thread:

https://github.com/freebsd/freebsd-src/blob/main/lib/libc/net/gai_strerror.c
https://github.com/freebsd/freebsd-src/blob/main/lib/libc/nls/ja_JP.UTF-8.msg

FreeBSD's message catalogues would provide a read-made source of
translations, bu... hmm, if glibc doesn't bother and the POSIX
interface is unhelpful and Windows' own implementation is so willfully
unusable, I don't really feel inclined to build a whole thread-local
cache thing on our side just to support this mess.

So I think we should just hard-code the error messages in English and
move on.  However, English is my language so perhaps I should abstain
and leave it to others to decide how important that is.

> These messages may add extra line-end periods to the parent (or
> cotaining) messages when appended. This looks as follows.
>
> (auth.c:517 : errdetail_log() : sub (detail) message)
> > Could not translate client host name "hoge" to IP address: An address 
> > incompatible with the requested protocol was used..
>
> (hba.c:1562 : errmsg() : main message)
> > invalid IP address "192.0.2.1": This is usually a temporary error during 
> > hostname resolution and means that the local server did not receive a 
> > response from an authoritative server.
>
> When I first saw the first version, I thought it would be better to
> use Windows' own messages, just like you did. However, considering the
> content of the message above, wouldn't it be better to adhere to
> Linux-style messages overall?

Yeah, I agree that either the glibc or the FreeBSD messages would be
better than those now that I've seen them.  They are short and sweet.

> A slightly subtler point is that the second example seems to have a
> misalignment between the descriptions before and after the colon, but
> do you think it's not something to be concerned about to this extent?

I didn't understand what you meant here.

Re: Remove MSVC scripts from the tree




On 2023-12-06 We 12:24, Peter Eisentraut wrote:

On 06.12.23 17:27, Andrew Dunstan wrote:
But the VS2019 compiler implements enough of C99 to pass our meson 
test, unlike VS2017. Maybe the test is too strict. After all, we know 
we can in fact build with the earlier versions.


I just realized that the C99 test is actually our own, not provided by 
meson.  (See "c99_test" in meson.build.)


Can you try disabling a few bits of that to see what makes it pass for 
you?  I suspect it's the structfunc() call.



Yes, if I comment out the call to structfunc() the test passes on VS2017 
(compiler version 19.15.26726)



cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: Emitting JSON to file using COPY TO




On 2023-12-06 We 15:20, Tom Lane wrote:

Joe Conway  writes:

I'll see if I can add some caching to composite_to_json(), but based on
the relative data size it does not sound like there is much performance
left on the table to go after, no?

If Nathan's perf results hold up elsewhere, it seems like some
micro-optimization around the text-pushing (appendStringInfoString)
might be more useful than caching.  The 7% spent in cache lookups
could be worth going after later, but it's not the top of the list.

The output size difference does say that maybe we should pay some
attention to the nearby request to not always label every field.
Perhaps there should be an option for each row to transform to
a JSON array rather than an object?





I doubt it. People who want this are likely to want pretty much what 
this patch is providing, not something they would have to transform in 
order to get it. If they want space-efficient data they won't really be 
wanting JSON. Maybe they want Protocol Buffers or something in that vein.


I see there's  nearby proposal to make this area pluggable at 




cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Reducing output size of nodeToString

Hi,

PFA a patch that reduces the output size of nodeToString by 50%+ in
most cases (measured on pg_rewrite), which on my system reduces the
total size of pg_rewrite by 33% to 472KiB. This does keep the textual
pg_node_tree format alive, but reduces its size signficantly.

The basic techniques used are
- Don't emit scalar fields when they contain a default value, and
make the reading code aware of this.
- Reasonable defaults are set for most datatypes, and overrides can
be added with new pg_node_attr() attributes. No introspection into
non-null Node/Array/etc. is being done though.
- Reset more fields to their default values before storing the values.
- Don't write trailing 0s in outDatum calls for by-ref types. This
saves many bytes for Name fields, but also some other pre-existing
entry points.

Future work will probably have to be on a significantly different
storage format, as the textual format is about to hit its entropy
limits.

See also [0], [1] and [2], where complaints about the verbosity of
nodeToString were vocalized.

Kind regards,

Matthias van de Meent

[0]
https://www.postgresql.org/message-id/CAEze2WgGexDM63dOvndLdAWwA6uSmSsc97jmrCuNmrF1JEDK7w%40mail.gmail.com
[1]
https://www.postgresql.org/message-id/flat/CACxu%3DvL_SD%3DWJiFSJyyBuZAp_2v_XBqb1x9JBiqz52a_g9z3jA%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/4b27fc50-8cd6-46f5-ab20-88dbaadca645%40eisentraut.org

v0-0001-Reduce-the-size-of-serialized-nodes-in-nodeToStri.patch
Description: Binary data

Re: automating RangeTblEntry node support

On Wed, 6 Dec 2023 at 21:02, Peter Eisentraut  wrote:
>
> I have been looking into what it would take to get rid of the
> custom_read_write and custom_query_jumble for the RangeTblEntry node
> type.  This is one of the larger and more complex exceptions left.
> [...]
> Now one could probably rightfully complain that having all these unused
> fields dumped would make the RangeTblEntry serialization bigger.  I'm
> not sure who big of a problem that actually is, considering how many
> often-unset fields other node types have.  But it deserves some
> consideration.  I think the best way to work around that would be to
> have a mode that omits fields that have their default value (zero).
> This would be more generally useful; for example Query also has a bunch
> of fields that are not often set.  I think this would be pretty easy to
> implement, for example like

Actually, I've worked on this last weekend, and got some good results.
It did need some fine-tuning and field annotations, but got raw
nodeToString sizes down 50%+ for the pg_rewrite table's ev_action
column, and compressed-with-pglz size of pg_rewrite total down 30%+.

> #define WRITE_INT_FIELD(fldname) \
>  if (full_mode || node->fldname) \
>  appendStringInfo(str, " :" CppAsString(fldname) " %d",
> node->fldname)
>
> There is also the discussion over at [0] about larger redesigns of the
> node serialization format.  I'm also interested in that, but here I'm
> mainly trying to remove more special cases to make that kind of work
> easier in the future.
>
> Any thoughts about the direction?

I've created a new thread [0] with my patch. It actually didn't need
_that_ many manual changes - most of it was just updating the
gen_node_support.pl code generation, and making the macros do a good
job.

In general I'm all for reducing special cases, so +1 on the idea. I'll
have to check the specifics of the patches at a later point in time.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Emitting JSON to file using COPY TO


On 12/6/23 11:28, Sehrope Sarkuni wrote:

Big +1 to this overall feature.


cool!

Regarding the defaults for the output, I think JSON lines (rather than a 
JSON array of objects) would be preferred. It's more natural to combine 
them and generate that type of data on the fly rather than forcing 
aggregation into a single object.


So that is +2 (Sehrope and me) for the status quo (JSON lines), and +2 
(Andrew and Davin) for defaulting to json arrays. Anyone else want to 
weigh in on that issue?


Couple more features / use cases come to mind as well. Even if they're 
not part of a first round of this feature I think it'd be helpful to 
document them now as it might give some ideas for what does make that 
first cut:


1. Outputting a top level JSON object without the additional column 
keys. IIUC, the top level keys are always the column names. A common use 
case would be a single json/jsonb column that is already formatted 
exactly as the user would like for output. Rather than enveloping it in 
an object with a dedicated key, it would be nice to be able to output it 
directly. This would allow non-object results to be outputted as well 
(e.g., lines of JSON arrays, numbers, or strings). Due to how JSON is 
structured, I think this would play nice with the JSON lines v.s. array 
concept.


COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM 
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON, 
SOME_OPTION_TO_NOT_ENVELOPE)

{"foo":1}
{"foo":2}
{"foo":3}


Your example does not match what you describe, or do I misunderstand? I 
thought your goal was to eliminate the repeated "foo" from each row...


2. An option to ignore null fields so they are excluded from the output. 
This would not be a default but would allow shrinking the total size of 
the output data in many situations. This would be recursive to allow 
nested objects to be shrunk down (not just the top level). This might be 
worthwhile as a standalone JSON function though handling it during 
output would be more efficient as it'd only be read once.


COPY (SELECT json_build_object('foo', CASE WHEN x > 1 THEN x END) FROM 
generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON, 
SOME_OPTION_TO_NOT_ENVELOPE, JSON_SKIP_NULLS)

{}
{"foo":2}
{"foo":3}


clear enough I think

3. Reverse of #2 when copying data in to allow defaulting missing fields 
to NULL.


good to record the ask, but applies to a different feature (COPY FROM 
instead of COPY TO).


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Emitting JSON to file using COPY TO

2023-12-06 Thread Sehrope Sarkuni

On Wed, Dec 6, 2023 at 4:03 PM Andrew Dunstan  wrote:

> > The output size difference does say that maybe we should pay some
> > attention to the nearby request to not always label every field.
> > Perhaps there should be an option for each row to transform to
> > a JSON array rather than an object?
>
> I doubt it. People who want this are likely to want pretty much what
> this patch is providing, not something they would have to transform in
> order to get it. If they want space-efficient data they won't really be
> wanting JSON. Maybe they want Protocol Buffers or something in that vein.
>

For arrays v.s. objects, it's not just about data size. There are plenty of
situations where a JSON array is superior to an object (e.g. duplicate
column names). Lines of JSON arrays of strings is pretty much CSV with JSON
escaping rules and a pair of wrapping brackets. It's common for tabular
data in node.js environments as you don't need a separate CSV parser.

Each one has its place and a default of the row_to_json(...) representation
of the row still makes sense. But if the user has the option of outputting
a single json/jsonb field for each row without an object or array wrapper,
then it's possible to support all of these use cases as the user can
explicitly pick whatever envelope makes sense:

-- Lines of JSON arrays:
COPY (SELECT json_build_array('test-' || a, b) FROM generate_series(1, 3)
a, generate_series(5,6) b) TO STDOUT WITH (FORMAT JSON,
SOME_OPTION_TO_DISABLE_ENVELOPE);
["test-1", 5]
["test-2", 5]
["test-3", 5]
["test-1", 6]
["test-2", 6]
["test-3", 6]

-- Lines of JSON strings:
COPY (SELECT to_json('test-' || x) FROM generate_series(1, 5) x) TO STDOUT
WITH (FORMAT JSON, SOME_OPTION_TO_DISABLE_ENVELOPE);
"test-1"
"test-2"
"test-3"
"test-4"
"test-5"

I'm not sure how I feel about the behavior being automatic if it's a single
top level json / jsonb field rather than requiring the explicit option.
It's probably what a user would want but it also feels odd to change the
output wrapper automatically based on the fields in the response. If it is
automatic and the user wants the additional envelope, the option always
exists to wrap it further in another: json_build_object('some_field",
my_field_i_want_wrapped)

The duplicate field names would be a good test case too. I haven't gone
through this patch but I'm guessing it doesn't filter out duplicates so the
behavior would match up with row_to_json(...), i.e. duplicates are
preserved:

=> SELECT row_to_json(t.*) FROM (SELECT 1 AS a, 2 AS a) t;
  row_to_json
---
 {"a":1,"a":2}

If so, that's a good test case to add as however that's handled should be
deterministic.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Re: Emitting JSON to file using COPY TO

On Wed, Dec 06, 2023 at 03:20:46PM -0500, Tom Lane wrote:
> If Nathan's perf results hold up elsewhere, it seems like some
> micro-optimization around the text-pushing (appendStringInfoString)
> might be more useful than caching.  The 7% spent in cache lookups
> could be worth going after later, but it's not the top of the list.

Agreed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Emitting JSON to file using COPY TO

2023-12-06 Thread Sehrope Sarkuni

On Wed, Dec 6, 2023 at 4:29 PM Joe Conway  wrote:

> > 1. Outputting a top level JSON object without the additional column
> > keys. IIUC, the top level keys are always the column names. A common use
> > case would be a single json/jsonb column that is already formatted
> > exactly as the user would like for output. Rather than enveloping it in
> > an object with a dedicated key, it would be nice to be able to output it
> > directly. This would allow non-object results to be outputted as well
> > (e.g., lines of JSON arrays, numbers, or strings). Due to how JSON is
> > structured, I think this would play nice with the JSON lines v.s. array
> > concept.
> >
> > COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
> > generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
> > SOME_OPTION_TO_NOT_ENVELOPE)
> > {"foo":1}
> > {"foo":2}
> > {"foo":3}
>
> Your example does not match what you describe, or do I misunderstand? I
> thought your goal was to eliminate the repeated "foo" from each row...
>

The "foo" in this case is explicit as I'm adding it when building the
object. What I was trying to show was not adding an additional object
wrapper / envelope.

So each row is:

{"foo":1}

Rather than:

"{"json_build_object":{"foo":1}}

If each row has exactly one json / jsonb field, then the user has already
indicated the format for each row.

That same mechanism can be used to remove the "foo" entirely via a
json/jsonb array.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

Re: Emitting JSON to file using COPY TO

On 12/6/23 16:42, Sehrope Sarkuni wrote:
On Wed, Dec 6, 2023 at 4:29 PM Joe Conway > wrote:

 > 1. Outputting a top level JSON object without the additional column
 > keys. IIUC, the top level keys are always the column names. A
common use
 > case would be a single json/jsonb column that is already formatted
 > exactly as the user would like for output. Rather than enveloping
it in
 > an object with a dedicated key, it would be nice to be able to
output it
 > directly. This would allow non-object results to be outputted as
well
 > (e.g., lines of JSON arrays, numbers, or strings). Due to how
JSON is
 > structured, I think this would play nice with the JSON lines v.s.
array
 > concept.
 >
 > COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM
 > generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON,
 > SOME_OPTION_TO_NOT_ENVELOPE)
 > {"foo":1}
 > {"foo":2}
 > {"foo":3}

Your example does not match what you describe, or do I misunderstand? I
thought your goal was to eliminate the repeated "foo" from each row...

The "foo" in this case is explicit as I'm adding it when building the 
object. What I was trying to show was not adding an additional object 
wrapper / envelope.

So each row is:

{"foo":1}

Rather than:

"{"json_build_object":{"foo":1}}

I am still getting confused ;-)

Let's focus on the current proposed patch with a "minimum required 
feature set".

Right now the default behavior is "JSON lines":
8<---
COPY (SELECT x.i, 'val' || x.i as v FROM
  generate_series(1, 3) x(i))
TO STDOUT WITH (FORMAT JSON);
{"i":1,"v":"val1"}
{"i":2,"v":"val2"}
{"i":3,"v":"val3"}
8<---

and the other, non-default option is "JSON array":
8<---
COPY (SELECT x.i, 'val' || x.i as v FROM
  generate_series(1, 3) x(i))
TO STDOUT WITH (FORMAT JSON, FORCE_ARRAY);
[
 {"i":1,"v":"val1"}
,{"i":2,"v":"val2"}
,{"i":3,"v":"val3"}
]
8<---

So the questions are:
1. Do those two formats work for the initial implementation?
2. Is the default correct or should it be switched
   e.g. rather than specifying FORCE_ARRAY to get an
   array, something like FORCE_NO_ARRAY to get JSON lines
   and the JSON array is default?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 3:38 PM Joe Conway  wrote:

> So the questions are:
> 1. Do those two formats work for the initial implementation?
>

Yes.  We provide a stream-oriented format and one atomic-import format.

2. Is the default correct or should it be switched
> e.g. rather than specifying FORCE_ARRAY to get an
> array, something like FORCE_NO_ARRAY to get JSON lines
> and the JSON array is default?
>
>
No default?

Require explicit of a sub-format when the main format is JSON.

JSON_OBJECT_ROWS
JSON_ARRAY_OF_OBJECTS

For a future compact array-structured-composites sub-format:
JSON_ARRAY_OF_ARRAYS
JSON_ARRAY_ROWS

David J.

Re: Emitting JSON to file using COPY TO


On 12/6/23 14:47, Joe Conway wrote:

On 12/6/23 13:59, Daniel Verite wrote:

Andrew Dunstan wrote:

IMNSHO, we should produce either a single JSON 
document (the ARRAY case) or a series of JSON documents, one per row 
(the LINES case).


"COPY Operations" in the doc says:

" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone".

In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):

PostgreSQL
 Type: CopyOut response
 Length: 13
 Format: Text (0)
 Columns: 3
Format: Text (0)
PostgreSQL
 Type: Copy data
 Length: 6
 Copy data: 5b0a
PostgreSQL
 Type: Copy data
 Length: 76
 Copy data:
207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…

The first Copy data message with contents "5b0a" does not qualify
as a row of data with 3 columns as advertised in the CopyOut
message. Isn't that a problem?



Is it a real problem, or just a bit of documentation change that I missed?

Anything receiving this and looking for a json array should know how to
assemble the data correctly despite the extra CopyData messages.


Hmm, maybe the real problem here is that Columns do not equal "3" for 
the json mode case -- that should really say "1" I think, because the 
row is not represented as 3 columns but rather 1 json object.


Does that sound correct?

Assuming yes, there is still maybe an issue that there are two more 
"rows" that actual output rows (the "[" and the "]"), but maybe those 
are less likely to cause some hazard?


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Clean up some signal usage mainly related to Windows

On Wed, Dec 06, 2023 at 11:30:02AM -0600, Nathan Bossart wrote:
> On Wed, Dec 06, 2023 at 06:27:04PM +0100, Peter Eisentraut wrote:
>> Makes sense.  Can you commit that?
> 
> Yes, I will do so shortly.

Committed.  Apologies for the delay.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 4:09 PM Joe Conway  wrote:

> On 12/6/23 14:47, Joe Conway wrote:
> > On 12/6/23 13:59, Daniel Verite wrote:
> >>  Andrew Dunstan wrote:
> >>
> >>> IMNSHO, we should produce either a single JSON
> >>> document (the ARRAY case) or a series of JSON documents, one per row
> >>> (the LINES case).
> >>
> >> "COPY Operations" in the doc says:
> >>
> >> " The backend sends a CopyOutResponse message to the frontend, followed
> >> by zero or more CopyData messages (always one per row), followed by
> >> CopyDone".
> >>
> >> In the ARRAY case, the first messages with the copyjsontest
> >> regression test look like this (tshark output):
> >>
> >> PostgreSQL
> >>  Type: CopyOut response
> >>  Length: 13
> >>  Format: Text (0)
> >>  Columns: 3
> >>  Format: Text (0)
>
> > Anything receiving this and looking for a json array should know how to
> > assemble the data correctly despite the extra CopyData messages.
>
> Hmm, maybe the real problem here is that Columns do not equal "3" for
> the json mode case -- that should really say "1" I think, because the
> row is not represented as 3 columns but rather 1 json object.
>
> Does that sound correct?
>
> Assuming yes, there is still maybe an issue that there are two more
> "rows" that actual output rows (the "[" and the "]"), but maybe those
> are less likely to cause some hazard?
>
>
What is the limitation, if any, of introducing new type codes for these.  n
= 2..N for the different variants?  Or even -1 for "raw text"?  And
document that columns and structural rows need to be determined
out-of-band.  Continuing to use 1 (text) for this non-csv data seems like a
hack even if we can technically make it function.  The semantics,
especially for the array case, are completely discarded or wrong.

David J.

Re: PATCH: Add REINDEX tag to event triggers

On Wed, Dec 06, 2023 at 10:00:01AM +0300, Alexander Lakhin wrote:
> I agree with it. I had worried a bit about ReindexRelationConcurrently()
> becoming twofold for callers (it can leave the snapshot or pop it), but I
> couldn't find a way to hide this twofoldness inside without adding more
> complexity. On the other hand, ReindexRelationConcurrently() now satisfies
> EnsurePortalSnapshotExists() in all cases.

Thanks, applied that with a few more tests, covering a bit more than
the code path you've reported with a failure.

I was wondering if this should be backpatched, actually, but could not
make a case for it as we've never needed a snapshot after a reindex
until now, AFAIK.
--
Michael


signature.asc
Description: PGP signature

Re: RFC: Logging plan of the running query

2023-12-06 Thread Rafael Thofehrn Castro

Hello hackers,

Last Saturday I submitted a patch to the pgsql-hackers list with the title
"Proposal: In-flight explain logging" with a patch proposing a feature very
similar to the one being worked on in this thread. I should have done a
better
search in the commitfest before implementing something from scratch.

So, as recommended by Ashutosh, I am sending an incremental patch containing
an additional feature I personally think we should include: logging the plan
with instrumentation details if enabled.

When targeting a query with instrumentation PG should log the complete
EXPLAIN ANALYZE plan with current row count and, if enabled, timing for each
node. This gives the user not only the ability to see what the plan is
but also what was executed so far, which is super useful when
troubleshooting queries that never finish.

Considering that the query is in progress the output will include the
statement (never executed) for nodes that weren't touched yet (or may
never be). This feature is already present in the current ExplainNode
implementation.

I added a new statement (in progress) for nodes currently being executed,
ie,
InstrStartNode was called and clock is ticking there.

Back-reading this thread I saw the discussion about extending
pg_log_query_plan
to parallel workers or not. I added it in my incremental patch as the new
capability of logging instrumentation makes it useful to be able to see
what parallel workers are currently doing.

# DEMONSTRATION:

postgres=# select pid, backend_type,pg_log_query_plan(pid)
postgres=# from pg_stat_activity
postgres=# where (backend_type = 'client backend' and pid !=
pg_backend_pid())
postgres=#or backend_type = 'parallel worker';
  pid  |  backend_type   | pg_log_query_plan
---+-+---
 33833 | client backend  | t
 47202 | parallel worker | t
 47203 | parallel worker | t
(3 rows)

2023-12-06 23:14:41.756 UTC [33833] LOG:  query plan running on backend
with PID 33833 is:
Query Text: explain (analyze, buffers) select *
from t2 a
inner join t1 b on a.c1=b.c1
inner join t1 c on a.c1=c.c1
inner join t1 d on a.c1=d.c1
inner join t1 e on a.c1=e.c1;
Gather  (cost=70894.63..202643.27 rows=100 width=20) (never executed)
 Output: a.c1, b.c1, c.c1, d.c1, e.c1
 Workers Planned: 2
 Workers Launched: 2
 ->  Parallel Hash Join  (cost=69894.63..101643.27 rows=416667 width=20)
(never executed)
   Output: a.c1, b.c1, c.c1, d.c1, e.c1
   Hash Cond: (a.c1 = e.c1)
   ->  Parallel Hash Join  (cost=54466.62..77218.65 rows=416667
width=16) (never executed)
 Output: a.c1, b.c1, c.c1, d.c1
 Hash Cond: (a.c1 = c.c1)
 ->  Parallel Hash Join  (cost=15428.00..29997.42 rows=416667
width=8) (actual time=2500.914..2625.922 rows=250412 loops=1) (in progress)
   Output: a.c1, b.c1
   Inner Unique: true
   Hash Cond: (b.c1 = a.c1)
   Buffers: shared hit=755 read=2175, temp read=1473
written=1860
   ->  Parallel Seq Scan on public.t1 b
 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.022..20.904
rows=331492 loops=1)
 Output: b.c1
 Buffers: shared hit=602 read=865
   ->  Parallel Hash  (cost=8591.67..8591.67 rows=416667
width=4) (actual time=1745.107..1745.107 rows=330638 loops=1)
 Output: a.c1
 Buffers: shared hit=153 read=1310, temp written=868
 ->  Parallel Seq Scan on public.t2 a
 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.042..27.695
rows=330638 loops=1)
   Output: a.c1
   Buffers: shared hit=153 read=1310
 ->  Parallel Hash  (cost=32202.28..32202.28 rows=416667
width=8) (actual time=2450.489..2450.489 rows=407941 loops=1)
   Output: c.c1, d.c1
   Buffers: shared hit=1141 read=1833, temp read=1938
written=2836
   ->  Parallel Hash Join  (cost=15428.00..32202.28
rows=416667 width=8) (actual time=1323.422..1575.245 rows=407941 loops=1)
 Output: c.c1, d.c1
 Hash Cond: (c.c1 = d.c1)
 Buffers: shared hit=1141 read=1833, temp read=1938
written=1864
 ->  Parallel Seq Scan on public.t1 c
 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.026..22.223
rows=336238 loops=1)
   Output: c.c1
   Buffers: shared hit=590 read=898
 ->  Parallel Hash  (cost=8591.67..8591.67
rows=416667 width=4) (actual time=653.306..653.306 rows=335836 loops=1)
   Output: d.c1
   Buffers: shared hit=551 read=935, temp
written=872
   ->  Parallel Seq Scan on public.t1 d
 (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.022..23.12

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 4:28 PM David G. Johnston 
wrote:

> On Wed, Dec 6, 2023 at 4:09 PM Joe Conway  wrote:
>
>> On 12/6/23 14:47, Joe Conway wrote:
>> > On 12/6/23 13:59, Daniel Verite wrote:
>> >>  Andrew Dunstan wrote:
>> >>
>> >>> IMNSHO, we should produce either a single JSON
>> >>> document (the ARRAY case) or a series of JSON documents, one per row
>> >>> (the LINES case).
>> >>
>> >> "COPY Operations" in the doc says:
>> >>
>> >> " The backend sends a CopyOutResponse message to the frontend, followed
>> >> by zero or more CopyData messages (always one per row), followed by
>> >> CopyDone".
>> >>
>> >> In the ARRAY case, the first messages with the copyjsontest
>> >> regression test look like this (tshark output):
>> >>
>> >> PostgreSQL
>> >>  Type: CopyOut response
>> >>  Length: 13
>> >>  Format: Text (0)
>> >>  Columns: 3
>> >>  Format: Text (0)
>>
>> > Anything receiving this and looking for a json array should know how to
>> > assemble the data correctly despite the extra CopyData messages.
>>
>> Hmm, maybe the real problem here is that Columns do not equal "3" for
>> the json mode case -- that should really say "1" I think, because the
>> row is not represented as 3 columns but rather 1 json object.
>>
>> Does that sound correct?
>>
>> Assuming yes, there is still maybe an issue that there are two more
>> "rows" that actual output rows (the "[" and the "]"), but maybe those
>> are less likely to cause some hazard?
>>
>>
> What is the limitation, if any, of introducing new type codes for these.
> n = 2..N for the different variants?  Or even -1 for "raw text"?  And
> document that columns and structural rows need to be determined
> out-of-band.  Continuing to use 1 (text) for this non-csv data seems like a
> hack even if we can technically make it function.  The semantics,
> especially for the array case, are completely discarded or wrong.
>
>
Also, it seems like this answer would be easier to make if we implement
COPY FROM now since how is the server supposed to deal with decomposing
this data into tables without accurate type information?  I don't see
implementing only half of the feature being a good idea.  I've had much
more desire for FROM compared to TO personally.

David J.

Re: Emitting JSON to file using COPY TO

On 12/6/23 18:28, David G. Johnston wrote:
On Wed, Dec 6, 2023 at 4:09 PM Joe Conway > wrote:

On 12/6/23 14:47, Joe Conway wrote:
 > On 12/6/23 13:59, Daniel Verite wrote:
 >>      Andrew Dunstan wrote:
 >>
 >>> IMNSHO, we should produce either a single JSON
 >>> document (the ARRAY case) or a series of JSON documents, one
per row
 >>> (the LINES case).
 >>
 >> "COPY Operations" in the doc says:
 >>
 >> " The backend sends a CopyOutResponse message to the frontend,
followed
 >>     by zero or more CopyData messages (always one per row),
followed by
 >>     CopyDone".
 >>
 >> In the ARRAY case, the first messages with the copyjsontest
 >> regression test look like this (tshark output):
 >>
 >> PostgreSQL
 >>      Type: CopyOut response
 >>      Length: 13
 >>      Format: Text (0)
 >>      Columns: 3
 >>      Format: Text (0)

 > Anything receiving this and looking for a json array should know
how to
 > assemble the data correctly despite the extra CopyData messages.

Hmm, maybe the real problem here is that Columns do not equal "3" for
the json mode case -- that should really say "1" I think, because the
row is not represented as 3 columns but rather 1 json object.

Does that sound correct?

Assuming yes, there is still maybe an issue that there are two more
"rows" that actual output rows (the "[" and the "]"), but maybe those
are less likely to cause some hazard?

What is the limitation, if any, of introducing new type codes for 
these.  n = 2..N for the different variants?  Or even -1 for "raw 
text"?  And document that columns and structural rows need to be 
determined out-of-band.  Continuing to use 1 (text) for this non-csv 
data seems like a hack even if we can technically make it function.  The 
semantics, especially for the array case, are completely discarded or wrong.

I am not following you here. SendCopyBegin looks like this currently:

8<
SendCopyBegin(CopyToState cstate)
{
StringInfoData buf;
int natts = list_length(cstate->attnumlist);
int16   format = (cstate->opts.binary ? 1 : 0);
int i;

pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format);  /* overall format */
pq_sendint16(&buf, natts);
for (i = 0; i < natts; i++)
pq_sendint16(&buf, format); /* per-column formats */
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
}
8<

The "1" is saying are we binary mode or not. JSON mode will never be 
sending in binary in the current implementation at least. And it always 
aggregates all the columns as one json object. So the correct answer is 
(I think):

8<
*** SendCopyBegin(CopyToState cstate)
*** 146,154 

pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format);  /* overall format */
!   pq_sendint16(&buf, natts);
!   for (i = 0; i < natts; i++)
!   pq_sendint16(&buf, format); /* per-column formats */
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
  }
--- 150,169 

pq_beginmessage(&buf, PqMsg_CopyOutResponse);
pq_sendbyte(&buf, format);  /* overall format */
!   if (!cstate->opts.json_mode)
!   {
!   pq_sendint16(&buf, natts);
!   for (i = 0; i < natts; i++)
!   pq_sendint16(&buf, format); /* per-column formats */
!   }
!   else
!   {
!   /*
!* JSON mode is always one non-binary column
!*/
!   pq_sendint16(&buf, 1);
!   pq_sendint16(&buf, 0);
!   }
pq_endmessage(&buf);
cstate->copy_dest = COPY_FRONTEND;
  }
8<

That still leaves the need to fix the documentation:

" The backend sends a CopyOutResponse message to the frontend, followed
   by zero or more CopyData messages (always one per row), followed by
   CopyDone"

probably "always one per row" would be changed to note that json array 
format outputs two extra rows for the start/end bracket.

In fact, as written the patch does this:
8<
COPY (SELECT x.i, 'val' || x.i as v FROM
  generate_series(1, 3) x(i) WHERE false)
TO STDOUT WITH (FORMAT JSON, FORCE_ARRAY);
[
]
8<

Not sure if that is a problem or not.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Emitting JSON to file using COPY TO

2023-12-06 Thread fujii.y...@df.mitsubishielectric.co.jp

On 12/6/23 18:38, David G. Johnston wrote:
On Wed, Dec 6, 2023 at 4:28 PM David G. Johnston 
mailto:david.g.johns...@gmail.com>> wrote:

On Wed, Dec 6, 2023 at 4:09 PM Joe Conway mailto:m...@joeconway.com>> wrote:

On 12/6/23 14:47, Joe Conway wrote:
 > On 12/6/23 13:59, Daniel Verite wrote:
 >>      Andrew Dunstan wrote:
 >>
 >>> IMNSHO, we should produce either a single JSON
 >>> document (the ARRAY case) or a series of JSON documents,
one per row
 >>> (the LINES case).
 >>
 >> "COPY Operations" in the doc says:
 >>
 >> " The backend sends a CopyOutResponse message to the
frontend, followed
 >>     by zero or more CopyData messages (always one per row),
followed by
 >>     CopyDone".
 >>
 >> In the ARRAY case, the first messages with the copyjsontest
 >> regression test look like this (tshark output):
 >>
 >> PostgreSQL
 >>      Type: CopyOut response
 >>      Length: 13
 >>      Format: Text (0)
 >>      Columns: 3
 >>      Format: Text (0)

 > Anything receiving this and looking for a json array should
know how to
 > assemble the data correctly despite the extra CopyData messages.

Hmm, maybe the real problem here is that Columns do not equal
"3" for
the json mode case -- that should really say "1" I think,
because the
row is not represented as 3 columns but rather 1 json object.

Does that sound correct?

Assuming yes, there is still maybe an issue that there are two more
"rows" that actual output rows (the "[" and the "]"), but maybe
those
are less likely to cause some hazard?

What is the limitation, if any, of introducing new type codes for
these.  n = 2..N for the different variants?  Or even -1 for "raw
text"?  And document that columns and structural rows need to be
determined out-of-band.  Continuing to use 1 (text) for this non-csv
data seems like a hack even if we can technically make it function. 
The semantics, especially for the array case, are completely

discarded or wrong.

Also, it seems like this answer would be easier to make if we implement 
COPY FROM now since how is the server supposed to deal with decomposing 
this data into tables without accurate type information?  I don't see 
implementing only half of the feature being a good idea.  I've had much 
more desire for FROM compared to TO personally.

Several people have weighed in on the side of getting COPY TO done by 
itself first. Given how long this discussion has already become for a 
relatively small and simple feature, I am a big fan of not expanding the 
scope now.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

pg16 && GSSAPI && Heimdal/Macos

2023-12-06 Thread kovert

Earlier this year, there was a thread about GSSAPI for delegated
credentials and various operating systems ultimately that Heimdal had
atrophied enough that you were comfortable not supporting it anymore as
a GSSAPI library.

Thread:
https://www.postgresql.org/message-id/flat/ZDFTailRZzyGdbXl%40tamriel.snowman.net#7b4b7354bc3ea060fb26d51565f0ad67

In https://www.postgresql.org/message-id/3598083.1680976022%40sss.pgh.pa.us,
Tom Lane said:

 > I share your feeling that we could probably blow off Apple's built-in
 > GSSAPI.  MacPorts offers both Heimdal and kerberos5, and I imagine
 > Homebrew has at least one of them, so Mac people could easily get
 > hold of newer implementations.

I wanted to follow up on the decision to blow off Apple's built-in
GSSAPI.  Years back, for reasons I never found, Apple switched from MIT
to Heimdal and have been maintaining their own version of it.  I'm not
clear how well they maintain it but they have enhanced it.

One of the things that Apple put it in was a different centralized
credentials cache system. (named of the form "API:uuid").  This isn't
in Heimdal nor is it in MIT, so typical kerberos tickets issued by the
Apple provide Kerberos libraries are not accessible via other kerberos
versions provided by homebrew/macports/etc. (netbsd pkgsrc on macos can
be told to use the system libraries, which is what I do).  Installing a
parallel version makes the client experience awful since it means having
to manage two sets of tickets and ticket caches, and which one gets used
varies depending on what libraries they were linked against.

As you may have surmised, I use a mac as a client and use gssapi pretty
heavily to interact with numerous postgresql databases.  This has stopped
me from upgrading my client side to 16.  I'm wondering if there's be any
willingness to reconsider heimdal support under some circumstances?

thanks,
-Todd

RE: [CAUTION!! freemail] Re: Partial aggregates pushdown

Hi Mr.Haas.

> -Original Message-
> From: Robert Haas 
> Sent: Wednesday, December 6, 2023 10:25 PM
> On Wed, Dec 6, 2023 at 3:41 AM fujii.y...@df.mitsubishielectric.co.jp
>  wrote:
> > Are you concerned about the hassle and potential human errors of
> > manually adding new partial aggregation functions, rather than the catalog 
> > becoming bloated?
> 
> I'm concerned about both.
Understood. Thank you for your response.

> > The process of creating partial aggregation functions from aggregation
> > functions can be automated, so I believe this issue can be resolved.
> > However, automating it may increase the size of the patch even more, so 
> > overall, approach#2 might be better.
> > To implement approach #2, it would be necessary to investigate how much 
> > additional code is required.
> 
> Yes. Unfortunately I fear that there is quite a lot of work left to do here 
> in order to produce a committable feature. To me it
> seems necessary to conduct an investigation of approach #2. If the result of 
> that investigation is that nothing major
> stands in the way of approach #2, then I think we should adopt it, which is 
> more work. In addition, the problems around
> transmitting serialized bytea blobs between machines that can't be assumed to 
> fully trust each other will need to be
> addressed in some way, which seems like it will require a good deal of design 
> work, forming some kind of consensus, and
> then implementation work to follow. In addition to that there may be some 
> small problems that need to be solved at a
> detail level, such as the HAVING issue. I think the last category won't be 
> too hard to sort out, but that still leaves two really
> major areas to address.
Yes, I agree with you. It is clear that further investigation and discussion 
are still needed. 
I would be grateful if we can resolve this issue gradually. I would also like 
to continue the discussion if possible in the future.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

Re: reindexing an invalid index should not use ERRCODE_INDEX_CORRUPTED

2023-12-06 Thread Noah Misch

On Wed, Dec 06, 2023 at 03:17:12PM +0900, Michael Paquier wrote:
> On Sat, Nov 18, 2023 at 04:32:36PM -0800, Noah Misch wrote:
> > On Sat, Nov 18, 2023 at 03:09:58PM -0800, Andres Freund wrote:
> >> Unfortunately, there is a case of such an sqlstate that's not at all 
> >> indicating
> >> corruption, namely REINDEX CONCURRENTLY when the index is invalid:
> >> 
> >> if (!indexRelation->rd_index->indisvalid)
> >> ereport(WARNING,
> >> (errcode(ERRCODE_INDEX_CORRUPTED),
> >>  errmsg("cannot reindex invalid index 
> >> \"%s.%s\" concurrently, skipping",
> >> 
> >> get_namespace_name(get_rel_namespace(cellOid)),
> >> get_rel_name(cellOid;
> >> 
> >> The only thing required to get to this is an interrupted CREATE INDEX
> >> CONCURRENTLY, which I don't think can be fairly characterized as 
> >> "corruption".
> >> 
> >> ISTM something like ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE would be more
> >> appropriate?
> > 
> > +1, that's a clear improvement.
> 
> The same thing can be said a couple of lines above where the code uses
> ERRCODE_FEATURE_NOT_SUPPORTED but your suggestion of
> ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE would be better.
> 
> Would the attached be OK for you?

Okay.

> > The "cannot" part of the message is also inaccurate, and it's not clear to 
> > me
> > why we have this specific restriction at all.  REINDEX INDEX CONCURRENTLY
> > accepts such indexes, so I doubt it's an implementation gap.
> 
> If you would reword that, what would you change?

I'd do "skipping reindex of invalid index \"%s.%s\"".  If one wanted more,
errhint("Use DROP INDEX or REINDEX INDEX.") would fit.

Re: Make COPY format extendable: Extract COPY TO format implementations

On Wed, Dec 06, 2023 at 10:07:51PM +0800, Junwang Zhao wrote:
> I read the thread[1] you posted and I think Andres's suggestion sounds great.
> 
> Should we extract both *copy to* and *copy from* for the first step, in that
> case we can add the pg_copy_handler catalog smoothly later.
> 
> Attached V4 adds 'extract copy from' and it passed the cirrus ci,
> please take a look.
> 
> I added a hook *copy_from_end* but this might be removed later if not used.
> 
> [1]: 
> https://www.postgresql.org/message-id/20180211211235.5x3jywe5z3lkgcsr%40alap3.anarazel.de

I was looking at the differences between v3 posted by Sutou-san and
v4 from you, seeing that:

+/* Routines for a COPY HANDLER implementation. */
+typedef struct CopyHandlerOps
 {
 /* Called when COPY TO is started. This will send a header. */
-void(*start) (CopyToState cstate, TupleDesc tupDesc);
+void(*copy_to_start) (CopyToState cstate, TupleDesc tupDesc);
 
 /* Copy one row for COPY TO. */
-void(*one_row) (CopyToState cstate, TupleTableSlot *slot);
+void(*copy_to_one_row) (CopyToState cstate, TupleTableSlot *slot);
 
 /* Called when COPY TO is ended. This will send a trailer. */
-void(*end) (CopyToState cstate);
-} CopyToFormatOps;
+void(*copy_to_end) (CopyToState cstate);
+
+void(*copy_from_start) (CopyFromState cstate, TupleDesc tupDesc);
+bool(*copy_from_next) (CopyFromState cstate, ExprContext *econtext,
+Datum *values, bool *nulls);
+void(*copy_from_error_callback) (CopyFromState cstate);
+void(*copy_from_end) (CopyFromState cstate);
+} CopyHandlerOps;

And we've spent a good deal of time refactoring the copy code so as
the logic behind TO and FROM is split.  Having a set of routines that
groups both does not look like a step in the right direction to me,
and v4 is an attempt at solving two problems, while v3 aims to improve
one case.  It seems to me that each callback portion should be focused
on staying in its own area of the code, aka copyfrom*.c or copyto*.c.
--
Michael


signature.asc
Description: PGP signature

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 4:45 PM Joe Conway  wrote:

>
> " The backend sends a CopyOutResponse message to the frontend, followed
> by zero or more CopyData messages (always one per row), followed by
> CopyDone"
>
> probably "always one per row" would be changed to note that json array
> format outputs two extra rows for the start/end bracket.
>

Fair, I was ascribing much more semantic meaning to this than it wants.

I don't see any real requirement, given the lack of semantics, to mention
JSON at all.  It is one CopyData per row, regardless of the contents.  We
don't delineate between the header and non-header data in CSV.  It isn't a
protocol concern.

But I still cannot shake the belief that using a format code of 1 - which
really could be interpreted as meaning "textual csv" in practice - for this
JSON output is unwise and we should introduce a new integer value for the
new fundamental output format.

David J.

Re: Is WAL_DEBUG related code still relevant today?

On Wed, Dec 06, 2023 at 09:46:09AM -0300, Euler Taveira wrote:
> On Wed, Dec 6, 2023, at 8:27 AM, Peter Eisentraut wrote:
>> This kind of thing could be mostly avoided if we didn't hide all the 
>> WAL_DEBUG behind #ifdefs.
> 
> AFAICS LOCK_DEBUG also hides its GUCs behind #ifdefs. The fact that XLOG_DEBUG
> is a variable but seems like a constant surprises me. I would rename it to
> XLogDebug or xlog_debug.

+1.  Or just wal_debug for greppability.

>> in the normal case.  That way, we don't need to wrap that in #ifdef 
>> WAL_DEBUG, and the compiler can see the disabled code and make sure it 
>> continues to build.
> 
> I didn't check the LOCK_DEBUG code path to make sure it fits in the same
> category as WAL_DEBUG. If it does, maybe it is worth to apply the same logic
> there.

PerformWalRecovery() with its log for RM_XACT_ID is something that
stresses me a bit though because this is in the main redo loop which
is never free.  The same can be said about GenericXLogFinish() because
the extra computation happens while holding a buffer and marking it
dirty.  The ones in xlog.c are free of charge as they are called
outside any critical portions.

This makes me wonder how much we need to care about
trace_recovery_messages, actually, and I've never used it.
--
Michael


signature.asc
Description: PGP signature

Re: Emitting JSON to file using COPY TO


On 12/6/23 19:39, David G. Johnston wrote:
On Wed, Dec 6, 2023 at 4:45 PM Joe Conway > wrote:



" The backend sends a CopyOutResponse message to the frontend, followed
     by zero or more CopyData messages (always one per row), followed by
     CopyDone"

probably "always one per row" would be changed to note that json array
format outputs two extra rows for the start/end bracket.


Fair, I was ascribing much more semantic meaning to this than it wants.

I don't see any real requirement, given the lack of semantics, to 
mention JSON at all.  It is one CopyData per row, regardless of the 
contents.  We don't delineate between the header and non-header data in 
CSV.  It isn't a protocol concern.


good point

But I still cannot shake the belief that using a format code of 1 - 
which really could be interpreted as meaning "textual csv" in practice - 
for this JSON output is unwise and we should introduce a new integer 
value for the new fundamental output format.


No, I am pretty sure you still have that wrong. The "1" means binary 
mode. As in

8<--
FORMAT

Selects the data format to be read or written: text, csv (Comma 
Separated Values), or binary. The default is text.

8<--

That is completely separate from text and csv. It literally means to use 
the binary output functions instead of the usual ones:


8<--
if (cstate->opts.binary)
getTypeBinaryOutputInfo(attr->atttypid,
&out_func_oid,
&isvarlena);
else
getTypeOutputInfo(attr->atttypid,
  &out_func_oid,
  &isvarlena);
8<--

Both "text" and "csv" mode use are non-binary output formats. I believe 
the JSON output format is also non-binary.


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 5:57 PM Joe Conway  wrote:

> On 12/6/23 19:39, David G. Johnston wrote:
> > On Wed, Dec 6, 2023 at 4:45 PM Joe Conway  > > wrote:
>
> > But I still cannot shake the belief that using a format code of 1 -
> > which really could be interpreted as meaning "textual csv" in practice -
> > for this JSON output is unwise and we should introduce a new integer
> > value for the new fundamental output format.
>
> No, I am pretty sure you still have that wrong. The "1" means binary
> mode

Ok.  I made the same typo twice, I did mean to write 0 instead of 1.  But
the point that we should introduce a 2 still stands.  The new code would
mean: use text output functions but that there is no inherent tabular
structure in the underlying contents.  Instead the copy format was JSON and
the output layout is dependent upon the json options in the copy command
and that there really shouldn't be any attempt to turn the contents
directly into a tabular data structure like you presently do with the CSV
data under format 0.  Ignore the column count and column formats as they
are fixed or non-existent.

David J.

Re: Emitting JSON to file using COPY TO


On 12/6/23 18:09, Joe Conway wrote:

On 12/6/23 14:47, Joe Conway wrote:

On 12/6/23 13:59, Daniel Verite wrote:

Andrew Dunstan wrote:

IMNSHO, we should produce either a single JSON 
document (the ARRAY case) or a series of JSON documents, one per row 
(the LINES case).


"COPY Operations" in the doc says:

" The backend sends a CopyOutResponse message to the frontend, followed
by zero or more CopyData messages (always one per row), followed by
CopyDone".

In the ARRAY case, the first messages with the copyjsontest
regression test look like this (tshark output):

PostgreSQL
 Type: CopyOut response
 Length: 13
 Format: Text (0)
 Columns: 3
Format: Text (0)
PostgreSQL
 Type: Copy data
 Length: 6
 Copy data: 5b0a
PostgreSQL
 Type: Copy data
 Length: 76
 Copy data:
207b226964223a312c226631223a226c696e652077697468205c2220696e2069743a2031…

The first Copy data message with contents "5b0a" does not qualify
as a row of data with 3 columns as advertised in the CopyOut
message. Isn't that a problem?



Is it a real problem, or just a bit of documentation change that I missed?

Anything receiving this and looking for a json array should know how to
assemble the data correctly despite the extra CopyData messages.


Hmm, maybe the real problem here is that Columns do not equal "3" for
the json mode case -- that should really say "1" I think, because the
row is not represented as 3 columns but rather 1 json object.

Does that sound correct?

Assuming yes, there is still maybe an issue that there are two more
"rows" that actual output rows (the "[" and the "]"), but maybe those
are less likely to cause some hazard?



The attached should fix the CopyOut response to say one column. I.e. it 
ought to look something like:


PostgreSQL
 Type: CopyOut response
 Length: 13
 Format: Text (0)
 Columns: 1
 Format: Text (0)
PostgreSQL
 Type: Copy data
 Length: 6
 Copy data: 5b0a
PostgreSQL
 Type: Copy data
 Length: 76
 Copy data: [...]


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18ecc69..8915fb3 100644
*** a/doc/src/sgml/ref/copy.sgml
--- b/doc/src/sgml/ref/copy.sgml
*** COPY { ta
*** 43,48 
--- 43,49 
  FORCE_QUOTE { ( column_name [, ...] ) | * }
  FORCE_NOT_NULL { ( column_name [, ...] ) | * }
  FORCE_NULL { ( column_name [, ...] ) | * }
+ FORCE_ARRAY [ boolean ]
  ENCODING 'encoding_name'
  
   
*** COPY { ta
*** 206,214 
--- 207,220 
Selects the data format to be read or written:
text,
csv (Comma Separated Values),
+   json (JavaScript Object Notation),
or binary.
The default is text.
   
+  
+   The json option is allowed only in
+   COPY TO.
+  
  
 
  
*** COPY { ta
*** 372,377 
--- 378,396 
   
  
 
+ 
+
+ FORCE_ARRAY
+ 
+  
+   Force output of square brackets as array decorations at the beginning
+   and end of output, and commas between the rows. It is allowed only in
+   COPY TO, and only when using
+   JSON format. The default is
+   false.
+  
+ 
+
  
 
  ENCODING
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfad47b..23b570f 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*** ProcessCopyOptions(ParseState *pstate,
*** 419,424 
--- 419,425 
  	bool		format_specified = false;
  	bool		freeze_specified = false;
  	bool		header_specified = false;
+ 	bool		force_array_specified = false;
  	ListCell   *option;
  
  	/* Support external use for option sanity checking */
*** ProcessCopyOptions(ParseState *pstate,
*** 443,448 
--- 444,451 
   /* default format */ ;
  			else if (strcmp(fmt, "csv") == 0)
  opts_out->csv_mode = true;
+ 			else if (strcmp(fmt, "json") == 0)
+ opts_out->json_mode = true;
  			else if (strcmp(fmt, "binary") == 0)
  opts_out->binary = true;
  			else
*** ProcessCopyOptions(ParseState *pstate,
*** 540,545 
--- 543,555 
  defel->defname),
  		 parser_errposition(pstate, defel->location)));
  		}
+ 		else if (strcmp(defel->defname, "force_array") == 0)
+ 		{
+ 			if (force_array_specified)
+ errorConflictingDefElem(defel, pstate);
+ 			force_array_specified = true;
+ 			opts_out->force_array = defGetBoolean(defel);
+ 		}
  		else if (strcmp(defel->defname, "convert_selectively") == 0)
  		{
  			/*
*** ProcessCopyOptions(ParseState *pstate,
*** 598,603 
--- 608,625 
  (errcode(ERRCODE_SYNTAX_ERROR),
   errmsg("cannot specify DEFAULT in BINARY mode")));
  
+ 	if (opts_out->json_mode)
+ 	{
+ 		if (is_from)
+ 			ereport(ERROR,
+ 	(errcode(ERRCODE_FEATURE_NOT

Re: Emitting JSON to file using COPY TO

On 12/6/23 20:09, David G. Johnston wrote:
On Wed, Dec 6, 2023 at 5:57 PM Joe Conway > wrote:

On 12/6/23 19:39, David G. Johnston wrote:
 > On Wed, Dec 6, 2023 at 4:45 PM Joe Conway mailto:m...@joeconway.com>
 > >> wrote:

 > But I still cannot shake the belief that using a format code of 1 -
 > which really could be interpreted as meaning "textual csv" in
practice -
 > for this JSON output is unwise and we should introduce a new integer
 > value for the new fundamental output format.

No, I am pretty sure you still have that wrong. The "1" means binary
mode

Ok.  I made the same typo twice, I did mean to write 0 instead of 1.

Fair enough.

But the point that we should introduce a 2 still stands.  The new code 
would mean: use text output functions but that there is no inherent 
tabular structure in the underlying contents.  Instead the copy format 
was JSON and the output layout is dependent upon the json options in the 
copy command and that there really shouldn't be any attempt to turn the 
contents directly into a tabular data structure like you presently do 
with the CSV data under format 0.  Ignore the column count and column 
formats as they are fixed or non-existent.

I think that amounts to a protocol change, which we tend to avoid at all 
costs.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Emitting JSON to file using COPY TO

On Wed, Dec 6, 2023 at 6:14 PM Joe Conway  wrote:

>
> > But the point that we should introduce a 2 still stands.  The new code
> > would mean: use text output functions but that there is no inherent
> > tabular structure in the underlying contents.  Instead the copy format
> > was JSON and the output layout is dependent upon the json options in the
> > copy command and that there really shouldn't be any attempt to turn the
> > contents directly into a tabular data structure like you presently do
> > with the CSV data under format 0.  Ignore the column count and column
> > formats as they are fixed or non-existent.
>
> I think that amounts to a protocol change, which we tend to avoid at all
> costs.
>
>
I wasn't sure on that point but figured it might be the case.  It is a
value change, not structural, which seems like it is the kind of
modification any living system might allow and be expected to have.  But I
also don't see any known problem with the current change of content
semantics without the format identification change.  Most of the relevant
context ends up out-of-band in the copy command itself.

David J.

Re: reindexing an invalid index should not use ERRCODE_INDEX_CORRUPTED