Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bindx

2024-01-16 Thread Anthonin Bonnefoy
Hi,

Thanks for the review and comments.

> One thing that I think should be added for completeness though is the
> ability to deallocate the prepared statement using
> PQsendClosePrepared. Other than that the changes look great.
Good point, I've added the \close command.

> Also a tiny nitpick: stmt_name should be replaced with STMT_NAME in
> this line of the help message.
Fixed


On Sat, Jan 13, 2024 at 3:37 PM Jelte Fennema-Nio  wrote:
>
> On Thu, 2 Nov 2023 at 10:52, Anthonin Bonnefoy
>  wrote:
> > The main goal is to provide more ways to test extended protocol in
> > regression tests
> > similarly to what \bind is doing.
>
> I think this is a great addition. One thing that I think should be
> added for completeness though is the ability to deallocate the
> prepared statement using PQsendClosePrepared. Other than that the
> changes look great.
>
> Also a tiny nitpick: stmt_name should be replaced with STMT_NAME in
> this line of the help message.
>
> > +   HELP0("  \\bindx stmt_name [PARAM]...\n"


v2-0001-psql-Add-support-for-prepared-stmt-with-extended-.patch
Description: Binary data


Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bindx

2024-01-17 Thread Anthonin Bonnefoy
> I do realize the same is true for plain \bind, but it seems
> like a bug there too.

The unscanned bind's parameters are discarded later in the
HandleSlashCmds functions. So adding the ignore_slash_options() for
inactive branches scans and discards them earlier. I will add it to
match what's done in the other commands but I don't think it's
testable as the behaviour is the same unless I miss something.

I did add the \bind, \bindx, \close and \parse to the inactive branch
tests to complete the list.

> One more usability thing. I think \parse and \close should not require
> a \g to send the message. You can do that by returning PSQL_CMD_SEND
> instead of PSQL_CMD_SKIP_LIN

Changed.

> I think the examples for \bindx and \close
> should use \parse instead of PREPARE

Done. I had to rely on manual PREPARE for my first tests and it leaked
in the docs.


v3-0001-psql-Add-support-for-prepared-stmt-with-extended-.patch
Description: Binary data


Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bindx

2024-01-18 Thread Anthonin Bonnefoy
> Hmm.  So it does not lead to any user-visible changes, right?

>From what I can tell, there's no change in the behaviour. All paths
would eventually go through HandleSlashCmds's cleaning logic. This is
also mentioned in ignore_slash_options's comment.

* Read and discard "normal" slash command options.
*
* This should be used for inactive-branch processing of any slash command
* that eats one or more OT_NORMAL, OT_SQLID, or OT_SQLIDHACK parameters.
* We don't need to worry about exactly how many it would eat, since the
* cleanup logic in HandleSlashCmds would silently discard any extras anyway.

> Could you split the bits for \bind into a separate patch, please?
> This requires a separate evaluation, especially if this had better be
> backpatched.

Done. patch 1 adds ignore_slash_options to bind. patch 2 adds the new
\bindx, \close and \parse commands.


v4-0002-psql-Add-support-for-prepared-stmt-with-extended-.patch
Description: Binary data


v4-0001-psql-Add-ignore_slash_options-in-bind-s-inactive-.patch
Description: Binary data


Add \syncpipeline command to pgbench

2024-01-18 Thread Anthonin Bonnefoy
Hi,

PQsendSyncMessage() was added in
https://commitfest.postgresql.org/46/4262/. It allows users to add a
Sync message without flushing the buffer.

As a follow-up, this change adds an additional meta-command to
pgbench, \syncpipeline, which will call PQsendSyncMessage(). This will
make it possible to benchmark impact and improvements of using
PQsendSyncMessage through pgbench.

Regards,
Anthonin


v1-0001-pgbench-Add-syncpipeline-to-pgbench.patch
Description: Binary data


Re: Add \syncpipeline command to pgbench

2024-01-18 Thread Anthonin Bonnefoy
On Fri, Jan 19, 2024 at 5:08 AM Michael Paquier  wrote:
> The logic looks sound, but I have a
> comment about the docs: could it be better to group \syncpipeline with
> \startpipeline and \endpipeline?  \syncpipeline requires a pipeline to
> work.

I've updated the doc to group the commands. It does look better and
more consistent with similar command groups like \if.

Regards,
Anthonin


v2-0001-pgbench-Add-syncpipeline-to-pgbench.patch
Description: Binary data


Re: Add \syncpipeline command to pgbench

2024-01-22 Thread Anthonin Bonnefoy
That looks like a bug with how opened pipelines are not caught at the
end of the script processing. startpipeline seems to have similar
related issues.

$ cat only_startpipeline.sql
\startpipeline
SELECT 1;

With only 1 transaction, pgbench will consider this a success despite
not sending anything since the pipeline was not flushed:
pgbench -t1 -Mextended -f only_startpipeline.sql
[...]
number of transactions per client: 1
number of transactions actually processed: 1/1

With 2 transactions, the error will happen when \startpipeline is
called a second time:
pgbench -t2 -Mextended -f only_startpipeline.sql
[...]
pgbench: error: client 0 aborted in command 0 (startpipeline) of
script 0; already in pipeline mode
number of transactions per client: 2
number of transactions actually processed: 1/2

I've split the changes into two patches.
0001 introduces a new error when the end of a pgbench script is
reached while there's still an ongoing pipeline.
0002 adds the \syncpipeline command (original patch with an additional
test case).

Regards,
Anthonin

On Mon, Jan 22, 2024 at 7:16 AM Michael Paquier  wrote:
>
> On Fri, Jan 19, 2024 at 08:55:31AM +0100, Anthonin Bonnefoy wrote:
> > I've updated the doc to group the commands. It does look better and
> > more consistent with similar command groups like \if.
>
> I was playing with a few meta command scenarios while looking at this
> patch, and this sequence generates an error that should never happen:
> $ cat /tmp/test.sql
> \startpipeline
> \syncpipeline
> $ pgbench -n -f /tmp/test.sql -M extended
> [...]
> pgbench: error: unexpected transaction status 1
> pgbench: error: client 0 aborted while receiving the transaction status
>
> It looks to me that we need to be much smarter than that for the error
> handling we'd need when a sync request is optionally sent when a
> transaction stops at the end of pgbench.  Could you look at it?
> --
> Michael


v3-0001-Abort-pgbench-if-script-end-is-reached-with-an-op.patch
Description: Binary data


v3-0002-pgbench-Add-syncpipeline-to-pgbench.patch
Description: Binary data


Re: A performance issue with Memoize

2024-01-31 Thread Anthonin Bonnefoy
Hi,

I've seen a similar issue with the following query (tested on the current head):

EXPLAIN ANALYZE SELECT * FROM tenk1 t1
LEFT JOIN LATERAL (SELECT t1.two, tenk2.hundred, tenk2.two FROM tenk2) t2
ON t1.hundred = t2.hundred WHERE t1.hundred < 5;
   QUERY PLAN

 Nested Loop Left Join  (cost=8.46..1495.10 rows=5 width=256)
(actual time=0.860..111.013 rows=5 loops=1)
   ->  Bitmap Heap Scan on tenk1 t1  (cost=8.16..376.77 rows=500
width=244) (actual time=0.798..1.418 rows=500 loops=1)
 Recheck Cond: (hundred < 5)
 Heap Blocks: exact=263
 ->  Bitmap Index Scan on tenk1_hundred  (cost=0.00..8.04
rows=500 width=0) (actual time=0.230..0.230 rows=500 loops=1)
   Index Cond: (hundred < 5)
   ->  Memoize  (cost=0.30..4.89 rows=100 width=12) (actual
time=0.009..0.180 rows=100 loops=500)
 Cache Key: t1.hundred
 Cache Mode: logical
 Hits: 0  Misses: 500  Evictions: 499  Overflows: 0  Memory Usage: 5kB
 ->  Index Scan using tenk2_hundred on tenk2  (cost=0.29..4.88
rows=100 width=12) (actual time=0.007..0.124 rows=100 loops=500)
   Index Cond: (hundred = t1.hundred)
 Planning Time: 0.661 ms
 Execution Time: 113.076 ms
(14 rows)

The memoize's cache key only uses t1.hundred while the nested loop has
two changed parameters: the lateral var t1.two and t1.hundred. This
leads to a chgParam that is always different and the cache is purged
on each rescan.

Regards,
Anthonin

On Sat, Jan 27, 2024 at 5:00 AM Alexander Lakhin  wrote:
>
> Hello,
>
> 27.01.2024 00:09, Tom Lane wrote:
> > David Rowley  writes:
> >> On Sat, 27 Jan 2024 at 09:41, Tom Lane  wrote:
> >>> drongo and fairywren are consistently failing the test case added
> >>> by this commit.  I'm not quite sure why the behavior of Memoize
> >>> would be platform-specific when we're dealing with integers,
> >>> but ...
> >> Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those
> >> rather than "2"?
> >> Looking at the code in fmtfloat(), we fallback on the built-in snprintf.
> > Maybe ... I don't have a better theory.
>
> FWIW, I've found where this behaviour is documented:
> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/sprintf-sprintf-l-swprintf-swprintf-l-swprintf-l?view=msvc-170
>
> (I've remembered a case with test/sql/partition_prune from 2020, where
> sprintf on Windows worked the other way.)
>
>
> Best regards,
> Alexander
>
>




Re: POC: Extension for adding distributed tracing - pg_tracing

2024-02-06 Thread Anthonin Bonnefoy
Hi!

On Fri, Jan 26, 2024 at 1:43 PM Nikita Malakhov  wrote:
> It's a good idea to split a big patch into several smaller ones.
> But you've already implemented these features - you could provide them
> as sequential small patches (i.e. v13-0002-guc-context-propagation.patch and 
> so on)

Keeping a consistent set of patches is going to be hard as any change
in the first patch would require rebasing that are likely going to
conflict. I haven't discarded the features but keeping them separated
could help with the discussion.

On Fri, Feb 2, 2024 at 1:41 PM Heikki Linnakangas  wrote:
>
> Hi Anthonin,
>
> I'm only now catching up on this thread. Very exciting feature!
Thanks! And thanks for the comments.

> - You're relying on various existing hooks, but it might be better to
> have the 'begin_span'/'end_span' calls directly in PostgreSQL source
> code in the right places. There are more places where spans might be
> nice, like when performing a large sort in tuplesort.c to name one
> example. Or even across every I/O; not sure how much overhead the spans
> incur and how much we'd be willing to accept. 'begin_span'/'end_span'
> could be new hooks that normally do nothing, but your extension would
> implement them.

Having information that would only be available to tracing doesn't
sound great. It probably should be added to the core instrumentation
which could be then leveraged by tracing and other systems.
One of the features I initially implemented was to track parse start
and end. However, since only the post parse hook is available, I could
only infer when the parse started. Changing the core to track parse
start would make this more easier and it could be used by EXPLAIN and
pg_stat_statements.
In the end, I've tried to match what's done in pg_stat_statements
since the core idea is the same, except pg_tracing outputs spans
instead of statistics.

> - How to pass the tracing context from the client? You went with
> SQLCommenter and others proposed a GUC. A protocol extension would make
> sense too. I can see merits to all of those. It probably would be good
> to support multiple mechanisms, but some might need core changes. It
> might be good to implement the mechanism for accepting trace context in
> core. Without the extension, we could still include the trace ID in the
> logs, for example.

I went with SQLCommenter for the initial implementation since it has
the benefit of already being implemented and working out of the box.
Jelte has an ongoing patch to add the possibility to modify GUC values
through a protocol message [1]. I've tested it and it worked well
without adding too much complexity in pg_tracing. I imagine having a
dedicated traceparent GUC variable could be also used in the core to
propagate in the logs and other places.

> - Include EXPLAIN plans in the traces.
This was already implemented in the first versions (or rather, the
planstate was translated to spans) of the patch. I've removed it in
the latest patch as it was contributing to 50% of the code and also
required a core change, namely a change in instrument.c to track the
first start of a node. Though if you were thinking of the raw EXPLAIN
ANALYZE text dumped as a span metadata, that could be doable.

> # Comments on the implementation
>
> There was discussion already on push vs pull model. Currently, you
> collect the traces in memory / files, and require a client to poll them.
> A push model is more common in applications that support tracing. If
> Postgres could connect directly to the OTEL collector, you'd need one
> fewer running component. You could keep the traces in backend-private
> memory, no need to deal with shared memory and spilling to files.

PostgreSQL metrics are currently exposed through tables like pg_stat_*
and users likely have collectors pulling those metrics (like OTEL
PostgreSQL Receiver [2]). Having this collector also pulling traces
would keep the logic in a single place and would avoid a situation
where both pull and push is used.

> 2. Start a new pgsql-hackers thread on in-core changes that would
> benefit the extension. Write patches for them. I'm thinking of the
> things I listed in the Core changes section above, but maybe there are
> others.
I already have an ongoing patch (more support for extended protocol in
psql [3] needed for tests) and plan to do others (the parse changes
I've mentioned). Though I also don't want to spread myself too thin.


[1]: https://commitfest.postgresql.org/46/4736/
[2]: 
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/postgresqlreceiver/README.md
[3]: https://commitfest.postgresql.org/46/4650/

On Fri, Feb 2, 2024 at 5:42 PM Jan Katins  wrote:
>
> Hi!
>
> On Fri, 2 Feb 2024 at 13:41, Heikki Linnakangas  wrote:
>>
>> PS. Does any other DBMS implement this? Any lessons to be learned from them?
>
>
> Mysql 8.3 has open telemetrie component, so very fresh:
>
> https://dev.mysql.com/doc/refman/8.3/en/telemetry-trace-install.html
> https:

Fix parallel vacuum buffer usage reporting

2024-02-09 Thread Anthonin Bonnefoy
Hi,

With a db setup with pgbench, we add an additional index:
CREATE INDEX ON pgbench_accounts(abalance)

And trigger several updates and vacuum to reach a stable amount of
dirtied pages:
UPDATE pgbench_accounts set abalance = abalance + 1 WHERE aid=1; CHECKPOINT;
VACUUM (VERBOSE, INDEX_CLEANUP ON) pgbench_accounts

The vacuum will report the following:
INFO:  vacuuming "postgres.public.pgbench_accounts"
INFO:  launched 1 parallel vacuum worker for index vacuuming (planned: 1)
INFO:  finished vacuuming "postgres.public.pgbench_accounts": index scans: 1
...
buffer usage: 122 hits, 165 misses, 4 dirtied

4 pages were reported dirtied. However, we have 5 dirtied blocks at
the end of the vacuum when looking at pg_buffercache:

SELECT c.relname, b.relfilenode
 FROM
pg_buffercache b LEFT JOIN pg_class c
 ON b.relfilenode =
pg_relation_filenode(c.oid)
   WHERE isdirty=true;
relname| relfilenode
---+-
 pg_class  |1259
 pgbench_accounts  |   16400
 pgbench_accounts  |   16400
 pgbench_accounts_pkey |   16408
 pgbench_accounts_abalance_idx |   16480

The missing dirty block comes from the parallel worker vacuuming the
abalance index. Running vacuum with parallel disabled will give the
correct result.

Vacuum uses dedicated VacuumPage{Hit,Miss,Dirty} globals to track
buffer usage. However, those values are not collected at the end of
parallel vacuum workers, leading to incorrect buffer count.

Those vacuum specific globals are redundant with the existing
pgBufferUsage and only used in the verbose output. This patch removes
them and replaces them by pgBufferUsage which is already correctly
collected at the end of parallel workers, fixing the buffer count.

Regards,
Anthonin


v01-0001-Fix-parallel-vacuum-buffer-usage-reporting.patch
Description: Binary data


Re: Flush SLRU counters in checkpointer process

2023-03-03 Thread Anthonin Bonnefoy
Here's the patch rebased with Andres' suggestions.
Happy to update it if there's any additionalj change required.


On Wed, Mar 1, 2023 at 8:46 PM Gregory Stark (as CFM) 
wrote:

> On Thu, 12 Jan 2023 at 03:46, Anthonin Bonnefoy
>  wrote:
> >
> >
> > That would make sense. I've created a new patch with everything moved in
> pgstat_report_checkpointer().
> > I did split the checkpointer flush in a pgstat_flush_checkpointer()
> function as it seemed more readable. Thought?
>
> This patch appears to need a rebase. Is there really any feedback
> needed or is it ready for committer once it's rebased?
>
>
>
> --
> Gregory Stark
> As Commitfest Manager
>


flush-slru-counters-v3.patch
Description: Binary data


Re: Fix parallel vacuum buffer usage reporting

2024-03-28 Thread Anthonin Bonnefoy
Hi,

Thanks for the review.

On Thu, Mar 28, 2024 at 4:07 AM Masahiko Sawada 
wrote:

> As for the proposed patch, the following part should handle the temp
> tables too:
>

True, I've missed the local blocks. I've updated the patch:
- read_rate and write_rate now include local block usage
- I've added a specific output for reporting local blocks instead of
combining them in the same output.

Regards,
Anthonin


v2-0001-Fix-parallel-vacuum-buffer-usage-reporting.patch
Description: Binary data


Fix expecteddir argument in pg_regress

2024-03-11 Thread Anthonin Bonnefoy
Hi all!

pg_regress accepts the expecteddir argument. However, it is never used
and outputdir is used instead to get the expected files paths.

This patch fixes this and uses the expecteddir argument as expected.

Regards,
Anthonin


v01-0001-pg_regress-Use-expecteddir-for-the-expected-file.patch
Description: Binary data


Re: POC: Extension for adding distributed tracing - pg_tracing

2024-03-12 Thread Anthonin Bonnefoy
Hi all!

Thanks for the reviews and comments.

> - pg_tracing.c should include postgres.h as the first thing
Will do.

> afaict none of the use of volatile is required, spinlocks have been barriers
> for a long time now
Got it, I will remove them. I've been mimicking what was done in
pg_stat_statements and all spinlocks are done on volatile variables
[1].

> - I don't think it's a good idea to do memory allocations in the middle of a
> PG_CATCH. If the error was due to out-of-memory, you'll throw another error.
Good point. I was wondering what were the risks of generating spans
for errors. I will try to find a better way to handle this.

To give a quick update: following Heikki's suggestion, I'm currently
working on getting pg_tracing as a separate extension on github. I
will send an update when it's ready.

[1]: 
https://github.com/postgres/postgres/blob/master/contrib/pg_stat_statements/pg_stat_statements.c#L2000

On Fri, Feb 9, 2024 at 7:50 PM Andres Freund  wrote:
>
> Hi,
>
> On 2024-02-09 19:46:58 +0300, Nikita Malakhov wrote:
> > I agree with Heikki on most topics and especially the one he recommended
> > to publish your extension on GitHub, this tool is very promising for highly
> > loaded
> > systems, you will get a lot of feedback very soon.
> >
> > I'm curious about SpinLock - it is recommended for very short operations,
> > taking up to several instructions, and docs say that for longer ones it
> > will be
> > too expensive, and recommends using LWLock. Why have you chosen SpinLock?
> > Does it have some benefits here?
>
> Indeed - e.g. end_tracing() looks to hold the spinlock for far too long for
> spinlocks to be appropriate. Use an lwlock.
>
> Random stuff I noticed while skimming:
> - pg_tracing.c should include postgres.h as the first thing
>
> - afaict none of the use of volatile is required, spinlocks have been barriers
>   for a long time now
>
> - you acquire the spinlock for single increments of n_writers, perhaps that
>   could become an atomic, to reduce contention?
>
> - I don't think it's a good idea to do memory allocations in the middle of a
>   PG_CATCH. If the error was due to out-of-memory, you'll throw another error.
>
>
> Greetings,
>
> Andres Freund




Flush SLRU counters in checkpointer process

2023-01-11 Thread Anthonin Bonnefoy
Hello hackers,

Currently, the Checkpointer process only reports SLRU statistics at server
shutdown, leading to delayed statistics for SLRU flushes. This patch adds a
flush of SLRU stats to the end of checkpoints.

Best regards,
Anthonin


flush-slru-counters.patch
Description: Binary data


Re: Flush SLRU counters in checkpointer process

2023-01-12 Thread Anthonin Bonnefoy
On Wed, Jan 11, 2023 at 5:33 PM Andres Freund  wrote:

> Hi,
>
> On 2023-01-11 10:29:06 +0100, Anthonin Bonnefoy wrote:
> > Currently, the Checkpointer process only reports SLRU statistics at
> server
> > shutdown, leading to delayed statistics for SLRU flushes. This patch
> adds a
> > flush of SLRU stats to the end of checkpoints.
>
> Hm. I wonder if we should do this even earlier, by the
> pgstat_report_checkpointer() calls in CheckpointWriteDelay().
>
> I'm inclined to move the pgstat_report_wal() and pgstat_report_slru() calls
> into pgstat_report_checkpointer() to avoid needing to care about all the
> individual places.
>
That would make sense. I've created a new patch with everything moved in
pgstat_report_checkpointer().
I did split the checkpointer flush in a pgstat_flush_checkpointer()
function as it seemed more readable. Thought?


> > @@ -505,6 +505,7 @@ CheckpointerMain(void)
> >   /* Report pending statistics to the cumulative stats
> system */
> >   pgstat_report_checkpointer();
> >   pgstat_report_wal(true);
> > + pgstat_report_slru(true);
>
> Why do we need a force parameter if all callers use it?

Good point. I've written the same signature as pgstat_report_wal but
there's no need for the nowait parameter.


flush-slru-counters-v2.patch
Description: Binary data


[PATCH] Add statement_timeout in pg_stat_activity

2023-04-04 Thread Anthonin Bonnefoy
Hello hackers.

This patch adds the backend's statement_timeout value to pg_stat_activity.

This would provide some insights on clients that are disabling a default
statement timeout or overriding it through a pgbouncer, messing with other
sessions.

pg_stat_activity seemed like the best place to have this information.

Regards,
Anthonin
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d5a45f996d..2109595c40 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -861,6 +861,15 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  
+   statement_timeout integer
+  
+  
+   The current backend's statement_timeout.
+  
+ 
+
  
   
wait_event_type text
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 574cbc2e44..7c24ef92bf 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -860,6 +860,7 @@ CREATE VIEW pg_stat_activity AS
 S.xact_start,
 S.query_start,
 S.state_change,
+S.statement_timeout,
 S.wait_event_type,
 S.wait_event,
 S.state,
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index f0f2e07655..0a2ba26070 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -1054,6 +1054,16 @@ assign_application_name(const char *newval, void *extra)
 	pgstat_report_appname(newval);
 }
 
+/*
+ * GUC assign_hook for statement_timeout
+ */
+void
+assign_statement_timeout(int newval, void *extra)
+{
+	/* Update the pg_stat_activity view */
+	pgstat_report_statement_timeout(newval);
+}
+
 /*
  * GUC check_hook for cluster_name
  */
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 608d01ea0d..5d94cb0ba6 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -336,6 +336,7 @@ pgstat_bestart(void)
 	lbeentry.st_activity_start_timestamp = 0;
 	lbeentry.st_state_start_timestamp = 0;
 	lbeentry.st_xact_start_timestamp = 0;
+	lbeentry.st_statement_timeout = 0;
 	lbeentry.st_databaseid = MyDatabaseId;
 
 	/* We have userid for client-backends, wal-sender and bgworker processes */
@@ -446,9 +447,11 @@ pgstat_bestart(void)
 
 	PGSTAT_END_WRITE_ACTIVITY(vbeentry);
 
-	/* Update app name to current GUC setting */
-	if (application_name)
+	/* Update app name and statement timeout to current GUC setting */
+	if (application_name) {
 		pgstat_report_appname(application_name);
+		pgstat_report_statement_timeout(StatementTimeout);
+	}
 }
 
 /*
@@ -692,6 +695,33 @@ pgstat_report_appname(const char *appname)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/* --
+ * pgstat_report_statement_timeout() -
+ *
+ *	Called to update statement_timeout.
+ * --
+ */
+void
+pgstat_report_statement_timeout(int statement_timeout)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	/*
+	 * Update my status entry, following the protocol of bumping
+	 * st_changecount before and after.  We use a volatile pointer here to
+	 * ensure the compiler doesn't try to get cute.
+	 */
+	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+
+	beentry->st_statement_timeout = statement_timeout;
+
+	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
+
 /*
  * Report current transaction start timestamp as the specified value.
  * Zero means there is no active transaction.
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index eec9f3cf9b..de72c2c1d0 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -303,7 +303,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_activity(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_ACTIVITY_COLS	30
+#define PG_STAT_GET_ACTIVITY_COLS	31
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	int			pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0);
@@ -612,6 +612,12 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 nulls[29] = true;
 			else
 values[29] = UInt64GetDatum(beentry->st_query_id);
+			/* Only report statement_timeout for client backend */
+			if (beentry->st_backendType == B_BACKEND) {
+values[30] = Int32GetDatum(beentry->st_statement_timeout);
+			} else {
+nulls[30] = true;
+			}
 		}
 		else
 		{
@@ -640,6 +646,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 			nulls[27] = true;
 			nulls[28] = true;
 			nulls[29] = true;
+			nulls[30] = true;
 		}
 
 		tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls);
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 8062589efd..5dd33f583f 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2478,7 +2478,7 @@ struct config_int ConfigureNamesInt[] =
 		

[PATCH] Add additional extended protocol commands to psql: \parse and \bindx

2023-11-02 Thread Anthonin Bonnefoy
Hi all!

Currently, only unnamed prepared statements are supported by psql with the
\bind command and it's not possible to create or use named prepared statements
through extended protocol.

This patch introduces 2 additional commands: \parse and \bindx.
\parse allows us to issue a Parse message to create a named prepared statement
through extended protocol.
\bindx allows to bind and execute a named prepared statement through extended
protocol.

The main goal is to provide more ways to test extended protocol in
regression tests
similarly to what \bind is doing.

Regards,
Anthonin


0001-psql-Add-support-for-prepared-stmt-with-extended-pro.patch
Description: Binary data


Re: Add PQsendSyncMessage() to libpq

2023-11-13 Thread Anthonin Bonnefoy
Hi,

I've played a bit with the patch on my side. One thing that would be
great would be to make this
available in pgbench through a \syncpipeline meta command. That would
make it easier for users
to test whether there's a positive impact with their queries or not.

I've wrote a patch to add it to pgbench (don't want to mess with the
thread's attachment so here's a GH link
https://github.com/bonnefoa/postgres/commit/047b5b05169e36361fe29fef9f430da045ef012d).
Here's some quick results:

echo "\set aid1 random(1, 10 * :scale)
\set aid2 random(1, 10 * :scale)
\startpipeline
select 1;
select * from pgbench_accounts where aid=:aid1;
select 2;
\syncpipeline
select 1;
select * from pgbench_accounts where aid=:aid2;
select 2;
\endpipeline" > /tmp/pipeline_without_flush.sql
pgbench -T30 -Mextended -f /tmp/pipeline_without_flush.sql -h127.0.0.1
latency average = 0.383 ms
initial connection time = 2.810 ms
tps = 2607.587877 (without initial connection time)

echo "\set aid1 random(1, 10 * :scale)
\set aid2 random(1, 10 * :scale)
\startpipeline
select 1;
select * from pgbench_accounts where aid=:aid1;
select 2;
\endpipeline
\startpipeline
select 1;
select * from pgbench_accounts where aid=:aid2;
select 2;
\endpipeline" > /tmp/pipeline_with_flush.sql
pgbench -T30 -Mextended -f /tmp/pipeline_with_flush.sql -h127.0.0.1
latency average = 0.437 ms
initial connection time = 2.602 ms
tps = 2290.527462 (without initial connection time)

I took some perfs and the main change is from the server spending less time in
ReadCommand which makes sense since the commands are sent in a single tcp
frame with the \syncpipeline version.

Regards,
Anthonin

On Sun, Nov 12, 2023 at 2:37 PM Anton Kirilov  wrote:
>
> Hello,
>
> Thanks for the feedback!
>
> On 07/11/2023 09:23, Jelte Fennema-Nio wrote:
>  > But I think it's looking at the situation from the wrong direction.
> [...] we should look at it as an addition to our current list of PQsend
> functions for a new packet type. And none of those PQsend functions ever
> needed a flag. Which makes sense, because they are the lowest level
> building blocks that make sense from a user perspective: They send a
> message type over the socket and don't do anything else.
>
> Yes, I think that this is quite close to my thinking when I created the
> original version of the patch. Also, the protocol specification states
> that the Sync message lacks parameters.
>
> Since there haven't been any comments from the other people who have
> chimed in on this e-mail thread, I will assume that there is consensus
> (we are doing a U-turn with the implementation approach after all), so
> here is the updated version of the patch.
>
> Best wishes,
> Anton Kirilov




Possible segfault when sending notification within a ProcessUtility hook

2023-12-05 Thread Anthonin Bonnefoy
Hi,

I've encountered the following segfault:

#0: 0x000104e821a8 postgres`list_head(l=0x7f7f7f7f7f7f7f7f) at
pg_list.h:130:17
#1: 0x000104e81c9c postgres`PreCommit_Notify at async.c:932:16
#2: 0x000104dd02f8 postgres`CommitTransaction at xact.c:2236:2
#3: 0x000104dcfc24 postgres`CommitTransactionCommand at xact.c:3061:4
#4: 0x00010528a880 postgres`finish_xact_command at postgres.c:2777:3
#5: 0x0001052883ac postgres`exec_simple_query(query_string="notify
test;") at postgres.c:1298:4

This happens when a transaction block fails and a ProcessUtility hook
sends a notification during the rollback command.

When a transaction block fails, it will enter in a TBLOCK_ABORT state,
waiting for a rollback. Calling rollback will switch to a
TBLOCK_ABORT_END state and will only go through CleanupTransaction.
If a hook sends a notification during the rollback command, a
notification will be queued but its content will be wiped when the
TopTransactionContext is destroyed.
Trying to send a notification immediately after will segfault in
PreCommit_Notify as pendingNotifies->events will be invalid.

There's a test_notify_rollback test module attached to the patch that reproduces
the issue.

Moving notification clean up from AbortTransaction to CleanupTransaction fixes
the issue as it will clear pendingActions in the same function that destroys the
TopTransactionContext.

Regards,
Anthonin


v1-0001-Fix-segfault-when-notifying-in-a-ProcessUtility-h.patch
Description: Binary data


Re: Possible segfault when sending notification within a ProcessUtility hook

2023-12-06 Thread Anthonin Bonnefoy
> On Tue, Dec 5, 2023 at 9:03 PM Tom Lane  wrote:
> Why should we regard that as anything other than a bug in the
> ProcessUtility hook?  A failed transaction should not send any
> notifies.

Fair point. That was also my initial assumption but I thought that the
transaction
state was not available from a hook as I've missed
IsAbortedTransactionBlockState.

I will rely on IsAbortedTransactionBlockState to avoid this case,
thanks for the input.

Regards,
Anthonin.




Re: POC: Extension for adding distributed tracing - pg_tracing

2024-01-04 Thread Anthonin Bonnefoy
Hi,

> This approach avoids the need to rewrite SQL or give special meaning to SQL 
> comments.

SQLCommenter already has a good amount of support and is referenced in
opentelemetry https://github.com/open-telemetry/opentelemetry-sqlcommenter
So the goal was more to leverage the existing trace propagation
systems as much as possible.

> I used GUCs to set the `opentelemtery_trace_id` and `opentelemetry_span_id`.
> These can be sent as protocol-level messages by the client driver if the
> client driver has native trace integration enabled, in which case the user
> doesn't need to see or care. And for other drivers, the application can use
> `SET` or `SET LOCAL` to assign them.

Emitting `SET` and `SET LOCAL` for every traced statement sounds more
disruptive and expensive than relying on SQLCommenter. When not using
`SET LOCAL`, the variables also need to be cleared.
This will also introduce a lot of headaches as the `SET` themselves
could be traced and generate spans when tracing utility statements is
already tricky enough.

> But IIRC the current BDR/PGD folks are now using an OpenTelemetry
> sidecar process instead. I think they send it UDP packets to emit
> metrics and events.

I would be curious to hear more about this. I didn't want to be tied
to a specific protocol (I've only seen gRPC and http support on latest
opentelemetry collectors) and I fear that relying on Postgres sending
traces would lower the chance of having this supported on managed
Postgres solutions (like RDS and CloudSQL).

> By queries you mean particular queries, not transactions? And since
> they have an assigned ID it means that the query is already executing
> and we want to enable the tracking in another session, right?

I think that was the idea. The query identifier generated for a
specific statement is stable so we could have a GUC variable with a
list of query id and only queries matching the provided query ids
would be sampled. This would make it easier to generate traces for a
specific query within a session.


On Tue, Jan 2, 2024 at 1:14 PM Aleksander Alekseev
 wrote:
>
> Hi,
>
> > Overall solution looks good for me except SQL Commenter - query 
> > instrumentation
> > with SQL comments is often not possible on production systems. Instead
> > the very often requested feature is to enable tracing for a given single 
> > query ID,
> > or several (very limited number of) queries by IDs. It could be done by 
> > adding
> > SQL function to add and remove query ID into a list (even array would do)
> > stored in top tracing context.
>
> Not 100% sure if I follow.
>
> By queries you mean particular queries, not transactions? And since
> they have an assigned ID it means that the query is already executing
> and we want to enable the tracking in another session, right? If this
> is the case I would argue that enabling/disabling tracing for an
> _already_ running query (or transaction) would be way too complicated.
>
> I wouldn't mind enabling/disabling tracing for the current session
> and/or a given session ID. In the latter case it can have an effect
> only for the new transactions. This however could be implemented
> separately from the proposed patchset. I suggest keeping the scope
> small.
>
> --
> Best regards,
> Aleksander Alekseev




Re: Remove dependency on VacuumPage(Hit/Miss/Dirty) counters in do_analyze_rel

2024-06-06 Thread Anthonin Bonnefoy
Hi,

I sent a similar patch for this in
https://www.postgresql.org/message-id/flat/cao6_xqr__kttclkftqs0qscm-j7_xbrg3ge2rwhucxqjmjh...@mail.gmail.com

Regards,
Anthonin

On Thu, Jun 6, 2024 at 11:10 AM Dilip Kumar  wrote:

> As part of commit 5cd72cc0c5017a9d4de8b5d465a75946da5abd1d, the
> dependency on global counters such as VacuumPage(Hit/Miss/Dirty) was
> removed from the vacuum. However, do_analyze_rel() was still using
> these counters, necessitating the tracking of global counters
> alongside BufferUsage counters.
>
> The attached patch addresses the issue by eliminating the need to
> track VacuumPage(Hit/Miss/Dirty) counters in do_analyze_rel(), making
> the global counters obsolete. This simplifies the code and improves
> consistency.
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>


Fix possible overflow of pg_stat DSA's refcnt

2024-06-25 Thread Anthonin Bonnefoy
Hi,

During backend initialisation, pgStat DSA is attached using
dsa_attach_in_place with a NULL segment. The NULL segment means that
there's no callback to release the DSA when the process exits.
pgstat_detach_shmem only calls dsa_detach which, as mentioned in the
function's comment, doesn't include releasing and doesn't decrement the
reference count of pgStat DSA.

Thus, every time a backend is created, pgStat DSA's refcnt is incremented
but never decremented when the backend shutdown. It will eventually
overflow and reach 0, triggering the "could not attach to dynamic shared
area" error on all newly created backends. When this state is reached, the
only way to recover is to restart the db to reset the counter.

The issue can be visible by calling dsa_dump in pgstat_detach_shmem and
checking that refcnt's value is continuously increasing as new backends are
created. It is also possible to reach the state where all connections are
refused by editing the refcnt manually with lldb/gdb (The alternative,
creating enough backends to reach 0 exists but can take some time). Setting
it to -10 and then opening 10 connections will eventually generate the
"could not attach" error.

This patch fixes this issue by releasing pgStat DSA with
dsa_release_in_place during pgStat shutdown to correctly decrement the
refcnt.

Regards,
Anthonin


v1-0001-Fix-possible-overflow-of-pg_stat-DSA-s-refcnt.patch
Description: Binary data


Re: Fix possible overflow of pg_stat DSA's refcnt

2024-06-25 Thread Anthonin Bonnefoy
On Wed, Jun 26, 2024 at 7:40 AM Michael Paquier  wrote:
>
> Very good catch!  It looks like you have seen that in the field, then.
> Sad face.

Yeah, this happened last week on one of our replicas (version 15.5)
last week that had 134 days uptime. We are doing a lot of parallel
queries on this cluster so the combination of high uptime plus
parallel workers creation eventually triggered the issue.

> Will fix and backpatch, thanks for the report!

Thanks for handling this and for the quick answer!

Regards,
Anthonin




Re: Use pgBufferUsage for block reporting in analyze

2024-07-08 Thread Anthonin Bonnefoy
Hi,

Thanks for the review, I've updated the patches with the suggestions:
- moved renaming of misses to reads to the first patch
- added intermediate variables for total blks usage

I've also done some additional tests using the provided
vacuum_analyze_buffer_usage.sql script. It relies on
pg_stat_statements to check the results (only pgss gives information
on dirtied buffers). It gives the following output:

 psql:vacuum_analyze_buffer_usage.sql:21: INFO:  vacuuming
"postgres.pg_temp_7.vacuum_blks_stat_test"
 ...
 buffer usage: 105 hits, 3 reads, 6 dirtied
 ...
  query  | sum_hit | sum_read | sum_dirtied
 +-+--+-
  VACUUM (VERBOSE... | 105 |3 |   6

For vacuum, we have the same results with SKIP_DATABASE_STATS. Without
this setting, we would have block usage generated by
vac_update_datfrozenxid outside of vacuum_rel and therefore not
tracked by the verbose output. For the second test, the second patch
is needed to have ANALYZE (VERBOSE) output the block usage. It will
output the following:

 psql:vacuum_analyze_buffer_usage.sql:29: INFO:  analyzing
"pg_temp_7.vacuum_blks_stat_test"
 ...
 buffer usage: 84 hits, 33 reads, 2 dirtied
 ...
 query| sum_hit | sum_read | sum_dirtied
 -+-+--+-
  ANALYZE (VERBOSE... |  91 |   38 |   2

There's additional buffer hits/reads reported by pgss, those are from
analyze_rel opening the relations in try_relation_open and are not
tracked by the ANALYZE VERBOSE.


v3-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patch
Description: Binary data


v3-0002-Output-buffer-and-wal-usage-with-verbose-analyze.patch
Description: Binary data


vacuum_analyze_buffer_usage.sql
Description: Binary data


Re: Flush SLRU counters in checkpointer process

2023-07-18 Thread Anthonin Bonnefoy
I think I've managed to reproduce the issue. The test I've added to check
slru flush was the one failing in the regression suite.

SELECT SUM(flushes) > :slru_flushes_before FROM pg_stat_slru;
 ?column?
--
 t

The origin seems to be a race condition on have_slrustats (
https://github.com/postgres/postgres/blob/c8e1ba736b2b9e8c98d37a5b77c4ed31baf94147/src/backend/utils/activity/pgstat_slru.c#L161-L162
).
I will try to get a new patch with improved test stability.


On Mon, Jul 3, 2023 at 3:18 PM Daniel Gustafsson  wrote:

> > On 3 Mar 2023, at 09:06, Anthonin Bonnefoy <
> anthonin.bonne...@datadoghq.com> wrote:
> >
> > Here's the patch rebased with Andres' suggestions.
> > Happy to update it if there's any additionalj change required.
>
> This patch crashes 031_recovery_conflict with a SIGInvalid on Windows, can
> you
> please investigate and see what might be going on there?  The test passed
> about
> 4 days ago on Windows so unless it's the CI being flaky it should be due
> to a
> recent change.
>
> If you don't have access to a Windows environment you can run your own
> instrumented builds in your Github account with the CI files in the
> postgres
> repo.
>
> --
> Daniel Gustafsson
>
>


Re: POC: Extension for adding distributed tracing - pg_tracing

2023-07-26 Thread Anthonin Bonnefoy
I've initially thought of sending the spans from PostgreSQL since this is
the usual behavior of tracing libraries.

However, this created a lot potential issues:

- Protocol support and differences between trace collectors. OpenTelemetry
seems to use gRPC, others are using http and those will require additional
libraries (plus gRPC support in C doesn't look good) and any change in
protobuf definition would require updating the extension.

- Do we send the spans within the query hooks? This means that we could
block the process if the trace collector is slow to answer or we can’t
connect. Sending spans from a background process sounded rather complex and
resource heavy.

Moving to a pull model fixed those issues and felt more natural as this is
the way PostgreSQL exposes its metrics.


On Wed, Jul 26, 2023 at 4:11 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Nikita,
>
> > This patch looks very interesting, I'm working on the same subject too.
> But I've used
> > another approach - I'm using C wrapper for C++ API library from
> OpenTelemetry, and
> > handle span storage and output to this library. There are some nuances
> though, but it
> > is possible. Have you tried to use OpenTelemetry APIs instead of
> implementing all
> > functionality around spans?
>
> I don't think that PostgreSQL accepts such kind of C++ code, not to
> mention the fact that the PostgreSQL license is not necessarily
> compatible with Apache 2.0 (I'm not a lawyer; this is not a legal
> advice). Such a design decision will probably require using separate
> compile flags since the user doesn't necessarily have a corresponding
> dependency installed. Similarly to how we do with LLVM, OpenSSL, etc.
>
> So -1 to the OpenTelemetry C++ library and +1 to the properly licensed
> C implementation without 3rd party dependencies from me. Especially
> considering the fact that the implementation seems to be rather
> simple.
>
> --
> Best regards,
> Aleksander Alekseev
>


Re: POC: Extension for adding distributed tracing - pg_tracing

2023-07-27 Thread Anthonin Bonnefoy
> Agree, something goes wrong when using Autotools (but not Meson) on
> both Linux and MacOS. I didn't investigate the issue though.
I was only using meson and forgot to keep Automake files up to date when
I've split pg_tracing.c in multiple files (span.c, explain.c...).

On Fri, Jul 28, 2023 at 8:10 AM Nikita Malakhov  wrote:

> Hi,
>
> I've fixed the Autotools build, please check patch below (v2).
>
> On Thu, Jul 27, 2023 at 6:39 PM Aleksander Alekseev <
> aleksan...@timescale.com> wrote:
>
>> Hi,
>>
>> > Also FYI, there are build warnings because functions
>> > const char * get_span_name(const Span * span, const char *qbuffer)
>> > and
>> > const char * get_operation_name(const Span * span, const char *qbuffer)
>> > do not have default inside switch and no return outside of switch.
>>
>> You are right, there are a few warnings:
>>
>> ```
>> [1566/1887] Compiling C object contrib/pg_tracing/pg_tracing.so.p/span.c.o
>> ../contrib/pg_tracing/span.c: In function ‘get_span_name’:
>> ../contrib/pg_tracing/span.c:210:1: warning: control reaches end of
>> non-void function [-Wreturn-type]
>>   210 | }
>>   | ^
>> ../contrib/pg_tracing/span.c: In function ‘get_operation_name’:
>> ../contrib/pg_tracing/span.c:249:1: warning: control reaches end of
>> non-void function [-Wreturn-type]
>>   249 | }
>>   | ^
>> ```
>>
>> Here is the patch v2 with a quick fix.
>>
>> > but got errors calling make check and cannot install the extension
>>
>> Agree, something goes wrong when using Autotools (but not Meson) on
>> both Linux and MacOS. I didn't investigate the issue though.
>>
>> --
>> Best regards,
>> Aleksander Alekseev
>>
>
>
> --
> Regards,
>
> --
> Nikita Malakhov
> Postgres Professional
> The Russian Postgres Company
> https://postgrespro.ru/
>


Re: POC: Extension for adding distributed tracing - pg_tracing

2023-07-28 Thread Anthonin Bonnefoy
Hi,

> I'd keep Autotools build up to date
Definitely. The patch is still in a rough state and updating Autotools fell
through the cracks.

> I'm currently playing with the patch and
> reviewing sources, if you need any cooperation - please let us know.
The goal for me was to get validation on the design and to check if there
was anything that would be a blocker before commiting more time on the
project.

There are already several things I was planning on improving:
- The parse span is something I've added recently. I've realised that I
could rely on the stmtStartTimestamp to get the time spent in parsing in
the post parse hook so the code around this is still rough.
- There are probably race conditions on the reads and writes of the query
file that need to be fixed
- I'm using the same Span structure for everything while Executor spans
only need a small subset which is a bit wasteful. I've tried to use two
different shared buffers (base spans and spans with counters) but it was
making things more confusing.
- Finish commenting code and start writing the doc

So any comments, missing features and reviews on the current state of the
project is welcome.

Regards,
Anthonin

On Fri, Jul 28, 2023 at 9:35 AM Nikita Malakhov  wrote:

> Hi,
>
> I'd keep Autotools build up to date, because Meson is very limited in
> terms of not very
> up-to-date OS versions. Anyway, it's OK now. I'm currently playing with
> the patch and
> reviewing sources, if you need any cooperation - please let us know.
> I'm +1 for committing
> this patch after review.
>
> --
> Regards,
> Nikita Malakhov
> Postgres Professional
> The Russian Postgres Company
> https://postgrespro.ru/
>


Re: POC: Extension for adding distributed tracing - pg_tracing

2023-07-28 Thread Anthonin Bonnefoy
> What do you think about using INSTR_TIME_SET_CURRENT, INSTR_TIME_SUBTRACT
and INSTR_TIME_GET_MILLISEC
> macros for timing calculations?
If you're talking of the two instances where I'm modifying the instr_time's
ticks, it's because I can't use the macros there.
The first case is for the parse span. I only have the start timestamp
using GetCurrentStatementStartTimestamp and don't
have access to the start instr_time so I need to build the duration from 2
timestamps.
The second case is when building node spans from the planstate. I directly
have the duration from Instrumentation.

I guess one fix would be to use an int64 for the span duration to directly
store nanoseconds instead of an instr_time
but I do use the instrumentation macros outside of those two cases to get
the duration of other spans.

> Also, have you thought about a way to trace existing (running) queries
without directly instrumenting them?
That's a good point. I was focusing on leaving the sampling decision to the
caller through the sampled flag and
only recently added the pg_tracing_sample_rate parameter to give more
control. It should be straightforward to
add an option to create standalone traces based on sample rate alone. This
way, setting the sample rate to 1
would force the queries running in the session to be traced.


On Fri, Jul 28, 2023 at 3:02 PM Nikita Malakhov  wrote:

> Hi!
>
> What do you think about using INSTR_TIME_SET_CURRENT, INSTR_TIME_SUBTRACT
> and INSTR_TIME_GET_MILLISEC
> macros for timing calculations?
>
> Also, have you thought about a way to trace existing (running) queries
> without directly instrumenting them? It would
> be much more usable for maintenance and support personnel, because in
> production environments you rarely could
> change query text directly. For the current state the most simple solution
> is switch tracing on and off by calling SQL
> function, and possibly switch tracing for prepared statement the same way,
> but not for any random query.
>
> I'll check the patch for the race conditions.
>
> --
> Regards,
> Nikita Malakhov
> Postgres Professional
> The Russian Postgres Company
> https://postgrespro.ru/
>


Re: Fix parallel vacuum buffer usage reporting

2024-04-22 Thread Anthonin Bonnefoy
On Sat, Apr 20, 2024 at 2:00 PM Alena Rybakina 
wrote:

> Hi, thank you for your work with this subject.
>
> While I was reviewing your code, I noticed that your patch conflicts with
> another patch [0] that been committed.
>
> [0]
> https://www.postgresql.org/message-id/flat/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com
>

I've rebased the patch and also split the changes:
1: Use pgBufferUsage in Vacuum and Analyze block reporting
2: Stop tracking buffer usage in the now unused Vacuum{Hit,Miss,Dirty}
variables
3: Remove declarations of Vacuum{Hit,Miss,Dirty}


v3-0003-Remove-declaration-of-Vacuum-block-usage-tracking.patch
Description: Binary data


v3-0001-Fix-parallel-vacuum-buffer-usage-reporting.patch
Description: Binary data


v3-0002-Remove-unused-tracking-of-vacuum-buffer-usage.patch
Description: Binary data


Re: Fix parallel vacuum buffer usage reporting

2024-04-24 Thread Anthonin Bonnefoy
Thanks for the review!


> I think that if the anayze command doesn't have the same issue, we
> don't need to change it.

Good point, I've wrongly assumed that analyze was also impacted but there's
no parallel analyze so the block count is correct.


> (a) make lazy vacuum use BufferUsage instead of
> VacuumPage{Hit,Miss,Dirty}. (backpatched down to pg13).
> (b) make analyze use BufferUsage and remove VacuumPage{Hit,Miss,Dirty}
> variables for consistency and simplicity (only for HEAD, if we agree).
>
 I've isolated the fix in the first patch. I've kept the analyze change and
removal of VacuumPage* variables split for clarity sake.

BTW I realized that VACUUM VERBOSE running on a temp table always
> shows the number of dirtied buffers being 0, which seems to be a bug.
> The patch (a) will resolve it as well.
>
Indeed, it visible with the following test:

SET temp_buffers='1024';
CREATE TEMPORARY TABLE vacuum_fix (aid int, bid int);
INSERT INTO vacuum_fix SELECT *, * FROM generate_series(1, 100);
VACUUM vacuum_fix;
UPDATE vacuum_fix SET bid = bid+1;
VACUUM (VERBOSE, INDEX_CLEANUP ON) vacuum_fix;

Pre-patch:
  avg read rate: 254.751 MB/s, avg write rate: 0.029 MB/s
  buffer usage: 8853 hits, 8856 misses, 1 dirtied
  WAL usage: 1 records, 1 full page images, 3049 bytes
The dirtied page is from pg_class (so not a local buffer)

With the fix:
  avg read rate: 250.230 MB/s, avg write rate: 250.230 MB/s
  buffer usage: 8853 hits, 8856 misses, 8856 dirtied
  WAL usage: 1 records, 1 full page images, 3049 bytes


v4-0002-Use-pgBufferUsage-for-analyze-block-reporting.patch
Description: Binary data


v4-0001-Fix-parallel-and-temp-vacuum-buffer-usage-reporti.patch
Description: Binary data


v4-0003-Remove-unused-variables-for-vacuum-buffer-usage.patch
Description: Binary data


Re: Fix parallel vacuum buffer usage reporting

2024-04-25 Thread Anthonin Bonnefoy
On Wed, Apr 24, 2024 at 4:01 PM Alena Rybakina 
wrote:

> I tested the main postgres branch with and without your fix using a script
> that was written by me. It consists of five scenarios and I made a
> comparison in the logs between the original version of the master branch
> and the master branch with your patch:
>
 Hi! Thanks for the tests.

I have attached a test file (vacuum_check_logs.sql)
>
Looking at the script, you won't trigger the problem. The reporting issue
will only happen if there's a parallel index vacuum and it will only happen
if there's at least 2 indexes [0]. You will need to create an additional
index.

The same script was run, but using vacuum verbose analyze, and I saw the
> difference again in the fifth step:
> with your patch: buffer usage: 32312 hits, 607 misses, 1566 dirtied
> master: buffer usage: 32346 hits, 573 misses, 1360 dirtied
>
Isn't there a chance for the checkpointer to run during this time? That
could make the conditions between the two runs slightly different and
explain the change in buffer report.

[0]
https://github.com/postgres/postgres/blob/8a1b31e6e59631807a08a4e9465134c343bbdf5e/src/backend/access/heap/vacuumlazy.c#L2826-L2831


Re: Fix parallel vacuum buffer usage reporting

2024-04-29 Thread Anthonin Bonnefoy
I've done some additional tests to validate the reported numbers. Using
pg_statio, it's possible to get the minimum number of block hits (Full
script attached).

-- Save block hits before vacuum
SELECT pg_stat_force_next_flush();
SELECT heap_blks_hit, idx_blks_hit FROM pg_statio_all_tables where
relname='vestat' \gset
vacuum (verbose, index_cleanup on) vestat;
-- Check the difference
SELECT pg_stat_force_next_flush();
SELECT heap_blks_hit - :heap_blks_hit as delta_heap_hit,
   idx_blks_hit - :idx_blks_hit as delta_idx_hit,
   heap_blks_hit - :heap_blks_hit + idx_blks_hit - :idx_blks_hit as sum
FROM pg_statio_all_tables where relname='vestat';

Output:
...
buffer usage: 14676 hits, 0 misses, 667 dirtied
buffer usage (new): 16081 hits, 0 misses, 667 dirtied
...
 -[ RECORD 1 ]--+--
delta_heap_hit | 9747
delta_idx_hit  | 6325
sum| 16072

>From pg_statio, we had 16072 blocks for the relation + indexes.
Pre-patch, we are under reporting with 14676.
Post-patch, we have 16081. The 9 additional block hits come from vacuum
accessing catalog tables like pg_class or pg_class_oid_index.


vacuum_block_with_pgstatio.sql
Description: Binary data


Re: Fix parallel vacuum buffer usage reporting

2024-05-02 Thread Anthonin Bonnefoy
On Wed, May 1, 2024 at 5:37 AM Masahiko Sawada 
wrote:

> Thank you for further testing! I've pushed the patch.

Thanks!

Here is the rebased version for the follow-up patch removing VacuumPage
variables. Though I'm not sure if I should create a dedicated mail thread
since the bug was fixed and the follow-up is more of a refactoring. What
do you think?


v1-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patch
Description: Binary data


Use pgBufferUsage for block reporting in analyze

2024-05-10 Thread Anthonin Bonnefoy
Hi,

Analyze logs within autovacuum uses specific variables
VacuumPage{Hit,Miss,Dirty} to track the buffer usage count. However,
pgBufferUsage already provides block usage tracking and handles more cases
(temporary tables, parallel workers...).

Those variables were only used in two places, block usage reporting in
verbose vacuum and analyze. 5cd72cc0c5017a9d4de8b5d465a75946da5abd1d
removed their usage in the vacuum command as part of a bugfix.

This patch replaces those Vacuum specific variables by pgBufferUsage
in analyze. This makes VacuumPage{Hit,Miss,Dirty} unused and removable.
This commit removes both their calls in bufmgr and their declarations.

Regards,
Anthonin


v1-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patch
Description: Binary data


Possible incorrect row estimation for Gather paths

2024-05-24 Thread Anthonin Bonnefoy
Hi,

While experimenting on an explain option to display all plan candidates
(very rough prototype here [1]), I've noticed some discrepancies in some
generated plans.

EXPLAIN (ALL_CANDIDATES) SELECT * FROM pgbench_accounts order by aid;
 Plan 1
->  Gather Merge  (cost=11108.32..22505.38 rows=10 width=97)
 Workers Planned: 1
 ->  Sort  (cost=10108.31..10255.37 rows=58824 width=97)
   Sort Key: aid
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..2228.24
rows=58824 width=97)
 Plan 2
->  Gather Merge  (cost=11108.32..17873.08 rows=58824 width=97)
 Workers Planned: 1
 ->  Sort  (cost=10108.31..10255.37 rows=58824 width=97)
   Sort Key: aid
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..2228.24
rows=58824 width=97)

The 2 plans are similar except one Gather Merge has a lower 58K estimated
rows.

The first plan is created with generate_useful_gather_paths with
override_rows=false then create_gather_merge_path and will use rel->rows as
the row count (so the 100K rows of pgbench_accounts).
#0: create_gather_merge_path(...) at pathnode.c:1885:30
#1: generate_useful_gather_paths(... override_rows=false) at
allpaths.c:3286:11
#2: apply_scanjoin_target_to_paths(...) at planner.c:7744:3
#3: grouping_planner(...) at planner.c:1638:3

The second plan is created through create_ordered_paths then
create_gather_merge_path and the number of rows is estimated to
(worker_rows * parallel_workers). Since we only have 1 parallel worker,
this yields 58K rows.
#0: create_gather_merge_path(...) at pathnode.c:1885:30
#1: create_ordered_paths(...) at planner.c:5287:5
#2: grouping_planner(...) at planner.c:1717:17

The 58K row estimate looks possibly incorrect. A worker row count is
estimated using total_rows/parallel_divisor. The parallel_divisor will
include the possible leader participation and will be 1.7 for 1 worker thus
the 58K rows (100K/1.7=58K)
However, the gather merge will only do 58K*1, dropping the leader
participation component.

I have a tentative patch split in two changes:
1: This is a related refactoring to remove an unnecessary and confusing
assignment of rows in create_gather_merge_path. This value is never used
and immediately overwritten in cost_gather_merge
2: This changes the row estimation of gather path to use
worker_rows*parallel_divisor to get a more accurate estimation.
Additionally, when creating a gather merge path in create_ordered_paths, we
try to use the source's rel rows when available.

The patch triggered a small change in the hash_join regression test. Pre
patch, we had the following candidates.
Plan 4
->  Aggregate  (cost=511.01..511.02 rows=1 width=8)
 ->  Gather  (cost=167.02..511.00 rows=2 width=0)
   Workers Planned: 1
   ->  Parallel Hash Join  (cost=167.02..510.80 rows=1 width=0)
 Hash Cond: (r.id = s.id)
 ->  Parallel Seq Scan on simple r  (cost=0.00..299.65
rows=11765 width=4)
 ->  Parallel Hash  (cost=167.01..167.01 rows=1 width=4)
   ->  Parallel Seq Scan on extremely_skewed s
 (cost=0.00..167.01 rows=1 width=4)
Plan 5
->  Finalize Aggregate  (cost=510.92..510.93 rows=1 width=8)
 ->  Gather  (cost=510.80..510.91 rows=1 width=8)
   Workers Planned: 1
   ->  Partial Aggregate  (cost=510.80..510.81 rows=1 width=8)
 ->  Parallel Hash Join  (cost=167.02..510.80 rows=1
width=0)
   Hash Cond: (r.id = s.id)
   ->  Parallel Seq Scan on simple r
 (cost=0.00..299.65 rows=11765 width=4)
   ->  Parallel Hash  (cost=167.01..167.01 rows=1
width=4)
 ->  Parallel Seq Scan on extremely_skewed s
 (cost=0.00..167.01 rows=1 width=4)

Post patch, the plan candidates became:
Plan 4
->  Finalize Aggregate  (cost=511.02..511.03 rows=1 width=8)
 ->  Gather  (cost=510.80..511.01 rows=2 width=8)
   Workers Planned: 1
   ->  Partial Aggregate  (cost=510.80..510.81 rows=1 width=8)
 ->  Parallel Hash Join  (cost=167.02..510.80 rows=1
width=0)
   Hash Cond: (r.id = s.id)
   ->  Parallel Seq Scan on simple r
 (cost=0.00..299.65 rows=11765 width=4)
   ->  Parallel Hash  (cost=167.01..167.01 rows=1
width=4)
 ->  Parallel Seq Scan on extremely_skewed s
 (cost=0.00..167.01 rows=1 width=4)
Plan 5
->  Aggregate  (cost=511.01..511.02 rows=1 width=8)
 ->  Gather  (cost=167.02..511.00 rows=2 width=0)
   Workers Planned: 1
   ->  Parallel Hash Join  (cost=167.02..510.80 rows=1 width=0)
 Hash Cond: (r.id = s.id)
 ->  Parallel Seq Scan on simple r  (cost=0.00..299.65
rows=11765 width=4)
 ->  Parallel Hash  (cost=167.01..167.01 rows=1 width=4)
   ->  Parallel Seq Scan on extremely_skewed s
 (cost=0.00..167.01 rows=1 width=4)

The FinalizeAggreg

Re: Use pgBufferUsage for block reporting in analyze

2024-05-28 Thread Anthonin Bonnefoy
Thanks for having a look.

On Fri, May 10, 2024 at 12:40 PM Michael Paquier 
wrote:

> This needs some runtime check to make sure that the calculations
> are consistent before and after the fact (cannot do that now).
>
Yeah, testing this is also a bit painful as buffer usage of analyze is only
displayed in the logs during autoanalyze. While looking at this, I've
thought of additional changes that could make testing easier and improve
consistency with VACUUM VERBOSE:
- Have ANALYZE VERBOSE outputs the buffer usage stats
- Add Wal usage to ANALYZE VERBOSE

analyze verbose output would look like:
postgres=# analyze (verbose) pgbench_accounts ;
INFO:  analyzing "public.pgbench_accounts"
INFO:  "pgbench_accounts": scanned 1640 of 1640 pages, containing 10
live rows and 0 dead rows; 3 rows in sample, 10 estimated total rows
INFO:  analyze of table "postgres.public.pgbench_accounts"
avg read rate: 124.120 MB/s, avg write rate: 0.110 MB/s
buffer usage: 533 hits, 1128 reads, 1 dirtied
WAL usage: 12 records, 1 full page images, 5729 bytes
system usage: CPU: user: 0.06 s, system: 0.00 s, elapsed: 0.07 s

Perhaps this should say "read" rather than "miss" in the logs as the
> two read variables for the shared and local blocks are used?  For
> consistency, at least.
>
Sounds good.

That's not material for v17, only for v18.
>
 Definitely

I've split the patch in two parts
1: Removal of the vacuum specific variables, this is the same as the
initial patch.
2: Add buffer and wal usage to analyze verbose output + rename miss to
reads

Regards,
Anthonin


v2-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patch
Description: Binary data


v2-0002-Output-buffer-and-wal-usage-with-verbose-analyze.patch
Description: Binary data


Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bind

2024-08-21 Thread Anthonin Bonnefoy
On Wed, Aug 21, 2024 at 12:00 AM Jelte Fennema-Nio  wrote:
> @Anthonin are you planning to update the patch accordingly?

Here's the patch with \bindx renamed to \bind_named.

I've also made a small change to Michael's refactoring in 0002 by
initialising success to false in ExecQueryAndProcessResults. There was
a compiler warning about success possibly used uninitialized[1].

[1] https://cirrus-ci.com/task/6207675187331072


v7-0001-psql-Add-support-for-prepared-stmt-with-extended-.patch
Description: Binary data


v7-0002-psql-Refactor-status-of-extended-protocol-command.patch
Description: Binary data


Segfault in jit tuple deforming on arm64 due to LLVM issue

2024-08-22 Thread Anthonin Bonnefoy
er,
when a new section is allocated, LLVM will request a new memory block
from the memory allocator[4]. The MemGroup.Near is passed as the start
hint of mmap but that's only a hint and the kernel doesn't provide any
guarantee that the new allocated block will be near. With the impacted
query, there are more than 10GB of gap between the .text section and
the .rodata section, making it impossible for the code in the .text
section to correctly fetch data from the .rodata section as the
address in ADRP is limited to a +/-4GB range.

There are mentions about this in the ABI that the GOT section should
be within 4GB from the text section[5]. Though in this case, there's
no GOT section as the offsets are stored in the .rodata section but
the constraint is going to be similar. This is a known LLVM issue[6]
that impacted Impala, Numba and Julia. There's an open PR[7] to fix
the issue by allocating all sections as a single memory block,
avoiding the gaps between sections. There's also a related discussion
on this on llvm-rtdyld discourse[8].

A possible mitigation is to switch from RuntimeDyld to JITLinking but
this requires at least LLVM15 as LLVM14 doesn't have any significant
relocation support for aarch64[9]. I did test using JITLinking on my
impacted db and it seems to fix the issue. JITLinking has no exposed C
interface though so it requires additional wrapping.

I don't necessarily have a good answer for this issue. I've tried to
tweak relocation settings or the jit code to avoid relocation without
too much success. Ideally, the llvm fix will be merged and backported
in llvm but the PR has been open for some time now. I've seen multiple
segfault reports that look similar to this issue (example: [10], [11])
but I don't think it was linked to the LLVM bug so I figured I would
at least share my findings.

[1] 
https://github.com/postgres/postgres/blob/REL_14_STABLE/src/backend/jit/llvm/llvmjit_deform.c#L364-L382
[2] 
https://github.com/llvm/llvm-project/blob/release/14.x/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp#L501-L513
[3] 
https://github.com/llvm/llvm-project/blob/release/14.x/llvm/lib/ExecutionEngine/SectionMemoryManager.cpp#L41C32-L41C47
[4] 
https://github.com/llvm/llvm-project/blob/release/14.x/llvm/lib/ExecutionEngine/SectionMemoryManager.cpp#L94-L110
[5] 
https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst#7code-models
[6] https://github.com/llvm/llvm-project/issues/71963
[7] https://github.com/llvm/llvm-project/pull/71968
[8] 
https://discourse.llvm.org/t/llvm-rtdyld-aarch64-abi-relocation-restrictions/74616
[9] 
https://github.com/llvm/llvm-project/blob/release/14.x/llvm/lib/ExecutionEngine/JITLink/ELF_aarch64.cpp#L75-L84
[10] 
https://www.postgresql.org/message-id/flat/CABa%2BnRvwZy_5t1QF9NJNGwAf03tv_PO_Sg1FsN1%2B-3Odb1XgBA%40mail.gmail.com
[11] 
https://www.postgresql.org/message-id/flat/CADAf1kavcN-kY%3DvEm3MYxhUa%2BrtGFs7tym5d7Ee6Ni2cwwxGqQ%40mail.gmail.com

Regards,
Anthonin Bonnefoy




Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

2024-08-23 Thread Anthonin Bonnefoy
On Thu, Aug 22, 2024 at 12:33 PM Thomas Munro  wrote:
> I fear that back-porting, for the LLVM project, would mean "we fix it
> in main/20.x, and also back-port it to 19.x".  Do distros back-port
> further?

That's also my fear, I'm not familiar with distros back-port policy
but eyeballing ubuntu package changelog[1], it seems to be mostly
build fixes.

Given that there's no visible way to fix the relocation issue, I
wonder if jit shouldn't be disabled for arm64 until either the
RuntimeDyld fix is merged or the switch to JITLink is done. Disabling
jit tuple deforming may be enough but I'm not confident the issue
won't happen in a different part.

[1] https://launchpad.net/ubuntu/+source/llvm-toolchain-16/+changelog




Re: Set query_id for query contained in utility statement

2024-08-26 Thread Anthonin Bonnefoy
On Mon, Aug 26, 2024 at 5:26 AM jian he  wrote:
>queryid| left
>   | plans | calls | rows
> --+--+---+---+--
>   2800308901962295548 | EXPLAIN (verbose) create table test_t as
> select 1;   | 2 | 2 |0
>   2093602470903273926 | EXPLAIN (verbose) create table test_t as
> select $1   | 0 | 2 |0
>
> "EXPLAIN (verbose) create table test_t as select 1;" called twice,
> is that what we expect?

pg_stat_statements reports nested queries and toplevel queries
separately. You can check with the toplevel column.
2800308901962295548 is the nested ctas part while 2093602470903273926
is the top explain utility statement (which explain why there's 0
plans since it's an utility statement). Since the explain ctas query
was called twice, it will be reported as 2 toplevel queries and 2
nested queries.

> should first row, the normalized query becomes
> EXPLAIN (verbose) create table test_t as select $1;

Good point, the issue in this case was the nested query was stored by
pg_stat_statements during the ExecutorEnd hook. This hook doesn't have
the jstate and parseState at that point so pg_stat_statements can't
normalize the query.

I've modified the patch to fix this. Instead of just jumbling the
query in ExplainQuery, I've moved jumbling in ExplainOneUtility which
already has specific code to handle ctas and dcs. Calling the post
parse hook here allows pg_stat_statements to store the normalized
version of the query for this queryid and nesting level.


v3-0001-Set-query_id-for-queries-contained-in-utility-sta.patch
Description: Binary data


Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

2024-08-26 Thread Anthonin Bonnefoy
On Mon, Aug 26, 2024 at 4:33 AM Thomas Munro  wrote:
> IIUC this one is a random and rare crash depending on malloc() and
> perhaps also the working size of your virtual memory dart board.
> (Annoyingly, I had tried to reproduce this quite a few times on small ARM
> systems when earlier reports came in, d'oh!).

allocateMappedMemory used when creating sections will eventually call
mmap[1], not malloc. So the amount of shared memory configured may be
a factor in triggering the issue.

My first attempts to reproduce the issue from scratch weren't
successful either. However, trying again with different values of
shared_buffers, I've managed to trigger the issue somewhat reliably.

On a clean Ubuntu jammy, I've compiled the current PostgreSQL
REL_14_STABLE (6bc2bfc3) with the following options:
CLANG=clang-14 ../configure --enable-cassert --enable-debug --prefix
~/.local/ --with-llvm

Set "shared_buffers = '4GB'" in the configuration. More may be needed
but 4GB was enough for me.

Create a table with multiple partitions with pgbench. The goal is to
have a jit module big enough to trigger the issue.
pgbench -i --partitions=64

Then run the following query with jit forcefully enabled:
psql options=-cjit_above_cost=0 -c 'SELECT count(bid) from pgbench_accounts;'

If the issue was successfully triggered, it should segfault or be
stuck in an infinite loop.

> Ultimately, if it doesn't work, and doesn't get fixed, it's hard for
> us to do much about it.  But hmm, this is probably madness... I wonder
> if it would be feasible to detect address span overflow ourselves at a
> useful time, as a kind of band-aid defence...

There's a possible alternative, but it's definitely in the same
category as the hot-patching idea. llvmjit uses
LLVMOrcCreateRTDyldObjectLinkingLayerWithSectionMemoryManager to
create the ObjectLinkingLayer and it will be created with the default
SectionMemoryManager[2]. It should be possible to provide a modified
SectionMemoryManager with the change to allocate sections in a single
block and it could be restricted to arm64 architecture. A part of me
tells me this is probably a bad idea but on the other hand, LLVM
provides this way to plug a custom allocator and it would fix the
issue...

[1] 
https://github.com/llvm/llvm-project/blob/release/14.x/llvm/lib/Support/Unix/Memory.inc#L115-L117
[2] 
https://github.com/llvm/llvm-project/blob/release/14.x/llvm/lib/ExecutionEngine/Orc/OrcV2CBindings.cpp#L967-L973




Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

2024-08-27 Thread Anthonin Bonnefoy
On Tue, Aug 27, 2024 at 1:33 AM Thomas Munro  wrote:
> I am sure this requires changes for various LLVM versions.  I tested
> it with LLVM 14 on a Mac where I've never managed to reproduce the
> original complaint, but ... h, this might be exacerbated by ASLR,
> and macOS only has a small ALSR slide window (16M or 256M apparently,
> according to me in another thread), so I'd probably have to interpose
> my own mmap() to choose some more interesting addresses, or run some
> other OS, but that's quite enough rabbit holes for one morning.

I've tested the patch. I had to make sure the issue was triggered on
master first. The issue didn't happen with 4GB shared_buffers and 64
partitions. However, increasing to 6GB and 128 partitions triggered
the issue.

The architecture check in the patch was incorrect (__arch64__ instead
of __aarch64__, glad to see I'm not the only one being confused with
aarch64 and arm64 :)) but once fixed, it worked and avoided the
segfault.

I've run some additional tests to try to test different parameters:
- I've tried disabling randomize_va_space, the issue still happened
even with ASLR disabled.
- I've tested different PG versions. With 14 and 15, 4GB and 64
partitions were enough. Starting PG 16, I had to increase
shared_buffers to 6GB and partitions to 128. I've been able to trigger
the issue on all versions from 14 to master (which was expected but I
wanted confirmation)
- I haven't been able to reproduce this on a macOS either. I've tried
to remove MemGroup.Near hint so mmap addresses would be more random
and played with different shared_buffers and partition values without
success

I've modified the patch with 3 changes:
- meson.build was using SectionMemoryManager.cpp file name, I've
replaced with SafeSectionMemoryManager.cpp
- Use __aarch64__ instead of __arch64__
- Moved the architecture switch to llvm_create_object_layer and go
through the normal
LLVMOrcCreateRTDyldObjectLinkingLayerWithSectionMemoryManager on non
arm64 architectures. There's no need to use the custom memory manager
for non arm64 so it looked better to avoid it entirely if there's no
need for the reserve allocation.


v3-0001-XXX-LLVM-ARM-relocation-bug-mitigation.patch
Description: Binary data


Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

2024-08-27 Thread Anthonin Bonnefoy
On Tue, Aug 27, 2024 at 12:01 PM Thomas Munro  wrote:
>
> Thanks!  And that's great news.  Do you want to report this experience
> to the PR, in support of committing it?  That'd make it seem easier to
> consider shipping a back-ported copy...

Yes, I will do that.




Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

2024-08-29 Thread Anthonin Bonnefoy
On Wed, Aug 28, 2024 at 12:24 AM Thomas Munro  wrote:
> 2.  I tested against LLVM 10-18, and found that 10 and 11 lack some
> needed symbols.  So I just hid this code from them.  Even though our
> stable branches support those and even older versions, I am not sure
> if it's worth trying to do something about that for EOL'd distros that
> no one has ever complained about.  I am willing to try harder if
> someone thinks that's important...

I would also assume that people using arm64 are more likely to use
recent versions than not.

I've done some additional tests on different LLVM versions with both
the unpatched version (to make sure the crash was triggered) and the
patched version. I'm joining the test scripts I've used as reference.
They target a kubernetes pod since it was the easiest way for me to
get a test ubuntu Jammy:
- setup_pod.sh: Install necessary packages, get multiple llvm
versions, fetch and compile master and patched version of postgres on
different LLVM version
- run_test.sh: go through all LLVM versions for both unpatched and
patched postgres to run the test_script.sh
- test_script.sh: ran inside the pod to setup the db with the
necessary tables and check if the crash happens

This generated the following output:
Test unpatched version on LLVM 19, : Crash triggered
Test unpatched version on LLVM 18, libLLVM-18.so.18.1: Crash triggered
Test unpatched version on LLVM 17, libLLVM-17.so.1: Crash triggered
Test unpatched version on LLVM 16, libLLVM-16.so.1: Crash triggered
Test unpatched version on LLVM 15, libLLVM-15.so.1: Crash triggered
Test unpatched version on LLVM 14, libLLVM-14.so.1: Crash triggered
Test unpatched version on LLVM 13, libLLVM-13.so.1: Crash triggered

Test patched version on LLVM 19, : Query ran successfully
Test patched version on LLVM 18, libLLVM-18.so.18.1: Query ran successfully
Test patched version on LLVM 17, libLLVM-17.so.1: Query ran successfully
Test patched version on LLVM 16, libLLVM-16.so.1: Query ran successfully
Test patched version on LLVM 15, libLLVM-15.so.1: Query ran successfully
Test patched version on LLVM 14, libLLVM-14.so.1: Query ran successfully
Test patched version on LLVM 13, libLLVM-13.so.1: Query ran successfully

I try to print the libLLVM linked to llvm.jit in the output to double
check whether I test on the correct version. The LLVM 19 package only
provides static libraries (probably because it's still a release
candidate?) so it shows as empty in the output. There was no LLVM 12
available when using the llvm.sh script so I couldn't test it. As for
the result, prepatch PG all crashed as expected while the patched
version was able to run the query successfully.

> Next, I think we should wait to see if the LLVM project commits that
> PR, this so that we can sync with their 19.x stable branch, instead of
> using code from a PR.  Our next minor release is in November, so we
> have some time.  If they don't commit it, we can consider it anyway: I
> mean, it's crashing all over the place in production, and we see that
> other projects are shipping this code already.

The PR[1] just received an approval and it sounds like they are ok to
eventually merge it.

[1] https://github.com/llvm/llvm-project/pull/71968
#!/bin/bash

# Don't want coredumps
ulimit -c 0

# Stop running PG
/var/lib/postgresql/.local/bin/pg_ctl -D /var/lib/postgresql/db_data stop &>> /dev/null

# Clean db_data
rm -rf /var/lib/postgresql/db_data

# Create new dbdata
/var/lib/postgresql/.local/bin/initdb -D /var/lib/postgresql/db_data -Apeer > /dev/null
echo "port = 5432
shared_buffers='4GB'
" > /var/lib/postgresql/db_data/postgresql.conf

# Start and setup test partitioned table
/var/lib/postgresql/.local/bin/pg_ctl -D /var/lib/postgresql/db_data start -l /tmp/pg.log > /dev/null
/var/lib/postgresql/.local/bin/pgbench -i --partitions=128 2> /dev/null

# Run the query to trigger the issue
/var/lib/postgresql/.local/bin/psql options=-cjit_above_cost=0 -c 'SELECT count(bid) from pgbench_accounts;' &>> /dev/null
exit_code=$?

if (( $exit_code == 2 )); then
echo "Crash triggered"
elif (( $exit_code == 0 )); then
echo "Query ran successfully"
else
echo "Other issue while running the query"
fi
#!/bin/bash
set -eu

TEST_POD=$(kubectl get pod -l app="test_pg" -nsre -o=jsonpath='{.items[*].metadata.name}')
LLVM_VERSIONS=(19 18 17 16 15 14 13)

get_llvm_version()
{
kubectl exec "$pod" -- bash -c "ldd /var/lib/postgresql/.local/lib/llvmjit.so | grep libLLVM | awk '{print \$1}'"
}

run_test_on_pod()
{
local pod="$1"

kubectl cp test_script.sh "$pod":/var/lib/postgresql/

for version in ${LLVM_VERSIONS[@]}; do
build_dir="~/postgres_build/build_${version}"
kubectl exec "$pod" -- bash -c "su postgres -c \" cd $build_dir; make -j19 install &>> /dev/null \" "

echo -n "Test unpatched version on LLVM $version, $(get_llvm_version): "
kubectl exec "$pod" -- bash -c "su postgres -c \" cd; bash test_script.sh \" "
done

for version in 

Re: Set query_id for query contained in utility statement

2024-08-30 Thread Anthonin Bonnefoy
On Tue, Aug 27, 2024 at 11:14 AM jian he  wrote:
> also it's ok to use passed (ParseState *pstate) for
> {
> estate = CreateExecutorState();
> estate->es_param_list_info = params;
> paramLI = EvaluateParams(pstate, entry, execstmt->params, estate);
> }
> ?
> I really don't know.
>
> some of the change is refactoring, maybe you can put it into a separate patch.

Thanks for the review. I think the parser state is mostly used for the
error callbacks and parser_errposition but I'm not 100% sure. Either
way, you're right and it probably shouldn't be in the patch. I've
modified the patch to restrict the changes to only add the necessary
query jumble and post parse hook calls.


v4-0001-Set-query_id-for-queries-contained-in-utility-sta.patch
Description: Binary data


Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

2024-08-30 Thread Anthonin Bonnefoy
I created a commitfest entry[1] to have the CI test the patch. There
was a failure in headerscheck and cpluspluscheck when the include of
SectionMemoryManager.h is checked[2]

In file included from /usr/include/llvm/ADT/SmallVector.h:18,
from /tmp/cirrus-ci-build/src/include/jit/SectionMemoryManager.h:23,
from /tmp/headerscheck.4b1i5C/test.c:2:
/usr/include/llvm/Support/type_traits.h:17:10: fatal error:
type_traits: No such file or directory
   17 | #include 

Since the SmallVector.h include type_traits, this file can't be
compiled with a C compiler so I've just excluded it from headerscheck.

Loosely related to headerscheck, running it locally was failing as it
couldn't find the  file. This is because headerscheck
except llvm include files to be in /usr/include and don't rely on
llvm-config. I created a second patch to use the LLVM_CPPFLAGS as
extra flags when testing the src/include/jit/* files.

Lastly, I've used www.github.com instead of github.com link to stop
spamming the llvm-project's PR with reference to the commit every time
it is pushed somewhere (which seems to be the unofficial hack[3]).

[1] https://commitfest.postgresql.org/49/5220/
[2] 
https://cirrus-ci.com/task/4646639124611072?logs=headers_headerscheck#L42-L46
[3] 
https://github.com/orgs/community/discussions/23123#discussioncomment-3239240


v6-0002-Add-LLVM_CPPFLAGS-in-headerscheck-to-llvm-jit-fil.patch
Description: Binary data


v6-0001-Backport-of-LLVM-code-to-fix-ARM-relocation-bug.patch
Description: Binary data


Re: Possible incorrect row estimation for Gather paths

2024-07-16 Thread Anthonin Bonnefoy
Hi Rafia,

Thanks for reviewing.

On Wed, Jul 10, 2024 at 4:54 PM Rafia Sabih  wrote:
> But I don't quite understood the purpose of this,
> + total_groups = input_rel->rows;
> +
> + /*
> + * If the number of rows is unknown, fallback to gather rows
> + * estimation
> + */
> + if (total_groups == 0)
> + total_groups = gather_rows_estimate(input_path);
>
> why not just use total_groups = gather_rows_estimate(input_path), what
> is the importance of having total_groups = input_rel->rows?

The initial goal was to use the source tuples if available and avoid
possible rounding errors. Though I realise that the difference would
be minimal. For example, 200K tuples and 3 workers would yield
int(int(20 / 2.4) * 2.4)=19. That is probably not worth the
additional complexity, I've updated the patch to just use
gather_rows_estimate.

> With respect to the change introduced by the patch in the regression
> test, I wonder if we should test it on the tables of a larger scale
> and check for performance issues.
> Imagine the case when Parallel Seq Scan on extremely_skewed s
> (cost=0.00..167.01 rows=1 width=4) returns 1000 rows instead of 1 ...
> I wonder which plan would perform better then or will there be a
> totally different plan.
For the extremely_skewed test, having the parallel seq scan returning
more rows won't impact the Gather since The Hash Join will reduce the
number of rows to 1. I've found an example where we can see the plan
changes with the default settings:

CREATE TABLE simple (id SERIAL PRIMARY KEY, other bigint);
insert into simple select generate_series(1,50), ceil(random()*100);
analyze simple;
EXPLAIN SELECT * FROM simple where other < 10 order by id;

Unpatched:
 Gather Merge  (cost=9377.85..12498.60 rows=27137 width=12)
   Workers Planned: 1
   ->  Sort  (cost=8377.84..8445.68 rows=27137 width=12)
 Sort Key: id
 ->  Parallel Seq Scan on simple  (cost=0.00..6379.47
rows=27137 width=12)
   Filter: (other < 10)

Patched:
 Sort  (cost=12381.73..12492.77 rows=44417 width=12)
   Sort Key: id
   ->  Seq Scan on simple  (cost=0.00..8953.00 rows=44417 width=12)
 Filter: (other < 10)

Looking at the candidates, the previous Gather Merge now has an
estimated number of rows of 44418. The 1 difference compared to the
other Gather Merge plan is due to rounding (26128 * 1.7 = 44417.6).

 Plan 3
   ->  Gather Merge  (cost=9296.40..14358.75 rows=44418 width=12)
 Workers Planned: 1
 ->  Sort  (cost=8296.39..8361.71 rows=26128 width=12)
   Sort Key: id
   ->  Parallel Seq Scan on simple  (cost=0.00..6379.47
rows=26128 width=12)
 Filter: (other < 10)
 Plan 4
   ->  Gather Merge  (cost=9296.40..14358.63 rows=44417 width=12)
 Workers Planned: 1
 ->  Sort  (cost=8296.39..8361.71 rows=26128 width=12)
   Sort Key: id
   ->  Parallel Seq Scan on simple  (cost=0.00..6379.47
rows=26128 width=12)
 Filter: (other < 10)
 Plan 5
   ->  Sort  (cost=12381.73..12492.77 rows=44417 width=12)
 Sort Key: id
 ->  Seq Scan on simple  (cost=0.00..8953.00 rows=44417 width=12)
   Filter: (other < 10)

The Sort plan is slightly slower than the Gather Merge plan: 100ms
average against 83ms but the Gather Merge comes at the additional cost
of creating and using a parallel worker. The unpatched row estimation
makes the parallel plan look cheaper and running a parallel query for
a 17ms improvement doesn't seem like a good trade.

I've also realised from the comments in optimizer.h that
nodes/pathnodes.h should not be included there and fixed it.

Regards,
Anthonin


v2-0001-Remove-unnecessary-assignment-of-path-rows-in-gat.patch
Description: Binary data


v2-0002-Fix-row-estimation-in-gather-paths.patch
Description: Binary data


Re: query_id, pg_stat_activity, extended query protocol

2024-07-17 Thread Anthonin Bonnefoy
Hi,

Wouldn't it be enough to call pgstat_report_query_id in ExecutorRun
and ProcessUtility? With those changes [1], both normal statements and
utility statements called through extended protocol will correctly
report the query_id.

-- Test utility statement with extended protocol
show all \bind \g

-- Check reported query_id
select query, query_id from pg_stat_activity where
application_name ='psql' and pid!=pg_backend_pid();
   query   |  query_id
---+-
 show all  | -866221123969716490

[1] 
https://github.com/bonnefoa/postgres/commit/bf4b332d7b481549c6d9cfa70db51e39a305b9b2

Regards,
Anthonin




Correctly propagate queryId for utility stmt in function

2024-07-18 Thread Anthonin Bonnefoy
Hi,

For utility statements defined within a function, the queryTree is
copied to a plannedStmt as utility commands don't require planning.
However, the queryId is not propagated to the plannedStmt. This leads
to plugins relying on queryId like pg_stat_statements to not be able
to track utility statements within function calls.

This patch fixes the issue by correctly propagating queryId from the
cached queryTree to the plannedStmt.

Regards,
Anthonin
From b70163d1bd1279ca3f9b29166431cc64d25ca586 Mon Sep 17 00:00:00 2001
From: Anthonin Bonnefoy 
Date: Thu, 18 Jul 2024 11:50:37 +0200
Subject: Correctly propagate queryId for utility stmt in function

For utility statements defined within a function, the queryTree is
copied to a plannedStmt as utility commands don't require planning.
However, the queryId is not propagated to the plannedStmt. This leads to
plugins relying on queryId like pg_stat_statements to not be able to
track utility statements within function calls.

This patch fixes the issue by correctly propagating queryId from the
cached queryTree to the plannedStmt.
---
 .../expected/level_tracking.out   | 28 +++
 .../pg_stat_statements/sql/level_tracking.sql | 12 
 src/backend/executor/functions.c  |  1 +
 3 files changed, 41 insertions(+)

diff --git a/contrib/pg_stat_statements/expected/level_tracking.out b/contrib/pg_stat_statements/expected/level_tracking.out
index fe01dd79af..f7b2ffceb3 100644
--- a/contrib/pg_stat_statements/expected/level_tracking.out
+++ b/contrib/pg_stat_statements/expected/level_tracking.out
@@ -67,6 +67,34 @@ SELECT toplevel, calls, query FROM pg_stat_statements
  t| 1 | SET pg_stat_statements.track = 'all'
 (7 rows)
 
+-- Procedure with multiple utility statements - all level tracking.
+CREATE OR REPLACE PROCEDURE proc_with_utility_stmt()
+LANGUAGE SQL
+AS $$
+SHOW pg_stat_statements.track;
+SHOW pg_stat_statements.track_utility;
+$$;
+CALL proc_with_utility_stmt();
+SELECT toplevel, calls, query FROM pg_stat_statements
+  WHERE query LIKE '%SHOW%' ORDER BY query COLLATE "C", toplevel;
+ toplevel | calls |query 
+--+---+--
+ t| 1 | CREATE OR REPLACE PROCEDURE proc_with_utility_stmt()+
+  |   | LANGUAGE SQL+
+  |   | AS $$   +
+  |   | SHOW pg_stat_statements.track;  +
+  |   | SHOW pg_stat_statements.track_utility;  +
+  |   | $$
+ f| 1 | SHOW pg_stat_statements.track
+ f| 1 | SHOW pg_stat_statements.track_utility
+(3 rows)
+
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+ t 
+---
+ t
+(1 row)
+
 -- DO block - top-level tracking without utility.
 SET pg_stat_statements.track = 'top';
 SET pg_stat_statements.track_utility = FALSE;
diff --git a/contrib/pg_stat_statements/sql/level_tracking.sql b/contrib/pg_stat_statements/sql/level_tracking.sql
index aa37408d52..115ff392a0 100644
--- a/contrib/pg_stat_statements/sql/level_tracking.sql
+++ b/contrib/pg_stat_statements/sql/level_tracking.sql
@@ -33,6 +33,18 @@ END; $$;
 SELECT toplevel, calls, query FROM pg_stat_statements
   ORDER BY query COLLATE "C", toplevel;
 
+-- Procedure with multiple utility statements - all level tracking.
+CREATE OR REPLACE PROCEDURE proc_with_utility_stmt()
+LANGUAGE SQL
+AS $$
+SHOW pg_stat_statements.track;
+SHOW pg_stat_statements.track_utility;
+$$;
+CALL proc_with_utility_stmt();
+SELECT toplevel, calls, query FROM pg_stat_statements
+  WHERE query LIKE '%SHOW%' ORDER BY query COLLATE "C", toplevel;
+SELECT pg_stat_statements_reset() IS NOT NULL AS t;
+
 -- DO block - top-level tracking without utility.
 SET pg_stat_statements.track = 'top';
 SET pg_stat_statements.track_utility = FALSE;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 539cd0a999..692854e2b3 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -492,6 +492,7 @@ init_execution_state(List *queryTree_list,
 stmt->utilityStmt = queryTree->utilityStmt;
 stmt->stmt_location = queryTree->stmt_location;
 stmt->stmt_len = queryTree->stmt_len;
+stmt->queryId = queryTree->queryId;
 			}
 			else
 stmt = pg_plan_query(queryTree,
-- 
2.39.3 (Apple Git-146)



Re: Possible incorrect row estimation for Gather paths

2024-07-22 Thread Anthonin Bonnefoy
On Wed, Jul 17, 2024 at 3:59 AM Richard Guo  wrote:
>
> I can reproduce this problem with the query below.
>
> explain (costs on) select * from tenk1 order by twenty;
>QUERY PLAN
> -
>  Gather Merge  (cost=772.11..830.93 rows=5882 width=244)
>Workers Planned: 1
>->  Sort  (cost=772.10..786.80 rows=5882 width=244)
>  Sort Key: twenty
>  ->  Parallel Seq Scan on tenk1  (cost=0.00..403.82 rows=5882 
> width=244)
> (5 rows)
I was looking for a test to add in the regress checks that wouldn't
rely on explain cost since it is disabled. However, I've realised I
could do something similar to what's done in stats_ext with the
check_estimated_rows function. I've added the get_estimated_rows test
function that extracts the estimated rows from the top node and uses
it to check the gather nodes' estimates. get_estimated_rows uses a
simple explain compared to check_estimated_rows which relies on an
explain analyze.

> I wonder if the changes in create_ordered_paths should also be reduced
> to 'total_groups = gather_rows_estimate(path);'.

It should already be the case with the patch v2. It does create
rounding errors that are visible in the tests but they shouldn't
exceed +1 or -1.

> I think perhaps it's better to declare gather_rows_estimate() in
> cost.h rather than optimizer.h.
> (BTW, I wonder if compute_gather_rows() would be a better name?)

Good point, I've moved the declaration and renamed it.

> I noticed another issue in generate_useful_gather_paths() -- *rowsp
> would have a random value if override_rows is true and we use
> incremental sort for gather merge.  I think we should fix this too.

That seems to be the case. I've tried to find a query that could
trigger this codepath without success. All grouping and distinct paths
I've tried where fully sorted and didn't trigger an incremental sort.
I will need a bit more time to check this.

In the meantime, I've updated the patches with the review comments.

Regards,
Anthonin


v3-0001-Remove-unnecessary-assignment-of-path-rows-in-gat.patch
Description: Binary data


v3-0002-Fix-row-estimation-in-gather-paths.patch
Description: Binary data


Re: Use pgBufferUsage for block reporting in analyze

2024-07-24 Thread Anthonin Bonnefoy
On Mon, Jul 22, 2024 at 10:59 PM Masahiko Sawada  wrote:
> The first line would vary depending on whether an autovacuum worker or
> not. And the above suggestion includes a change of term "row" to
> "tuple" for better consistency with VACUUM VERBOSE outputs. I think it
> would be great if autoanalyze also writes logs in the same format.
> IIUC with the patch, autoanalyze logs don't include the page and tuple
> statistics.

One issue is that the number of scanned pages, live tuples and dead
tuples is only available in acquire_sample_rows which is where the log
containing those stats is emitted. I've tried to implement the
following in 0003:
- Sampling functions now accept an AcquireSampleStats struct to store
pages and tuples stats
- Log is removed from sampling functions
- do_analyze_rel now outputs scanned and tuples statistics when
relevant. sampling from fdw doesn't provide those statistics so they
are not displayed in those cases.

This ensures that analyze logs are only emitted in do_analyze_rel,
allowing to display the same output for both autoanalyze and ANALYZE
VERBOSE. With those changes, we have the following output for analyze
verbose of a table:

analyze (verbose) pgbench_accounts ;
INFO:  analyzing "public.pgbench_accounts"
INFO:  analyze of table "postgres.public.pgbench_accounts"
pages: 1640 of 1640 scanned
tuples: 10 live tuples, 0 are dead; 3 tuples in samples,
10 estimated total tuples
avg read rate: 174.395 MB/s, avg write rate: 0.000 MB/s
buffer usage: 285 hits, 1384 reads, 0 dirtied
WAL usage: 14 records, 0 full page images, 1343 bytes
system usage: CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.06 s

For a file_fdw, the output will look like:

analyze (verbose) pglog;
INFO:  analyzing "public.pglog"
INFO:  analyze of table "postgres.public.pglog"
tuples: 3 tuples in samples, 60042 estimated total tuples
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 182 hits, 0 reads, 0 dirtied
WAL usage: 118 records, 0 full page images, 13086 bytes
system usage: CPU: user: 0.40 s, system: 0.00 s, elapsed: 0.41 s

I've also slightly modified 0002 to display "automatic analyze" when
we're inside an autovacuum worker, similar to what's done with vacuum
output.

Regards,
Anthonin


v3-0002-Output-buffer-and-wal-usage-with-verbose-analyze.patch
Description: Binary data


v3-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patch
Description: Binary data


v3-0003-Output-pages-and-tuples-stats-in-ANALYZE-VERBOSE.patch
Description: Binary data


Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bind

2024-07-24 Thread Anthonin Bonnefoy
On Wed, Jul 24, 2024 at 05:33:07PM +0200, Peter Eisentraut wrote:
> This commit message confused me, because I don't think this is what the
> \bindx command actually does.  AFAICT, it only binds, it does not execute.
> At least that is what the documentation in the content of the patch appears
> to indicate.

Unless I misunderstand the remark, \bindx will call
PQsendQueryPrepared which will bind then execute the query, similar to
what \bind is doing (except \bind also parses the query).

> I'm not sure \bindx is such a great name.  The "x" stands for "I ran out of
> ideas". ;-)

That's definitely what happened :). \bind would have been a better fit
but it was already used.

On Thu, Jul 25, 2024 at 4:19 AM Michael Paquier  wrote:
> Not sure that I like much the additional option embedded in the
> existing command; I'd rather keep a separate command for each libpq
> call, that seems cleaner.  So I would be OK with your suggested
> \bind_named.  Fine by me to be outvoted, of course.

+1 keeping this as a separate command and using \bind_named. \bind has
a different behaviour as it also parses the query so keeping them as
separate commands would probably avoid some confusion.




Re: query_id, pg_stat_activity, extended query protocol

2024-07-26 Thread Anthonin Bonnefoy
On Thu, Jul 18, 2024 at 10:56 AM Michael Paquier  wrote:
> Please attach things to your emails, if your repository disappears for
> a reason or another we would lose knowledge in the archives of the
> community lists.

Noted and thanks for the reminder, I'm still learning about mailing
list etiquette.

> I have not looked at that entirely in details, and I'd need to check
> if it is possible to use what's here for more predictible tests:
> https://www.postgresql.org/message-id/ZqCMCS4HUshUYjGc%40paquier.xyz

For the tests, there are limited possibilities to check whether a
query_id has been set correctly.
- Checking pg_stat_activity is not possible in the regress tests as
you need a second session to check the reported query_id.
- pg_stat_statements can be used indirectly but you're limited to how
pgss uses query_id. For example, it doesn't rely on queryId in
ExecutorRun.

A possible solution I've been thinking of is to use a test module. The
module will assert on whether the queryId is set or not in parse, plan
and executor hooks. It will also check if the queryId reported in
pgstat matches the queryId at the root level.

This allows us to check that the queryId is correctly set with the
extended protocol. I've also found some queries which will trigger a
failure (ctas and cursor usage) though this is probably a different
issue from the extended protocol issue.

Regards,
Anthonin


0001-Create-module-for-query_id-test.patch
Description: Binary data


Add ALL_CANDIDATES option to EXPLAIN

2024-07-26 Thread Anthonin Bonnefoy
Hi,

I have a prototype for an ALL_CANDIDATES option for EXPLAIN. The goal
of this option is to print all plan candidates instead of only the
cheapest plan. It will output the plans from the most expensive at the
top to the cheapest. Here's an example:

explain (all_candidates) select * from pgbench_accounts where aid=1;
 QUERY PLAN
-
 Plan 1
   ->  Gather  (cost=1000.00..3375.39 rows=1 width=97)
 Workers Planned: 1
 ->  Parallel Seq Scan on pgbench_accounts
(cost=0.00..2375.29 rows=1 width=97)
   Filter: (aid = 1)
 Plan 2
   ->  Seq Scan on pgbench_accounts  (cost=0.00..2890.00 rows=1 width=97)
 Filter: (aid = 1)
 Plan 3
   ->  Bitmap Heap Scan on pgbench_accounts  (cost=4.30..8.31 rows=1 width=97)
 Recheck Cond: (aid = 1)
 ->  Bitmap Index Scan on pgbench_accounts_pkey
(cost=0.00..4.30 rows=1 width=0)
   Index Cond: (aid = 1)
 Plan 4
   ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.29..8.31 rows=1 width=97)
 Index Cond: (aid = 1)

This can provide very useful insight on the planner's decisions like
whether it tried to use a specific index and how much cost difference
there is with the top plan. Additionally, it makes it possible to spot
discrepancies in generated plans like incorrect row estimation [1].

The plan list is generated from the upper_rel's pathlist. However, due
to how planning mutates the PlannerGlobal state, we can't directly
iterate the path list generated by the subquery_planner and create a
planned statement for them. Instead, we need to plan from scratch for
every path in the pathlist to generate the list of PlannedStmt.

The patch is split in two mostly to ease the review:
001: Propagate PlannerInfo root to add_path. This is needed to prevent
add_path from discarding path if all_candidates is enabled which will
be stored in PlannerGlobal.
002: Add the planner_all_candidates function and print of candidate
list in explain

[1] 
https://www.postgresql.org/message-id/flat/CAO6_Xqr9+51NxgO=XospEkUeAg-p=ejawmtpdczwjrggkj5...@mail.gmail.com

Regards,
Anthonin


v1-0002-Add-ALL_CANDIDATES-explain-option.patch
Description: Binary data


v1-0001-Propagate-root-PlannerInfo-to-add_path.patch
Description: Binary data


Re: Use pgBufferUsage for block reporting in analyze

2024-07-29 Thread Anthonin Bonnefoy
On Sat, Jul 27, 2024 at 12:35 AM Masahiko Sawada  wrote:
> An alternative idea would
> be to pass StringInfo to AcquireSampleRowsFunc() so that callback can
> write its messages there. This is somewhat similar to what we do in
> the EXPLAIN command (cf, ExplainPropertyText() etc). It could be too
> much but I think it could be better than writing logs in the single
> format.

I've tested this approach, it definitely looks better. I've added a
logbuf StringInfo to AcquireSampleRowsFunc which will receive the
logs. elevel was removed as it is not used anymore. Since everything
is in the same log line, I've removed the relation name in the acquire
sample functions.

For partitioned tables, I've also added the processed partition table
being sampled. The output will look like:

INFO:  analyze of table "postgres.public.test_partition"
Sampling rows from child "public.test_partition_1"
pages: 5 of 5 scanned
tuples: 999 live tuples, 0 are dead; 999 tuples in sample, 999
estimated total tuples
Sampling rows from child "public.test_partition_2"
pages: 5 of 5 scanned
tuples: 1000 live tuples, 0 are dead; 1000 tuples in sample, 1000
estimated total tuples
avg read rate: 2.604 MB/s, avg write rate: 0.000 MB/s
...

For a file_fdw, the output will be:

INFO:  analyze of table "postgres.public.pglog"
tuples: 60043 tuples; 3 tuples in sample
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
...

Regards,
Anthonin


v4-0002-Output-buffer-and-wal-usage-with-verbose-analyze.patch
Description: Binary data


v4-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patch
Description: Binary data


v4-0003-Pass-StringInfo-logbuf-to-AcquireSampleFunc.patch
Description: Binary data


Re: Use pgBufferUsage for block reporting in analyze

2024-07-30 Thread Anthonin Bonnefoy
Hi,

On Tue, Jul 30, 2024 at 1:13 AM Masahiko Sawada  wrote:
> I have one comment on 0001 patch:
> The comment should also be updated or removed.

Ok, I've removed the comment.

> However, as the above comment says, delay_in_ms can have a duration up
> to 25 days. I guess it would not be a problem for analyze cases but
> could be in vacuum cases as vacuum could sometimes be running for a
> very long time. I've seen vacuums running even for almost 1 month. So
> I think we should keep this part.

Good point, I've reverted to using TimestampDifference for vacuum.

>  /* measure elapsed time iff autovacuum logging requires it */
> -if (AmAutoVacuumWorkerProcess() && params->log_min_duration >= 0)
> +if (instrument)
>
> The comment should also be updated.

Updated.

> Could you split the 0002 patch into two patches? One is to have
> ANALYZE command (with VERBOSE option) write the buffer usage, and
> second one is to add WAL usage to both ANALYZE command and
> autoanalyze. I think adding WAL usage to ANALYZE could be
> controversial as it should not be WAL-intensive operation, so I'd like
> to keep them separate.

I've split the patch, 0002 makes verbose outputs the same as
autoanalyze logs with buffer/io/system while 0003 adds WAL usage
output.

> It seems to me that the current style is more concise and readable (3
> rows per table vs. 1 row per table). We might need to consider a
> better output format for partitioned tables as the number of
> partitions could be high. I don't have a good idea now, though.

A possible change would be to pass an inh flag when an acquirefunc is
called from acquire_inherited_sample_rows. The acquirefunc could then
use an alternative log format to append to logbuf. This way, we could
have a more compact format for partitioned tables.

Regards,
Anthonin


v5-0003-Output-wal-usage-of-analyze-in-verbose-output-and.patch
Description: Binary data


v5-0002-Output-analyze-details-on-analyze-verbose.patch
Description: Binary data


v5-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patch
Description: Binary data


v5-0004-Pass-StringInfo-logbuf-to-AcquireSampleFunc.patch
Description: Binary data


Re: Use pgBufferUsage for block reporting in analyze

2024-07-31 Thread Anthonin Bonnefoy
On Tue, Jul 30, 2024 at 9:21 AM Anthonin Bonnefoy
 wrote:
> A possible change would be to pass an inh flag when an acquirefunc is
> called from acquire_inherited_sample_rows. The acquirefunc could then
> use an alternative log format to append to logbuf. This way, we could
> have a more compact format for partitioned tables.

I've just tested this, the result isn't great as it creates an
inconsistent output

INFO:  analyze of table "postgres.public.test_partition"
"test_partition_1": scanned 5 of 5 pages, containing 999 live tuples
and 0 dead tuples; 999 rows in sample, 999 estimated total rows
"test_partition_2": scanned 5 of 5 pages, containing 1000 live tuples
and 0 dead tuples; 1000 rows in sample, 1000 estimated total rows
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
...
INFO:  analyze of table "postgres.public.test_partition_1"
pages: 5 of 5 scanned
tuples: 999 live tuples, 0 are dead; 999 tuples in sample, 999
estimated total tuples
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s

Maybe the best approach is to always use the compact form?

INFO:  analyze of table "postgres.public.test_partition"
"test_partition_1": scanned 5 of 5 pages, containing 999 live tuples
and 0 dead tuples; 999 tuples in sample, 999 estimated total tuples
"test_partition_2": scanned 5 of 5 pages, containing 1000 live tuples
and 0 dead tuples; 1000 tuples in sample, 1000 estimated total tuples
avg read rate: 1.953 MB/s, avg write rate: 0.000 MB/s
...
INFO:  analyze of table "postgres.public.test_partition_1"
"test_partition_1": scanned 5 of 5 pages, containing 999 live tuples
and 0 dead tuples; 999 tuples in sample, 999 estimated total tuples
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s

I've updated the patchset with those changes. 0004 introduces the
StringInfo logbuf so we can output logs as a single log and during
ANALYZE VERBOSE while using the compact form.

Regards,
Anthonin


v6-0004-Pass-StringInfo-logbuf-to-AcquireSampleFunc.patch
Description: Binary data


v6-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patch
Description: Binary data


v6-0002-Output-analyze-details-on-analyze-verbose.patch
Description: Binary data


v6-0003-Output-wal-usage-of-analyze-in-verbose-output-and.patch
Description: Binary data


Re: Use pgBufferUsage for block reporting in analyze

2024-07-31 Thread Anthonin Bonnefoy
On Wed, Jul 31, 2024 at 9:36 PM Masahiko Sawada  wrote:
> Meanwhile, I think we can push 0001 and 0002 patches since they are in
> good shape. I've updated commit messages to them and slightly changed
> 0002 patch to write "finished analyzing of table \"%s.%s.%s\" instead
> of  "analyze of table \"%s.%s.%s\".

Wouldn't it make sense to do the same for autoanalyze and write
"finished automatic analyze of table \"%s.%s.%s\"\n" instead of
"automatic analyze of table \"%s.%s.%s\"\n"?

> Also, regarding 0003 patch, what is the main reason why we want to add
> WAL usage to analyze reports? I think that analyze normally does not
> write WAL records much so I'm not sure it's going to provide a good
> insight for users.

There was no strong reason except for consistency with VACUUM VERBOSE
output. But as you said, it's not really providing valuable
information so it's probably better to keep the noise down and drop
it.

Regards,
Anthonin Bonnefoy




Set query_id for query contained in utility statement

2024-08-02 Thread Anthonin Bonnefoy
Hi all,

Some utility statements like Explain, CreateTableAs and DeclareCursor
contain a query which will be planned and executed. During post parse,
only the top utility statement is jumbled, leaving the contained query
without a query_id set. Explain does the query jumble in ExplainQuery
but for the contained query but CreateTableAs and DeclareCursor were
left with unset query_id.

This leads to extensions relying on query_id like pg_stat_statements
to not be able to track those nested queries as the query_id was 0.

This patch fixes this by recursively jumbling queries contained in
those utility statements during post_parse, setting the query_id for
those contained queries and removing the need for ExplainQuery to do
it for the Explain statements.

Regards,
Anthonin Bonnefoy


v1-0001-Set-query_id-for-queries-contained-in-utility-sta.patch
Description: Binary data


Re: Set query_id for query contained in utility statement

2024-08-05 Thread Anthonin Bonnefoy
I've realised my initial approach was wrong, calling the post parse
for all nested queries in analyze.c prevents extension like pgss to
correctly track the query's nesting level.

I've changed the approach to replicate what's done in ExplainQuery to
both CreateTableAs and DeclareCursor: Jumble the query contained by
the utility statement and call the post parse hook before it is
planned or executed. Additionally, explain's nested query can itself
be a CreateTableAs or DeclareCursor which also needs to be jumbled.
The issue is visible when explaining a CreateTableAs or DeclareCursor
Query, the queryId is missing despite the verbose.

EXPLAIN (verbose) create table test_t as select 1;
QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=4)
   Output: 1

Post patch, the query id is correctly displayed.

EXPLAIN (verbose) create table test_t as select 1;
QUERY PLAN
--
 Result  (cost=0.00..0.01 rows=1 width=4)
   Output: 1
 Query Identifier: 2800308901962295548

Regards,
Anthonin Bonnefoy


v2-0001-Set-query_id-for-queries-contained-in-utility-sta.patch
Description: Binary data


Re: Set query_id for query contained in utility statement

2024-10-08 Thread Anthonin Bonnefoy
On Mon, Oct 7, 2024 at 7:39 AM Michael Paquier  wrote:
> GOod point, this is confusing.  The point is that having only
> stmt_location is not enough to detect where the element in the query
> you want to track is because it only points at its start location in
> the full query string.  In an ideal world, what we should have is its
> start and end, pass it down to pgss_store(), and store only this
> subquery between the start and end positions in the stats entry.
> Making that right through the parser may be challenging, though.

One of the issues is that we don't track the length in the parser,
only location[1]. The only place we can have some information about
the statement length (or at least, the location of the ';') is for
multi statement query.

On Mon, Oct 7, 2024 at 6:17 PM jian he  wrote:
> turns out UPDATE/DELETE/MERGE and other utilities stmt cannot have
> arbitrary parenthesis with EXPLAIN.

Yes, it is also possible to get the length of the Select statement
within parenthesis through the parser by using the location of ')' for
the select_no_parens.

> the main gotcha is to add location information for the statement that
> is being explained.

I've found that there are other possible issues with not having the
statement length and including the opening parenthesis won't be
enough. On HEAD, we have the following:

explain(verbose) SELECT 1, 2, 3\; explain SELECT 1, 2, 3, 4;
SELECT toplevel, query FROM pg_stat_statements
ORDER BY toplevel desc, query;
 toplevel |  query
--+-
 t| SELECT pg_stat_statements_reset() IS NOT NULL AS t
 t| explain SELECT $1, $2, $3, $4
 t| explain(verbose) SELECT $1, $2, $3
 f| explain(verbose) SELECT $1, $2, $3; explain SELECT 1, 2, 3, 4;
 f| explain(verbose) SELECT 1, 2, 3; explain SELECT $1, $2, $3, $4;

The nested statement will have the whole query string. To fix this, we
need to propagate the statement length from the RawStmt (probably
using the ParserState?) and adjust the nested query's location and
length when the statement is transformed. I'm still working on the
details and edge cases on this.

[1]: 
https://github.com/postgres/postgres/blob/REL_17_STABLE/src/backend/parser/gram.y#L69-L79




Re: Set query_id for query contained in utility statement

2024-10-04 Thread Anthonin Bonnefoy
On Wed, Oct 2, 2024 at 6:39 AM Michael Paquier  wrote:
> FWIW, I've always found this case with EXPLAIN with two entries
> confusing, so what's the advantage in trying to apply this rule for
> the rest?  We know that EXPLAIN, DECLARE and CTAS run a query attached
> to their DDL, hence isn't it sufficient to register a single entry for
> the top-level query, then nothing for the internal one.  The
> documentation tells about inner queries when pg_stat_statements.track
> = all, like the ones in PL functions, DO blocks, because we have a
> clear view of the query string, creating a unique one-one mapping
> between a query string and its ID.  This patch maps the same query
> string to more than one query ID, spreading that.
>
> So it seems that there are arguments for not doing what this patch
> proposes, but also make sure that EXPLAIN logs a single entry, not
> two currently when using pg_stat_statements.track = all.

I agree that tracking 2 identical statements with different queryIds
and nesting levels is very confusing. On the other hand, from an
extension developer point of view (not necessarily limited to
pg_stat_statements), I would like to have the queryId available and
the post_parse hook called so the query can be normalised and tracked
in a hashmap.

However, the repeated statements did bug me a lot so I took a stab at
trying to find a possible solution. I made an attempt in 0001 by
tracking the statements' locations of explainable statements (Select,
Insert, Update, Merge, Delete...) during parse and propagate them in
the generated Query during transform. With the change, we now have the
following result:

SET pg_stat_statements.track = 'all';
explain (costs off) select 1;
select 1;
select queryid, calls, query, toplevel from pg_stat_statements
 where query ~'select \$1';
   queryid   | calls | query | toplevel
-+---+---+--
 2800308901962295548 | 1 | select $1 | t
 2800308901962295548 | 1 | select $1;| f
 3797185112479763582 | 1 | explain (costs off) select $1 | t

The top level and nested select statement now share both the same
queryId and query string. The additional ';' for the nested query is
due to not having the statement length  and taking the whole
statement.

> Side note.  It looks like the patch is forgetting about CREATE VIEW
> and CREATE MATERIALIZED VIEW, creating only a top-level entry when
> running these utilities.

I've updated 0002 to handle CREATE MATERIALIZED VIEW, CREATE VIEW
doesn't generate nested statements. I've also realised that refresh
materialized view has a similar problem to explain. The first refresh
called when the matview is created will have the correct select
substring. Subsequent refresh call will use the refresh utility
statement as the query string:

-- Create the view
CREATE MATERIALIZED VIEW test AS SELECT * FROM generate_series(1, 1000) as id;
-- Reset pgss and refresh
select * from pg_stat_statements_reset();
REFRESH MATERIALIZED VIEW test;
select queryid, calls, query, toplevel from pg_stat_statements;
   queryid| calls |  query
  | toplevel
--+---+--+--
  8227191975572355654 | 1 | REFRESH MATERIALIZED VIEW test   | t
 -2908407163726309935 | 1 | REFRESH MATERIALIZED VIEW test;  | f
 -1361326859153559975 | 1 | select * from pg_stat_statements_reset() | t

I've tried to improve this behaviour in 0003 where the view's
definition is used as query string instead of the refresh utility
statement.


v5-0001-Track-location-to-extract-relevant-part-in-nested.patch
Description: Binary data


v5-0003-Use-view-s-definition-as-query-string-on-a-materi.patch
Description: Binary data


v5-0002-Set-query_id-for-queries-contained-in-utility-sta.patch
Description: Binary data


Re: Set query_id for query contained in utility statement

2024-10-16 Thread Anthonin Bonnefoy
On Tue, Oct 15, 2024 at 3:23 PM jian he  wrote:
> explain(verbose) SELECT 1, 2, 3\;  explain SELECT 1, 2, 3, 4;
> will transformed to
> explain(verbose) SELECT 1, 2, 3;  explain SELECT 1, 2, 3, 4;
>
> it seems to me your patch care about following position.
> explain(verbose) SELECT 1, 2, 3;  explain SELECT 1, 2, 3, 4;
>   ^
> but this patch [1] at another thread will get the top level statement
> (passed the raw parse, no syntax error) beginning more effortless.
> explain(verbose) SELECT 1, 2, 3;  explain SELECT 1, 2, 3, 4;
> ^  ^
>
> can you try to looking at [1]. it may help to resolve this patch problem.
>
> [1] https://www.postgresql.org/message-id/2245576.1728418678%40sss.pgh.pa.us

The top level statement beginning doesn't have an impact on this
patch. The patch's focus is on the nested statement and RawStmt's
location and length are only used to get the nested statement length.
I will need the nested statement's location and length no matter what,
to handle cases like "COPY (UPDATE ...) to stdout;" and only report
the statement within parentheses.

The linked patch will allow a simplification: the "if (@$ < 0) @$ =
@2;" I've added won't be needed anymore. But I think that's the only
possible simplification? I've run the tests on top of the linked patch
and there was no change in the regression output.




Re: Set query_id for query contained in utility statement

2024-10-17 Thread Anthonin Bonnefoy
Hi Jian,

On Thu, Oct 17, 2024 at 5:59 AM jian he  wrote:
> in [3] I mentioned adding "ParseLoc location" to ExplainStmt, then you
> found some problems at [4] with multi statements,
> now I found a way to resolve it.
> I also add "ParseLoc location;" to typedef struct CopyStmt.
> copy (select 1) to stdout;
> I tested my change can tracking
> beginning location and  length of the nested query ("select 1")
>
> I didn't touch other nested queries cases yet, so far only ExplainStmt
> and CopyStmt1
> IMHO, it's more neat than your patches.
> Can you give it a try?

I'm not sure about this approach. It moves the responsibility of
tracking the location and length from the nested statement to the top
level statement.
- The location you added in ExplainStmt and CopyStmt has a different
meaning from all others and tracks the nested location and not the
location of the statement itself. This creates some inconsistencies.
- The work will need to be done for all statements with nested
queries: Prepare, Create table as, Declare Cursor, Materialised View.
Whereas by tracking the location of PreparableStatements, there's no
need for additional logic. For example, v8 0002 fixes the existing
behaviour for Prepare statements thanks to SelectStmt's modifications.

I feel like while it looks simpler, it will need more work to handle
all cases while introducing an outlier in how locations are tracked.




Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

2024-10-17 Thread Anthonin Bonnefoy
I've run some additional tests, mostly pgbench with
options=-cjit_above_cost=0 for an extended duration on an instance
that was impacted. I haven't seen any issues nor performance
regressions compared to the unpatched version.

I will switch the commitfest entry to Ready for Committer if there's
no objection.




Re: Converting tab-complete.c's else-if chain to a switch

2024-10-10 Thread Anthonin Bonnefoy
Hi,

bd1276a3c9 seems to have introduced a segfault when trying to complete
a word that doesn't have any match. For example, 'postgres=# z\t' will
yield the following backtrace:

#0: psql`pg_strcasecmp(s1="", s2="ACCESS METHOD") at pgstrcasecmp.c:40:39
#1: psql`psql_completion(text=":pgss-", start=0, end=6) at
tab-complete.in.c:2033:8

The matching code in tab-complete.in.c:
if (matches == NULL)
{
const pgsql_thing_t *wac;
for (wac = words_after_create; wac->name != NULL; wac++)
{
if (pg_strcasecmp(prev_wd, wac->name) == 0)
{

psql tries to read the previous word when matches are NULL. However,
there's no previous word set here, leading to the segfault.

I've attached a patch that checks if the previous word does exist
before trying to use it, along with an additional test for this
codepath.


v1-0001-Fix-psql-segfault-on-word-completion-without-matc.patch
Description: Binary data


Re: Set query_id for query contained in utility statement

2024-10-23 Thread Anthonin Bonnefoy
On Wed, Oct 23, 2024 at 8:10 AM Michael Paquier  wrote:
>
> I have some more minor comments.
>
> -if (@$ < 0) /* see comments for YYLLOC_DEFAULT */
> -@$ = @2;
>
> With 14e5680eee19 now in the tree (interesting timing as this did not
> exist until yesterday), it looks like we don't need these ones
> anymore, no?

Yes, 14e5680eee19 makes the if(@$<0) unnecessaries. I saw this
yesterday and planned to remove them but you beat me to it.

> +   ParseLocp_stmt_location;/* start location, or -1 if unknown */
> +   ParseLocp_stmt_len; /* length in bytes; 0 means "rest of string" 
> */
>
> So, the reason why these two fields are added to the ParseState is
> that the lower layers in charge of the query transforms don't have to
> RawStmt so as the location and lengths can be adjusted when queries
> are between parenthesis.  I was first wondering if we should push
> RawStmt deeper into the argument stack, but based on the stmt_location
> assignments for the DMLs and WITH, storing that in the ParseState
> looks neater.  The patch is lacking a description of these two fields
> at the top of the ParseState structure in parse_node.h.  This stuff
> needs to be explained, aka we need them to be able to adjust the
> locations and lengths depending on inner clauses of the queries we are
> dealing with, or something like that.

True, added comments for both fields.

On Wed, Oct 23, 2024 at 10:36 AM jian he  wrote:
> /*
>  * transformTopLevelStmt -
>  *  transform a Parse tree into a Query tree.
>  * This function is just responsible for transferring statement location data
>  * from the RawStmt into the finished Query.
>  */
> Query *
> transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree)
> {
> Query   *result;
> /* Store RawStmt's length and location in pstate */
> pstate->p_stmt_len = parseTree->stmt_len;
> pstate->p_stmt_location = parseTree->stmt_location;
> /* We're at top level, so allow SELECT INTO */
> result = transformOptionalSelectInto(pstate, parseTree->stmt);
> return result;
> }
> do we need to change the comments?
> since it's transformOptionalSelectInto inner function 
> setQueryLocationAndLength
> transfer location data to the finished query.

Yes, transformTopLevelStmt's comment was outdated. I've updated it.

An issue I've realised with calling setQueryLocationAndLength in
transformStmt: it was possible for pstate's location and length to be
0, leading to a Query with negative size. This wouldn't be visible in
tests since the only time Query's locations are used (AFAIK) is during
post_parse_hook which always have pstate's location information.
However, this is definitely something to avoid. I've added an
additional Assert(qry->stmt_len >= 0); to catch that. The fix is to
not do anything when pstate doesn't have location information.

This also answers another issue I was wondering about. Should the
child's parsestate inherit the location information when
make_parsestate is called? That would be incorrect since this is used
for sub-statement, pstate should reflect the size of the whole
sub-statement. However, since this is unused, it is fine to leave the
child parser with unset location data, which would in turn leave the
statement's location unset in setQueryLocationAndLength.

Patch includes Micheal changes. I've left out 0002 for now to focus on 0001.


v13-0001-Track-location-to-extract-relevant-part-in-neste.patch
Description: Binary data


Re: Set query_id for query contained in utility statement

2024-10-22 Thread Anthonin Bonnefoy
On Tue, Oct 22, 2024 at 7:06 AM Michael Paquier  wrote:
> -Query *
> -transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree)
>
> What's the advantage of removing this routine?  Is that because you're
> pushing the initialization of stmt_location and stmt_len into
> transformOptionalSelectInto(), making it mostly moot?

Yeah, the removal of the stmt_location and stmt_len initialization
left the function with only one thing, the call to
transformOptionalSelectInto.

> Something that
> worries me a bit is that this changes makes the code less clean, by
> having a SELECT-INTO specific routine called in the parse-analyze
> paths, while adding three individual paths in charge of setting
> pstate->p_stmt_len and p_stmt_location.

I've moved pstate's p_stmt_len and p_stmt_location assignment to
transformTopLevelStmt (and also restored transformTopLevelStmt). This
will remove the multiple assignment paths.

> +   n->stmt_len = @3 - @2;
>
> This specific case deserves a comment.

I think I went over this 3 times thinking "maybe I should add a
comment here". Added.

> I don't have the best
> understanding of this area yet (need more analysis), but With the
> existing updateRawStmtEnd() and RawStmt also tracking this
> information, I am wondering if we could be smarter with less paths
> manipulating the start locations and lengths.  And your patch adds a
> new setQueryStmtLen() that does the same kind of job.  Hmm.

This is doable. I've moved the query's location and length assignment
to the end of transformStmt which will call setQueryLocationAndLength.
The logic of manipulating locations and lengths will only happen in a
single place. That creates an additional switch on the node's type as
a small trade off.

> FWIW, I don't feel strongly about 0004 that tries to make the REFRESH
> handling smarter.  I am not sure that it even makes sense as-is by
> hacking into a wrapper of pg_get_viewdef_worker() to get the query
> string, leading it to not be normalized.  This has also a small
> footprint in 0003.  I think that the bits in ExecRefreshMatView()
> should be discarded from 0003, as a result.

Good point on the query not being normalised. I'm fine with dropping
the materialised view part.

Also, there was an unnecessary cast in analyze.c "result->utilityStmt
= (Node *) parseTree;" as parseTree is already a Node. I removed it in
0001.


v11-0001-Track-location-to-extract-relevant-part-in-neste.patch
Description: Binary data


v11-0002-Set-query_id-for-queries-contained-in-utility-st.patch
Description: Binary data


Consider pipeline implicit transaction as a transaction block

2024-10-30 Thread Anthonin Bonnefoy
Hi,

When using pipelining with implicit transaction, a transaction will
start from the first command and be committed with the Sync message.
Functions like IsInTransactionBlock and PreventInTransactionBlock
already assimilate this implicit transaction as a transaction block,
relying on the XACT_FLAGS_PIPELINING flag. However, this is not the
case in CheckTransactionBlock. This function is used for things like
warning when a local GUC is set outside of a transaction block.

The attached patch adds the detection of implicit transactions started
by a pipeline in CheckTransactionBlock, avoiding warnings when
commands like `set local` are called within a pipeline, and making the
detection of transaction block coherent with what's done in
IsInTransactionBlock and PreventInTransactionBlock.

One thing that's still not fixed by the patch is that the first
command won't be seen as part of a transaction block. For example,
with pgbench:

\startpipeline
SET LOCAL statement_timeout='200ms';
SELECT * FROM pgbench_accounts where aid=1;
\endpipeline

The XACT_FLAGS_PIPELINING will only be set after the first command, so
the warning about `set local` happening outside of a transaction block
will still be generated. However, I'm not sure if it's something
fixable (or worth fixing?). This would require to know beforehand that
there are multiple executes before the sync message, which doesn't
seem doable.

Regards,
Anthonin


v01-0001-Consider-pipeline-implicit-transaction-as-a-tran.patch
Description: Binary data


Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

2024-11-05 Thread Anthonin Bonnefoy
On Tue, Nov 5, 2024 at 9:00 AM Thomas Munro  wrote:
> Reasoning:  Old LLVM required C++11.  LLVM 9 switched to C++14.  LLVM
> 14 switched C++17.  Pretty soon they'll flip to C++20 or C++23, they
> don't mess around.  The corresponding -std=c++XX flag finishes up in
> our compile lines, because llvm-config --cxxflags spits it out, to
> match the features they're using in headers that we include (easy to
> spot examples being std::make_unique (C++14) and std::string_view
> (C++17)), so you might say that PostgreSQL indirectly chases C++
> standards much faster than it chases C standards.  This particular
> code is a special case because it's guarded for LLVM 12+ only, so it's
> OK to use C++14  in that limited context even in back branches.  We
> have to be careful that it doesn't contain C++17 code since it came
> from recent LLVM, but it doesn't seem to by inspection, and you can
> check on a machine with CXX=g++ and LLVM 14 on Linux, which uses
> -std=c++14 and fails if you add a use of  and
> std::string_view.  (Warning: the system C++ standard library on Macs
> and other Clang-based systems doesn't have enough version guards so it
> won't complain, but GCC and its standard library will explicitly tell
> you not to use C++17 features in a C++14 program.)

I think the switch to C++14 happened with LLVM 10[0] while the C++17
switch happened with LLVM 16[1]. Double checking on an ubuntu jammy
(can't install earlier llvm version than 12 on those):

VERSIONS=(20 19 18 17 16 15 14 13 12)
for version in ${VERSIONS[@]}; do
llvm-config-$version --cxxflags
done

-I/usr/lib/llvm-20/include -std=c++17   -fno-exceptions
-funwind-tables -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
-I/usr/lib/llvm-19/include -std=c++17   -fno-exceptions
-funwind-tables -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
-I/usr/lib/llvm-18/include -std=c++17   -fno-exceptions
-funwind-tables -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
-I/usr/lib/llvm-17/include -std=c++17   -fno-exceptions
-funwind-tables -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS
-D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
-I/usr/lib/llvm-16/include -std=c++17   -fno-exceptions -D_GNU_SOURCE
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
-I/usr/lib/llvm-15/include -std=c++14   -fno-exceptions -D_GNU_SOURCE
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
-I/usr/lib/llvm-14/include -std=c++14   -fno-exceptions -D_GNU_SOURCE
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
-I/usr/lib/llvm-13/include -std=c++14   -fno-exceptions -D_GNU_SOURCE
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
-I/usr/lib/llvm-12/include -std=c++14   -fno-exceptions -D_GNU_SOURCE
-D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS

Which is still fine since, as you said, the code only applied for
LLVM12+ which will always have at least c++14. I've tried compiling
and running against all those llvm versions and the associated stdc++
version earlier in the thread[2] and had no issues.

> If there are no further comments, I'm going to push this to all
> branches tomorrow morning.  For master only, I will remove the #if
> condition and comment about LLVM 12+, as we now require 14+.

Patch looks good to me.

[0]: 
https://github.com/llvm/llvm-project/blob/release/10.x/llvm/CMakeLists.txt#L53
[1]: 
https://github.com/llvm/llvm-project/blob/release/16.x/llvm/CMakeLists.txt#L70
[2]: 
https://www.postgresql.org/message-id/CAO6_XqqxEQ%3DJY%2BtYO-KQn3_pKQ3O-mPojcwG54L5eptiu1cSJQ%40mail.gmail.com




Re: Consider pipeline implicit transaction as a transaction block

2024-11-04 Thread Anthonin Bonnefoy
On Sat, Nov 2, 2024 at 4:11 AM Michael Paquier  wrote:
> Now, here is a fancy case: SAVEPOINT and its two brothers.  An error
> would be reported on HEAD if attempting a SAVEPOINT, complaining that
> we are not in a transaction block.  The patch causes a different, more
> confusing, failure:
> FATAL:  DefineSavepoint: unexpected state STARTED
>
> This is a bit user-unfriendly.  I am not sure to see the point of
> supporting savepoints in this context, so perhaps we should just issue
> a cleaner error when we are under a XACT_FLAGS_PIPELINING?  Reporting
> that we are not in a transaction block, while, well, we are in an
> implicit transaction block because of the use of pipelines is
> confusing.  The new error is actually worse.

There's a possible alternative approach. Instead of checking the
XACT_FLAGS_PIPELINING flag in
CheckTransactionBlock/IsInTransactionBlock/PreventInTransactionBlock,
it is possible to switch the transaction state to
TBLOCK_IMPLICIT_INPROGRESS by reusing BeginImplicitTransactionBlock.
This transaction state is used to represent a transaction block
created by a multi statement query which is executed within the same
transaction, which is very similar to what's done with pipelining.

This allows the removal of the XACT_FLAGS_PIPELINING check in
IsInTransactionBlock and PreventInTransactionBlock since the
transaction state will correctly reflect the ongoing implicit block.
Additionally, it will reuse the same behaviour with regard to
SAVEPOINT and disallow them with a "SAVEPOINT can only be used in
transaction blocks" error.


v02-0001-Consider-pipeline-implicit-transaction-as-a-tran.patch
Description: Binary data


Re: Segfault in jit tuple deforming on arm64 due to LLVM issue

2024-10-31 Thread Anthonin Bonnefoy
On Thu, Oct 31, 2024 at 6:49 AM Thomas Munro  wrote:
> There are a couple of cases of dual-licensed code in our tree where we
> explicitly used the Boost alternative instead of Apache 2.  I plead
> complete ignorance of this topic and defer to those who know about
> such things: can we actually do this?  I guess at a minimum a copy of
> the licence would need to appear somewhere -- perhaps under
> src/backend/jit/llvm?

I'm also not super knowledgeable about the licensing intricacies but I
read it the same way - a license file has to be provided due to the 4a
clause. llvmlite did this when they added the patched memory
manager[1]

> 4d says that if you modified the code you have
> to say so prominently, but I did that at the top (and the changes are
> completely trivial, just some #ifdef swizzling to massage some
> function prototypes to suit older LLVMs).  Otherwise I understand it
> to be generally "BSD-like" (sans advert clause) but there is also some
> stuff about patents, which surely aren't relevant to this in
> practice... but I know that some projects object to it on principle
> and because it smells like contract law, or something not an area
> I am well informed about.  Who should I be asking? (Naively, I
> wondered: could there be some kind of fair use concept for
> back-patching fixes to broken libraries that you're merely a user of
> where you can be excused from the burdens of a distributor?  Yeah
> wishful thinking I'm sure.)

You mean 4b, right? LLVM doesn't seem to have any NOTICE files so the
4d clause shouldn't apply. The top comment looks fine to notify the
source of the modified file and how it was changed. But again, I don't
have much experience in this so I can't be sure.

[1]: 
https://github.com/numba/llvmlite/pull/1009/files#diff-80b149f35cebd583e21dfc49c0007a7fab89c3c6d07c028e4a87de0848aa2ed8




Re: Set query_id for query contained in utility statement

2024-10-01 Thread Anthonin Bonnefoy
On Mon, Sep 30, 2024 at 5:14 PM jian he  wrote:
> Is it possible to normalize top level utilities explain query, make
> these two have the same queryid?
> explain(verbose) EXECUTE test_prepare_pgss1(1, 2);
> explain(verbose) EXECUTE test_prepare_pgss1(1, 3);
>
> I guess this is a corner case.

This seems to be a known issue. The test_prepare_pgss1's parameters
are A_Const nodes. Those nodes have a custom query jumble which
doesn't record location[1] and thus can't be normalised by pgss.

That could be fixed by replacing the switch on nodeTag by
JUMBLE_LOCATION(location) but this will impact a lot of DDL queries
and the result doesn't look great (for example, "BEGIN TRANSACTION NOT
DEFERRABLE, READ ONLY, READ WRITE, DEFERRABLE" would be normalised as
"BEGIN TRANSACTION $1 DEFERRABLE, $2 ONLY, $3 WRITE, $4")

Looking at the commit for the A_Const's jumble, this is mentioned by Michael[2]:
> (FWIW, I'd like to think that there is an argument to normalize the
> A_Const nodes for a portion of the DDL queries, by ignoring their
> values in the query jumbling and mark a location, which would be
> really useful for some workloads, but that's a separate discussion I
> am keeping for later.)

I haven't found any recent discussion but this should live in a
different thread as this is a separate issue.

[1]: 
https://github.com/postgres/postgres/blob/cf4401fe6cf56811343edcad29c96086c2c66481/src/backend/nodes/queryjumblefuncs.c#L323-L355
[2]: https://www.postgresql.org/message-id/y9+huyslmap6y...@paquier.xyz




Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bind

2024-09-18 Thread Anthonin Bonnefoy
On Tue, Sep 17, 2024 at 5:00 PM Alexander Lakhin  wrote:
>
> Please look at an assertion failure, caused by \bind_named:
> regression=# SELECT $1 \parse s
> \bind_named s
>
> regression=# \bind_named
> \bind_named: missing required argument
> regression=# 1 \g
> psql: common.c:1501: ExecQueryAndProcessResults: Assertion `pset.stmtName != 
> ((void *)0)' failed.

Thanks for the report.

Looking at the failure, it seems like the issue was already present
with \bind, though there was no assertion failure: repeatedly calling
\bind would allocate new stmtName/bind_params and leak them at the
start of exec_command_bind.

I've joined a patch to clean the psql extended state at the start of
every extended protocol backslash command, freeing the allocated
variables and resetting the send_mode. Another possible approach would
be to return an error when there's already an existing state instead
of overwriting it.


v1-0001-Clean-psql-extended-state-on-extended-command.patch
Description: Binary data


Re: Add Pipelining support in psql

2024-11-28 Thread Anthonin Bonnefoy
On Wed, Nov 27, 2024 at 11:46 AM Kirill Reshke  wrote:
> I'm very doubtful about the \syncpipeline . Maybe we should instead
> support \sync meta-command in psql? This will be a useful contribution
> itself.

\syncpipeline is useful to cover regression tests involving implicit
transaction blocks within a pipeline where a sync acts as an implicit
COMMIT. What would be the use of sending a \sync outside of a
pipeline? All extended queries meta-commands sent by psql already send
a sync if you're not in a pipeline, so the only use of a \sync
meta-command would be to send a single sync (which could be achieved
with \startpipeline followed by a \endpipeline).

On Wed, Nov 27, 2024 at 1:54 PM Jelte Fennema-Nio  wrote:
> This behaviour is expected, it also happens if you send "SELECT 1"
> instead of "begin" in the parse message:
> db1=# \startpipeline
> db1=# SELECT 1 \parse p1
> db1=*#
>
> The reason this happens is that the first command in a pipeline sent
> to the server opens an implicit transaction. See the "implicit COMMIT"
> wording here[1], or look at this code in exec_parse_message:

I don't think that's the case here. Nothing was sent to the server so
there's no active transaction yet. From prompt.c, psql will show a '*'
when PQtransactionStatus is either PQTRANS_ACTIVE or PQTRANS_INTRANS.

Looking at the state of the PGConn, after a \startpipeline, we have:
db->xactStatus: (PGTransactionStatusType) PQTRANS_IDLE
db->asyncStatus: (PGAsyncStatusType) PGASYNC_IDLE
db->pipelineStatus: (PGpipelineStatus) PQ_PIPELINE_ON

After the first command is sent to the pipeline:
db->asyncStatus: (PGAsyncStatusType) PGASYNC_BUSY
db->xactStatus: (PGTransactionStatusType) PQTRANS_IDLE
db->pipelineStatus: (PGpipelineStatus) PQ_PIPELINE_ON

The xactStatus is idle, however, since asyncStatus reports busy, we
fall in the "if (conn->asyncStatus != PGASYNC_IDLE) return
PQTRANS_ACTIVE;" and psql display '*' in the prompt. This might be a
bit misleading since there's no ongoing transaction block on the
server and maybe it's worth having a new state? I've updated the patch
to display a tentative '|' when there's an ongoing pipeline. On the
other hand, a pipeline will start an implicit transaction block even
without BEGIN so leaving the '*' may be a good way to reflect that?

On Wed, Nov 27, 2024 at 1:45 PM Jelte Fennema-Nio  wrote:
> I played around quickly with this patch and it works quite well. A few
> things that would be nice improvements I think. Feel free to change
> the command names:
> 1. Add a \flush command that calls PQflush
> 2. Add a \flushrequest command that calls PQsendFlushRequest
> 3. Add a \getresult command so you can get a result from a query
> without having to close the pipeline
>
> To be clear, not having those additional commands isn't a blocker for
> this patch imho, but I'd definitely miss them if they weren't there
> when I would be using it.

I will need a bit of time to check those and how they can be included
in the regression tests.


v02-0001-Add-pipelining-support-in-psql.patch
Description: Binary data


Re: Consider pipeline implicit transaction as a transaction block

2024-11-27 Thread Anthonin Bonnefoy
On Thu, Nov 28, 2024 at 12:26 AM Michael Paquier  wrote:
> I don't mind being more careful here based on your concerns, so I'll
> go remove that in the stable branches.

Sorry about that. I didn't have a strong need for this to be
backpatched and should have made this clearer.




Re: Consider pipeline implicit transaction as a transaction block

2024-11-20 Thread Anthonin Bonnefoy
On Tue, Nov 5, 2024 at 6:50 AM Michael Paquier  wrote:
> It would be nice to document all these behaviors with regression
> tests in pgbench as it is the only place where we can control that
> with error pattern checks.
It's not the first time I wanted to be able to do pipelining in psql
as relying on pgbench and tap tests is not the most flexible so I took
a stab at adding it.

0001: This is a small bug I've stumbled upon. The query buffer is not
cleared on a backslash error. For example, "SELECT 1 \parse" would
fail due to a missing statement name but would leave "SELECT 1\n" in
the query buffer which would impact the next command.

0002: This adds pipeline support to psql by adding the same
meta-commands as pgbench: \startpipeline, \endpipeline and
\syncpipeline. Once a pipeline is started, all extended protocol
queries are piped until \endpipeline is called. To keep things simple,
meta-commands like \gx, \gdesc and \gexec are forbidden inside a
pipeline. The tests are covering the existing pipelining behaviour
with the SET LOCAL, SAVEPOINTS, REINDEX CONCURRENTLY and LOCK
commands, depending if it's the first command or not. The
\syncpipeline allows us to see that this "first command" behaviour
also happens after a synchronisation point within a pipeline.

0003: The initial change to use implicit transaction block when
pipelining is detected. The tests reflect the change in behaviour like
the LOCK command working if it's the second command in the pipeline.
I've updated the pipeline documentation to provide more details about
the use of implicit transaction blocks with pipelines.


v03-0001-Reset-query-buffer-on-a-backslash-error.patch
Description: Binary data


v03-0003-Consider-pipeline-implicit-transaction-as-a-tran.patch
Description: Binary data


v03-0002-Add-pipeline-support-in-psql.patch
Description: Binary data


Re: Consider pipeline implicit transaction as a transaction block

2024-11-26 Thread Anthonin Bonnefoy
On Wed, Nov 27, 2024 at 1:42 AM Michael Paquier  wrote:
> I've edited the whole, added this extra test based on \syncpipeline in
> 17~, kept the remaining tests in 14~ where pgbench is able to handle
> them, and backpatched that down to 13.  Let's see now what we can do
> with the psql bits.

Thanks!

> Anthonin, now that the original problem is solved, could you create a
> new thread with your new proposal for psql?  That would attract a
> better audience for reviews.

Yes, I will rebase and start the dedicated thread for the pipeline
support in psql.




Add Pipelining support in psql

2024-11-27 Thread Anthonin Bonnefoy
Hi,

With \bind, \parse, \bind_named and \close, it is possible to issue
queries from psql using the extended protocol. However, it wasn't
possible to send those queries using pipelining and the only way to
test pipelined queries was through pgbench's tap tests.

The attached patch adds pipelining support to psql with 3 new
meta-commands, mirroring what's already done in pgbench:
- \startpipeline starts a new pipeline. All extended queries will be
queued until the end of the pipeline is reached.
- \endpipeline ends an ongoing pipeline. All queued commands will be
sent to the server and all responses will be processed by the psql.
- \syncpipeline queue a synchronisation point without flushing the
commands to the server

Those meta-commands will allow testing pipelined query behaviour using
psql regression tests.

Regards,
Anthonin


v01-0001-Add-pipelining-support-in-psql.patch
Description: Binary data


Re: Consider pipeline implicit transaction as a transaction block

2024-11-25 Thread Anthonin Bonnefoy
Thanks for the review!

On Mon, Nov 25, 2024 at 7:39 AM Michael Paquier  wrote:
>
> This breaks an existing property of psql.  One example: \edit where we
> should keep the existing query buffer rather than discard it if a
> failure happens.  This patch forcibly removes the contents of the
> query buffer, and a failure could happen because of an incorrect
> option given to this meta-command.

Good point, I've removed the change.

> This relies on 0001 for the case where \gx fails and you want to make
> sure that the follow-up command is able to work in the same pipeline.
> I'd suggest to add a \reset after the \gx in this test sequence,
> without 0001.  At this point we have IMO more advantages in
> maintaining the existing properties rather than enforce it, especially
> as we have \reset.

Definitely, I've updated the test to use \reset.

> Why is stuff specific to psql being mentioned on the libpq page?  That
> does not seem adapted to me here.  Documenting the expectations and
> the assumptions one would rely on in the context of a pipeline and
> what the backend thinks of it as an internal transaction is a good
> idea.

I've realised the behavior was already documented in the protocol-flow
doc[1] with an existing mention on the difference between the first
and following commands of a pipeline.

> It is sad to not have at least 2~3 tests that we could use for the
> back-branches for the ERROR cases and not the first command cases with
> minimal sequences in pgbench?  I'm OK with your proposal with psql on
> HEAD, but backpatching without a few tests is going to make the
> maintenance of stable branches more complicated in the long run as we
> would navigate blindly.  So I'd try to fix the problems first, then
> work on any new proposal once we are in a clean operational state.
> And I'd recommend to propose the new feature on a new, separate,
> thread to attract a better audience for the reviews this would
> require.

I've definitely sidetracked myself with the pipeline support in psql.
I've reworked and reordered the patch set to focus on the issue and
backporting.

0001: Use implicit transaction block for the implicit pipeline
transaction. I've added tests in pgbench that covered the same checks
I did in psql. I've avoided using \syncpipeline since it was
introduced in 17. I've also slightly modified the protocol-flow
pipelining doc to provide more details on the implicit transaction
block and how the first message after a sync can be different. I've
added vacuum as first and second command of a pipeline in the tests
since it was mentioned in the doc.

0002: Pipeline support in psql. I'm still attaching the patch here as
it makes testing pipelining behavior much easier but this is more as a
reference. I will create a dedicated thread and commitfest entry for
it.

[1]: 
https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-FLOW-PIPELINING


v05-0002-Add-pipeline-support-in-psql.patch
Description: Binary data


v05-0001-Consider-pipeline-implicit-transaction-as-a-tran.patch
Description: Binary data


Re: Add Pipelining support in psql

2024-12-04 Thread Anthonin Bonnefoy
An updated version of the patch, still a bit rough around the edges.

I've added \flushrequest, \flush and \getrequests meta-commands.
\flushrequest and \getresults look interesting as they add an
additional protocol message to test, but it also introduces the
possibility to be in the middle of an aborted pipeline which adds some
complexity to the patch. I'm a bit skeptical of \flush as \getresults
already automatically flush through PQgetResult and I haven't thought
of interesting tests that could rely on \flush.

I've added a new prompt interpolation instead of trying to hijack the
existing %x: %P reports the pipeline status with either
'|num_syncs,num_queries,pending_results|' for an ongoing pipeline or
'+num_syncs,num_queries,pending_results+' for an aborted pipeline.
Those are definitely open to debate for those but I found those
informations useful while debugging.

With the new \flushrequest and \getresults, it's visible that %x
doesn't reflect the transaction status when pipelines are involved. If
I do:

postgres=# \startpipeline
postgres=|0,0,0|# BEGIN \bind \g
postgres=|0,1,0|*# \flushrequest
postgres=|0,1,1|*# \getresults
BEGIN
postgres=|0,0,0|#

There's an ongoing transaction block running on the server but %x will
be empty as there's no queued commands and the server results are
consumed. I'm tempted to change %x to not display anything when within
a pipeline.

Calling PQgetResult while a command is piped and not followed by a
flushrequest or a sync would block since the asyncstatus is busy. So
\getresults has to keep track of the number of pending results to
avoid calling PQgetResult if there's no pending results. This is to
cover cases like:

postgres=# \startpipeline
postgres=|0,0,0|# select 1 \bind \g
postgres=|0,1,0|*# \getresults
No pending results to get
postgres=|0,1,0|*#

There are some interesting behaviors with pipeline + copy. Running the
following:

\startpipeline
SELECT $1 \bind 'val1' \g
COPY psql_pipeline FROM STDIN \bind \g
\flushrequest
\getresults
3   test3
\.
\endpipeline

Will complain with a "message type 0x5a arrived from server while
idle" (but still work). It's also possible to trigger a protocol error
with a query error after a piped copy:

truncate pgbench_accounts;
\startpipeline
copy pgbench_accounts FROM stdin \bind \g
SELECT 1 \bind \g
\endpipeline
1 2 3 asd
\.

Copy will fail with "ERROR:  unexpected message type 0x50 during COPY
from stdin" and connection will be terminated.


v03-0001-Add-pipelining-support-in-psql.patch
Description: Binary data


Re: Add Pipelining support in psql

2024-12-10 Thread Anthonin Bonnefoy
An improved version with simplifications and refinements.

num_queries (2nd element in the pipeline status prompt) is now used to
track queued queries that were not flushed (with a flush request or
sync) to the server. It used to count both unflushed queries and
flushed queries.

Code in ExecQueryAndProcessResults should be simpler now.
- DiscardAbortedPipelineResults function handles both discarding of
results until a synchronisation point is reached or discarding of
results until there's no more pending results.
- The logic to process the pipeline's results and getting the next
results fit more with the existing flow.
- Tests didn't cover chunk results so I've added additional tests to
cover use of pipelining + FETCH_COUNT


v04-0001-Add-pipelining-support-in-psql.patch
Description: Binary data


Re: Add Pipelining support in psql

2024-12-12 Thread Anthonin Bonnefoy
Thanks for the review!

On Thu, Dec 12, 2024 at 12:53 AM Jelte Fennema-Nio  wrote:
> I think that new prompt is super useful, so useful in fact that I'd
> suggest linking to it from the \startpipeline docs.

Good point, I've added a paragraph with the link to the %P prompt.

> I do think that
> the wording in the docs could be a bit more precise:
> 1. The columns are not necessarily queries, they are messages or
> commands. i.e. \parse and \bind_named both count as 1, even though
> they form one query together.

Yeah, I wasn't super happy with the num_queries and wording. I've
renamed the prompt columns to piped_syncs, piped_commands and
pending_results and added more precision on what counts as a command.

> 2. messages not followed by \sync and \flushrequest, could very well
> already "be sent to the server" (if the client buffer got full, or in
> case of manual \flush). The main thing that \sync and \flushrequest do
> is make sure that the server actually sends its own result back, even
> if its buffer is not full yet.

I had the wrong assumptions on what was happening. I've changed the
definition of pending_results to: "pending_results is the number of
commands that were followed by either a \flushrequest or a
\syncpipeline, forcing the server to send the results that can be
retrieved with \getresults."

> The main feedback I have after playing around with this version is
> that I'd like to have a \getresult (without the s), to only get a
> single result. So that you can get results one by one, possibly
> interleaved with some other queries again.

\getresults was easier to implement since it was more or less the
current behaviour :). I didn't want to add a new meta-command just for
that (I feel like I'm already adding a lot of them) so I've added a
num_results parameter to \getresults. You should be able to use
\getresults 1 to get a single result. A sync response count as a
result.

> One thing I'm wondering is how useful the num_syncs count is in the
> pipeline prompt, since you never really wait for a sync.

With the addition of the num_results parameter for \getresults,
knowing the number of syncs becomes more useful when results are
consumed one by one.

> Regarding the usefulness of \flush. I agree it's not as useful as I
> thought, because indeed \getresults already flushes everything. But
> it's not completely useless either. The main way I was able to use it
> interactively in a somewhat interesting way was to send it after a
> long running query, and then while that's processing type up the next
> query after it.

I feel there's a large overlap between \flush and \flushrequest. On
the other hand, if people want to test the behaviour of pushing data
with and without a flush request message, then \flush can be useful.


v05-0001-Add-pipelining-support-in-psql.patch
Description: Binary data


Re: Consider pipeline implicit transaction as a transaction block

2024-11-22 Thread Anthonin Bonnefoy
Some minor changes: I forgot to add the new pipeline meta-commands to
psql's help output, this is now added in 0002. I've also reworded the
doc a bit.


v04-0001-Reset-query-buffer-on-a-backslash-error.patch
Description: Binary data


v04-0002-Add-pipeline-support-in-psql.patch
Description: Binary data


v04-0003-Consider-pipeline-implicit-transaction-as-a-tran.patch
Description: Binary data


Re: Accept recovery conflict interrupt on blocked writing

2025-01-21 Thread Anthonin Bonnefoy
Hi,

Thanks for the detailed explanations, I've definitely misinterpreted
how interrupts and errors were handled.

On Fri, Jan 17, 2025 at 7:03 PM Andres Freund  wrote:
>
> Might be worth using it it in src/test/postmaster/t/002_start_stop.pl? That
> has e.g. code to send a startup message.

I've changed pgproto to provide the necessary helpers to be used in
001_connection_limits.pl and 002_start_stop.pl.

> > Playing around with this it's unfortunately worse - we very commonly get to
> > ProcessClientWriteInterrupt() with InterruptHoldoffCount > 0. Including just
> > after processing recovery conflict interrupts. The reason for that is that 
> > if
> > we do throw an error the first thing we do is to hold interrupts
> > (c.f. sigsetjmp() in PostgresMain()) and then we call EmitErrorReport().
> >
> > Unfortunately I suspect that this means any protection we'd get from this
> > version of the patch is rather incomplete - if the recovery conflict is
> > triggered while not blocked, we'll ERROR out and report the message with
> > interrupts held. That cycle of recovery conflict signalling wouldn't ever be
> > resolved in such a case, I think. Of course we do re-signal recovery 
> > conflict
> > interrupts rather aggressively - but I'm not sure that's going to help
> > reliably.
>
> This unfortunately indeed is true.  If I, instead of the generate_series(),
> add the following to the pgproto query:
> DO \$\$BEGIN RAISE 'endless scream: %', repeat('a', 
> 100); END;\$\$;
>
> the conflict doesn't get handled.

A possible approach to handle this case could be to stop retrying and
return a write error if there's a recovery conflict on a blocked
write, with ProcessClientWriteInterrupt deciding whether a write is
retryable or not. WIthin ProcessClientWriteInterrupt, if we conflict
with the recovery on a blocked write and interrupts can be processed,
then we can call ProcessRecoveryConflictInterrupts with
ExitOnAnyError. Otherwise, if the interrupts can't be processed, then
we make the write not retryable, leading to a write error which will
close the socket with a "connection to client lost".

This requires a new CheckBlockedWriteConflictInterrupts function that
validates whether the process conflicts or not without processing the
conflict as we don't want to abort a writing if the conflict could be
ignored.

The code is a bit rough as there's duplicated logic checks between
CheckBlockedWriteConflictInterrupts and
ProcessRecoveryConflictInterrupt that could very probably be
simplified, but I wanted to validate the approach first. It would also
be probably worth adding more tests to cover all conflicts while
there's a blocked write, though I did add the RAISE case in the tests.

Regards,
Anthonin Bonnefoy


v04-0002-Accept-recovery-conflict-interrupt-on-blocked-wr.patch
Description: Binary data


v04-0001-Add-PgProto-test-module-to-send-message-on-a-raw.patch
Description: Binary data


Accept recovery conflict interrupt on blocked writing

2025-01-13 Thread Anthonin Bonnefoy
Hi,

I have replicas that have regular transient bursts of replay lag (>5
minutes). The events have the following symptoms:
- Replicas are using physical replication slot and hot_standby_feedback
- The lag recovers by itself after at most 15 minutes
- During the same timeframe, there's a query stuck in ClientWrite on
the replica for ~15 minutes (despite the 30s statement_timeout). The
matching client was terminated at the beginning when the server
started sending results and when the server process died, the replay
lag started to recover.

The 15 minutes timeout matches the default linux TCP retransmission
timeout[1]. A similar situation can be triggered by using psql
pipelining patch[2].

# On the primary, generate constant updates:
echo "\set aid random(1, 10 * :scale)
UPDATE pgbench_accounts SET bid=bid+1 WHERE aid=:aid;" > update.sql
pgbench -f update.sql -T900

-- On a replica:
\startpipeline
-- At least 2 select are needed to completely saturate socket buffers
select * from pgbench_accounts \bind \g
select * from pgbench_accounts \bind \g
-- Flush the commands to the server
\flush

After that, the queries are sent to the server but the client doesn't
consume the results and the backend process should be stuck in a
ClientWrite state. Eventually, usually after ~10s, WAL replay becomes
completely blocked (sometimes I need to redo the pipeline query if
nothing happens). Looking at the backtrace, the recovery process is
blocked on ResolveRecoveryConflictWithBufferPin. The conflicting
buffer pin is held by the pipelined query, currently blocked trying to
write the result to the socket which is completely saturated. During
this time, all interrupts are ignored and WAL replay won't be able to
recover until the socket becomes either writable or ends with an
error.

The same situation happened on my instances where a backend process
was sending results to a client that died without sending a FIN or a
RST. The only way for this process to die is to reach the TCP
retransmission timeout after 15m. During this time, it can conflict
with the recovery as it holds a buffer pin, possibly blocking the
recovery for the whole duration.

To avoid blocking recovery for an extended period of time, this patch
changes client write interrupts by handling recovery conflict
interrupts instead of ignoring them. Since the interrupt happens while
we're likely to have partially written results on the socket, there's
no easy way to keep protocol sync so the session needs to be
terminated.

Setting tcp_user_timeout is also a possible mitigation for users, but
my assumption of how conflict recovery is done is that it is not
desirable to block recovery for an extended period of time and it is
fine to be aggressive when the standby delay is exceeded.

[1]: https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html
[2]: https://commitfest.postgresql.org/51/5407/

Regards,
Anthonin Bonnefoy


v01-0001-Accept-recovery-conflict-interrupt-on-blocked-wr.patch
Description: Binary data


Re: Add Pipelining support in psql

2025-01-14 Thread Anthonin Bonnefoy
During my tests, I've noticed I didn't handle query cancellation, this
is now fixed. I've also added additional comments related to
available_results to make it clearer that it depends on what the
server has flushed to the client.


v07-0001-Add-pipelining-support-in-psql.patch
Description: Binary data


Re: Accept recovery conflict interrupt on blocked writing

2025-01-16 Thread Anthonin Bonnefoy
Bonjour Thomas,

On Wed, Jan 15, 2025 at 6:21 AM Thomas Munro  wrote:
> Right.  Before commit 0da096d7 in v17, the recovery conflict code
> running in a signal handler would have set ProcDiePending, so this
> looks like an unintended regression due to that commit.

The issue is happening on instances with 14.13. Though I haven't tried
to reproduce it yet on a version older than the current HEAD yet.

> In your patch you have CHECK_FOR_INTERRUPTS(), but I think that means
> it could also service other interrupts that we probably don't want to
> introduce without more study, no?  For example if
> ProcessRecoveryConflictInterrupts() returns without throwing, and
> another pending interrupt flag is also set.
>
> Thinking of other proc signals/interrupts, of course it's not really
> acceptable that we don't process at least also global barriers here,
> ie for a time controllable by a client, but let's call that an
> independent problem.  I think the minimal thing for this specific
> problem might be to use ProcessRecoveryConflictInterrupts() directly.

I think that's definitely happening, my first version used a
pg_unreachable after the CHECK_FOR_INTERRUPTS() which was triggered.
I've replaced it by a direct call to
ProcessRecoveryConflictInterrupts().

> The read from the socket also seems to have a variant of the problem,
> unless I'm missing something, except in that case I'm not sure it's
> new.  For sending you're proposing that our only choice is to kill the
> session, which makes sense, but the equivalent read side stay-in-sync
> strategy is to keep deferring until the client gets around to sending
> a complete message, if I read that right.

I'm not sure the read path has the same issue. That would require a
replica to do a non command read but I don't think it's possible? A
recovering instance can't run COPY, and is there another situation
where a replica would do a blocking read without the DoingCommandRead
flag? Also, the read side is also killing the session since
DoingCommandRead will be true most of the time, falling through to the
FATAL case.

> Hmm, I wonder if we could write tests for both directions in
> 031_recovery_conflict.pl, using a Perl to do raw pq protocol in TCP,
> as seen in some other recent proposals...  I might have a crack at
> that soon unless you want to.

I've tried adding a test for this. I've implemented some utility
functions to start a session, read a payload and send a simple query
using raw TCP. The implementation is definitely a bit rough as I'm far
from being fluent in perl. The socket utility functions should
probably be moved to a dedicated file and the Windows build is
currently failing. However, the test does trigger the issue with the
recovery blocked by the session blocked in ClientWrite.


v02-0001-Accept-recovery-conflict-interrupt-on-blocked-wr.patch
Description: Binary data


Re: Add Pipelining support in psql

2025-01-10 Thread Anthonin Bonnefoy
> I feel there's a large overlap between \flush and \flushrequest. On
> the other hand, if people want to test the behaviour of pushing data
> with and without a flush request message, then \flush can be useful.

I've been looking into some issues related to a backend process stuck
in ClientWrite state blocking the WAL replay and it turns out that
pipelining + \flush was extremely useful to reproduce the issue. It
makes it possible to saturate the server->client socket buffer since
psql doesn't consume the results, easily triggering the state where
the process is stuck in ClientWrite. So I'm amending my position on
the usefulness of \flush and definitely see interesting usages.

I've also found out that I didn't correctly manage connection reset.
I've fixed this in v6 by resetting the pipeline counters on connection
reset and only calling discardAbortedPipelineResults if we're inside a
pipeline.


v06-0001-Add-pipelining-support-in-psql.patch
Description: Binary data


Re: Accept recovery conflict interrupt on blocked writing

2025-01-17 Thread Anthonin Bonnefoy
I've tested on a PG16, the issue is indeed triggered with the
replication blocked while the conflicting query is stuck in
ClientWrite.

I've cleaned up the tests: I've created a dedicated PgProto
(definitely open to suggestions for a better name...) module
containing all the helpers to send and receive messages on a raw
socket in 0001.

0002 contains the change with the test using the new PgProto module.
The failure on windows I had was due to using the wrong username. I
was using $USER variable env and I'm now getting the user from an
existing psql session instead. Also, I think the DoingBlockingWrite
variable was unnecessary? Calling ProcessRecoveryConflictInterrupts
should be enough as I don't think QueryCancelHoldoffCount can be >0
when writing on a socket it should always be able to cancel the
statement.


v03-0001-Add-PgProto-test-module-to-send-message-on-a-raw.patch
Description: Binary data


v03-0002-Accept-recovery-conflict-interrupt-on-blocked-wr.patch
Description: Binary data


Re: Add Pipelining support in psql

2025-03-17 Thread Anthonin Bonnefoy
Here is a new patch set:

0001: This introduces the \sendpipeline meta-command and forbid \g in
a pipeline. This is to fix the formatting options of \g that are not
supported in a pipeline.

0002: Allows ';' to send a query using extended protocol when within a
pipeline by using PQsendQueryParams with 0 parameters. It is not
possible to send parameters with extended protocol this way and
everything will be propagated through the query string, similar to a
simple query.


v02-0001-psql-Create-new-sendpipeline-meta-command.patch
Description: Binary data


v02-0002-psql-Allow-to-add-queries-in-an-ongoing-pipeline.patch
Description: Binary data


Re: Add Pipelining support in psql

2025-03-18 Thread Anthonin Bonnefoy
On Tue, Mar 18, 2025 at 1:50 AM Michael Paquier  wrote:
> - count
> 
> - 4
> -(1 row)
>
> This removal done in the regression tests was not intentional.

Yes, thanks for fixing that.

> How about adding a check on PIPELINE_COMMAND_COUNT when sending a
> query through this path?  Should we check for more scenarios with
> syncs and flushes as well when sending these queries?

I've added additional tests when piping queries with ';':
- I've reused the same scenario with \sendpipeline: single query,
multiple queries, flushes, syncs, using COPY...
- Using ';' will replace the unnamed prepared statement. It's a bit
different from expected as a simple query will delete the unnamed
prepared statement.
- Sending an extended query prepared with \bind using a ';' on a
newline, though this is not specific to pipelining. The scanned
semicolon triggers the call to SendQuery, processing the buffered
extended query. It's a bit unusual but that's the current behaviour.


v03-0001-psql-Allow-to-add-queries-in-an-ongoing-pipeline.patch
Description: Binary data


Re: Add Pipelining support in psql

2025-04-22 Thread Anthonin Bonnefoy
On Tue, Apr 22, 2025 at 2:06 AM Michael Paquier  wrote:
> I am wondering if we could not be smarter with the handling of
> the counters, but I really doubt that there is much more we can do
> under a PGRES_FATAL_ERROR.

One thing that bothers me is that the reported error is silently
discarded within discardAbortedPipelineResults.

psql -f bug_assertion.sql
psql:bug_assertion.sql:7: ERROR:  unexpected message type 0x50 during
COPY from stdin
CONTEXT:  COPY psql_pipeline, line 1
psql:bug_assertion.sql:7: Pipeline aborted, command did not run

This should ideally report the "FATAL:  terminating connection because
protocol synchronization was lost" sent by the backend process.

Also, we still have a triggered assertion failure with the following:
CREATE TABLE psql_pipeline(a text);
\startpipeline
COPY psql_pipeline FROM STDIN;
SELECT 'val1';
\syncpipeline
\endpipeline
...
Assertion failed: (pset.piped_syncs == 0), function
ExecQueryAndProcessResults, file common.c, line 2153.

A possible alternative could be to abort discardAbortedPipelineResults
when we encounter a res != NULL + FATAL error and let the outer loop
handle it. As you said, the pipeline flow is borked so there's not
much to salvage. The outer loop would read and print all error
messages until the closed connection is detected. Then,
CheckConnection will reset the connection which will reset the
pipeline state.

While testing this change, I was initially looking for the "FATAL:
terminating connection because protocol synchronization was lost"
message in the tests. However, this was failing on Windows[1] as the
FATAL message wasn't reported on stderr. I'm not sure why yet.

[1]: https://cirrus-ci.com/task/5051031505076224?logs=check_world#L240-L246


v02-0001-PATCH-psql-Fix-assertion-failure-with-pipeline-m.patch
Description: Binary data


Re: Add Pipelining support in psql

2025-02-18 Thread Anthonin Bonnefoy
On Tue, Feb 18, 2025 at 8:23 AM Michael Paquier  wrote:
> This is a pretty cool patch.  I like the structure you have used for
> the integration with the tracking of the number of commands, the
> number of syncs (like pgbench) put in a pipeline, the number of
> results requested and the number of results available.  That makes the
> whole easier to look at with a state in pset.

Thanks!

> +   PSQL_SEND_PIPELINE_SYNC,
> +   PSQL_START_PIPELINE_MODE,
> +   PSQL_END_PIPELINE_MODE,
> +   PSQL_FLUSH,
> +   PSQL_SEND_FLUSH_REQUEST,
> +   PSQL_GET_RESULTS,
>
> These new values are inconsistent, let's use some more PSQL_SEND_*
> here.  That makes the whole set of values more greppable.

Changed.

> The tests in psql.sql are becoming really long.  Perhaps it would be
> better to split that into its own file, say psql_pipeline.sql?  The
> input file is already 2k lines, you are adding 15% more lines to that.

Agreed, I wasn't sure if this was enough to warrant a dedicated test
file. This is now separated in psql_pipeline.sql.

> + * otherwise, calling PQgetResults will block.
>
> Likely PQgetResults => PQgetResult().

Indeed, this is fixed.

> Set of nits with the style of the code, but I'd suggest to use
> braces {} here, to outline that the comment is in a block.  There's a
> second, larger one in discardAbortedPipelineResults().
>
> +   if (strcmp(cmd, "gx") == 0 && PQpipelineStatus(pset.db) != 
> PQ_PIPELINE_OFF)
> +   {
> +   pg_log_error("\\gx not allowed in pipeline mode");
> +   clean_extended_state();
> +   return PSQL_CMD_ERROR;
> +   }

Changed.

> What is the reasoning here behind this restriction?  \gx is a wrapper
> of \g with expanded mode on, but it is also possible to call \g with
> expanded=on, bypassing this restriction.

The issue is that \gx enables expanded mode for the duration of the
query and immediately reset it in sendquery_cleanup. With pipelining,
the command is piped and displaying is done by either \endpipeline or
\getresults, so the flag change has no impact. Forbidding it was a way
to make it clearer that it won't have the expected effect. If we
wanted a similar feature, this would need to be done with something
like \endpipelinex or \getresultsx.

> Let's split the prompt patch with the support of %P into its own
> patch.
>
> -#define DEFAULT_PROMPT1 "%/%R%x%# "
> -#define DEFAULT_PROMPT2 "%/%R%x%# "
> +#define DEFAULT_PROMPT1 "%/%R%P%x%# "
> +#define DEFAULT_PROMPT2 "%/%R%P%x%# "
>  #define DEFAULT_PROMPT3 ">> "
>
> I don't think that changing this default is a good idea.  Everybody
> can do that in their own .psqlrc (spoiler: I do).
>
> The idea to use three fields with a hardcoded format does not look
> like a good idea to me.  I think that this should be done in a
> different and more extensible way:
> - Use %P to track if we are in pipeline mode on, off or abort.
> - Define three new variables that behave like ROW_COUNT, but for the
> fields you want to track here.  These could then be passed down to a
> PROMPT with %:name:, grammar already supported.
>
> That would make the whole much more flexible.  At it seems to me that
> we could also add requested_results to this set?  These could be named
> with the same prefix, like PIPELINE_SYNC_COUNT, etc.

I've split the patch and created the 3 special variables:
PIPELINE_SYNC_COUNT, PIPELINE_COMMAND_COUNT, PIPELINE_RESULT_COUNT.

For requested_results, I don't think there's value in exposing it
since it is used as an exit condition and thus will always be 0
outside of ExecQueryAndProcessResults.


v08-0001-Add-pipelining-support-in-psql.patch
Description: Binary data


v08-0002-Add-prompt-interpolation-and-variables-for-psql-.patch
Description: Binary data


Memory context can be its own parent and child in replication command

2025-03-07 Thread Anthonin Bonnefoy
Hi,

While running some tests with logical replication, I've run in a
situation where a walsender process was stuck in an infinite loop with
the following backtrace:

#0  in MemoryContextDelete (...) at ../src/backend/utils/mmgr/mcxt.c:474
#1  in exec_replication_command (cmd_string=... "BEGIN") at
../src/backend/replication/walsender.c:2005

Which matches the following while loop:
while (curr->firstchild != NULL)
curr = curr->firstchild;

Inspecting the memory context, I have the following:

$9 = (MemoryContext) 0xafb741c35360
(gdb) p *curr
$10 = {type = T_AllocSetContext, isReset = false, allowInCritSection =
false, mem_allocated = 8192, methods = 0xafb7278a82d8
, parent = 0xafb741c35360, firstchild =
0xafb741c35360, prevchild = 0x0, nextchild = 0x0, name =
0xafb7275f68a8 "Replication command context", ident = 0x0, reset_cbs =
0x0}

So the memory context is 0xafb741c35360, which is also the same value
for parent and firstchild. This explains the infinite loop as
MemoryContextDelete tries to find the leaf with no children.

I was able to get a rr recording of the issue and trace how this
happened. This can be reproduced by triggering 2 replication commands,
with the first one doing a snapshot export:

CREATE_REPLICATION_SLOT "test_slot" LOGICAL "test_decoding" ( SNAPSHOT
"export");
DROP_REPLICATION_SLOT "test_slot";

- CreateReplicationSlot will start a new transaction to handle the
snapshot export.
- This transaction will save the replication command context (let's
call it ctx1) in its state.
- ctx1 is deleted at the end of exec_replication_command
- During the next replication command, the transaction is aborted in
SnapBuildClearExportedSnapshot
- The transaction restores ctx1 as the CurrentMemoryContext
- AllocSetContextCreate is called with ctx1 as a parent, and it will
pull ctx1 from the freelist
- ctx1's parent and child will be set to ctx1 and returned
- During ctx1 deletion, it will be stuck in an infinite loop

I've added a tap test to reproduce the issue, along with an assertion
during context creation to check the parent and returned context are
not the same so the test would immediately abort and not stay stuck.

To fix this, it seems like saving and restoring the memory context
after the call AbortCurrentTransaction was the best approach. It is
similar to what's done with CurrentResourceOwner. I've thought of
switching to the TopMemoryContext before exporting the snapshot so
aborting the transaction will switch back to TopMemoryContext, but
this would still require restoring the memory context after the
transaction is aborted.

Regards,
Anthonin Bonnefoy


v01-0001-Avoid-using-deleted-context-with-replication-com.patch
Description: Binary data


Re: Add Pipelining support in psql

2025-03-07 Thread Anthonin Bonnefoy
On Thu, Mar 6, 2025 at 5:20 AM Michael Paquier  wrote:
> That was not a test case we had in mind originally here, but if it is
> possible to keep the implementation simple while supporting your
> demand, well, let's do it.  If it's not that straight-forward, let's
> use the new meta-command, forbidding \g and \gx based on your
> arguments from upthread.

I think the new meta-command is a separate issue from allowing ';' to
push in a pipeline. Any time there's a change or an additional format
option added to \g, it will need to be forbidden for pipelining. The
\sendpipeline meta-command will help keep those exceptions low since
the whole \g will be forbidden.

Another possible option would be to allow both \g and \gx, but send a
warning like "printing options within a pipeline will be ignored" if
those options are used, similar to "SET LOCAL" warning when done
outside of a transaction block. That would have the benefit of making
existing scripts using \g and \gx compatible.

For using ';' to push commands in a pipeline, I think it should be
fairly straightforward. I can try to work on that next week (I'm
currently chasing a weird memory context bug that I need to finish
first).

On Fri, Mar 7, 2025 at 1:05 AM Jelte Fennema-Nio  wrote:
> One thing I've noticed is that \startpipeline throws warnings when
> copy pasting multiple lines. It seems to still execute everything as
> expected though. As an example you can copy paste this tiny script:
>
> \startpipeline
> select pg_sleep(5) \bind \g
> \endpipeline
>
> And then it will show these "extra argument ... ignored" warnings
>
> \startpipeline: extra argument "select" ignored
> \startpipeline: extra argument "pg_sleep(5)" ignored

It looks like an issue with libreadline. At least, I've been able to
reproduce the warnings and 'readline(prompt);' returns everything as a
single line, with the \n inside the string. This explains why what is
after \startpipeline is processed as arguments. This can also be done
with:
select 1 \bind \g
select 2 \bind \g
And somehow, I couldn't reproduce the issue anymore once I've compiled
and installed libreadline with debug symbols.




Re: Add Pipelining support in psql

2025-03-04 Thread Anthonin Bonnefoy
On Tue, Mar 4, 2025 at 1:32 PM Daniel Verite  wrote:
> But if the code triggered the use of the extended query protocol
> if \bind is in effect *or* a pipeline is active, then the first sequence
> would  just push "select 1" into the pipeline.
>
> This would have the advantage that, to submit into a pipeline
> a pre-existing file with SQL commands separated with ";"  you don't have
> to pre-process it to inject metacommands. Adding a \startpipeline at
> the beginning and an \endpipeline at the end would be sufficient in the
> cases that the user does not need the results before the end.
>
> The \sendpipeline is not mandatory when ";" can be used to terminate
> the queries. But it makes it clearer that the script wants
> specifically to push into a pipeline, and it might accept specific
> options in the future, whereas obviously ";" cannot.

So if I understand correctly, you want to automatically convert a
simple query into an extended query when we're within a pipeline. That
would be doable with:

--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -1668,7 +1668,16 @@ ExecQueryAndProcessResults(const char *query,
}
break;
case PSQL_SEND_QUERY:
-   success = PQsendQuery(pset.db, query);
+   if (PQpipelineStatus(pset.db) != PQ_PIPELINE_OFF) {
+   success = PQsendQueryParams(pset.db, query,
+
pset.bind_nparams, NULL,
+   (const
char *const *) pset.bind_params,
+   NULL, NULL, 0);
+   if (success)
+   pset.piped_commands++;
+   }
+   else
+   success = PQsendQuery(pset.db, query);
break;
}

I do see the idea to make it easier to convert existing scripts into
using pipelining. The main focus of the initial implementation was
more on protocol regression tests with psql, so that's not necessarily
something I had in mind. I have some reservation as it will push all
parameters in the query string which may not be the desired behaviour.
But on the other hand, if it is to convert existing psql scripts, then
everything was already pushed as simple queries. Plus, this is similar
to what pgbench is doing when using -Mextended or -Mprepared.




Re: Add Pipelining support in psql

2025-03-04 Thread Anthonin Bonnefoy
On Tue, Mar 4, 2025 at 1:57 AM Michael Paquier  wrote:
> Anyway, can we really say that the set of printQueryOpt saved at
> execution is the correct set to use?  It is possible to have the
> opposite argument and say that we should just apply the printQueryOpt
> at the moment where \getresult is run.  A benefit of that it to keep
> the loop retrieving results simpler in ExecQueryAndProcessResults(),
> because we pass down to this call *one* printQueryOpt as "opt".
>
> There's of course the simplicity argument that I do like a lot here,
> but applying the printing options at the time of \getresult feels also
> more natural to me.

Saving the printQueryOpt when a command is pushed was an option I had
in mind if that was straightforward to implement. However, even with
savePsetInfo, you will need to save values like gfname and gset_prefix
since it impacts the output (it may make sense to move those in
printQueryOpt). This would also need to be saved for all commands,
like \close or \parse since we don't distinguish if a piped command
generates an output or not. So that definitely looks like it would add
a lot of complexity for limited benefit.

> FWIW, I agree that HEAD is unbalanced with its handling of \gx, so we
> could do one of the following two things:
> 1) Ignore any formatting options given to \g, but also allow \gx to
> run, documenting that during a pipeline the formatting options are
> ignored, and that the set of options defined when doing a \getresult
> is the only thing that matters.
> 2) Invent a new meta-command (suggested by Daniel Verite off-list, and
> not published to the lists because I don't know how to do a
> reply-all), like a \send, a \push, forbiding entirely \gx and \g when
> in a pipeline.  If we were to integrate options into this new
> meta-command, a split with \g would make an integration easier to
> think about.  One name suggestion I can come up is \sendpipeline.
>
> I'd consider option 2, based on Daniel's concerns.  One line I'm going
> to draw here is that we should not go down to manipulations of
> printQueryOpt while retrieving batch of results depending on the style
> of output that was defined when sending a query in a pipeline.

Another possible option would be to directly send the command without
requiring an additional meta-command, like "SELECT 1 \bind". However,
this would make it more painful to introduce new parameters, plus it
makes the \bind and \bind_named inconsistent as it is normally
required to send the result with a separate meta-command.

I like the \sendpipeline option. It makes it clearer that formatting
options are not applicable within a pipeline (at least, in the current
implementation) and I think it would make more sense to have those
formatting options in \getresults or \endpipeline.

I took a stab at creating the \sendpipeline meta-command. I've also
realised there's a small leak where fname is currently not freed on
queries like 'select ... \bind \gx filename' when within a pipeline,
which is fixed in patch 0001.


v01-0001-Fix-fname-leak-on-gx-within-a-pipeline.patch
Description: Binary data


v01-0002-psql-Create-new-sendpipeline-meta-command.patch
Description: Binary data


  1   2   >