On 12.02.2025 19:00, Alvaro Herrera wrote:
On 2025-Feb-12, Julien Rouhaud wrote:

On Wed, Feb 12, 2025 at 01:57:47PM +0100, Alvaro Herrera wrote:
Anyway, I think that's different.  We do support compute_query_id=off as
a way for a custom module to compute completely different query IDs
using their own algorithm, which I think is what you're referring to.
However, the ability to affect the way the in-core algorithm works is a
different thing: you still want in-core code to compute the query ID.
I don't think that's the actual behavior, or at least not what it was
supposed to be.

What we should have is the ability to compute queryid, which can be
either in core or done by an external module, but one only one can /
should be done.
Yes, that's what I tried to say, but I don't understand why you say I
said something different.

Right now, the proposal in the other thread is that if you want to
affect that algorithm in order to merge arrays to be considered a single
query element regardless of its length, you set the GUC for that.
Initially the GUC was in the core code.  Then, based on review, the GUC
was moved to the extension, _BUT_ the implementation was still in the
core code: in order to activate it, the extension calls a function that
modifies core code behavior.  So there are more moving parts than
before, and if you for whatever reason want that behavior but not the
extension, then you need to write a C function.  To me this is absurd.
So what I suggest we do is return to having the GUC in the core code.
I agree, although that probably breaks the queryid extensibility.
It does?

I haven't read the patch but IIUC if you want the feature to work you
need to both change the queryid calculation but also the way the
constants are recorded and the query text is normalized, and I don't
know if extensions have access to it.
Hmm.  As for the query text: with Andrey's feature with the GUC in core,
a query like this
SELECT foo FROM tab WHERE col1 IN (1,2,3,4)
will have in pg_stat_activity an identical query_id to a query like this
SELECT foo WHERE tab WHERE col1 IN (1,2,3,4,5)
even though the query texts differ (in the number of elements in the
array).  I don't think this is a problem.  This means that the query_id
for two different queries can be identical, but that should be no
surprise, precisely because the GUC that controls it is documented to do
that.

If pg_stat_statements is enabled with Andrey's patch, then the same
query_id will have a single entry (which has stats for both execution of
those queries) with that query_id, with a normalized query text that is
going to be different from those two above; without Andrey's feature,
the text would be
SELECT foo WHERE tab WHERE col1 IN ($1,$2,$3,$4);
SELECT foo WHERE tab WHERE col1 IN ($1,$2,$3,$4,$5);
(that is, pg_stat_statements transformed the values into placeholders,
but using exactly the same number of items in the array as the original
queries).  With Andrey's feature, it will be
SELECT foo WHERE tab WHERE col1 IN (...);
that is, the query text has been modified and no longer matches exactly
any of the queries in pg_stat_activity.  But note that the query text
already does not match what's in pg_stat_activity, even before Andrey's
patch.

I don't understand what you mean with "the way the constants are
recorded".  What constants are you talking about?  pg_stat_statements
purposefully discards any constants used in the query (obviously).

If they have access and fail to do what the GUC asked then of course
that's just a bug in that extension.
I don't understand what bug are you thinking that such hypothetical
extension would have.  (pg_stat_statements does of course have access to
the query text and to the location of all constants).

Now I admit I'm not sure what the solution would be for the problem
discussed in this subthread.  Apparently the problem is related to temp
tables and their changing OIDs.  I'm not sure what exactly the proposal
for a GUC is.
I'm not proposing anything, just explaining why pg_stat_statements is
generally useless if you use temp tables as someone asked.
Ah, okay.  Well, where you see a deficiency, I see an opportunity for
improvement :-)


Hi everyone,

I support the idea of computing the planid  for temporary tables using 'pg_temp.rel_name'. Moreover, we have already started using this approach for computing queryid [0]. It seems reasonable to apply the same logic to the planid calculation as well.

[0]: https://www.postgresql.org/message-id/flat/Z9mkqplmUpQ4xG52%40msg.df7cb.de#efb20f01bec32aeafd58e5d4ab0dfc16

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.



Reply via email to