Hi Dmitry,

On Sun, May 4, 2025 at 6:19 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Thu, May 01, 2025 at 09:55:47PM GMT, Dmitry Dolgov wrote:
> > > On Thu, May 01, 2025 at 09:29:13AM GMT, Michael Paquier wrote:
> > >
> > > I agree that the current solution we have in the tree feels incomplete
> > > because we are not taking into account the most common cases that
> > > users would care about.  Now, allowing PARAM_EXTERN means that we
> > > allow any expression to be detected as a squashable thing, and this
> > > kinds of breaks the assumption of IsSquashableConst() where we want
> > > only constants to be allowed because EXECUTE parameters can be any
> > > kind of Expr nodes.  At least that's the intention of the code on
> > > HEAD.
> > >
> > > Now, I am not actually objecting about PARAM_EXTERN included or not if
> > > there's a consensus behind it and my arguments are considered as not
> > > relevant.  The patch is written so as it claims that a PARAM_EXTERN
> > > implies the expression to be a Const, but it may not be so depending
> > > on what the execution path is given for the parameter.  Or at least
> > > the patch could be clearer and rename the parts about the "Const"
> > > squashable APIs around queryjumblefuncs.c.
> > >
> > > [...]
> > >
> > > The PARAM_EXTERN part has been mentioned a couple of weeks ago here,
> > > btw:
> > > https://www.postgresql.org/message-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=v...@mail.gmail.com
> >
> > In fact, this has been discussed much earlier in the thread above, as
> > essentially the same implementation with T_Params, which is submitted
> > here, was part of the original patch. The concern was always whether or
> > not it will break any assumption about query identification, because
> > this way much broader scope of expressions will be considered equivalent
> > for query id computation purposes.
> >
> > At the same time after thinking about this concern more, I presume it
> > already happens at a smaller scale -- when two queries happen to have
> > the same number of parameters, they will be indistinguishable even if
> > parameters are different in some way.
>
> Returning to the topic of whether to squash list of Params.
>
> Originally squashing of Params wasn't included into the squashing patch due to
> concerns from reviewers about treating quite different queries as the same for
> the purposes of query identification. E.g. there is some assumption somewhere,
> which will be broken if we treat query with a list of integer parameters same
> as a query with a list of float parameters. For the sake of making progress
> I've decided to postpone answering this question and concentrate on more 
> simple
> scenario. Now, as the patch was applied, I think it's a good moment to reflect
> on those concerns. It's not enough to say that we don't see any problems with
> squashing of Param, some more sound argumentation is needed. So, what will
> happen if parameters are squashed as constants?
>
> 1. One obvious impact is that more queries, that were considered distinct
> before, will have the same normalized query and hence the entry in
> pg_stat_statements. Since a Param could be pretty much anything, this can lead
> to a situation when two queries with quiet different performance profiles 
> (e.g.
> one contains a parameter value, which is a heavy function, another one 
> doesn't)
> are matched to one entry, making it less useful.
>
> But at the same time this already can happen if those two queries have the 
> same
> number of parameters, since query parametrizing is intrinsically lossy in this
> sense. The only thing we do by squashing such queries is we loose information
> about the number of parameters, not properties of the parameters themselves.
>
> 2. Another tricky scenario is when queryId is used by some extension, which in
> turn makes assumption about it that are going to be rendered incorrect by
> squashing. The only type of assumptions I can imagine falling into this
> category is anything about equivalence of queries. For example, an extension
> can capture two queries, which have the same normalized entry in pgss, and
> assume all properties of those queries are the same.
>
> It's worth noting that normalized query is not transitive, i.e. if a query1 
> has
> the normalized version query_norm, and a query2 has the same normalized 
> version
> query_norm, it doesn't mean query1 is equivalent query2 in all senses (e.g.
> they could have list of parameter values with different type and the same
> size). That means that such assumptions are already faulty, and could work 
> most
> of the time only because it takes queries with a list of the same size to 
> break
> the assumption. Squashing such queries will make them wrong more often.
>
> One can argue that we might want to be friendly to such extensions, and do not
> "break" them even further. But I don't think it's worth it, as number of such
> extensions is most likely low, if any. One more extreme case would be when an
> extension assumes that queries with the same entry in pgss have the same 
> number
> of parameters, but I don't see how such assumption could be useful.
>
> 3. More annoying is the consequence that parameters are going to be treated as
> constants in pg_stat_statements. While mostly harmless, that would mean 
> they're
> going to be replaced in the same way as constants. This means that the
> parameter order is going to be lost, e.g.:
>
> SELECT * FROM test_squash WHERE data IN ($4, $3, $2, $1) \bind 1 2 3 4
> -- output
> SELECT * FROM test_squash WHERE data IN ($1 /*, ... */)
>
> SELECT * FROM test_squash WHERE data IN ($1, $2, $3, $4)
>     AND id IN ($5, $6, $7, $8) \bind 1 2 3 4 5 6 7 8
> -- output
> SELECT * FROM test_squash WHERE data IN ($1 /*, ... */)
>     AND id IN ($2 /*, ... */)
>
> This representation could be confusing of course. It could be either explained
> in the documentation, or LocationLen has to be extended to carry information
> about whether it's a constant or a parameter, and do not replace the latter. 
> In
> any case, anything more than the first parameter number will be lost, but it's
> probably not so dramatic.
>
> At the end of the day, I think the value of squashing for parameters outweighs
> the problems described above. As long as there is an agreement about that, 
> it's
> fine by me. I've attached the more complete version of the patch (but without
> modifying LocationLen to not replace Param yet) in case if such agreemeng will
> be achieved.

Would it make sense to rename `RecordConstLocation` to something like
`RecordExpressionLocation` instead?

- /* Array of locations of constants that should be removed */
+ /* Array of locations of constants that should be removed and parameters */
  LocationLen *clocations;

should be

+ /* Array of locations of constants and parameters that should be removed */

You could also consider renaming `clocations` to `elocations`, this
may introduce
some additional churn though.

-- 
Regards
Junwang Zhao


Reply via email to