Hi Dmitry, On Sun, May 4, 2025 at 6:19 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > On Thu, May 01, 2025 at 09:55:47PM GMT, Dmitry Dolgov wrote: > > > On Thu, May 01, 2025 at 09:29:13AM GMT, Michael Paquier wrote: > > > > > > I agree that the current solution we have in the tree feels incomplete > > > because we are not taking into account the most common cases that > > > users would care about. Now, allowing PARAM_EXTERN means that we > > > allow any expression to be detected as a squashable thing, and this > > > kinds of breaks the assumption of IsSquashableConst() where we want > > > only constants to be allowed because EXECUTE parameters can be any > > > kind of Expr nodes. At least that's the intention of the code on > > > HEAD. > > > > > > Now, I am not actually objecting about PARAM_EXTERN included or not if > > > there's a consensus behind it and my arguments are considered as not > > > relevant. The patch is written so as it claims that a PARAM_EXTERN > > > implies the expression to be a Const, but it may not be so depending > > > on what the execution path is given for the parameter. Or at least > > > the patch could be clearer and rename the parts about the "Const" > > > squashable APIs around queryjumblefuncs.c. > > > > > > [...] > > > > > > The PARAM_EXTERN part has been mentioned a couple of weeks ago here, > > > btw: > > > https://www.postgresql.org/message-id/CAA5RZ0tu6_KRiYJCFptf4_--wjFSu9cZMj1XNmOCqTNxu=v...@mail.gmail.com > > > > In fact, this has been discussed much earlier in the thread above, as > > essentially the same implementation with T_Params, which is submitted > > here, was part of the original patch. The concern was always whether or > > not it will break any assumption about query identification, because > > this way much broader scope of expressions will be considered equivalent > > for query id computation purposes. > > > > At the same time after thinking about this concern more, I presume it > > already happens at a smaller scale -- when two queries happen to have > > the same number of parameters, they will be indistinguishable even if > > parameters are different in some way. > > Returning to the topic of whether to squash list of Params. > > Originally squashing of Params wasn't included into the squashing patch due to > concerns from reviewers about treating quite different queries as the same for > the purposes of query identification. E.g. there is some assumption somewhere, > which will be broken if we treat query with a list of integer parameters same > as a query with a list of float parameters. For the sake of making progress > I've decided to postpone answering this question and concentrate on more > simple > scenario. Now, as the patch was applied, I think it's a good moment to reflect > on those concerns. It's not enough to say that we don't see any problems with > squashing of Param, some more sound argumentation is needed. So, what will > happen if parameters are squashed as constants? > > 1. One obvious impact is that more queries, that were considered distinct > before, will have the same normalized query and hence the entry in > pg_stat_statements. Since a Param could be pretty much anything, this can lead > to a situation when two queries with quiet different performance profiles > (e.g. > one contains a parameter value, which is a heavy function, another one > doesn't) > are matched to one entry, making it less useful. > > But at the same time this already can happen if those two queries have the > same > number of parameters, since query parametrizing is intrinsically lossy in this > sense. The only thing we do by squashing such queries is we loose information > about the number of parameters, not properties of the parameters themselves. > > 2. Another tricky scenario is when queryId is used by some extension, which in > turn makes assumption about it that are going to be rendered incorrect by > squashing. The only type of assumptions I can imagine falling into this > category is anything about equivalence of queries. For example, an extension > can capture two queries, which have the same normalized entry in pgss, and > assume all properties of those queries are the same. > > It's worth noting that normalized query is not transitive, i.e. if a query1 > has > the normalized version query_norm, and a query2 has the same normalized > version > query_norm, it doesn't mean query1 is equivalent query2 in all senses (e.g. > they could have list of parameter values with different type and the same > size). That means that such assumptions are already faulty, and could work > most > of the time only because it takes queries with a list of the same size to > break > the assumption. Squashing such queries will make them wrong more often. > > One can argue that we might want to be friendly to such extensions, and do not > "break" them even further. But I don't think it's worth it, as number of such > extensions is most likely low, if any. One more extreme case would be when an > extension assumes that queries with the same entry in pgss have the same > number > of parameters, but I don't see how such assumption could be useful. > > 3. More annoying is the consequence that parameters are going to be treated as > constants in pg_stat_statements. While mostly harmless, that would mean > they're > going to be replaced in the same way as constants. This means that the > parameter order is going to be lost, e.g.: > > SELECT * FROM test_squash WHERE data IN ($4, $3, $2, $1) \bind 1 2 3 4 > -- output > SELECT * FROM test_squash WHERE data IN ($1 /*, ... */) > > SELECT * FROM test_squash WHERE data IN ($1, $2, $3, $4) > AND id IN ($5, $6, $7, $8) \bind 1 2 3 4 5 6 7 8 > -- output > SELECT * FROM test_squash WHERE data IN ($1 /*, ... */) > AND id IN ($2 /*, ... */) > > This representation could be confusing of course. It could be either explained > in the documentation, or LocationLen has to be extended to carry information > about whether it's a constant or a parameter, and do not replace the latter. > In > any case, anything more than the first parameter number will be lost, but it's > probably not so dramatic. > > At the end of the day, I think the value of squashing for parameters outweighs > the problems described above. As long as there is an agreement about that, > it's > fine by me. I've attached the more complete version of the patch (but without > modifying LocationLen to not replace Param yet) in case if such agreemeng will > be achieved.
Would it make sense to rename `RecordConstLocation` to something like `RecordExpressionLocation` instead? - /* Array of locations of constants that should be removed */ + /* Array of locations of constants that should be removed and parameters */ LocationLen *clocations; should be + /* Array of locations of constants and parameters that should be removed */ You could also consider renaming `clocations` to `elocations`, this may introduce some additional churn though. -- Regards Junwang Zhao