On Thu, Mar 06, 2025 at 06:44:27PM -0600, Sami Imseih wrote: > Regarding the issue itself, query jumbling behavior is often subjective, > making it difficult to classify as a bug. I'm not entirely sure this > qualifies as a bug either, but I do believe it should be addressed.
I would call that a bug and something that we should fix, but not something that we can really backpatch as this has unfortunately an impact on monitoring tools. Stability takes priority in this area in already released branches. > By rearranging them as done in variant A, the position where the expression > is appended in the jumble changes, leading to a different hash. I just > don't like > this solution because it requires one to carefully construct a struct, > but it maybe the best out of the other options. I prefer something like variant A. It would not be the first time we are manipulating the structure of the parse nodes for the sake of making the query jumbling more transparent. An advantage of depending on the structure definition is that we can document the expectation in the header, and not hide it in the middle of the jumble code. > Variant B is not acceptable IMO as it adds a whole bunch of null-terminators > unnecessarily. For example, in a simple "select 1", (expr == NULL) is > true 19 times, > so that is an extra 19 bytes. Variant B is not acceptable here. > I think a third option is to add a new pg_node_attr called "query_jumble_null" > and be very selective on which fields should append a null-terminator to the > jumble when the expression is null > > The queryjumblefuncs.c could look like the below. JUMBLE_NULL will > be responsible for appending the null terminator. Not much a fan of that. For the sake of this thread, I'd still go with the simplicity of A. And please, let's add a couple of queries in pgss to track that these lead to two different entries. Another option that I was thinking about and could be slightly cleaner is the addition of an extra field in Query that is set when we go through a offset_clause in the parser. It could just be a boolean, false by default. We have been using this practice in in DeallocateStmt, for example. And there are only a few places where limitOffset is set. However, I'd rather discard this idea as transformSelectStmt() has also the idea to transform a LIMIT clause into an OFFSET clause, changing a Query representation. And note that we calculate the query jumbling after applying the query transformation. For these reasons, variant A where we put the LimitOption between the two int8 expression nodes feels like the "okay" approach here. But we must document this expectation in the structure, and check for more grammar variants of LIMIT and OFFSET clauses in pgss. Another concept would be to add into the jumble the offset of a Node in the upper structure we are jumbling, but this requires keeping a track of all the nested things, which would make the query jumbling code much more complicated and more expensive, for sure. I'd also recommend to be careful and double-check all these transformation assumptions for LIMIT and OFFSET. We should shape things so as an OFFSET never maps to a LIMIT variant when we differentiate all these flavors at query string level. -- Michael
signature.asc
Description: PGP signature