Hey, let me know if there's any way I can help. I would argue that making even a small improvement here would be beneficial to many.
On Tue, Jul 21, 2020 at 11:59 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > On Thu, Oct 3, 2019 at 3:33 AM Pavel Trukhanov < > pavel.trukha...@gmail.com> wrote: > > > >> On Wed, Jun 26, 2019 at 11:10 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > >> > >> Greg Stark <st...@mit.edu> writes: > >> > Actually thinking about this for two more seconds the question is > what it > >> > would do with a query like > >> > WHERE col = ANY '1,2,3'::integer[] > >> > Or > >> > WHERE col = ANY ARRAY[1,2,3] > >> > Whichever the language binding that is failing to do parameterized > queries > >> > is doing to emulate them. > >> > >> Yeah, one interesting question is whether this is actually modeling > >> what happens with real-world applications --- are they sending Consts, > >> or Params? > >> > >> I resolutely dislike the idea of marking arrays that came from IN > >> differently from other ones; that's just a hack and it's going to give > >> rise to unexplainable behavioral differences for logically-equivalent > >> queries. > > > > Thanks for your input. > > > > As for real-world applications – being a founder of a server monitoring > saas > > (okmeter) I have access to stats on hundreds of postgres installations. > > > > It shows that IN with a variable number of params is ~7 times more used > than > > ANY(array). > > Hi, > > I would like to do some archaeology and inquire about this thread, since > unfortunately there was no patch presented as far as I see. > > IIUC the ideas suggested in this thread are evolving mostly about modifying > parser: > > > On Fri, Jun 14, 2019 at 2:46 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > > > I do not think you need new expression infrastructure. IMO what's going > on > > here is that we're indulging in premature optimization in the parser. It > > would be better from a structural standpoint if the output of parse > analysis > > were closer to what the user wrote, and then the business of separating > Vars > > from Consts and reducing the Consts to an array were handled in the > planner's > > expression preprocessing phase. > > > > So maybe what you should be thinking about is a preliminary patch that's > > mostly in the nature of refactoring, to move that processing to where it > > should be. > > > > Of course, life is never quite that simple; there are at least two > > issues you'd have to think about. > > > > * The parsing phase is responsible for determining the semantics of > > the query, in particular resolving the data types of the IN-list items > > and choosing the comparison operators that will be used. The planner > > is not allowed to rethink that. What I'm not clear about offhand is > > whether the existing coding in parse analysis might lead to different > > choices of data types/operators than a more straightforward approach > > does. If so, we'd have to decide whether that's okay. > > > > * Postponing this work might make things slower overall, which wouldn't > > matter too much for short IN-lists, but you can bet that people who > > throw ten-thousand-entry IN-lists at us will notice. So you'd need to > > keep an eye on efficiency and make sure you don't end up repeating > > similar processing over and over. > > This puzzles me, since the original issue sounds like a "representation" > problem, when we want to calculate jumble hash in a way that obviously > repeating parameters or constants are hashed into one value. I see the > point in > ideas like this: > > >> One idea that comes to me after looking at the code involved is that > >> it might be an improvement across-the-board if transformAExprIn were to > >> reduce the generated ArrayExpr to an array Const immediately, in cases > >> where all the inputs are Consts. That is going to happen anyway come > >> plan time, so it'd have zero impact on semantics or query performance. > >> Doing it earlier would cost nothing, and could even be a net win, by > >> reducing per-parse-node overhead in places like the rewriter. > >> > >> The advantage for the problem at hand is that a Const that's an array > >> of 2 elements is going to look the same as a Const that's any other > >> number of elements so far as pg_stat_statements is concerned. > >> > >> This doesn't help of course in cases where the values aren't all > >> Consts. Since we eliminated Vars already, the main practical case > >> would be that they're Params, leading us back to the previous > >> question of whether apps are binding queries with different numbers > >> of parameter markers in an IN, and how hard pg_stat_statements should > >> try to fuzz that if they are. > >> > >> Then, to Greg's point, there's a question of whether transformArrayExpr > >> should do likewise, ie try to produce an array Const immediately. > >> I'm a bit less excited about that, but consistency suggests that > >> we should have it act the same as the IN case. > > Interestingly enough, something similar was already mentioned in [1]. But > no > one jumped into this, probably due to its relative complexity, lack of > personal > time resources or not clear way to handle Params (I'm actually not sure > about > the statistics for Consts vs Params myself and need to check this, but can > easily imagine both could be an often problem). > > Another idea also was mentioned in [1]: > > > I wonder whether we could improve this by arranging things so that both > > Consts and Params contribute zero to the jumble hash, and a list of these > > things also contributes zero, regardless of the length of the list. > > Taking everything into account, is there anything particularly wrong about > approach of squashing down lists of constants/parameters in > pg_stat_statements > itself? This sounds simpler, and judging from my experiments even > preventing > jumbling of ArrayExpr and rte values constants of the same type with a > position > index above some threshold will already help a lot in many cases that I > observe. > > [1]: > https://www.postgresql.org/message-id/flat/CAM3SWZSpdPB3uErnXWMt3q74y0r%2B84ZsOt2U3HKKes_V7O%2B0Qg%40mail.gmail.com >