On Tue, Dec 10, 2013 at 1:30 AM, Peter Geoghegan <p...@heroku.com> wrote: > pg_stat_statements' fingerprinting logic considers the following two > statements as distinct: > > select 1 in (1, 2, 3); > select 1 in (1, 2, 3, 4); > > This is because the ArrayExpr jumble case jumbles any ArrayExpr's list > of elements recursively. In this case it's a list of Const nodes, and > the fingerprinting logic jumbles those nodes indifferently. > > Somebody told me that they think that pg_stat_statements should not do > that. This person felt that it would be preferable for such > expressions to be normalized without regard to the number of distinct > Const elements. I suppose that that would work by determing if the > ArrayExpr elements list was a list of Const nodes and only const > nodes. Iff that turned out to be the case, something else would be > jumbled (something other than the list) that would essentially be a > representation of "some list of zero or more (or maybe one or more) > Const nodes with consttype of, in this example, 23". I think that this > would make at least one person happy, because of course the two > statements above would have their costs aggregated within a single > pg_stat_statements entry.
Baron Schwartz recently gave a talk at PGConf Silicon Valley about the proprietary query instrumentation tool, VividCortex. The slides are available from: http://info.citusdata.com/rs/235-CNE-301/images/Analyzing_PostgreSQL_Network_Traffic_with_vc-pgsql-sniffer_-_Baron_Schwartz.pdf One specific justification he gave for not using pg_stat_statements was: "Doesn’t merge bind vars in IN()" (See slide #11) His theory is that you should allow a proprietary binary to run with root permissions, a binary that sniffs the wire protocol, mostly because pg_stat_statements has this limitation (all other pg_stat_statements limitations listed are pretty unconvincing, IMV). That doesn't seem like a great plan to me, but I think he has a point about pg_stat_statements. It's about time that we fixed this -- it isn't realistic to imagine that people are going to know to use an array constant like "= any ('{1,2,3}')" -- even a major contributor to Django that I talked to about this issue a couple of years ago didn't know about that. It also isn't portable across database systems. I wonder: * How do other people feel about this? Personally, I've seen enough problems of this kind in the field that "slippery slope" arguments against this don't seem very compelling. * How might altering the jumbling logic to make it recognize a variable number of constants as equivalent work in practice? Perhaps we should do something to flatten the representation based on which powers of two the number of constants is between. There are still some details to work out there, but that's my first idea. That seems like a good compromise between the current behavior, and completely disregarding the number of constants. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers