On 11 April 2012 01:16, Tom Lane <t...@sss.pgh.pa.us> wrote: > Peter Geoghegan <pe...@2ndquadrant.com> writes: >> On 11 April 2012 00:35, Robert Haas <robertmh...@gmail.com> wrote: >>> If people need something like that, couldn't they create it by hashing >>> the normalized query text with an arbitrary algorithm? > >> That supposes that the normalised query text is perfectly stable. It >> may well not be, particularly for things like ad-hoc queries or >> queries generated by ORMs, across database clusters and over long >> periods of time - > > Indeed, but the hash value isn't stable either given those sorts of > assumptions, so I'm not convinced that there's any advantage there.
Isn't it? The hash captures the true meaning of the query, while having the database server's platform as a usually irrelevant artefact. Another thing that I forgot to mention is client encoding - it may well be fairly inconvenient to have to use the same algorithm to hash the query string across applications. You also have to hash the query string yourself again and again, which is expensive to do from Python or something, and is often inconvenient - differences beyond track_activity_query_size bytes (default:1024) are not recognised. Using an SQL code beautifier where a single byte varies now breaks everything, which developers don't expect at all (we've trained them not to), so in many ways you're back to the same limitations as classic pg_stat_statements if you attempt to aggregate queries over time and across machines, which is a very real use case. It's probably pretty annoying to have to get your Python app to use the same hash function as your Java app or whatever I, unless you want to use something heavyweight like a cryptographic hash function. By doing it within Postgres, you avoid those headaches. I'm not asking you to very loudly proclaim that it should be used like this - just expose it, accurately document it, and I'm quite confident that it will be widely used and relied upon by those that are reasonably well informed, and understand its limitations, which are really quite straightforward. > What I think people would actually like to know, if they're in a > situation where distinct query texts are getting hashed to the same > thing, is *which* different texts got hashed to the same thing. > But there's no good way to expose that given the pg_stat_statements > infrastructure, and exposing the hash value doesn't help. Apart from detecting the case where we get a straightforward collision, I don't expect that that would be useful. The whole point is that the user doesn't care about the difference, and I think we've specified a practical, widely useful standard for when queries should be considered equivalent. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers