On Wed, Dec 27, 2023 at 01:08:47PM +0100, Tomas Vondra wrote: > On 12/26/23 20:19, Tom Lane wrote: > > Bruce Momjian <br...@momjian.us> writes: > >> I think we need a robust API to handle two cases: > > > >> * changes in how we store statistics > >> * changes in how how data type values are represented in the statistics > > > >> We have had such changes in the past, and I think these two issues are > >> what have prevented import/export of statistics up to this point. > >> Developing an API that doesn't cleanly handle these will cause long-term > >> pain. > > > > Agreed. > > > > I agree the format is important - we don't want to end up with a format > that's cumbersome or inconvenient to use. But I don't think the proposed > format is somewhat bad in those respects - it mostly reflects how we > store statistics and if I was designing a format for humans, it might > look a bit differently. But that's not the goal here, IMHO. > > I don't quite understand the two cases above. Why should this affect how > we store statistics? Surely, making the statistics easy to use for the > optimizer is much more important than occasional export/import.
The two items above were to focus on getting a solution that can easily handle future statistics storage changes. I figured we would want to manipulate the data as a table internally so I am confused why we would export JSON instead of a COPY format. I didn't think we were changing how we internall store or use the statistics. > >> In summary, I think we need an SQL-level command for this. > > > > I think a SQL command is an actively bad idea. It'll just add development > > and maintenance overhead that we don't need. When I worked on this topic > > years ago at Salesforce, I had things set up with simple functions, which > > pg_dump would invoke by writing more or less > > > > SELECT pg_catalog.load_statistics(....); > > > > This has a number of advantages, not least of which is that an extension > > could plausibly add compatible functions to older versions. The trick, > > as you say, is to figure out what the argument lists ought to be. > > Unfortunately I recall few details of what I wrote for Salesforce, > > but I think I had it broken down in a way where there was a separate > > function call occurring for each pg_statistic "slot", thus roughly > > > > load_statistics(table regclass, attname text, stakind int, stavalue ...); > > > > I might have had a separate load_statistics_xxx function for each > > stakind, which would ease the issue of deciding what the datatype > > of "stavalue" is. As mentioned already, we'd also need some sort of > > version identifier, and we'd expect the load_statistics() functions > > to be able to transform the data if the old version used a different > > representation. I agree with the idea that an explicit representation > > of the source table attribute's type would be wise, too. > > > > Yeah, this is pretty much what I meant by "functional" interface. But if > I said maybe the format implemented by the patch is maybe too close to > how we store the statistics, then this has exactly the same issue. And > it has other issues too, I think - it breaks down the stats into > multiple function calls, so ensuring the sanity/correctness of whole > sets of statistics gets much harder, I think. I was suggesting an SQL command because this feature is going to need a lot of options and do a lot of different things, I am afraid, and a single function might be too complex to manage. > I'm not sure about the extension idea. Yes, we could have an extension > providing such functions, but do we have any precedent of making > pg_upgrade dependent on an external extension? I'd much rather have > something built-in that just works, especially if we intend to make it > the default behavior (which I think should be our aim here). Uh, an extension seems nice to allow people in back branches to install it, but not for normal usage. -- Bruce Momjian <br...@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.