Still looking at 0002. pg_ndistinct_in disallows input, claiming that pg_node_tree does the same thing. But pg_node_tree does it for security reasons: you could crash the backend if you supplied a malicious value. I don't think that applies to pg_ndistinct_in. Perhaps it will be useful to inject fake stats at some point, so why not allow it? It shouldn't be complicated (though it does require writing some additional code, so perhaps that's one reason we don't want to allow input of these values).
The comment on top of pg_ndistinct_out is missing the "_out"; also it talks about histograms, which is not what this is about. In the same function, a trivial point you don't need to pstrdup() the .data out of a stringinfo; it's already palloc'ed in the right context -- just PG_RETURN_CSTRING(str.data) and forget about "ret". Saves you one line. Nearby, some auxiliary functions such as n_choose_k and num_combinations are not documented. What it is that they do? I'd move these at the end of the file, keeping the important entry points at the top of the file. I see this patch has a estimate_ndistinct() which claims to be a re- implementation of code already in analyze.c, but it is actually a lot simpler than what analyze.c does. I've been wondering if it'd be a good idea to use some of this code so that some routines are moved out of analyze.c; good implementations of statistics-related functions would live in src/backend/statistics/ where they can be used both by analyze.c and your new mvstats stuff. (More generally I am beginning to wonder if the new directory should be just src/backend/statistics.) common.h does not belong in src/backend/utils/mvstats; IMO it should be called src/include/utils/mvstat.h. Also, it must not include postgres.h, and it probably doesn't need most of the #includes it has; those are better put into whatever include it. It definitely needs a guarding #ifdef MVSTATS_H around its whole content too. An include file is not just a way to avoid #includes in other files; it is supposed to be a minimally invasive way of exporting the structs and functions implemented in some file into other files. So it must be kept minimal. psql/tab-complete.c compares the wrong version number (9.6 instead of 10). Is it important to have a cast from pg_ndistinct to bytea? I think it's odd that outputting it as bytea yields something completely different than as text. (The bytea is not human readable and cannot be used for future input, so what is the point?) In another subthread you seem to have surrendered to the opinion that the new catalog should be called pg_statistics_ext, just in case in the future we come up with additional things to put on it. However, given its schema, with a "starelid / stakeys", is it sensible to think that we're going to get anything other than something that involves multiple variables? Maybe it should just be "pg_statistics_multivar" and if something else comes along we create another catalog with an appropriate schema. Heck, how does this catalog serve the purpose of cross-table statistics in the first place, given that it has room to record a single relid only? Are you thinking that in the future you'd change starelid into an oidvector column? The comment in gram.y about the CREATE STATISTICS is at odds with what is actually allowed by the grammar. I think the name of a statistics is only useful to DROP/ALTER it, right? I wonder why it's useful that statistics belongs in a schema. Perhaps it should be a global object? I suppose the name collisions would become bothersome if you have many mvstats. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers