Re: Multivariate MCV stats can leak data to unprivileged users

Tomas Vondra Mon, 20 May 2019 07:46:21 -0700

On Mon, May 20, 2019 at 09:32:24AM -0400, Tom Lane wrote:

Dean Rasheed <dean.a.rash...@gmail.com> writes:

On Sun, 19 May 2019 at 23:45, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:

Oh, right. It still has the disadvantage that it obfuscates the actual
data stored in the pg_stats_ext_data (or whatever would it be called),
so e.g. functions would have to do additional checks to make sure it
actually is the right statistic type. For example pg_mcv_list_items()
could not rely on receiving pg_mcv_list values, as per the signature,
but would have to check the value.

Yes. In fact, since the user-accessible view would want to expose
datatypes specific to the stats kinds rather than bytea or cstring
values, we would need SQL-callable conversion functions for each kind:


It seems like people are willfully misunderstanding my suggestion.
You'd only need *one* conversion function, which would look at the
embedded ID field and then emit the appropriate text representation.
I don't see a reason why we'd have the separate pg_ndistinct etc. types
any more at all.


That would however require having input functions, which we currently
don't have. Otherwise people could not process the statistic values using
functions like pg_mcv_list_items(). Which I think is useful.

Of course, we could add input functions, but there was a reasoning for not

having them (similarly to pg_node_tree).

Also this model presupposes that all future stats kinds are most
conveniently represented in a single column, but maybe that won't be
the case. It's conceivable that a future stats kind would benefit from
splitting its data across multiple columns.


Hm, that's possible I suppose, but it seems a little far-fetched.
You could equally well argue that pg_ndistinct etc. should have been
broken down into smaller types, but we didn't.


True. I can't rule out adding such "split" statistic type, but don't think
it's very likely. The extended statistic values tend to be complex and
easier to represent in a single value.

Yes, I think it is an EAV model. I think EAV models do have their
place, but I think that's largely where adding new columns is a common
operation and involves adding little to no extra code. I don't think
either of those is true for extended stats. What we've seen over the
last couple of years is that adding each new stats kind is a large
undertaking, involving lots of new code. That alone is going to limit
just how many ever get added, and compared to that effort, adding new
columns to the catalog is small fry.


I can't argue with that --- the make-work is just a small part of the
total.  But it's still make-work.

Anyway, it was just a suggestion, and if people don't like it that's
fine.  But I don't want it to be rejected on the basis of false
arguments.


Sure.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Multivariate MCV stats can leak data to unprivileged users

Reply via email to