Not sure if you saw my earlier message ? I think it ought to be possible to configure this feature such that an auto-analyze on any child partition would trigger analyze of the parent. I think that would be important for maintaining accurate stats of the partition key column for many cases involving RANGE-partitioned tables, which are likely to rely on histogram rather than MCVs.
On Wed, Mar 18, 2020 at 11:30:39AM -0500, Justin Pryzby wrote: > Regarding this patch: > > + * the ANALYZE message as it resets the partition's changes_since_analze > => analyze > > + * If the relation is a partitioned table, we must add up children's > childrens' > > The approach in general: > > I see an issue for timeseries data, where only the most recent partition is > being inserted into, and the histogram endpoint is being continuously extended > (this is our use-case). The most recent partition will be analyzed pretty > often, and I think it'll be problematic if its parent doesn't get similar > treatment. Let's say there are 12 historic, monthly children with 1e6 tuples > each, and the 13th child has 2e5 tuples (6 days into the month). It's > analyzed > when it grows by 20% (1.2 days), but at that point the parent has only grown > by > 12x less (~2%) and won't be analyzed until 12x further into the future (14 > days). Its histogram is 12x longer (geometrically), but the histogram changed > by just as much (arithmetically). That's an issue for a query over "the last > few days"; if that's past the end of the histogram bound, the query planner > will estimate about ~0 tuples, and tend to give cascades of nested loops. I'm > biased, but I'm guessing that's too common a use case to answer that the > proper > fix is to set the parent's analyze_scale_factor=0.0005. I think that suggests > that the parent might sometimes need to be analyzed every time any of its > children are. In other cases (like probably any hash partitioning), that'd be > excessive, and maybe the default settings shouldn't do that, but I think that > behavior ought to be possible, and I think this patch doesn't allow that. > > In the past, I think there's was talk that maybe someone would invent a clever > way to dynamically combine all the partitions' statistics, so analyzing the > parent wasn't needed. I think that's easy enough for reltuples, MCV, and I > think histogram, but ISTM that ndistinct is simultaneously important to get > right and hard to do so. It depends on whether it's the partition key, which > now can be an arbitrary expression. Extended stats further complicates it, > even if we didn't aim to dynamically compute extended stats for a parent. > > While writing this, it occured to me that we could use "CREATE STATISTICS" as > a > way to mark a partitioned table (or certain columns) as needing to be handled > by analyze. I understand "CREATE STATs" was intended to (eventually) allow > implementing stats on expressions without using "create index" as a hack. So > if it's excessive to automatically analyze a parent table when any of its > children are analyzed, maybe it's less excessive to only do that for parents > with a stats object, and only on the given colums. I realize this patch is > alot less useful if it requires to do anything extra/nondefault, and it's > desirable to work without creating a stats object at all. Also, using CREATE > STATs would reduces the CPU cost of re-analyzing the entire heirarchy, but > doesn't help to reduce the I/O cost, which is significant. > > -- > Justin -- Justin Pryzby System Administrator Telsasoft +1-952-707-8581