Hello, On Sat, Apr 18, 2020 at 2:08 PM Justin Pryzby <pry...@telsasoft.com> wrote: > > On Fri, Apr 17, 2020 at 10:09:07PM +0900, Amit Langote wrote: > > On Thu, Apr 16, 2020 at 11:19 PM Justin Pryzby <pry...@telsasoft.com> wrote: > > > On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote: > > > I don't think that adequately allows what's needed. > ...(paragraph with my typos elided)... > > > For example, say a new customer has bunch of partitioned tables which each > > > currently have only one partition (for the current month), and that's > > > expected > > > to grow to at least 20+ partitions (2+ years of history). How does one > > > set the > > > partitioned table's auto-analyze parameters to analyze whenever any child > > > is > > > analyzed ? I don't think it should be needed to update it every month > > > after > > > computing sum(child tuples). > > > > > > Possibly you could allow that behavior for some special values of the > > > threshold. Like if autovacuum_analyze_threshold=-2, then analyze the > > > parent > > > whenever any of its children are analyzed. > > > > > > I think that use case and that need would be common, but I'd like to hear > > > what > > > others think. > > > > Having to constantly pay attention to whether a parent's > > analyze_threshold/scale_factor is working as intended would surely be > > an annoyance, so I tend to agree that we might need more than just the > > ability to set analyze_threshold/scale_factor on parent tables. > > However, I think we can at least start with being able to do > > *something* here. :) Maybe others think that this shouldn't be > > considered committable until we figure out a good analyze threshold > > calculation formula to apply to parent tables. > > > > Considering that, how about having, say, a > > autovacuum_analyze_partition_parent_frequency, with string values > > 'default', 'partition'? -- 'default' assumes the same formula as > > regular tables, whereas with 'partition', parent is analyzed as soon > > as a partition is. > > I assume you mean a reloption to be applied only to partitioned tables, > > Your "partition" setting would mean that the scale/threshold values would have > no effect, which seems kind of unfortunate. > > I think it should be called something else, and done differently, like maybe: > autovacuum_analyze_mode = {off,sum,max,...} > The above reloption you suggested will be applied all tables? Users might not use it for partitions, so I think we should add "parent" to reloption's name, like Amit's suggestion.
> The threshold would be threshold + scale*tuples, as always, but would be > compared with f(changes) as determined by the relopt. > > sum(changes) would do what you called "default", comparing the sum(changes) > across all partitions to the threshold, which is itself computed using > sum(reltuples) AS reltuples. > > max(changes) would compute max(changes) compared to the threshold, and the > threshold would be computed separately for each partition's reltuples: > threshold_N = parent_threshold + parent_scale * part_N_tuples. If *any* > partition exceeds that threshold, the partition itself is analyzed. This > allows what I want for time-series. Maybe this would have an alias called > "any". > I may be wrong but I think the fomula, > threshold_N = parent_threshold + parent_scale * part_N_tuples would use orginary table's threshold, not parent's. If it use parent_threshold, parent might not be analyzed even if its any partition is analyzed when parent_threshold is larger than normal threshold. I'm worried that this case meets requirements for time-series. > I'm not sure if there's any other useful modes, like avg(changes)? I guess we > can add them later if someone thinks of a good use case. > > Also, for me, the v7 patch warns: > |src/backend/postmaster/autovacuum.c:3117:70: warning: ‘reltuples’ may be > used uninitialized in this function [-Wmaybe-uninitialized] > | vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * > reltuples; > ..which seems to be a false positive, but easily avoided. > Thank you for testing the patch. I got it. I'll update the patch soon. > > This patch includes partitioned tables in pg_stat_*_tables, which is great; I > complained awhile ago that they were missing [0]. It might be useful if that > part was split out into a separate 0001 patch (?). > If partitioned table's statistics is used for other purposes, I think it would be better to split the patch. Does anyone have any opinion? --- Best regards, Yuzuko Hosoya NTT Open Source Software Center