On Tue, Mar 23, 2021 at 4:21 AM Greg Stark <st...@mit.edu> wrote: > > On Sun, 21 Mar 2021 at 18:16, Stephen Frost <sfr...@snowman.net> wrote: > > > > Greetings, > > > > * Tom Lane (t...@sss.pgh.pa.us) wrote: > > > I also believe that the snapshotting behavior has advantages in terms > > > of being able to perform multiple successive queries and get consistent > > > results from them. Only the most trivial sorts of analysis don't need > > > that. > > > > > > In short, what you are proposing sounds absolutely disastrous for > > > usability of the stats views, and I for one will not sign off on it > > > being acceptable. > > > > > > I do think we could relax the consistency guarantees a little bit, > > > perhaps along the lines of only caching view rows that have already > > > been read, rather than grabbing everything up front. But we can't > > > just toss the snapshot concept out the window. It'd be like deciding > > > that nobody needs MVCC, or even any sort of repeatable read. > > > > This isn't the same use-case as traditional tables or relational > > concepts in general- there aren't any foreign keys for the fields that > > would actually be changing across these accesses to the shared memory > > stats- we're talking about gross stats numbers like the number of > > inserts into a table, not an employee_id column. In short, I don't > > agree that this is a fair comparison. > > I use these stats quite a bit and do lots of slicing and dicing with > them. I don't think it's as bad as Tom says but I also don't think we > can be quite as loosy-goosy as I think Andres or Stephen might be > proposing either (though I note that haven't said they don't want any > consistency at all). > > The cases where the consistency really matter for me is when I'm doing > math involving more than one statistic. > > Typically that's ratios. E.g. with pg_stat_*_tables I routinely divide > seq_tup_read by seq_scan or idx_tup_* by idx_scans. I also often look > at the ratio between n_tup_upd and n_tup_hot_upd. > > And no, it doesn't help that these are often large numbers after a > long time because I'm actually working with the first derivative of > these numbers using snapshots or a time series database. So if you > have the seq_tup_read incremented but not seq_scan incremented you > could get a wildly incorrect calculation of "tup read per seq scan" > which actually matters. > > I don't think I've ever done math across stats for different objects. > I mean, I've plotted them together and looked at which was higher but > I don't think that's affected by some plots having peaks slightly out > of sync with the other. I suppose you could look at the ratio of > access patterns between two tables and know that they're only ever > accessed by a single code path at the same time and therefore the > ratios would be meaningful. But I don't think users would be surprised > to find they're not consistent that way either.
Yeah, it's important to differentiate if things can be inconsistent within a single object, or just between objects. And I agree that in a lot of cases, just having per-object consistent data is probably enough. Normally when you graph things for example, your peaks will look across >1 sample point anyway, and in that case it doesn't much matter does it? But if we said we try to offer per-object consistency only, then for example the idx_scans value in the tables view may see changes to some but not all indexes on that table. Would that be acceptable? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/