On Sun, Mar 21, 2021 at 12:14 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > If I understand what you are proposing, all stats views would become > completely volatile, without even within-query consistency. That really > is not gonna work. As an example, you could get not-even-self-consistent > results from a join to a stats view if the planner decides to implement > it as a nestloop with the view on the inside. > > I also believe that the snapshotting behavior has advantages in terms > of being able to perform multiple successive queries and get consistent > results from them. Only the most trivial sorts of analysis don't need > that. > > In short, what you are proposing sounds absolutely disastrous for > usability of the stats views, and I for one will not sign off on it > being acceptable. > > I do think we could relax the consistency guarantees a little bit, > perhaps along the lines of only caching view rows that have already > been read, rather than grabbing everything up front. But we can't > just toss the snapshot concept out the window. It'd be like deciding > that nobody needs MVCC, or even any sort of repeatable read.
So, just as a data point, the output of pg_locks is not stable within a transaction. In fact, the pg_locks output could technically include the same exact lock more than once, if it's being moved from the fast-path table to the main table just as you are reading all the data. In theory, that could produce the same kinds of problems that you're concerned about here, and I suspect sometimes it does. But I haven't seen a lot of people yelling and screaming about it. The situation isn't ideal, but it's not disastrous either. I think it's really hard for us as developers to predict what kinds of effects of these kinds of decisions will have in real-world deployments. All of us have probably had the experience of making some behavioral change that we thought would not be too big a deal and it actually pissed off a bunch of users who were relying on the old behavior. I know I have. Conversely, I've reluctantly made changes that seemed rather dangerous to me and heard nary a peep. If somebody takes the position that changing this behavior is scary because we don't know how many users will be inconvenienced or how badly, I can only agree. But saying that it's tantamount to deciding that nobody needs MVCC is completely over the top. This is statistical data, not user data, and there are good reasons to think that people don't have the same expectations in both cases, starting with the fact that we have some stuff that works like that already. More than that, there's a huge problem with the way this works today that can't be fixed without making some compromises. In the test case Andres mentioned upthread, the stats snapshot burned through 170MB of RAM. Now, you might dismiss that as not much memory in 2021, but if you have a lot of backends accessing the stats, that value could be multiplied by a two digit or even three digit number, and that is *definitely* a lot of memory, even in 2021. But even if it's not multiplied by anything, we're shipping with a default work_mem of just 4MB. So, the position we're implicitly taking today is: if you join two 5MB tables, it's too risky to put the entire contents of one of them into a single in-memory hash table, because we might run the machine out of RAM. But if you have millions of objects in your database and touch the statistics for one of those objects, once, it's absolutely OK to slurp tens or even hundreds of megabytes of data into backend-private memory to avoid the possibility that you might later access another one of those counters and expect snapshot semantics. To be honest, I don't find either of those positions very believable. I do not think it likely that the typical user really wants a 5MB hash join to be done in batches to save memory, and I think it equally unlikely that everybody wants to read and cache tens or hundreds of megabytes of data to get MVCC semantics for volatile statistics. I think there are probably some cases where having that information be stable across a transaction lifetime is really useful, so if we can provide that as an optional behavior, I think that would be a pretty good idea. But I don't think it's reasonable for that to be the only behavior, and I'm doubtful about whether it should even be the default. I bet there are a lot of cases where somebody just wants to take a quick glance at some of the values as they exist right now, and has no intention of running any more queries that might examine the same data again later. Not only does caching all the data use a lot of memory, but having to read all the data in order to cache it is potentially a lot slower than just reading the data actually requested. I'm unwilling to dismiss that as a negligible problem. In short, I agree that there's stuff to worry about here, but I don't agree that a zero-consistency model is a crazy idea, even though I also think it would be nice if we can make stronger consistency available upon request. -- Robert Haas EDB: http://www.enterprisedb.com