On Fri, Jul 19, 2019 at 10:58 PM Robert Haas <robertmh...@gmail.com> wrote: > > One other thing that seems worth noting is that we have to consider > what happens after a restart. After a crash, and depending on exactly > how we design it perhaps also after a non-crash restart, we won't > immediately know how many outstanding transactions need undo; we'll > have to grovel through the undo logs to find out. If we've got a hard > cap, we can't allow new undo-using transactions to start until we > finish that work. It's possible that, at the moment of the crash, the > maximum number of items had already been pushed into the background, > and every foreground session was busy trying to undo an abort as well. > If so, we're already up against the limit. We'll have to scan through > all of the undo logs and examine each transaction to get a count on > how many transactions are already in a needs-undo-work state; only > once we have that value do we know whether it's OK to admit new > transactions to using the undo machinery, and how many we can admit. > In typical cases, that won't take long at all, because there won't be > any pending undo work, or not much, and we'll very quickly read the > handful of transaction headers that we need to consult and away we go. > However, if the hard limit is pretty big, and we're pretty close to > it, counting might take a long time. It seems bothersome to have this > interval between when we start accepting transactions and when we can > accept transactions that use undo. Instead of throwing an ERROR, we > can probably just teach the system to wait for the background process > to finish doing the counting; that's what Amit's patch does currently. >
Yeah, however, we wait for a certain threshold period of time (one minute) for counting to finish and then error out. We can wait till the counting is finished but I am not sure if that is a good idea because anyway user can try again after some time. > Or, we could not even open for connections until the counting has been > completed. > > When I first thought about this, I was really concerned about the idea > of a hard limit, but the more I think about it the less problematic it > seems. I think in the end it boils down to a question of: when things > break, what behavior would users prefer? > One minor thing I would like to add here is that we are providing some knobs wherein the systems having more number of rollbacks can configure to have a much higher value of hard limit such that it won't hit in their systems. I know it is not always easy to find the right value, but I guess they can learn from the behavior and then change it to avoid the same in future. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com