On Mon, Jul 8, 2019 at 6:57 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > I didn't have other use cases in mind and I think to some extent this > argument holds true for existing binaryheap_allocate allocate. If we > want to make it more generic, then shouldn't we try to even change the > existing binaryheap_allocate to use this new model, that way the > binary heap allocation API will be more generic?
No. binaryheap_allocate is fine for simple cases and there's no reason that I can see to change it. > You are right that it won't be nth smallest element from the queue and > we don't even care for that here. The peeking logic is not to find > the next prioritized element but to check if we can find some element > for the same database in the next few entries to avoid frequent undo > worker restart. You *should* care for that here. The whole purpose of a binary heap is to help us choose which task we should do first and which ones should be done later. I don't see why it's OK to decide that we only care about doing the tasks in priority order sometimes, and other times it's OK to just pick semi-randomly. > This is a good test scenario, but I think it has not been considered > that there are multiple queues and we peek into each one. I think that makes very little difference, so I don't see why it should be considered. It's true that this will sometimes mask the problem, but so what? An algorithm that works 90% of the time is not much better than one that works 80% of the time, and neither is the caliber of work we are expecting to see in PostgreSQL. > I think we should once try with the actual program which can test such > a scenario on the undo patches before reaching any conclusion. I or > one of my colleague will work on this and report back the results. There is certainly a place for empirical testing of a patch like this (perhaps even before posting it). It does not substitute for a good theoretical explanation of why the algorithm is correct, and I don't think it is. > > You can construct way worse examples than > > this one, too: imagine that there are two databases, each with a > > worker, and one has 99% of the requests and the other one has 1% of > > the requests. It's really unlikely that there's going to be an entry > > for the second database within the lookahead window. > > I am not sure if that is the case because as soon as the request from > other database get prioritized (say because its XID becomes older) and > came as the first request in one of the queues, the undo worker will > exit (provided it has worked for some threshold time (10s) in that > database) and allow the request from another database to be processed. I don't see how this responds to what I wrote. Neither worker needs to exit in this scenario, but the worker from the less-popular database is likely to exit anyway, which seems like it's probably not the right thing. > > And note that > > increasing the window doesn't really help either: you just need more > > databases than the size of the lookahead window, or even almost as > > many as the lookahead window, and things are going to stop working > > properly. > > > > On the other hand, suppose that you have 10 databases and one undo > > worker. One database is pretty active and generates a continuous > > stream of undo requests at exactly the same speed we can process them. > > The others all have 1 pending undo request. Now, what's going to > > happen is that you'll always find the undo request for the current > > database within the lookahead window. So, you'll never exit. > > Following the logic given above, I think here also worker will exit as > soon as the request from other database get prioritised. OK. > > But > > that means the undo requests in the other 9 databases will just sit > > there for all eternity, because there's no other worker to process > > them. On the other hand, if you had 11 databases, there's a good > > chance it would work fine, because the new request for the active > > database would likely be outside the lookahead window, and so you'd > > find no work to do and exit, allowing a worker to be started up in > > some other database. > > As explained above, I think it will work the same way both for 10 or > 11 databases. Note, that we don't always try to look ahead. We look > ahead when we have not worked on the current database for some > threshold amount of time. That's interesting, and it means that some of the scenarios that I mentioned are not problems. However, I don't believe it means that your code is actually correct. It's just means that it's wrong in different ways. The point is that, with the way you've implemented this, whenever you do lookahead, you will, basically randomly, sometimes find the next entry for the current database within the lookahead window, and sometimes you won't. And sometimes it will be the next-highest-priority request, and sometimes it won't. That just cannot possibly be the right thing to do. Would you propose to commit a patch that implemented the following pseudocode? find-next-thing-to-do: see if the highest-priority task in any database is for our database. if it is, do it and stop here. if it is not, and if we haven't worked on the current database for at least 10 seconds, look for an item in the current database. ...but don't look very hard, so that we'll sometimes, semi-randomly, find nothing even when there is something we could do. ...and also, sometimes find a lower-priority item that we can do, possibly much lower-priority, instead of the highest priority thing we can do. Because that's what your patch is doing. In contrast, the algorithm that I proposed would work like this: find-next-thing-to-do: find the highest-priority item for the current database. do it. I venture to propose that the second one is the superior algorithm here. One problem with the second algorithm, which I pointed out in my previous email, is that sometimes we might want the worker to exit even though there is work to do in the current database. My algorithm makes no provision for that, and yours does. However, yours does that in a way that's totally unprincipled: it just sometimes fails to find any work that it could do even though there is work that it could do. No amount of testing or argumentation is going to convince me that this is a good approach. The decision about when a worker should exit to allow a new one to be launched needs to be based on clear, understandable rules, not be something that happens semi-randomly when a haphazard search for the next entry fails, as if by chance, to find it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company