On Tue, Nov 8, 2011 at 2:26 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > On Tue, Nov 8, 2011 at 2:54 AM, Robert Haas <robertmh...@gmail.com> wrote: >> It would still be nice to fix the case where we need to freeze a tuple >> that is on a page someone else has pinned, but I don't have any good >> ideas for how to do that. > > I think we need to avoid long pin hold times generally.
In the case of a suspended sequential scan, which is the case where this has most recently bitten me on a production system, it actually seems rather unnecessary to hold the pin for a long period of time. If we release the buffer pin, then someone could vacuum the buffer. I haven't looked in detail at the issues, but in theory that doesn't seem like a huge problem: just remember which TIDs you've already looked at and, when you re-acquire the buffer, pick up where you left off. Any tuples that have been vacuumed away meanwhile weren't going to be visible to your scan anyway. But there's an efficiency argument against doing it that way. First, if we release the pin then we'll have to reacquire the buffer, which means taking and releasing a BufMappingLock, the buffer header spinlock, and the buffer content lock. Second, instead of returning a pointer to the data in the page, we'll have to copy the data out of the buffer before releasing the pin. The situation is similar (perhaps even simpler) for index-only scans. We could easily release the heap buffer pin after returning a tuple, but it will make things much slower if the next heap fetch hits the same page. I wonder if we could arrange for a vacuum that's waiting for a cleanup lock to signal the backends that could possibly be holding a conflicting pin. Sort of like what the startup process does during Hot Standby, except that instead of killing the people holding the pins, we'd send them a signal that says "if at all possible, could you please release those buffer pins right away?", and then the backends would try to comply. Actually making that work though seems a bit tricky, though, and getting it wrong would mean very, very rare, nearly unreproducible bugs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers