On Fri, 2007-02-09 at 13:16 +0530, Pavan Deolasee wrote: > The second problem of concurrent index scans seems a bit more complex. > We need a mechanism so that no tuples are missed or tuples are > not returned twice. Since CHILLing of a tuple adds a new access path > to the tuple from the index, a concurrent index scan may return a > tuple twice. > > How about grabbing a AccessExclusiveLock during CHILLing > operation ? This would prevent any concurrent index scans. Since > CHILLing of a large table can take a long time, the operation can be > spread across time with periodic acquire/release of the lock. This > would prevent starvation of other backends. Since CHILLing is required > only for CREATE INDEX and stub-cleanup, I am assuming that its ok for > it to be lazy in nature.
We've just spoken about this, so just wanted to add those thoughts here. A pointer-swing operation will begin when we see a tuple that is both status of HEAPTUPLE_DEAD and is marked HEAP_UPDATE_ROOT. Perhaps that requires a new status from HeapTupleSatisfiesVacuum()? We chill all tuples in the chain, up to the new root. We mark those tuples, so that HeapTupleSatisfiesVacuum() will be describe them as HEAPTUPLE_RECENTLY_DEAD. So the current Vacuum won't immediately remove them, but they'll go away in the future as part of an on-demand block vacuum or Vacuum. That's similar to the way we handle HALF_DEAD index pages. The index scans are page-at-a-time, so when we pointer-swing from the root tuple to one of the HOT tuples we'll be OK. We're switching a specific index tuple, so there's no multi-page locking on the index to consider. Right now, I wouldn't want to assume that the way tuples are marked prior to pointer-swinging is exactly the same as the chilling required by CREATE INDEX: CHILL_IN_PROGRESS. It may well be, but I'm wary that we assume they are exactly the same and introduce a subtle bug. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org