On Wed, Apr 1, 2015 at 9:43 AM, Joshua Ma <j...@benchling.com> wrote:
> Hi all, > > I was curious about why CONCURRENTLY needs two scans to complete - from > the documentation on HOT (access/heap/README.HOT), it looks like the > process is: > > 1) insert pg_index entry, wait for relevant in-progress txns to finish > (before marking index open for inserts, so HOT updates won't write > incorrect index entries) > 2) build index in 1st snapshot, mark index open for inserts > 3) in 2nd snapshot, validate index and insert missing tuples since first > snapshot, mark index valid for searches > > Why are two scans necessary? What would break if it did something like the > following? > > 1) insert pg_index entry, wait for relevant txns to finish, mark index > open for inserts > 2) build index in a single snapshot, mark index valid for searches > > Wouldn't new inserts update the index correctly? Between the snapshot and > index-updating txns afterwards, wouldn't all updates be covered? > When an index is built with index_build, are included in the index only the tuples seen at the start of the first scan. A second scan is needed to add in the index entries for the tuples that have been inserted into the table during the build phase. -- Michael