Re: [GENERAL] Why does CREATE INDEX CONCURRENTLY need two scans?

Joshua Ma Tue, 31 Mar 2015 21:08:09 -0700

Ah, that's exactly what I was looking for. Thanks everyone for the
responses!


- Josh
ᐧ

On Tue, Mar 31, 2015 at 8:54 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Michael Paquier <michael.paqu...@gmail.com> writes:
> > On Wed, Apr 1, 2015 at 9:43 AM, Joshua Ma <j...@benchling.com> wrote:
> >> Why are two scans necessary? What would break if it did something like
> the
> >> following?
> >>
> >> 1) insert pg_index entry, wait for relevant txns to finish, mark index
> >> open for inserts
> >>
> >> 2) build index in a single snapshot, mark index valid for searches
>
> >> Wouldn't new inserts update the index correctly? Between the snapshot
> and
> >> index-updating txns afterwards, wouldn't all updates be covered?
>
> > When an index is built with index_build, are included in the index only
> the
> > tuples seen at the start of the first scan. A second scan is needed to
> add
> > in the index entries for the tuples that have been inserted into the
> table
> > during the build phase.
>
> More to the point: Joshua's design supposes that retail insertions into
> an index can happen in parallel with index build.  Or in other words,
> that index build consists of instantaneously creating an empty-but-valid
> index file and then doing a lot of ordinary inserts into it.  That's a
> possible design, but it's not very efficient, and most of our index AMs
> don't do it that way.  btree, for instance, starts by sorting all the
> entries and creating the leaf-level pages.  Then it builds the upper tree
> levels.  It doesn't have a complete tree that could support retail
> insertions until the very end.  Moreover, most of the work is done in
> storage that's local to the backend running CREATE INDEX, and isn't
> accessible to other processes at all.
>
>                         regards, tom lane
>

Re: [GENERAL] Why does CREATE INDEX CONCURRENTLY need two scans?

Reply via email to