Re: [HACKERS] Batch update of indexes on data loading

2008-04-24 Thread Simon Riggs
On Tue, 2008-02-26 at 09:08 +, Simon Riggs wrote: > I very much like the idea of index merging, or put another way: batch > index inserts. How big do the batch of index inserts have to be for us > to gain benefit from this technique? Would it be possible to just buffer > the index inserts insi

Re: [HACKERS] Batch update of indexes on data loading

2008-03-05 Thread Bruce Momjian
Added to TODO: o Allow COPY FROM to create index entries in bulk http://archives.postgresql.org/pgsql-hackers/2008-02/msg00811.php --- ITAGAKI Takahiro wrote: > This is a proposal of fast data loading us

Re: [HACKERS] Batch update of indexes on data loading

2008-02-28 Thread Tom Lane
ITAGAKI Takahiro <[EMAIL PROTECTED]> writes: > Tom Lane <[EMAIL PROTECTED]> wrote: >>> Can we do REINDEX >>> holding only shared lock on the index? >> >> No. When you commit the reindex, the old copy of the index will >> instantaneously disappear; it will not do for someone to be actively >> scan

Re: [HACKERS] Batch update of indexes on data loading

2008-02-28 Thread ITAGAKI Takahiro
Tom Lane <[EMAIL PROTECTED]> wrote: > > Can we do REINDEX > > holding only shared lock on the index? > > No. When you commit the reindex, the old copy of the index will > instantaneously disappear; it will not do for someone to be actively > scanning that copy. Hmm... Is it ok if the index wil

Re: [HACKERS] Batch update of indexes on data loading

2008-02-28 Thread Tom Lane
"Markus Bertheau" <[EMAIL PROTECTED]> writes: > 2008/2/29, Tom Lane <[EMAIL PROTECTED]>: >> No. When you commit the reindex, the old copy of the index will >> instantaneously disappear; it will not do for someone to be actively >> scanning that copy. > Can a shared lock be taken at first, and whe

Re: [HACKERS] Batch update of indexes on data loading

2008-02-28 Thread Markus Bertheau
2008/2/29, Tom Lane <[EMAIL PROTECTED]>: > ITAGAKI Takahiro <[EMAIL PROTECTED]> writes: > > BTW, why REINDEX requires access exclusive lock? Read-only queries > > are forbidden during the operation now, but I feel they are ok > > because REINDEX only reads existing tuples. Can we do REINDEX > >

Re: [HACKERS] Batch update of indexes on data loading

2008-02-28 Thread Tom Lane
ITAGAKI Takahiro <[EMAIL PROTECTED]> writes: > BTW, why REINDEX requires access exclusive lock? Read-only queries > are forbidden during the operation now, but I feel they are ok > because REINDEX only reads existing tuples. Can we do REINDEX > holding only shared lock on the index? No. When you

Re: [HACKERS] Batch update of indexes on data loading

2008-02-27 Thread ITAGAKI Takahiro
Simon Riggs <[EMAIL PROTECTED]> wrote: > The LOCK is only required because we defer the inserts into unique > indexes, yes? No, as far as present pg_bulkload. It creates a new relfilenode like REINDEX, therefore, access exclusive lock is needed. When there is violations of unique constraints, al

Re: [HACKERS] Batch update of indexes on data loading

2008-02-26 Thread Simon Riggs
On Tue, 2008-02-26 at 15:19 +0900, ITAGAKI Takahiro wrote: > Simon Riggs <[EMAIL PROTECTED]> wrote: > > > One of the reasons why I hadn't wanted to pursue earlier ideas to use > > LOCK was that applying a lock will prevent running in parallel, which > > ultimately may prevent further performance g

Re: [HACKERS] Batch update of indexes on data loading

2008-02-25 Thread ITAGAKI Takahiro
Simon Riggs <[EMAIL PROTECTED]> wrote: > One of the reasons why I hadn't wanted to pursue earlier ideas to use > LOCK was that applying a lock will prevent running in parallel, which > ultimately may prevent further performance gains. > > Is there a way of doing this that will allow multiple conc

Re: [HACKERS] Batch update of indexes on data loading

2008-02-24 Thread Simon Riggs
On Thu, 2008-02-21 at 13:26 +0900, ITAGAKI Takahiro wrote: > This is a proposal of fast data loading using batch update of indexes for 8.4. > It is a part of pg_bulkload (http://pgbulkload.projects.postgresql.org/) and > I'd like to integrate it in order to cooperate with other parts of postgres. >

Re: [HACKERS] Batch update of indexes on data loading

2008-02-21 Thread Josh Berkus
Itagaki-san, > Alvaro Herrera <[EMAIL PROTECTED]> wrote: > > > The basic concept is spooling new coming data, and merge the spool and > > > the existing indexes into a new index at the end of data loading. It is > > > 5-10 times faster than index insertion per-row, that is the way in 8.3. Thanks

Re: [HACKERS] Batch update of indexes on data loading

2008-02-21 Thread ITAGAKI Takahiro
Alvaro Herrera <[EMAIL PROTECTED]> wrote: > > The basic concept is spooling new coming data, and merge the spool and > > the existing indexes into a new index at the end of data loading. It is > > 5-10 times faster than index insertion per-row, that is the way in 8.3. > > Please see > http://th

Re: [HACKERS] Batch update of indexes on data loading

2008-02-21 Thread Alvaro Herrera
ITAGAKI Takahiro wrote: > The basic concept is spooling new coming data, and merge the spool and > the existing indexes into a new index at the end of data loading. It is > 5-10 times faster than index insertion per-row, that is the way in 8.3. Please see http://thread.gmane.org/gmane.comp.db.p

[HACKERS] Batch update of indexes on data loading

2008-02-20 Thread ITAGAKI Takahiro
This is a proposal of fast data loading using batch update of indexes for 8.4. It is a part of pg_bulkload (http://pgbulkload.projects.postgresql.org/) and I'd like to integrate it in order to cooperate with other parts of postgres. The basic concept is spooling new coming data, and merge the spoo