Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-09 Thread Pierre C
Within the data to import most rows have 20 till 50 duplicates. Sometime much more, sometimes less. In that case (source data has lots of redundancy), after importing the data chunks in parallel, you can run a first pass of de-duplication on the chunks, also in parallel, something like : C

Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-09 Thread Torsten Zühlsdorff
Pierre C schrieb: Within the data to import most rows have 20 till 50 duplicates. Sometime much more, sometimes less. In that case (source data has lots of redundancy), after importing the data chunks in parallel, you can run a first pass of de-duplication on the chunks, also in parallel, s

Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-07 Thread Pierre C
Within the data to import most rows have 20 till 50 duplicates. Sometime much more, sometimes less. In that case (source data has lots of redundancy), after importing the data chunks in parallel, you can run a first pass of de-duplication on the chunks, also in parallel, something like :

Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-07 Thread Torsten Zühlsdorff
Pierre C schrieb: Since you have lots of data you can use parallel loading. Split your data in several files and then do : CREATE TEMPORARY TABLE loader1 ( ... ) COPY loader1 FROM ... Use a TEMPORARY TABLE for this : you don't need crash-recovery since if something blows up, you can COPY it a

Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-06 Thread Pierre C
Since you have lots of data you can use parallel loading. Split your data in several files and then do : CREATE TEMPORARY TABLE loader1 ( ... ) COPY loader1 FROM ... Use a TEMPORARY TABLE for this : you don't need crash-recovery since if something blows up, you can COPY it again... and it wil

Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-06 Thread Scott Marlowe
On Sun, Jun 6, 2010 at 6:02 AM, Torsten Zühlsdorff wrote: > Scott Marlowe schrieb: > Thank you very much for your example. Now i've got it :) > > I've test your example on a small set of my rows. While testing i've > stumpled over a difference in sql-formulation. Using except seems to be a > littl

Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-06 Thread Torsten Zühlsdorff
Scott Marlowe schrieb: i have a set of unique data which about 150.000.000 rows. Regullary i get a list of data, which contains multiple times of rows than the already stored one. Often around 2.000.000.000 rows. Within this rows are many duplicates and often the set of already stored data. I wa

Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-06 Thread Torsten Zühlsdorff
Cédric Villemain schrieb: I think you need to have a look at pgloader. It does COPY with error handling. very effective. Thanks for this advice. I will have a look at it. Greetings from Germany, Torsten -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make cha

Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-06 Thread Andy Colson
On 06/01/2010 10:03 AM, Torsten Zühlsdorff wrote: Hello, i have a set of unique data which about 150.000.000 rows. Regullary i get a list of data, which contains multiple times of rows than the already stored one. Often around 2.000.000.000 rows. Within this rows are many duplicates and often th

Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-03 Thread Cédric Villemain
2010/6/1 Torsten Zühlsdorff : > Hello, > > i have a set of unique data which about 150.000.000 rows. Regullary i get a > list of data, which contains multiple times of rows than the already stored > one. Often around 2.000.000.000 rows. Within this rows are many duplicates > and often the set of al

Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-03 Thread Scott Marlowe
On Thu, Jun 3, 2010 at 11:19 AM, Torsten Zühlsdorff wrote: > Scott Marlowe schrieb: >> >> On Tue, Jun 1, 2010 at 9:03 AM, Torsten Zühlsdorff >> wrote: >>> >>> Hello, >>> >>> i have a set of unique data which about 150.000.000 rows. Regullary i get >>> a >>> list of data, which contains multiple t

Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-02 Thread Scott Marlowe
On Tue, Jun 1, 2010 at 9:03 AM, Torsten Zühlsdorff wrote: > Hello, > > i have a set of unique data which about 150.000.000 rows. Regullary i get a > list of data, which contains multiple times of rows than the already stored > one. Often around 2.000.000.000 rows. Within this rows are many duplica

[PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-02 Thread Torsten Zühlsdorff
Hello, i have a set of unique data which about 150.000.000 rows. Regullary i get a list of data, which contains multiple times of rows than the already stored one. Often around 2.000.000.000 rows. Within this rows are many duplicates and often the set of already stored data. I want to store ju