Is your table have only pkey one field and the sql is the exact same sql you 
use?

If you have more than one field to insert, DISTINCT ensure the whole row values 
set are distinct instead of one field value. that't maybe a reason for your 
situation.




------------------ Original ------------------
From:  "Steven Schlansker";<ste...@likeness.com>;
Date:  Thu, Sep 26, 2013 01:50 AM
To:  "Merlin Moncure"<mmonc...@gmail.com>; 
Cc:  "François Beausolei"<franc...@teksol.info>; "Forums 
postgresql"<pgsql-general@postgresql.org>; 
Subject:  Re: [GENERAL] Deduplication and transaction isolation level




On Sep 25, 2013, at 6:04 AM, Merlin Moncure <mmonc...@gmail.com> wrote:

> On Tue, Sep 24, 2013 at 10:19 PM, François Beausoleil
> <franc...@teksol.info> wrote:
>> Hi all!
>> 
>> I import many, many rows of data into a table, from three or more computers, 
>> 4 times per hour. I have a primary key, and the query I use to import the 
>> data is supposed to dedup before inserting, but I still get primary key 
>> violations.
>> 
>> The import process is:
>> 
>> * Load CSV data into temp table
>> * INSERT INTO dest SELECT DISTINCT (pkey) FROM temp WHERE NOT 
>> EXISTS(temp.pkey = dest.pkey)
>> 
>> I assumed (erroneously) that this would guarantee no duplicate data could 
>> make it into the database. The primary key violations are proving me wrong.
> 
> Right.  Transaction A and B are interleaved: they both run the same
> check against the same id at the same time.  Both checks pass because
> neither transaction is committed.  This problem is not solvable by
> adjusting the isolation level.

Are you sure that this is the case?  It is my understanding that since 9.1 with 
SSI (https://wiki.postgresql.org/wiki/SSI) if you set the transaction isolation 
level to SERIALIZABLE, this problem is solved, as the insert will take a 
"predicate lock" and the other insert cannot succeed.

We use this to detect / resolve concurrent inserts that violate primary keys 
and it works great.

However in this case it probably doesn't help the OP because the cost of 
restarting the entire import is likely too high.

> 
> Typical solutions might be to:
> A. Lock the table while inserting
> B. Retry the transaction following an error.
> C. Import the records to a staging table, then copy the do the
> deduplication check when moving from the staging table
> 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
.

Reply via email to