On Wed, Sep 25, 2013 at 12:50 PM, Steven Schlansker <ste...@likeness.com> wrote: > > On Sep 25, 2013, at 6:04 AM, Merlin Moncure <mmonc...@gmail.com> wrote: > >> On Tue, Sep 24, 2013 at 10:19 PM, François Beausoleil >> <franc...@teksol.info> wrote: >>> Hi all! >>> >>> I import many, many rows of data into a table, from three or more >>> computers, 4 times per hour. I have a primary key, and the query I use to >>> import the data is supposed to dedup before inserting, but I still get >>> primary key violations. >>> >>> The import process is: >>> >>> * Load CSV data into temp table >>> * INSERT INTO dest SELECT DISTINCT (pkey) FROM temp WHERE NOT >>> EXISTS(temp.pkey = dest.pkey) >>> >>> I assumed (erroneously) that this would guarantee no duplicate data could >>> make it into the database. The primary key violations are proving me wrong. >> >> Right. Transaction A and B are interleaved: they both run the same >> check against the same id at the same time. Both checks pass because >> neither transaction is committed. This problem is not solvable by >> adjusting the isolation level. > > Are you sure that this is the case? It is my understanding that since 9.1 > with SSI (https://wiki.postgresql.org/wiki/SSI) if you set the transaction > isolation level to SERIALIZABLE, this problem is solved, as the insert will > take a "predicate lock" and the other insert cannot succeed. > > We use this to detect / resolve concurrent inserts that violate primary keys > and it works great. > > However in this case it probably doesn't help the OP because the cost of > restarting the entire import is likely too high.
ah, you're right! merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general