I had a similar issue once and was able to recover from it. If this affects only some rows and you are able to identify them, this is fixable:
--force table scan to skip using corrupt index set enable_seqscan=1 set enable_indexscan=0 set enable_bitmapscan=0 select email,count(*) from users group by email having count(*) > 1; Then, if the rows are simply just duplicates and have no other changes, add a new serial column (or to prevent blocking, add a bigint column and update with sequential values), then using the emails from above, delete the ones with the higher/lower sequence number. Ensure you are still skipping indexes. Once the table is clean, drop the sequence column again and re-index. Hope this helps, Daniel On Fri, Sep 2, 2016 at 11:06 PM, Albe Laurenz <laurenz.a...@wien.gv.at> wrote: > Jonas Tehler wrote: > > We’re running Postgresql 9.4.5 on Amazon RDS. One of our tables looks > something like this: > > > > > > CREATE TABLE users > > ( > > ... > > email character varying(128) NOT NULL, > > ... > > CONSTRAINT users_email_key UNIQUE (email) > > ) > > > > Despite this we have rows with very similar email values. I discovered > the problem when I tried to add > > a column and got the following error: > > > > ActiveRecord::RecordNotUnique: PG::UniqueViolation: ERROR: could not > create unique index > > "users_email_key" > > DETAIL: Key (email)=(x...@yyy.com) is duplicated. > > : ALTER TABLE "users" ADD COLUMN "client_branch" character varying(255) > DEFAULT ‘beta' > > > > > > Now look at this: > > > > => select email from users where email = 'x...@yyy.com'; > > email > > --------------------------- > > x...@yyy.com > > (1 row) > > > > > > => select email from users where email LIKE 'x...@yyy.com'; > > email > > --------------------------- > > x...@yyy.com > > x...@yyy.com > > (2 rows) > > > > > > I have tried to compare the binary data in various ways, email::bytes, > md5(email), > > encode(email::bytea, 'hex’), char_length(email) and it all looks the > same for both rows. > > > > Any suggestions how I can discover the difference between the values and > how they could have been > > added without triggering the constraint? I know that the values were > added after the constraint was > > added. > > > > The data was added from a Ruby on Rails app that also has unique > constraints on the email field and > > validation on the email format. > > That looks very much like data corruption. > > I guess there is an index on "users" that is used for one query but not > the other. > Can you verify with EXPLAIN? > > Assuming that it is a 'text_ops' or 'varchar_ops' index, I'd say it gets > used for the first > query, but not for the second. That would mean that there is an extra > entry in the table that > is not in the index. > > Did you have any crashes, standby promotion, restore with PITR or other > unusual occurrences recently? > > Make sure you have a physical backup; there may be other things corrupted. > > This is a possible path to proceed: > > Once you have made sure that you have a physical backup, try to add the > "ctid" column to both queries. > > Then delete the extra row from the second query with "DELETE FROM email > WHERE ctid = ...". > > Then, to make sure there is no other corruption lurking, make a logical > backup > with pg_dumpall, create a new database cluster, create a new one with > "initdb" and > restore the data. > > Yours, > Laurenz Albe > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >