The following bug has been logged online: Bug reference: 4420 Logged by: Legistrate Email address: [EMAIL PROTECTED] PostgreSQL version: 8.3.3 Operating system: Windows XP Description: Problem Finding unique text entries Details:
SELECT count(*) FROM trigrams UNION SELECT count(*) FROM (SELECT DISTINCT trigram FROM trigrams) s; 393262 393285 SELECT DISTINCT trigram INTO trigram2 FROM trigrams; SELECT count(*) FROM trigram2 UNION SELECT count(*) FROM (SELECT DISTINCT trigram FROM trigram2) s; 393261 393262 SELECT DISTINCT trigram INTO trigram3 FROM trigram2; SELECT count(*) FROM trigram3 UNION SELECT count(*) FROM (SELECT DISTINCT trigram FROM trigram3) s; 393261 393261 trigrams:http://www.2shared.com/file/3943447/273b8527/MusicData.html trigram2:http://www.2shared.com/file/3943480/15ea5f88/MusicData2.html trigram3:http://www.2shared.com/file/3943484/12879b91/MusicData3.html When I recently installed 8.3.3 to fix similar bug in 8.3.0 I selected UTF-8 for server and client and the database that trigrams, etc are in is UTF-8. However me and RhodiumToad discovered that show lc_collate; returns "English_United States.1252". I dont recall seeing an option to change this in the windows installer, but I could have missed it. Not sure if this is affecting the results or not. Note that: select distinct trigram from trigrams where trigram=(select trigram from trigrams where trigram_id=384519); select distinct trigram from trigrams where trigram_id IN (384519,392701); SELECT * FROM trigram2 GROUP BY trigram HAVING count(trigram)>1; all return the same row that seems to be the duplicate that is not detected. Also, the original trigrams table was created by inserting the result of trigrams2 into trigrams to use the SERIAL field as a primary key from the following: SELECT DISTINCT trigram INTO trigrams2 FROM (SELECT DISTINCT trigram FROM artist_trigrams UNION SELECT DISTINCT trigram FROM album_trigrams UNION SELECT DISTINCT trigram FROM track_trigrams) s; The content of those tables is much more extensive, but if you need it, let me know and I will rapidshare it or something. I was able to create a unique constraint on 'trigram' for both trigram2 and trigram3, not on trigrams. For RodiumToad, he couldnt create a unique constraint on trigram2 and SELECT count(*) FROM (SELECT DISTINCT trigram FROM trigrams) s; returned 393261 on freebsd with lc_collate=C -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs