On 09/16/2011 04:42 PM, Rich Shepard wrote:
On Thu, 15 Sep 2011, Andy Colson wrote:

First you need to trim the \n and spaces:

andy=# insert into junk values (E'GW-22');
INSERT 0 1
andy=# insert into junk values (E'GW-22 \n');
INSERT 0 1
andy=# insert into junk values (E'GW-22 \n');

Andy,

Here's what worked for me:

nevada=# \i junk.sql
CREATE TABLE
nevada=# insert into junk select * from chemistry where site_id = (E'GW-22');
INSERT 0 803
nevada=# insert into junk select * from chemistry where site_id = (E'GW-22 \n');
INSERT 0 0
nevada=# insert into junk select * from chemistry where site_id = (E'GW-22 \n');
INSERT 0 0
nevada=# insert into junk select * from chemistry where site_id = (E'GW-22\n');
INSERT 0 1409
nevada=# select '['|| rtrim(trim(trailing E'\n' from site_id)) || ']' from junk;

?column? ----------
[GW-22]
[GW-22]

and so on for 2212 rows.

Trim it up:

andy=# select '['|| rtrim(trim(trailing E'\n' from a)) || ']' from junk;

If you have a unique index you'll wanna drop it first. Once you get that done, 
we can remove the dups.

No index on junk; I can remove it from chemistry prior to reinserting the
cleaned rows.

Also, where can I read about the select syntax you use? I find nothing
about it in Rick van der Lans' 4th edition, the most comprehensive language
reference I've read.

Thanks,

Rich


The fine online manual:

http://www.postgresql.org/docs/current/interactive/index.html

Especially the string ops:

http://www.postgresql.org/docs/current/interactive/functions-string.html

Trim it up:
andy=# select '['|| rtrim(trim(trailing E'\n' from a)) || ']' from junk;

Andy,

Scrolling through the table with rows ordered by date and chemical I find
no duplicates ... so far. However, what I do find is that the above did not
work:


No, it wasnt supposed to.  A select statement builds a new result set and 
returns it to you, it wont update a table.  That select statement was meant as 
an example for writing an update statement.

Like:

update chemistry set side_id = rtrim(trim(trailing E'\n' from site_id));

If there was a unique index on chemistry(site_id), the above would throw an 
error, so I was warning you to drop it.

Once the site_id was trimmed, you could then delete the dups, with:

delete from chemistry where site_id = 'GW-22' and ctid <> (select min(ctid) 
from chemistry site_id = 'GW-22');

Those 11 steps you had... I was thinking two steps.  The update and the delete 
above.

Sorry, I should have been a little more clear, but, at least you got things 
cleaned up.  PG has a huge number of data manipulation functions.  If you have 
to export data out of a database in order to massage it, then that's a failure 
of a database.  PG (and sql) were meant for just this kind of job.


-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to