If you're good at Chinese ,just visit here. http://blog.chinaunix.net/u/29134/showart_375303.html
On Thu, Aug 7, 2008 at 10:34 PM, Magnus Smith < [EMAIL PROTECTED]> wrote: > I think got it in the end by doing a union and a join. > > delete AA, PA from ACCOUNTACTION AA, ACCOUNTPAYMENTACTION PA > where AA.ID = PA.ID and AA.ID in > (select D.ID from > > (select A1.ID from ACCOUNTACTION A1 > > left join > > ( > select * from ACCOUNTACTION A2 > where A2.ACTIONDATE like '2008-08-01 02:00%' > group by A2.ACCOUNT_ID > having count(A2.ACCOUNT_ID) > 1 > > union > > select * from ACCOUNTACTION A3 > where A3.ACTIONDATE like '2008-08-01 02:00%' > group by A3.ACCOUNT_ID having count(A3.ACCOUNT_ID) = 1 > ) as U1 > > on A1.ID = U1.ID > where A1.ACTIONDATE like '2008-08-01 02:00%' > and U1.ID is NULL > ) as D > ); > > Thanks for the pointers ;-) > > > > > -----Original Message----- > From: Magnus Smith [mailto:[EMAIL PROTECTED] > Sent: 07 August 2008 10:35 > To: Ananda Kumar > Cc: mysql@lists.mysql.com > Subject: RE: removing duplicate entries > > Yes I can see you are correct. I tried setting up a little test case > myself. > > CREATE TABLE ACCOUNTACTION ( > ID INT NOT NULL PRIMARY KEY, > ACTIONDATE DATETIME, > ACCOUNT_ID INT NOT NULL > ); > > CREATE TABLE ACCOUNTPAYMENTACTION ( > ID INT NOT NULL PRIMARY KEY, > AMOUNT INT > ); > > INSERT INTO ACCOUNTACTION (ID, ACTIONDATE, ACCOUNT_ID) > VALUES('001', '2008-08-01 02:00:00', '101'), > ('002', '2008-08-01 02:00:00', '101'), > ('003', '2008-08-01 02:00:00', '101'), > ('004', '2008-08-01 02:00:00', '102'), > ('005', '2008-08-01 02:00:00', '103'), > ('006', '2008-08-01 02:00:00', '104'), > ('007', '2008-08-01 02:00:00', '104'), > ('008', '2008-08-01 02:00:00', '105'), > ('009', '2008-08-01 03:00:00', '104'), > ('010', '2008-08-01 03:00:00', '105'), > ('011', '2008-08-01 02:00:00', '106'); > > INSERT INTO ACCOUNTPAYMENTACTION (ID, AMOUNT) > VALUES('001', '1000'), > ('002', '1000'), > ('003', '1000'), > ('004', '1000'), > ('005', '1000'), > ('006', '1000'), > ('007', '1000'), > ('008', '1000'), > ('009', '1000'), > ('010', '1000'), > ('011', '1000'); > > > I got the following query that seems to work on my test case. > > I create a union of everything that is not a duplicate and then take the > ones that are not in this to be the duplicates > > > select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION > where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' > and (ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID) > not in > (select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION > where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' > group by ACCOUNTACTION.ACCOUNT_ID > having count(ACCOUNTACTION.ACCOUNT_ID) > 1 > union > select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION > where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' > group by ACCOUNTACTION.ACCOUNT_ID having count(ACCOUNTACTION.ACCOUNT_ID) > = 1); > > > The problem is that when I try to apply this to my real database tables > that are quite large then the query does not return. > > I am thinking that there must be a more efficient way to write the > query? > > I would be pleased to hear any suggestions - thanks > > > > > ________________________________ > > From: Ananda Kumar [mailto:[EMAIL PROTECTED] > Sent: 06 August 2008 13:51 > To: Magnus Smith > Cc: mysql@lists.mysql.com > Subject: Re: removing duplicate entries > > > I just did a test case here > > > select * from amc_25; > +------+ > | id | > +------+ > | 2 | > | 14 | > | 1 | > | 2 | > +------+ > 4 rows in set (0.01 sec) > > select id from amc_25 group by id having count(id) >1 and id!=min(id); > > Empty set (0.00 sec) > > > > It does not give me any rows. > > R u sure the rows returned, are the one you want to keep are indeed > duplicates > > > > > On 8/6/08, Magnus Smith <[EMAIL PROTECTED]> wrote: > > When I try the first suggestion (i) then I get all the 1682 > duplicate rows. The thing is that I need to keep the originals which > are the ones with the lowest ACCOUNTACTION.ID <http://accountaction.id/> > value. > > The second suggestion (ii) gives me 563 rows that are the > duplicates with the lowest ACCOUNTACTION.ID <http://accountaction.id/> > which are the ones I wish to keep > > So the ones I want to delete are the ones in (i) and not (ii) > > When I use > > > select ACCOUNTACTION.ID <http://accountaction.id/> from > ACCOUNTACTION > where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' > group by ACCOUNTACTION.ACCOUNT_ID > having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 > and ACCOUNTACTION.ID <http://accountaction.id/> != > min(ACCOUNTACTION.ID <http://accountaction.id/> )); > > then I get 1119 rows which is all the duplicates in (i) less the > originals in (ii) > > > The problem I'm having is using this in a delete statement. > > > ________________________________ > > From: Ananda Kumar [mailto:[EMAIL PROTECTED] > Sent: 06 August 2008 10:11 > To: Magnus Smith > Cc: mysql@lists.mysql.com > Subject: Re: removing duplicate entries > > > > I doubt the belwo sql will give you duplcates > > select ACCOUNTACTION.ID <http://accountaction.id/> from > ACCOUNTACTION > where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' > group by ACCOUNTACTION.ACCOUNT_ID > having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and ACCOUNTACTION.ID > <http://accountaction.id/> != > min(ACCOUNTACTION.ID <http://accountaction.id/> )); > > The reason being, for duplicates records accountaction.id > <http://accountaction.id/> will always equal to min(accountaction.id > <http://accountaction.id/> ). > > > try this > select ACCOUNTACTION.ID <http://accountaction.id/> from > ACCOUNTACTION > where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' > group by ACCOUNTACTION.ACCOUNT_ID > having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 > > or > > select ACCOUNTACTION.ID <http://accountaction.id/> from > ACCOUNTACTION > where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' > group by ACCOUNTACTION.ACCOUNT_ID > having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and ACCOUNTACTION.ID > <http://accountaction.id/> =min(accountaction.id > <http://accountaction.id/> ); > > I would use the first select statement. > > > > On 8/6/08, Magnus Smith <[EMAIL PROTECTED]> wrote: > > I have the following two tables > > ACCOUNTACTION > > +-------------------+--------------+------+-----+---------+-------+ > | Field | Type | Null | Key | > Default | Extra | > > +-------------------+--------------+------+-----+---------+-------+ > | ID | bigint(20) | NO | PRI | > | | > | AccountActionType | varchar(31) | YES | | NULL > | | > | DESCRIPTION | varchar(255) | YES | | NULL > | | > | ACTIONDATE | datetime | YES | | NULL > | | > | ACCOUNT_ID | bigint(20) | YES | MUL | NULL > | | > > +-------------------+--------------+------+-----+---------+-------+ > > and > > ACCOUNTPAYMENTACTION > > +---------------+------------+------+-----+---------+-------+ > | Field | Type | Null | Key | Default | > Extra | > > +---------------+------------+------+-----+---------+-------+ > | ID | bigint(20) | NO | PRI | | > | > | AMOUNTINPENCE | bigint(20) | YES | | NULL | > | > > +---------------+------------+------+-----+---------+-------+ > > ACCOUNTPAYMENTACTION shares the primary key with > ACCOUNTACTION > > I need to remove duplicate entries that occured at a > specific time in > ACCOUNTACTION I then plan to remove the rows in > ACCOUNTPAYMENTACTION > that are no longer referenced in ACCOUNTACTION by using > an outer join > > I can select the duplicate records in ACCOUNTACTION > using > > select ACCOUNTACTION.ID <http://accountaction.id/> from > ACCOUNTACTION > where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' > group by ACCOUNTACTION.ACCOUNT_ID > having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and > ACCOUNTACTION.ID <http://accountaction.id/> != > min(ACCOUNTACTION.ID <http://accountaction.id/> )); > > I am trying to delete these records but am having > trouble with the sql > delete > > I tried the following but nothing happened > > delete ACCOUNTACTION where ACCOUNTACTION.ID > <http://accountaction.id/> in > (select ACCOUNTACTION.ID <http://accountaction.id/> > from ACCOUNTACTION > where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%' > group by ACCOUNTACTION.ACCOUNT_ID > having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and > ACCOUNTACTION.ID <http://accountaction.id/> != > min(ACCOUNTACTION.ID <http://accountaction.id/> ))); > > Can anyone help me? > > > > > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: > http://lists.mysql.com/[EMAIL PROTECTED] > > -- I'm a MySQL DBA in china. More about me just visit here: http://yueliangdao0608.cublog.cn