Re: removing duplicate entries

Moon's Father Mon, 11 Aug 2008 22:28:29 -0700

If you're good at Chinese ,just visit here.
http://blog.chinaunix.net/u/29134/showart_375303.html


On Thu, Aug 7, 2008 at 10:34 PM, Magnus Smith <
[EMAIL PROTECTED]> wrote:

> I think got it in the end by doing a union and a join.
>
> delete AA, PA from ACCOUNTACTION AA, ACCOUNTPAYMENTACTION PA
> where AA.ID = PA.ID and AA.ID in
>    (select D.ID from
>
>        (select A1.ID from ACCOUNTACTION A1
>
>        left join
>
>        (
>        select * from ACCOUNTACTION A2
>        where A2.ACTIONDATE like '2008-08-01 02:00%'
>        group by A2.ACCOUNT_ID
>        having count(A2.ACCOUNT_ID) > 1
>
>        union
>
>        select * from ACCOUNTACTION A3
>        where A3.ACTIONDATE like '2008-08-01 02:00%'
>        group by A3.ACCOUNT_ID having count(A3.ACCOUNT_ID) = 1
>        ) as U1
>
>        on A1.ID = U1.ID
>        where A1.ACTIONDATE like '2008-08-01 02:00%'
>        and U1.ID is NULL
>    ) as D
> );
>
> Thanks for the pointers ;-)
>
>
>
>
> -----Original Message-----
> From: Magnus Smith [mailto:[EMAIL PROTECTED]
> Sent: 07 August 2008 10:35
> To: Ananda Kumar
> Cc: mysql@lists.mysql.com
> Subject: RE: removing duplicate entries
>
> Yes I can see you are correct. I tried setting up a little test case
> myself.
>
> CREATE TABLE ACCOUNTACTION (
>         ID INT NOT NULL PRIMARY KEY,
>         ACTIONDATE DATETIME,
>         ACCOUNT_ID INT NOT NULL
>       );
>
> CREATE TABLE ACCOUNTPAYMENTACTION (
>        ID INT NOT NULL PRIMARY KEY,
>        AMOUNT INT
>        );
>
> INSERT INTO ACCOUNTACTION (ID, ACTIONDATE, ACCOUNT_ID)
> VALUES('001', '2008-08-01 02:00:00', '101'),
> ('002', '2008-08-01 02:00:00', '101'),
> ('003', '2008-08-01 02:00:00', '101'),
> ('004', '2008-08-01 02:00:00', '102'),
> ('005', '2008-08-01 02:00:00', '103'),
> ('006', '2008-08-01 02:00:00', '104'),
> ('007', '2008-08-01 02:00:00', '104'),
> ('008', '2008-08-01 02:00:00', '105'),
> ('009', '2008-08-01 03:00:00', '104'),
> ('010', '2008-08-01 03:00:00', '105'),
> ('011', '2008-08-01 02:00:00', '106');
>
> INSERT INTO ACCOUNTPAYMENTACTION (ID, AMOUNT)
> VALUES('001', '1000'),
> ('002', '1000'),
> ('003', '1000'),
> ('004', '1000'),
> ('005', '1000'),
> ('006', '1000'),
> ('007', '1000'),
> ('008', '1000'),
> ('009', '1000'),
> ('010', '1000'),
> ('011', '1000');
>
>
> I got the following query that seems to work on my test case.
>
> I create a union of everything that is not a duplicate and then take the
> ones that are not in this to be the duplicates
>
>
> select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> and (ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID)
> not in
> (select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> group by ACCOUNTACTION.ACCOUNT_ID
> having count(ACCOUNTACTION.ACCOUNT_ID) > 1
> union
> select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> group by ACCOUNTACTION.ACCOUNT_ID having count(ACCOUNTACTION.ACCOUNT_ID)
> = 1);
>
>
> The problem is that when I try to apply this to my real database tables
> that are quite large then the query does not return.
>
> I am thinking that there must be a more efficient way to write the
> query?
>
> I would be pleased to hear any suggestions - thanks
>
>
>
>
> ________________________________
>
> From: Ananda Kumar [mailto:[EMAIL PROTECTED]
> Sent: 06 August 2008 13:51
> To: Magnus Smith
> Cc: mysql@lists.mysql.com
> Subject: Re: removing duplicate entries
>
>
> I just did a test case here
>
>
> select * from amc_25;
> +------+
> | id   |
> +------+
> |    2 |
> |   14 |
> |    1 |
> |    2 |
> +------+
> 4 rows in set (0.01 sec)
>
> select id from amc_25 group by id having count(id) >1 and id!=min(id);
>
> Empty set (0.00 sec)
>
>
>
> It does not give me any rows.
>
> R u sure the rows returned, are the one you want to keep are indeed
> duplicates
>
>
>
>
> On 8/6/08, Magnus Smith <[EMAIL PROTECTED]> wrote:
>
>        When I try the first suggestion (i) then I get all the 1682
> duplicate rows.  The thing is that I need to keep the originals which
> are the ones with the lowest ACCOUNTACTION.ID <http://accountaction.id/>
> value.
>
>        The second suggestion (ii) gives me 563 rows that are the
> duplicates with the lowest ACCOUNTACTION.ID <http://accountaction.id/>
> which are the ones I wish to keep
>
>        So the ones I want to delete are the ones in (i) and not (ii)
>
>        When I use
>
>
>        select ACCOUNTACTION.ID <http://accountaction.id/>  from
> ACCOUNTACTION
>        where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
>        group by ACCOUNTACTION.ACCOUNT_ID
>        having (count(ACCOUNTACTION.ACCOUNT_ID) > 1
>        and ACCOUNTACTION.ID <http://accountaction.id/>  !=
> min(ACCOUNTACTION.ID <http://accountaction.id/> ));
>
>        then I get 1119 rows which is all the duplicates in (i) less the
> originals in (ii)
>
>
>        The problem I'm having is using this in a delete statement.
>
>
> ________________________________
>
>        From: Ananda Kumar [mailto:[EMAIL PROTECTED]
>        Sent: 06 August 2008 10:11
>        To: Magnus Smith
>        Cc: mysql@lists.mysql.com
>        Subject: Re: removing duplicate entries
>
>
>
>        I doubt the belwo sql will give you duplcates
>
>        select ACCOUNTACTION.ID <http://accountaction.id/>  from
> ACCOUNTACTION
>        where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
>        group by ACCOUNTACTION.ACCOUNT_ID
>        having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and ACCOUNTACTION.ID
> <http://accountaction.id/>  !=
>        min(ACCOUNTACTION.ID <http://accountaction.id/> ));
>
>        The reason being, for duplicates records accountaction.id
> <http://accountaction.id/>  will always equal to min(accountaction.id
> <http://accountaction.id/> ).
>
>
>        try this
>        select ACCOUNTACTION.ID <http://accountaction.id/>  from
> ACCOUNTACTION
>        where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
>        group by ACCOUNTACTION.ACCOUNT_ID
>        having (count(ACCOUNTACTION.ACCOUNT_ID) > 1
>
>        or
>
>        select ACCOUNTACTION.ID <http://accountaction.id/>  from
> ACCOUNTACTION
>        where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
>        group by ACCOUNTACTION.ACCOUNT_ID
>        having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and ACCOUNTACTION.ID
> <http://accountaction.id/> =min(accountaction.id
> <http://accountaction.id/> );
>
>        I would use the first select statement.
>
>
>
>        On 8/6/08, Magnus Smith <[EMAIL PROTECTED]> wrote:
>
>                I have the following two tables
>
>                ACCOUNTACTION
>
> +-------------------+--------------+------+-----+---------+-------+
>                | Field             | Type         | Null | Key |
> Default | Extra |
>
> +-------------------+--------------+------+-----+---------+-------+
>                | ID                | bigint(20)   | NO   | PRI |
> |       |
>                | AccountActionType | varchar(31)  | YES  |     | NULL
> |       |
>                | DESCRIPTION       | varchar(255) | YES  |     | NULL
> |       |
>                | ACTIONDATE        | datetime     | YES  |     | NULL
> |       |
>                | ACCOUNT_ID        | bigint(20)   | YES  | MUL | NULL
> |       |
>
> +-------------------+--------------+------+-----+---------+-------+
>
>                and
>
>                ACCOUNTPAYMENTACTION
>
> +---------------+------------+------+-----+---------+-------+
>                | Field         | Type       | Null | Key | Default |
> Extra |
>
> +---------------+------------+------+-----+---------+-------+
>                | ID            | bigint(20) | NO   | PRI |         |
> |
>                | AMOUNTINPENCE | bigint(20) | YES  |     | NULL    |
> |
>
> +---------------+------------+------+-----+---------+-------+
>
>                ACCOUNTPAYMENTACTION shares the primary key with
> ACCOUNTACTION
>
>                I need to remove duplicate entries that occured at a
> specific time in
>                ACCOUNTACTION I then plan to remove the rows in
> ACCOUNTPAYMENTACTION
>                that are no longer referenced in ACCOUNTACTION by using
> an outer join
>
>                I can select the duplicate records in ACCOUNTACTION
> using
>
>                select ACCOUNTACTION.ID <http://accountaction.id/>  from
> ACCOUNTACTION
>                where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
>                group by ACCOUNTACTION.ACCOUNT_ID
>                having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and
> ACCOUNTACTION.ID <http://accountaction.id/>  !=
>                min(ACCOUNTACTION.ID <http://accountaction.id/> ));
>
>                I am trying to delete these records but am having
> trouble with the sql
>                delete
>
>                I tried the following but nothing happened
>
>                delete ACCOUNTACTION where ACCOUNTACTION.ID
> <http://accountaction.id/>  in
>                (select ACCOUNTACTION.ID <http://accountaction.id/>
> from ACCOUNTACTION
>                where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
>                group by ACCOUNTACTION.ACCOUNT_ID
>                having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and
> ACCOUNTACTION.ID <http://accountaction.id/>  !=
>                min(ACCOUNTACTION.ID <http://accountaction.id/> )));
>
>                Can anyone help me?
>
>
>
>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
> http://lists.mysql.com/[EMAIL PROTECTED]
>
>


-- 
I'm a MySQL DBA in china.
More about me just visit here:
http://yueliangdao0608.cublog.cn

Re: removing duplicate entries

Reply via email to