Re: Bug in finding Duplicates with Group By?

Paul DuBois Thu, 18 Sep 2003 16:05:34 -0700

At 4:56 PM -0500 9/18/03, Allen wrote:

DISTINCT key_field is fine. I don't think it will make a difference. My example was using only one field.

Your example was using three fields by my count.

DISTINCT field1 is likely to yield a different result than
DISTINCT field1, field2, field3. What form of DISTINCT did you use?

Paul DuBois wrote:
At 4:22 PM -0500 9/18/03, Allen wrote:
Ok. I looked through google and saw a couple examples of how to find the duplicates in a table that looked like this...
SELECT date, time, id, count(*) as n from session GROUP BY date,time,id
HAVING n > 1;
Ok. So I tried it. I have a table of 13,128,178 rows (not a small example). I ran a DISTINCT on the key field and the result was 12,787,768 so that tells
Do you mean "key field" (singular) or do you really mean
DISTINCT date, time, id ?
me there are 340,410 duplicate rows.

Ok. I run the above command. I get 272,626 as an answer. What? That doesn't match. For grins I run the query again with.

HAVING n > 2;
This time I get 67,756.
Again HAVING n > 3 I get 30.
Add them all up I get 340,413. Now they match! What seems to be happening is that the HAVING n > 1 is really doing n == 2 not GREATER than 1 and so on.

I would say this looks like a bug or I am crazy either of which could be true.

- Allen

--
Paul DuBois, Senior Technical Writer
Madison, Wisconsin, USA
MySQL AB, www.mysql.com

Are you MySQL certified? http://www.mysql.com/certification/


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Bug in finding Duplicates with Group By?

Reply via email to