Chris Anderson wrote:
I have a large dataset that contain duplicate records. How do I identify and
remove duplicate records?
Here's one way:
> aq <- airquality[sample(NROW(airquality), replace=TRUE),]
> any(duplicated(aq))
[1] TRUE
> which(duplicated(aq))
[1] 2 15 34 44 45 47 49 5
Chris,
How large is large? How may columns?
"Duplicate" across all columns of just some?
Henrique gave you simple R answer. Perhaps doing in SQL is more efficient?
eg
SELECT DISTINCT
FROM ;
HTH,
Jim Porzak
TGN.com
San Francisco, CA
www.linkedin.com/in/jimporzak
use R! Group SF
Try this:
d <- data.frame(a = c(1, 1, 2, 3), b = c(10, 10, 9, 8))
unique(d)
On Fri, Jun 5, 2009 at 1:38 PM, Chris Anderson wrote:
> I have a large dataset that contain duplicate records. How do I identify
> and remove duplicate records?
>
>
> Chris Anderson
> 707.315.8486
> www.sassydeals4u.co
I have a large dataset that contain duplicate records. How do I identify and
remove duplicate records?
Chris Anderson
707.315.8486
www.sassydeals4u.com
Free info for small business owners. Click here to find great products geared
for
4 matches
Mail list logo