Re: [R] Compare two dataframes

Mark Na Fri, 17 Dec 2010 13:41:21 -0800

Hi Petr,

Many thanks for your help. I like your solution because (and I did not
know this) the unique function works on ALL the data at once (i.e.,
across all of the columns) which means I don't have to make a unique
ID field by pasting together all of the rows or run through all of the
columns iteratively (say, by using a loop).


However, if the dataframe contains non-unique rows (two rows with
exactly the same values in each column) then the unique function will
delete one of them and that may not be desirable. So, caution is
required.

Thanks again for the time you took to help me better understand the
unique function. Much appreciated. Děkuji!

Mark



On Fri, Dec 17, 2010 at 2:27 AM, Petr Savicky <savi...@cs.cas.cz> wrote:
> On Thu, Dec 16, 2010 at 01:02:29PM -0600, Mark Na wrote:
>> Hello,
>>
>> I have two dataframes DF1 and DF2 that should be identical but are not
>> (DF1 has some rows that aren't in DF2, and vice versa). I would like
>> to produce a new dataframe DF3 containing rows in DF1 that aren't in
>> DF2 (and similarly DF4 would contain rows in DF2 that aren't in DF1).
>
> The function unique(DF) removes duplicated rows of DF and keeps the unique
> rows in the order of their first occurrence. So, if DF1 does not contain
> duplicated rows, then unique(rbind(DF1, DF2)) contains first DF1 and
> then the rows, which are unique to DF2, if there are any. The order of
> the rows in the result depends on the order of the original data frames
> and if DF2 contains several instances of a row, which is not in DF1, we
> get only the first instance of this row in the difference.
>
>  #MAKE SOME DATA
>  cars$id <- paste(cars$speed, cars$dist, sep="") #create unique ID field by 
> pasting all columns together
>  cars1 <- cars[1:35, ]
>  cars2 <- cars[16:50, ]
>
>  #EXTRACT UNIQUE ROWS
>  cars1_unique <- cars1[cars1$id %in% setdiff(cars1$id, cars2$id), ] #rows 
> unique to cars1 (i.e., not in cars2)
>  cars2_unique <- cars2[cars2$id %in% setdiff(cars2$id, cars1$id), ] #rows 
> unique to cars2
>
>  cars1_set <- unique(cars1)
>  cars2_set <- unique(cars2)
>
>  cars1_plus <- unique(rbind(cars1_set, cars2_set))
>  cars2_plus <- unique(rbind(cars2_set, cars1_set))
>
>  cars1_diff <- cars2_plus[ - seq(nrow(cars2_set)), ]
>  cars2_diff <- cars1_plus[ - seq(nrow(cars1_set)), ]
>
>  all(cars1_unique == cars1_diff) # [1] TRUE
>  all(cars2_unique == cars2_diff) # [1] TRUE
>
> Petr Savicky.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Compare two dataframes

Reply via email to