Jay,
Thanks much for the reply. I think you are right about the prob. Unfortunately, I was not able to find the old emails I had discussing the use of the more powerful setdiff that essentially inherits from the base class R setdiff functionality but extends that functionality by now working with data.frames instead of just a simple array of values. Love this functionality. However, for the following example, Test1_DF<-data.frame(HouseSize=c(1:100), LandLocation=c("Here")) Test1_DF<-data.frame(HouseSize=c(1:100), LandLocation=c("Here"), Price = c("Low")) Test2_DF<-rbind(Test1_DF, Test1_DF) setdiff(Test1_DF, Test2_DF) [1] HouseSize LandLocation Price <0 rows> (or 0-length row.names) > setdiff(Test2_DF, Test1_DF) [1] HouseSize LandLocation Price <0 rows> (or 0-length row.names) I was hoping for this example one of the setdiff's would have returned essentially Test1_DF, since it is duplicated and that is what is different between the two dataframes. So, I guess I am trying to figure out a way to truely diff the dataframes, i.e. determine when two data.frames are different from one another and then receive the output of the results. Does this capability exist in a function within a current R package or does it exist within a typically used pattern to create this functionality? Thanks again for any feedback you can provide. Also, I tried to determine my Session Info and the packages I have loaded, but I received the following: > sessionInfo() Error in x$Priority : $ operator is invalid for atomic vectors In addition: There were 12 warnings (use warnings() to see them) > warnings() Warning messages: 1: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer", ... : DESCRIPTION file of package 'prob' is missing or broken 2: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer", ... : DESCRIPTION file of package 'ggplot2' is missing or broken 3: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer", ... : DESCRIPTION file of package 'reshape' is missing or broken 4: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer", ... : DESCRIPTION file of package 'RColorBrewer' is missing or broken 5: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer", ... : DESCRIPTION file of package 'proto' is missing or broken 6: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer", ... : DESCRIPTION file of package 'plyr' is missing or broken 7: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer", ... : DESCRIPTION file of package 'nortest' is missing or broken 8: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer", ... : DESCRIPTION file of package 'fBasics' is missing or broken 9: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer", ... : DESCRIPTION file of package 'timeSeries' is missing or broken 10: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer", ... : DESCRIPTION file of package 'timeDate' is missing or broken 11: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer", ... : DESCRIPTION file of package 'vcd' is missing or broken 12: In FUN(c("prob", "ggplot2", "reshape", "RColorBrewer", ... : DESCRIPTION file of package 'colorspace' is missing or broken However, I typically load the following ones: library(colorspace, lib.loc=RLibraryPathLocation) library(vcd, lib.loc=RLibraryPathLocation) library(timeDate, lib.loc=RLibraryPathLocation) library(timeSeries, lib.loc=RLibraryPathLocation) library(fBasics, lib.loc=RLibraryPathLocation) library(nortest, lib.loc=RLibraryPathLocation) library(plyr, lib.loc=RLibraryPathLocation) library(proto, lib.loc=RLibraryPathLocation) library(RColorBrewer, lib.loc=RLibraryPathLocation) library(reshape, lib.loc=RLibraryPathLocation) library(ggplot2, lib.loc=RLibraryPathLocation) library(prob, lib.loc=RLibraryPathLocation) --- On Fri, 5/29/09, G. Jay Kerns <gke...@ysu.edu> wrote: > From: G. Jay Kerns <gke...@ysu.edu> > Subject: Re: [R] Odd Behavior Out of setdiff(...) - addition of duplicate > entries is not identified > To: "Jason Rupert" <jasonkrup...@yahoo.com> > Cc: R-help@r-project.org > Date: Friday, May 29, 2009, 3:21 PM > Dear Jason, > > On Fri, May 29, 2009 at 2:48 PM, Jason Rupert <jasonkrup...@yahoo.com> > wrote: > > > > I think I am using the improved version of > setdiff(...) that handles data.frames, so I think some odd > behavior was expected but this one is escaping me. > > > > It appears that the the addition of duplicate entries > is not caught by the setdiff(...). Is this expected > behavior? > > [snip] > > > Thanks in advance for any feedback. > > > > Test1_DF<-data.frame(HouseSize=c(1:100)) > > Test2_DF<-rbind(Test1_DF, Test1_DF) > > setdiff(Test1_DF, Test2_DF) > > integer(0) > > setdiff(Test2_DF, Test1_DF) > > integer(0) > > > > However, > > Test3_DF<-data.frame(HouseSize=c(1:25)) > > setdiff(Test1_DF, Test3_DF) > > [1] 26 27 28 29 30 31 32 33 34 > 35 36 37 38 39 40 41 > > [17] 42 43 44 45 46 47 48 49 50 51 > 52 53 54 55 56 57 > > [33] 58 59 60 61 62 63 64 65 66 67 > 68 69 70 71 72 73 > > [49] 74 75 76 77 78 79 80 81 82 83 > 84 85 86 87 88 89 > > [65] 90 91 92 93 94 95 96 97 98 99 > 100 > > > > setdiff(Test3_DF, Test1_DF) > > integer(0) > > > You didn't explicitly say which "improved version" of > setdiff() that > you are using, so I can only presume that you are using > the > setdiff.data.frame in the prob package. > > The behaviour you are observing is expected and matches > the > base:::setdiff behaviour in the case of vectors; cf. > > x1 <- c(1:100) > x2 <- c(x1,x1) > > setdiff(x1, x2) # integer(0) > setdiff(x2, x1) # integer(0) > > x3 <- c(1:25) > setdiff(x1, x3) # 26:100 > setdiff(x3, x1) # integer(0) > > > > > > If so, is there another method or approach that should > be used to identify duplicate row entries between two > different data frames? > > > > The R-help archives are chock full of every possible > variant of > questions (and answers) about this, and you haven't said > _exactly_ > what you are looking for. In the absence of an already > posted > solution, please specify exactly what you want and I'll > wager an R > Ninja could dispatch it in moments. > > Regards, > Jay > > > > > > > > > > *************************************************** > G. Jay Kerns, Ph.D. > Associate Professor > Department of Mathematics & Statistics > Youngstown State University > Youngstown, OH 44555-0002 USA > Office: 1035 Cushwa Hall > Phone: (330) 941-3310 Office (voice mail) > -3302 Department > -3170 FAX > E-mail: gke...@ysu.edu > http://www.cc.ysu.edu/~gjkerns/ > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.