Jay, Thanks again for all your help.
I have ended up with something similar that appears to work and truly does provide the difference of two data frames including all the duplicate rows that may be removed due to filtering. Thanks again as this will be very helpful to me going forward as the data I receive often has duplicate rows that I filter out but want to double check that it is filtered out. Entry_DF<-read.csv("RSetDiffEntry.csv", header = TRUE) EntryFiltered_DF<-subset(Entry_DF, !duplicated(Entry_DF)) EntryFiltered_DF<-subset(EntryFiltered_DF, !(EntryFiltered_DF$CostPerSquareFoot==0)) EntryFiltered_DF<-subset(EntryFiltered_DF, EntryFiltered_DF$CostPerSquareFoot>0) EntryFiltered_DF<-subset(EntryFiltered_DF, EntryFiltered_DF$CostPerSquareFoot<300) library("prob") setDiff_DF<-setdiff(Entry_DF, EntryFiltered_DF) DuplicateRows_DF<-subset(Entry_DF, duplicated(Entry_DF)) DesiredDFDiff_DF<-rbind(DuplicateRows_DF, setDiff_DF) DesiredDFDiff_DF --- On Sat, 5/30/09, G. Jay Kerns <gke...@ysu.edu> wrote: > From: G. Jay Kerns <gke...@ysu.edu> > Subject: Re: setdiff bizarre (was: odd behavior out of setdiff) > To: "Jason Rupert" <jasonkrup...@yahoo.com> > Cc: "David Winsemius" <dwinsem...@comcast.net>, "r-help@r-project.org" > <r-help@r-project.org> > Date: Saturday, May 30, 2009, 5:19 PM > Jason, > > (moved back to R-help) > > On Sat, May 30, 2009 at 3:30 PM, Jason Rupert <jasonkrup...@yahoo.com> > wrote: > > > > Jay, > > > > > > I really appreciate all your help help. > > > > I posted to Nabble an R file and input CSV files more > accurately demonstrating what I am seeing and the output I > desire to achieve when I difference two dataframes. > > http://n2.nabble.com/Support-SetDiff-Discussion-Items...-td2999739.html > > > > > > It may be that "setdiff" as intended in the base R > functionality and "prob" was never intended to provide the > type of result I desire. If that is the case then I will > need to ask the "Ninjas" for help to produce the out come I > seek. > > > > That is, when I different the data within > RSetDiffEntry.csv and RSetDuplicatesRemoved.csv, I desire to > get the result shown in RDesired.csv. > > > > Note that, it would not be enough to just work to > remove duplicate "CostPerSquareFoot" values, since that > variable is tied to "EntryDate" and "HouseNumber". > > > > Any further help and insights are much appreciated. > > > > Thanks again, > > Jason > > > > From your description, something like the following should > work: > > Let A = your RSetDiffEntry > Let B = your RSetDuplicatesRemoved... > > library(prob) > C <- setdiff(A,B) > D <- rbind(A,C) > E <- D[duplicated(D),] > > The E should = your RDesired. > > Hope this helps, > Jay > > P.S. I notice your row number 7 in > "RSetDuplicatesRemoved" is > duplicated by the following row. That's a typo, yes? > If so, then E > should have one more row than your "RDesired." > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.