Thanks Gray, This helps, I'd completely forgotten about the subset command. However, it doesn't quite get me where I need. Perhaps an example will help. I will simplify my dataframe to the three important variables:
ESR_ref ESR_ref_edit Loaded 1.1 1.1 Y 1.1.1 1.1 NC 1.1.2 1.1 Y 2.1 2.1 N 2.1.1 2.1 Y 2.1.2 2.1 PU 2.1.3 2.1 Y 3.1 3.1 Y 4.1 4.1 N 4.1.1 4.1 PU So I've created the "edit" variable so I can test for duplicates (i.e. samples with more than one sub-sample) because this is not of interest at this point. I just want one subsample per sample. However, if we consider 2.1 - this would result in a subset (if duplicates were removed) with the first line which has an "N". But it is of interest to me the if at least one of the subsamples has a "Y" then I want that line rather than a subsample with another code. 1.1 in this example works by default because the first subsample is a "Y" so it will retain that information. Thanks Gareth Gray Calhoun-2 wrote: > > Hi, > Try: > > subset(Samps, !duplicated(Samps$ESR_ref_edit) | Samps$Loaded == "Y") > > I'd need specific code to be sure that this is exactly what you want > (ie you specify input and desired output), but indexing with a logical > vector is probably going to be the solution. > > Best, > Gray > > On Wed, Dec 16, 2009 at 7:55 PM, gcam <gcam...@gmail.com> wrote: >> >> Hi all. >> >> So I have a data frame with multiple columns/variables. The first >> variable >> is a major sample name for which there are some sub-samples. Currently I >> have used the following command to remove the duplicates: >> >> Samps_working<-Samps[-c(which(duplicated(Samps$ESR_Ref_edit))),] >> >> This removes all of the duplicated sample rows. >> >> However, I just realised that, of course, this removes the first >> observation >> of each duplicated set. However, I wish to retain any that have the code >> "Y" in another variable Samps$Loaded. I'm at a bit of a loss as to how >> best >> to approach this problem. >> >> Just to reiterate. I want to remove all duplicate lines based on sample >> name, but, I want the lines to be removed with a preference given to >> those >> that do not include a "Y" in the Loaded variable column. >> -- >> View this message in context: >> http://n4.nabble.com/Remove-duplicates-from-a-data-frame-but-with-some-special-requirements-tp965745p965745.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Gray Calhoun > > Assistant Professor of Economics > Iowa State University > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://n4.nabble.com/Remove-duplicates-from-a-data-frame-but-with-some-special-requirements-tp965745p974312.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.