And yes, I ignored Genotype, but for the example data none of the stand_ID values are present in more than one Genotype, so it doesn't matter. If that's not true in general, then constructing the grp variable is a little more complex, but the principle is the same.
-- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 Lab cell 925-724-7509 On 8/27/18, 4:10 PM, "R-help on behalf of MacQueen, Don via R-help" <r-help-boun...@r-project.org on behalf of r-help@r-project.org> wrote: You could start with split() grp <- rep('', nrow(mydata) ) grp[mydata$stand_ID %in% c(7,9,67)] <- 'A-training' grp[mydata$stand_ID %in% c(3,18,20,21,32)] <- 'B-testing' split(mydata, grp) or perhaps grp <- ifelse( mydata$stand_ID %in% c(7,9,67) , 'A-training', 'B-testing' ) split(mydata, grp) -Don -- Don MacQueen Lawrence Livermore National Laboratory 7000 East Ave., L-627 Livermore, CA 94550 925-423-1062 Lab cell 925-724-7509 On 8/27/18, 3:54 PM, "R-help on behalf of Ahmed Attia" <r-help-boun...@r-project.org on behalf of ahmedati...@gmail.com> wrote: I would like to partition the following dataset (dataGenotype) based on two variables; Genotype and stand_ID, for example, for Genotype H13: stand_ID number 7 may go to training and stand_ID number 18 and 21 may go to testing. Genotype stand_ID Inventory_date stemC mheight H13 7 5/18/2006 1940.1075 11.33995 H13 7 11/1/2008 10898.9597 23.20395 H13 7 4/14/2009 12830.1284 23.77395 H13 18 11/3/2005 2726.42 13.4432 H13 18 6/30/2008 12226.1554 24.091967 H13 18 4/14/2009 14141.68 25.0922 H13 21 5/18/2006 4981.7158 15.7173 H13 21 4/14/2009 20327.0667 27.9155 H15 9 3/31/2006 3570.06 14.7898 H15 9 11/1/2008 15138.8383 26.2088 H15 9 4/14/2009 17035.4688 26.8778 H15 20 1/18/2005 3016.881 14.1886 H15 20 10/4/2006 8330.4688 20.19425 H15 20 6/30/2008 13576.5 25.4774 H15 32 2/1/2006 3426.2525 14.31815 U21 3 1/9/2006 3660.416 15.09925 U21 3 6/30/2008 13236.29 24.27634 U21 3 4/14/2009 16124.192 25.79562 U21 67 11/4/2005 2812.8425 13.60485 U21 67 4/14/2009 13468.455 24.6203 And the desired output is the following; A-training Genotype stand_ID Inventory_date stemC mheight H13 7 5/18/2006 1940.1075 11.33995 H13 7 11/1/2008 10898.9597 23.20395 H13 7 4/14/2009 12830.1284 23.77395 H15 9 3/31/2006 3570.06 14.7898 H15 9 11/1/2008 15138.8383 26.2088 H15 9 4/14/2009 17035.4688 26.8778 U21 67 11/4/2005 2812.8425 13.60485 U21 67 4/14/2009 13468.455 24.6203 B-testing Genotype stand_ID Inventory_date stemC mheight H13 18 11/3/2005 2726.42 13.4432 H13 18 6/30/2008 12226.1554 24.091967 H13 18 4/14/2009 14141.68 25.0922 H13 21 5/18/2006 4981.7158 15.7173 H13 21 4/14/2009 20327.0667 27.9155 H15 20 1/18/2005 3016.881 14.1886 H15 20 10/4/2006 8330.4688 20.19425 H15 20 6/30/2008 13576.5 25.4774 H15 32 2/1/2006 3426.2525 14.31815 U21 3 1/9/2006 3660.416 15.09925 U21 3 6/30/2008 13236.29 24.27634 U21 3 4/14/2009 16124.192 25.79562 I tried the following code; library(caret) dataPartitioning <- createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) train = dataGenotype[dataPartitioning,] test = dataGenotype[-dataPartitioning,] Also tried createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) It did not produce the desired output, the data are partitioned within the stand_ID. For example, one row of stand_ID 7 goes to training and two rows of stand_ID 7 go to testing. How can I partition the data by Genotype and stand_ID together?. Ahmed Attia ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.