Just partition the unique stand_ID's and select on them using %in% , say: id <- unique(dataGenotype$stand_ID) tst <- sample(id, floor(length(id)/2)) wh <- dataGenotype$stand_ID %in% tst ## logical vector test<- dataGenotype[wh,] train <- dataGenotype[!wh,]
There are a million variations on this theme I'm sure. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Aug 27, 2018 at 3:54 PM Ahmed Attia <ahmedati...@gmail.com> wrote: > I would like to partition the following dataset (dataGenotype) based > on two variables; Genotype and stand_ID, for example, for Genotype > H13: stand_ID number 7 may go to training and stand_ID number 18 and > 21 may go to testing. > > Genotype stand_ID Inventory_date stemC mheight > H13 7 5/18/2006 1940.1075 11.33995 > H13 7 11/1/2008 10898.9597 23.20395 > H13 7 4/14/2009 12830.1284 23.77395 > H13 18 11/3/2005 2726.42 13.4432 > H13 18 6/30/2008 12226.1554 24.091967 > H13 18 4/14/2009 14141.68 25.0922 > H13 21 5/18/2006 4981.7158 15.7173 > H13 21 4/14/2009 20327.0667 27.9155 > H15 9 3/31/2006 3570.06 14.7898 > H15 9 11/1/2008 15138.8383 26.2088 > H15 9 4/14/2009 17035.4688 26.8778 > H15 20 1/18/2005 3016.881 14.1886 > H15 20 10/4/2006 8330.4688 20.19425 > H15 20 6/30/2008 13576.5 25.4774 > H15 32 2/1/2006 3426.2525 14.31815 > U21 3 1/9/2006 3660.416 15.09925 > U21 3 6/30/2008 13236.29 24.27634 > U21 3 4/14/2009 16124.192 25.79562 > U21 67 11/4/2005 2812.8425 13.60485 > U21 67 4/14/2009 13468.455 24.6203 > > And the desired output is the following; > > A-training > > Genotype stand_ID Inventory_date stemC mheight > H13 7 5/18/2006 1940.1075 11.33995 > H13 7 11/1/2008 10898.9597 23.20395 > H13 7 4/14/2009 12830.1284 23.77395 > H15 9 3/31/2006 3570.06 14.7898 > H15 9 11/1/2008 15138.8383 26.2088 > H15 9 4/14/2009 17035.4688 26.8778 > U21 67 11/4/2005 2812.8425 13.60485 > U21 67 4/14/2009 13468.455 24.6203 > > B-testing > > Genotype stand_ID Inventory_date stemC mheight > H13 18 11/3/2005 2726.42 13.4432 > H13 18 6/30/2008 12226.1554 24.091967 > H13 18 4/14/2009 14141.68 25.0922 > H13 21 5/18/2006 4981.7158 15.7173 > H13 21 4/14/2009 20327.0667 27.9155 > H15 20 1/18/2005 3016.881 14.1886 > H15 20 10/4/2006 8330.4688 20.19425 > H15 20 6/30/2008 13576.5 25.4774 > H15 32 2/1/2006 3426.2525 14.31815 > U21 3 1/9/2006 3660.416 15.09925 > U21 3 6/30/2008 13236.29 24.27634 > U21 3 4/14/2009 16124.192 25.79562 > > I tried the following code; > > library(caret) > dataPartitioning <- > createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2) > train = dataGenotype[dataPartitioning,] > test = dataGenotype[-dataPartitioning,] > > Also tried > > createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2) > > It did not produce the desired output, the data are partitioned within > the stand_ID. For example, one row of stand_ID 7 goes to training and > two rows of stand_ID 7 go to testing. How can I partition the data by > Genotype and stand_ID together?. > > > > Ahmed Attia > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.