Thanks Bert, worked nicely. Yes, genotypes with only one ID will be
eliminated before partitioning the data.
Best regards
Ahmed Attia
On Mon, Aug 27, 2018 at 8:09 PM, Bert Gunter wrote:
> Just partition the unique stand_ID's and select on them using %in% , say:
>
> id <- unique(dataGenoty
Sorry, my bad -- careless reading: you need to do the partitioning within
genotype.
Something like:
by(dataGenotype, dataGenotype$Genotype, function(x){
u <- unique(x$standID)
tst <- x$x2 %in% sample(u, floor(length(u)/2))
list(test = x[tst,], train = x[!tst,]
})
This will give a
And yes, I ignored Genotype, but for the example data none of the stand_ID
values are present in more than one Genotype, so it doesn't matter. If that's
not true in general, then constructing the grp variable is a little more
complex, but the principle is the same.
--
Don MacQueen
Lawrence Live
You could start with split()
grp <- rep('', nrow(mydata) )
grp[mydata$stand_ID %in% c(7,9,67)] <- 'A-training'
grp[mydata$stand_ID %in% c(3,18,20,21,32)] <- 'B-testing'
split(mydata, grp)
or perhaps
grp <- ifelse( mydata$stand_ID %in% c(7,9,67) , 'A-training', 'B-testing' )
split(mydata, grp)
Just partition the unique stand_ID's and select on them using %in% , say:
id <- unique(dataGenotype$stand_ID)
tst <- sample(id, floor(length(id)/2))
wh <- dataGenotype$stand_ID %in% tst ## logical vector
test<- dataGenotype[wh,]
train <- dataGenotype[!wh,]
There are a million variations on this t
I would like to partition the following dataset (dataGenotype) based
on two variables; Genotype and stand_ID, for example, for Genotype
H13: stand_ID number 7 may go to training and stand_ID number 18 and
21 may go to testing.
Genotypestand_IDInventory_date stemC mheight
H13
6 matches
Mail list logo