Hi Val, There are probably more elegant ways to do it, but the following is fairly transparent:
# input data arranged as an array: indat<-cbind(c(1,2,2,1),c(1,2,1,1),c(2,2,2,2),c(2,2,2,2),c(2,2,2,1),c(2,2,2,2),c(2,2,2,1),c(2,2,2,2),c(1,2,1,1),c(1,2,1,2)) indat outdat<-array(dim=c(dim(indat)[1],dim(indat)[2]/2)) # output data has same number of rows and half as many columns for (i in 1:dim(outdat)[2]){ outdat[,i]<-apply(indat[,(i-1)*2+1:2],F=sum,M=1) # each column of output = sum(two columns of input) } outdat[outdat==2]<-0 # allele pairs that sum to 2 are genotype 0 outdat[outdat==4]<-1 # allele pairs that sum to 4 are genotype 1 # allele pairs that sum to 3 are genotype 3, so no need to change anything with them outdat # faster but a little more difficult to see what is going on: outdat<-indat %*% array(c(rep(c(rep(1,2),rep(0,dim(indat)[2])),dim(indat)[2]/2),1,1),dim=c(dim(indat)[2],dim(indat)[2]/2)) outdat[outdat==2]<-0 outdat[outdat==4]<-1 outdat -Dan On Thu, Feb 11, 2016 at 2:52 PM, Val <valkr...@gmail.com> wrote: > Hi all, > > I have SNP data set: the first column is the ID and the the > subsequent pair of columns are the alleles for each > SNP1, SNP2 and So on. Each SNP has two columns. Based on the alleles > I want make phenotype > > if the alleles are 1 1 then genotype is 0 > 2 2 then genotype is 1 > and if it is 1 2 or 2 1 then genotyep is 3 > > This is a sample data set but the actual has 13,000 SNP(26,000columns) > > > Geno data > AB95 1 1 2 2 2 2 2 2 1 1 > AB82 2 2 2 2 2 2 2 2 2 2 > AB95 2 1 2 2 2 2 2 2 1 1 > AB59 1 1 2 2 1 2 1 2 1 2 > AB32 2 1 2 2 2 2 2 2 1 2 > AB46 2 1 2 2 1 2 1 1 2 2 > AB61 1 1 2 2 1 2 1 2 1 2 > AB32 2 2 1 2 2 2 2 2 1 2 > AB35 2 2 1 2 2 2 2 2 2 2 > AB43 2 2 1 2 2 2 2 2 2 2 > > Desired output > AB95 0 1 1 1 0 > AB82 1 1 1 1 1 > AB95 3 1 1 1 0 > AB59 0 1 3 3 3 > AB32 3 1 1 1 3 > AB46 3 1 3 0 1 > AB61 0 1 3 3 3 > AB32 1 3 1 1 3 > AB35 1 3 1 1 1 > AB43 1 3 1 1 1 > > I would appreciate if you help me out here. > Thank you in advance > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Dan Dalthorp, PhD USGS Forest and Rangeland Ecosystem Science Center Forest Sciences Lab, Rm 189 3200 SW Jefferson Way Corvallis, OR 97331 ph: 541-750-0953 ddalth...@usgs.gov [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.