Thank you very much Dan! I want go with the second one, because the data very huge (>25,000 columns) and > 3,000 row. The data is loaded as "testdat"
Can you help me to fit in the following code please, # faster but a little more difficult to see what is going on: outdat<-indat %*% array(c(rep(c(rep(1,2),rep(0,dim(indat)[2])),dim(indat)[2]/2),1,1),dim=c(dim(indat)[2],dim(indat)[2]/2)) outdat[outdat==2]<-0 outdat[outdat==4]<-1 outdat Thank you! On Thu, Feb 11, 2016 at 5:58 PM, Dalthorp, Daniel <ddalth...@usgs.gov> wrote: > Hi Val, > There are probably more elegant ways to do it, but the following is fairly > transparent: > > # input data arranged as an array: > > indat<-cbind(c(1,2,2,1),c(1,2,1,1),c(2,2,2,2),c(2,2,2,2),c(2,2,2,1),c(2,2,2,2),c(2,2,2,1),c(2,2,2,2),c(1,2,1,1),c(1,2,1,2)) > indat > > outdat<-array(dim=c(dim(indat)[1],dim(indat)[2]/2)) # output data has same > number of rows and half as many columns > for (i in 1:dim(outdat)[2]){ > outdat[,i]<-apply(indat[,(i-1)*2+1:2],F=sum,M=1) # each column of output > = sum(two columns of input) > } > outdat[outdat==2]<-0 # allele pairs that sum to 2 are genotype 0 > outdat[outdat==4]<-1 # allele pairs that sum to 4 are genotype 1 > # allele pairs that sum to 3 are genotype 3, so no need to change anything > with them > outdat > > # faster but a little more difficult to see what is going on: > outdat<-indat %*% > array(c(rep(c(rep(1,2),rep(0,dim(indat)[2])),dim(indat)[2]/2),1,1),dim=c(dim(indat)[2],dim(indat)[2]/2)) > outdat[outdat==2]<-0 > outdat[outdat==4]<-1 > outdat > > -Dan > > > > On Thu, Feb 11, 2016 at 2:52 PM, Val <valkr...@gmail.com> wrote: > >> Hi all, >> >> I have SNP data set: the first column is the ID and the the >> subsequent pair of columns are the alleles for each >> SNP1, SNP2 and So on. Each SNP has two columns. Based on the alleles >> I want make phenotype >> >> if the alleles are 1 1 then genotype is 0 >> 2 2 then genotype is 1 >> and if it is 1 2 or 2 1 then genotyep is 3 >> >> This is a sample data set but the actual has 13,000 SNP(26,000columns) >> >> >> Geno data >> AB95 1 1 2 2 2 2 2 2 1 1 >> AB82 2 2 2 2 2 2 2 2 2 2 >> AB95 2 1 2 2 2 2 2 2 1 1 >> AB59 1 1 2 2 1 2 1 2 1 2 >> AB32 2 1 2 2 2 2 2 2 1 2 >> AB46 2 1 2 2 1 2 1 1 2 2 >> AB61 1 1 2 2 1 2 1 2 1 2 >> AB32 2 2 1 2 2 2 2 2 1 2 >> AB35 2 2 1 2 2 2 2 2 2 2 >> AB43 2 2 1 2 2 2 2 2 2 2 >> >> Desired output >> AB95 0 1 1 1 0 >> AB82 1 1 1 1 1 >> AB95 3 1 1 1 0 >> AB59 0 1 3 3 3 >> AB32 3 1 1 1 3 >> AB46 3 1 3 0 1 >> AB61 0 1 3 3 3 >> AB32 1 3 1 1 3 >> AB35 1 3 1 1 1 >> AB43 1 3 1 1 1 >> >> I would appreciate if you help me out here. >> Thank you in advance >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Dan Dalthorp, PhD > USGS Forest and Rangeland Ecosystem Science Center > Forest Sciences Lab, Rm 189 > 3200 SW Jefferson Way > Corvallis, OR 97331 > ph: 541-750-0953 > ddalth...@usgs.gov > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.