[R] Transform a list of multiple to a data.frame which I want
Dear all: I have a list like that,which is a standard str_locate_all() function (stringr package) output: $K start end $GSEGTCSCSSK start end [1,] 6 6 [2,] 8 8 $GFSTTCPAHVDDLTPEQVLDGDVNELMDVVLHHVPEAK start end [1,] 6 6 $LVECIGQELIFLLPNK start end [1,] 4 4 $NFK start end $HR start end $AYASLFR start end I want to transform this list like that: ID start.1 start.2 K NA NA GSEGTCSCSSK 6 8 GFSTTCPAHVDDLTPEQVLDGDVNELMDVVLHHVPEAK 6 NA LVECIGQELIFLLPNK 4 NA NFK NA NA HR NA NA AYASLFR NA NA I have already tried to use t() , lapply() but I think it is hard to handle the NA value and different rows in every matrix Thanks in advance __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is that an efficient way to find the overlapped , upstream and downstream rangess for a bunch of rangess
I do have a bunch of genes ( nearly ~5) from the whole genome, which read in genomic ranges A range(gene) can be seem as an observation has three columns chromosome, start and end, like that seqnames start end width strand gene1 chr1 1 5 5 + gene2 chr110 15 6 + gene3 chr112 17 6 + gene4 chr120 25 6 + gene5 chr130 4011 + I just wondering is there an efficient way to find *overlapped, upstream and downstream genes for each gene in the granges* For example, assuming all_genes_gr is a ~5 genes genomic range, the result I want like belows: gene_name upstream_gene downstream_gene overlapped_gene gene1 NA gene2 NA gene2 gene1 gene4 gene3 gene3 gene1 gene4 gene2 gene4 gene3 gene5 NA Currently , the strategy I use is like that, library(GenomicRanges) find_overlapped_gene <- function(idx, all_genes_gr) { #cat(idx, "\n") curr_gene <- all_genes_gr[idx] other_genes <- all_genes_gr[-idx] n <- countOverlaps(curr_gene, other_genes) gene <- subsetByOverlaps(curr_gene, other_genes) return(list(n, gene)) } system.time(lapply(1:100, function(idx) find_overlapped_gene(idx, all_genes_gr))) However, for 100 genes, it use nearly ~8s by system.time().That means if I had 5 genes, nearly one hour for just find overlapped gene. I am just wondering any algorithm or strategy to do that efficiently, perhaps 5 genes in ~10min or even less Yao He [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to read different files into different objects in one time?
Dear All I have a lot of files in a directory as follows: "02-03.txt" "03-04.txt" "04-05.txt" "05-06.txt" "06-07.txt" "07-08.txt" "08-09.txt" "09-10.txt" "G0.txt" "G1.txt" "raw_ped.txt" .. I want to read them into different objects according to their filenames,such as: 02-03<-read.table("02-03.txt",header=T) 03-04<-read.table("03-04.txt",header=T) I don't want to type hundreds of read.table(),so how I read it in one time? I think the core problem is that I can't create different objects' name in the use of loop or sapply() ,but there may be a better way to do what I want. Thanks a lot Yao He Yao He -- — Master candidate in 2rd year Department of Animal genetics & breeding Room 436,College of Animial Science&Technology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to aggregate T-test result in an elegant way?
Dear all: Plan 1: I want to do serval t-test means for different variables in a loop , so I want to add all results to an object then dump() them to an text. But I don't know how to append T-test result to the object? I have already plot the barplot and I want to know an elegant way to report raw result. Can anybody give me some pieces of advice? Yao He — Master candidate in 2rd year Department of Animal genetics & breeding Room 436,College of Animial Science&Technology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to aggregate T-test result in an elegant way?
Thank you,it is really helpful everytime. I didn't provide any example data because I thought it is just a question of how to report t.test() result in R. However,as you say,it is better to show more details for finding an elegant way In fact I generate a 3-dimension array like that: str(a) num [1:2, 1:245, 1:3] 47.5 NA 48.9 NA 47.5 ... - attr(*, "dimnames")=List of 3 ..$ : chr [1:2] "13%" "21%" ..$ : chr [1:245] "TWF2H101" "TWF2H105" "TWF2H106" "TWF2H110" ... ..$ : chr [1:3] "EW.INCU" "EW.17.5" "EMW" I want to do two sample mean t-test between 13% and 21% for each variable "EW.INCU" "EW.17.5" "EMW". So I try these codes: variable<-dimnames(a)[[3]] O2<-dimnames(a)[[1]] for (i in variable) { print(i) print(O2[1]) print(O2[2]) print(t.test(a[O2[1],,i],a[O2[2],,i],na.rm=T)) } I don't think it is an elegant way and I am inexperience to report raw result. Could you give me more help? Yao He 2013/1/7 arun : > Hi, > You didn't provide any example data. So, I am not sure whether this helps. > > set.seed(15) > dat1<-data.frame(A=sample(10:20,5,replace=TRUE),B=sample(18:28,5,replace=TRUE),C=sample(25:35,5,replace=TRUE),D=sample(20:30,5,replace=TRUE)) > res<-lapply(lapply(seq_len(ncol(dat2)),function(i) > t.test(dat2[,i],dat1[,1],paired=TRUE)),function(x) > data.frame(meanDiff=x$estimate,p.value=x$p.value))# paired > names(res)<-paste("A",LETTERS[2:4],sep="") > res<- do.call(rbind,res) > res > # meanDiff p.value > #AB 9.4 0.021389577 > #AC 15.0 0.002570261 > #AD 10.6 0.003971604 > > > #or > res1<-lapply(lapply(seq_len(ncol(dat2)),function(i) > t.test(dat2[,i],dat1[,1],paired=FALSE)),function(x) > data.frame(mean=x$estimate,p.value=x$p.value)) > names(res1)<-paste("A",LETTERS[2:4],sep="") > res1<-do.call(rbind,res1) > row.names(res1)[grep("mean of > y",row.names(res1))]<-gsub("(.*\\.).*","\\1A",row.names(res1)[grep("mean of > y",row.names(res1))]) > row.names(res1)[grep("mean of > x",row.names(res1))]<-gsub("(\\w)(\\w)(\\.).*","\\1\\2\\3\\2",row.names(res1)[grep("mean > of x",row.names(res1))]) > res1 > # mean p.value > #AB.B 25.2 1.299192e-03 > #AB.A 15.8 1.299192e-03 > #AC.C 30.8 5.145519e-05 > #AC.A 15.8 5.145519e-05 > #AD.D 26.4 1.381339e-03 > #AD.A 15.8 1.381339e-03 > > > A.K. > > > > - Original Message - > From: Yao He > To: r-help@r-project.org > Cc: > Sent: Sunday, January 6, 2013 10:20 PM > Subject: [R] how to aggregate T-test result in an elegant way? > > Dear all: > > Plan 1: > I want to do serval t-test means for different variables in a loop , > so I want to add all results to an object then dump() them to an > text. But I don't know how to append T-test result to the object? > > I have already plot the barplot and I want to know an elegant way to > report raw result. > Can anybody give me some pieces of advice? > > Yao He > ― > Master candidate in 2rd year > Department of Animal genetics & breeding > Room 436,College of Animial Science&Technology, > China Agriculture University,Beijing,100193 > E-mail: yao.h.1...@gmail.com > ―― > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- ― Master candidate in 2rd year Department of Animal genetics & breeding Room 436,College of Animial Science&Technology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com ―― __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to aggregate T-test result in an elegant way?
Hi, arun I'm so sorry for that isn't helpful. One of question is that I don't know how to subset a small part as it is a 3-dimension array so I just show the structure of that. I tried dput() to a file , then what should I do for subsetting it? Another question is : My rawdata is a "melt" dataframe like that: IID O2 variablevalue 1 TWF2H5 13% EW.INCU 49.38 2 TWF2H6 13% EW.INCU 48.02 3 TWF2H19 13% EW.INCU51.44 280 TWF2H10113% EW.17.5 42.26 281 TWF2H10513% EW.17.5 43.52 282 TWF2H10613% EW.17.5 42.83 472 TWF2N10221% EW.17.5 45.97 473 TWF2N10421% EW.17.543.32 474 TWF2N10621% EW.17.5 48.63 689 TWF2N2 21% EMW19.57 690 TWF2N6 21% EMW18.07 691 TWF2N10 21% EMW 15.4 491 TWF2H5 13%EMW 15.61 492 TWF2H6 13% EMW 13.41 493 TWF2H19 13% EMW 14.03 199 TWF2N2 21% EW.INCU 48.69 200 TWF2N6 21% EW.INCU 50.52 201 TWF2N10 21% EW.INCU 42.04 if you meet a t-test task as I described , is that generate a high-dimension array a good way ? Thank you! Yao He 2013/1/7 arun : > HI, > I tried to create an example dataset (as you didn't provide the data). > set.seed(25) > a<-array(sample(1:50,60,replace=TRUE),dim=c(2,10,3)) > dimnames(a)[[1]]<-c("13%","21%") > dimnames(a)[[2]]<-paste("TWF2H",101:110,sep="") > dimnames(a)[[3]]<-c("EW.INCU","EW.17.5","EMW") > > > str(a) > # int [1:2, 1:10, 1:3] 21 35 8 45 7 50 32 17 4 15 ... > #- attr(*, "dimnames")=List of 3 > #..$ : chr [1:2] "13%" "21%" > .#.$ : chr [1:10] "TWF2H101" "TWF2H102" "TWF2H103" "TWF2H104" ... > #..$ : chr [1:3] "EW.INCU" "EW.17.5" "EMW" > > res<-lapply(lapply(seq_len(dim(a)[3]),function(i) > t.test(a[dimnames(a)[[1]][1],,i],a[dimnames(a)[[1]][2],,i])),function(x) > data.frame(mean=x$estimate,p.value=x$p.value)) > res1<-do.call(rbind,res) > row.names(res1)[grep("mean of > x",row.names(res1))]<-gsub("(.*\\.).*$","\\113%",row.names(res1)[grep("mean > of x",row.names(res1))]) > row.names(res1)[grep("mean of > y",row.names(res1))]<-gsub("(.*\\.).*$","\\121%",row.names(res1)[grep("mean > of y",row.names(res1))]) > res1 > #mean p.value > #EW.INCU.13% 22.3 0.2754842 > #EW.INCU.21% 29.3 0.2754842 > #EW.17.5.13% 20.5 0.4705772 > #EW.17.5.21% 16.0 0.4705772 > #EMW.13% 23.9 0.9638679 > #EMW.21% 24.2 0.9638679 > A.K. > > > > > - Original Message - > From: Yao He > To: arun > Cc: R help > Sent: Sunday, January 6, 2013 11:21 PM > Subject: Re: [R] how to aggregate T-test result in an elegant way? > > Thank you,it is really helpful everytime. > > I didn't provide any example data because I thought it is just a > question of how to report t.test() result in R. > However,as you say,it is better to show more details for finding an elegant > way > > In fact I generate a 3-dimension array like that: > str(a) > num [1:2, 1:245, 1:3] 47.5 NA 48.9 NA 47.5 ... > - attr(*, "dimnames")=List of 3 > ..$ : chr [1:2] "13%" "21%" > ..$ : chr [1:245] "TWF2H101" "TWF2H105" "TWF2H106" "TWF2H110" ... > ..$ : chr [1:3] "EW.INCU" "EW.17.5" "EMW" > > I want to do two sample mean t-test between 13% and 21% for each > variable "EW.INCU" "EW.17.5" "EMW". > > So I try these codes: > variable<-dimnames(a)[[3]] > O2<-dimnames(a)[[1]] > for (i in variable) { > print(i) > print(O2[1]) > print(O2[2]) > print(t.test(a[O2[1],,i],a[O2[2],,i],na.rm=T)) > } > > I don't think it is an elegant way and I am inexperience to report raw result. > Could you give me more help? > > Yao He > > 2013/1/7 arun : >> Hi, >> You didn't provide any example data. So, I am not sure whether this helps. >> >> set.seed(15) >> dat1<-data.frame(A=sample(10:20,5,replace=TRUE),B=sample(18:28,5,replace=TRUE),C=sample(25:35,5,replace=TRUE),D=sample(20:30,5,replace=TRUE)) >> res<-lapply(lapply(seq_len(ncol(dat2)),function(i) >> t.test(dat2[,i],dat1[,1],paired=TRUE)),function(x) >> data.frame(meanDiff=x$estimate,p.value=x$p.value))# paired >> names(res)<-paste("A",LETTERS[2:4],sep="") >> res<- do.call(rbi
Re: [R] how to aggregate T-test result in an elegant way?
Hi,arun Yes , I just want to do the t.test I think maybe it is not necessary to generate a 3D array from the raw data.frame by acast() at first Thanks a lot 2013/1/7 arun : > Hi Yao, > > It's okay. > > How did you generate the 3 D array? > Using ?acast() > > I am not sure I understand your question " > > if you meet a t-test task as I described , is that generate a > high-dimension array a good way ?" > > Do you want to do the t-test in the melt dataset? > > b<- read.table(text=" > IDO2variablevalue > 1TWF2H513% EW.INCU49.38 > 2TWF2H613% EW.INCU48.02 > 3TWF2H1913%EW.INCU51.44 > 280TWF2H10113% EW.17.542.26 > 281TWF2H10513%EW.17.543.52 > 282TWF2H10613% EW.17.542.83 > 472TWF2N10221% EW.17.545.97 > 473TWF2N10421%EW.17.5 43.32 > 474TWF2N10621% EW.17.548.63 > 689TWF2N221% EMW19.57 > 690TWF2N621%EMW18.07 > 691TWF2N1021%EMW15.4 > 491TWF2H513%EMW15.61 > 492TWF2H613%EMW13.41 > 493TWF2H1913%EMW14.03 > 199TWF2N221%EW.INCU48.69 > 200TWF2N621%EW.INCU50.52 > 201TWF2N1021%EW.INCU42.04 > ",sep="",header=TRUE,stringsAsFactors=FALSE) > res<-lapply(lapply(split(b,b$variable),function(x) > t.test(x$value[x$O2=="13%"],x$value[x$O2=="21%"])),function(x) > data.frame(mean=x$estimate,p.value=x$p.value)) > res1<-do.call(rbind,res) > row.names(res1)[grep("mean of > x",row.names(res1))]<-gsub("(.*\\.).*$","\\113%",row.names(res1)[grep("mean > of x",row.names(res1))]) > row.names(res1)[grep("mean of > y",row.names(res1))]<-gsub("(.*\\.).*$","\\121%",row.names(res1)[grep("mean > of y",row.names(res1))]) > res1 > # meanp.value > #EMW.13% 14.35000 0.09355374 > #EMW.21% 17.68000 0.09355374 > #EW.17.5.13% 42.87000 0.17464018 > #EW.17.5.21% 45.97333 0.17464018 > #EW.INCU.13% 49.61333 0.43689727 > #EW.INCU.21% 47.08333 0.43689727 > > A.K. > > > > - Original Message - > From: Yao He > To: arun > Cc: R help > Sent: Monday, January 7, 2013 4:00 AM > Subject: Re: [R] how to aggregate T-test result in an elegant way? > > Hi, arun > I'm so sorry for that isn't helpful. > One of question is that I don't know how to subset a small part as it > is a 3-dimension array so I just show the structure of that. > I tried dput() to a file , then what should I do for subsetting it? > > Another question is : > My rawdata is a "melt" dataframe like that: > IIDO2variablevalue > 1TWF2H513% EW.INCU49.38 > 2TWF2H613% EW.INCU48.02 > 3TWF2H1913% EW.INCU51.44 > 280TWF2H10113% EW.17.542.26 > 281TWF2H10513% EW.17.5 43.52 > 282TWF2H10613% EW.17.542.83 > 472TWF2N10221% EW.17.545.97 > 473TWF2N10421% EW.17.5 43.32 > 474TWF2N10621% EW.17.548.63 > 689TWF2N221% EMW19.57 > 690TWF2N621% EMW 18.07 > 691TWF2N1021%EMW15.4 > 491TWF2H5 13%EMW15.61 > 492TWF2H613%EMW13.41 > 493TWF2H1913%EMW14.03 > 199TWF2N221%EW.INCU48.69 > 200TWF2N621%EW.INCU50.52 > 201TWF2N1021%EW.INCU42.04 > > if you meet a t-test task as I described , is that generate a > high-dimension array a good way ? > Thank you! > > Yao He > 2013/1/7 arun : >> HI, >> I tried to create an example dataset (as you didn't provide the data). >> set.seed(25) >> a<-array(sample(1:50,60,replace=TRUE),dim=c(2,10,3)) >> dimnames(a)[[1]]<-c("13%","21%") >> dimnames(a)[[2]]<-paste("TWF2H",101:110,sep="") >> dimnames(a)[[3]]<-c("EW.INCU","EW.17.5","EMW") >> >> >> str(a) >> # int [1:2, 1:10, 1:3] 21 35 8 45 7 50 32 17 4 15 ... >> #- attr(*, "dimnames")=List of 3 >> #..$ : chr [1:2] "13%" "21%" >> .#.$ : chr [1:10] "TWF2H101" "TWF2H102" "TWF2H103" "TWF2H104" ... >> #..$ : chr [1:3] "EW.INCU" "EW.17.5" "EMW" >> >> res<-lapply(lapply(seq_len(dim(a)[3]),function(i) >> t.test(a[dimnames(a)[[1]][1],,i],a[dimnames(a)[[1]]
Re: [R] how to aggregate T-test result in an elegant way?
Yes, thanks a lot for your help! Regards 2013/1/8 arun : > Hi Yao, > > You could also have the results in a wide format: > res<-do.call(rbind,lapply(lapply(split(b,b$variable),function(x) > t.test(x$value[x$O2=="13%"],x$value[x$O2=="21%"])),function(x) > data.frame(mean13=x$estimate[1],mean21=x$estimate[2],p.value=x$p.value,CILow=x$conf.int[1],CIHigh=x$conf.int[2]))) > res > # mean13 mean21p.value CILowCIHigh > #EMW 14.35000 17.68000 0.09355374 -7.682686 1.022686 > #EW.17.5 42.87000 45.97333 0.17464018 -9.265622 3.058955 > #EW.INCU 49.61333 47.08333 0.43689727 -7.119234 12.179234 > A.K. > > > > > - Original Message - > From: Yao He > To: arun > Cc: R help > Sent: Monday, January 7, 2013 10:57 AM > Subject: Re: [R] how to aggregate T-test result in an elegant way? > > Hi,arun > > Yes , I just want to do the t.test > I think maybe it is not necessary to generate a 3D array from the raw > data.frame by acast() at first > > Thanks a lot > > 2013/1/7 arun : >> Hi Yao, >> >> It's okay. >> >> How did you generate the 3 D array? >> Using ?acast() >> >> I am not sure I understand your question " >> >> if you meet a t-test task as I described , is that generate a >> high-dimension array a good way ?" >> >> Do you want to do the t-test in the melt dataset? >> >> b<- read.table(text=" >> IDO2variablevalue >> 1TWF2H513% EW.INCU49.38 >> 2TWF2H613% EW.INCU48.02 >> 3TWF2H1913%EW.INCU51.44 >> 280TWF2H10113% EW.17.542.26 >> 281TWF2H10513%EW.17.543.52 >> 282TWF2H10613% EW.17.542.83 >> 472TWF2N10221% EW.17.545.97 >> 473TWF2N10421%EW.17.5 43.32 >> 474TWF2N10621% EW.17.548.63 >> 689TWF2N221% EMW19.57 >> 690TWF2N621%EMW18.07 >> 691TWF2N1021%EMW15.4 >> 491TWF2H513%EMW15.61 >> 492TWF2H613%EMW13.41 >> 493TWF2H1913%EMW14.03 >> 199TWF2N221%EW.INCU48.69 >> 200TWF2N621%EW.INCU50.52 >> 201TWF2N1021%EW.INCU42.04 >> ",sep="",header=TRUE,stringsAsFactors=FALSE) >> res<-lapply(lapply(split(b,b$variable),function(x) >> t.test(x$value[x$O2=="13%"],x$value[x$O2=="21%"])),function(x) >> data.frame(mean=x$estimate,p.value=x$p.value)) >> res1<-do.call(rbind,res) >> row.names(res1)[grep("mean of >> x",row.names(res1))]<-gsub("(.*\\.).*$","\\113%",row.names(res1)[grep("mean >> of x",row.names(res1))]) >> row.names(res1)[grep("mean of >> y",row.names(res1))]<-gsub("(.*\\.).*$","\\121%",row.names(res1)[grep("mean >> of y",row.names(res1))]) >> res1 >> #meanp.value >> #EMW.13% 14.35000 0.09355374 >> #EMW.21% 17.68000 0.09355374 >> #EW.17.5.13% 42.87000 0.17464018 >> #EW.17.5.21% 45.97333 0.17464018 >> #EW.INCU.13% 49.61333 0.43689727 >> #EW.INCU.21% 47.08333 0.43689727 >> >> A.K. >> >> >> >> - Original Message - >> From: Yao He >> To: arun >> Cc: R help >> Sent: Monday, January 7, 2013 4:00 AM >> Subject: Re: [R] how to aggregate T-test result in an elegant way? >> >> Hi, arun >> I'm so sorry for that isn't helpful. >> One of question is that I don't know how to subset a small part as it >> is a 3-dimension array so I just show the structure of that. >> I tried dput() to a file , then what should I do for subsetting it? >> >> Another question is : >> My rawdata is a "melt" dataframe like that: >> IIDO2variablevalue >> 1TWF2H513% EW.INCU49.38 >> 2TWF2H613% EW.INCU48.02 >> 3TWF2H1913% EW.INCU51.44 >> 280TWF2H10113% EW.17.5 42.26 >> 281TWF2H10513% EW.17.5 43.52 >> 282TWF2H10613% EW.17.542.83 >> 472TWF2N10221% EW.17.545.97 >> 473TWF2N10421% EW.17.5 43.32 >> 474TWF2N10621% EW.17.548.63 >> 689TWF2N221% EMW19.57 >> 690TWF2N621% EMW18.07 >> 691TWF2N1021%EMW15.4 >> 491TWF2H5 13%EMW15.61 >> 492TWF2H613%EMW13
Re: [R] ggplot not showing all the years on the x-axis
Hi,this is a question about how to set the scale,try this add a scale_x_discrete() like that: plot <- tmpplot + geom_line()+scale_x_continuous(breaks=ii) Yao He 2013/1/8 Francesco Sarracino : > Dear R helpers, > > I am currently having hard time fixing the values on the x-axis of a plot > with ggplot: even though I have 12 years, ggplot plots only 3 of them. > Here is my example: > > library(ggplot2) > ii <- 2000:2011 > ss <- rnorm(12,0,1) > pm <- data.frame(ii,ss) > tmpplot <- ggplot(pm, aes(x = ii, y = ss)) > plot <- tmpplot + geom_line() > plot > > In my case, ggplot reports on the year 2000, 2004 and 2008 on the x-axis, > but I'd like to have all the years from 2000 to 2011. I know how to fix > this with the standard plot in R, but for consistency I'd like to use > ggplot. > Can anyone help? > thanks in advance, > f. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- — Master candidate in 2rd year Department of Animal genetics & breeding Room 436,College of Animial Science&Technology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to count "A", "C", "T", "G" in each row in a big data.frame?
t;, "GG", > "GA", "GG", "TT", "CC", "GA", "CT", "AA", "AA", "AG"), X2570 = c("AA", > "CT", "TT", "CC", "CT", "CC", "CC", "TT", "CC", "GG", "GG", > "GG", "GG", "TT", "TC", "GG", "CC", "AA", "AA", "GG"), X2476 = c("AA", > "TT", "TT", "CC", "TT", "CC", "CC", "TT", "CC", "GG", "GG", > "GG", "GG", "GT", "TC", "AG", "CC", "AA", "AA", "AG"), X2534 = c("GA", > "TC", "TT", "CC", "TC", "CC", "CC", "TT", "CC", "GG", "GA", > "AG", "GG", "TG", "CC", "AG", "TC", "AA", "AA", "AA"), X2280 = c("AA", > "TC", "TT", "CC", "TC", "CC", "CC", "TT", "CC", "GG", "AG", > "AG", "GG", "TT", "CC", "GG", "CC", "AA", "AA", "AG"), X2316 = c("AA", > "CC", "TT", "CC", "CC", "CC", "CC", "TT", "CC", "AG", "AA", > "AA", "AG", "TT", "TC", "GG", "CT", "AA", "GG", "GG"), X2339 = c("AA", > "CC", "TT", "CC", "CC", "CC", "CC", "TT", "CC", "GA", "AA", > "GG", "GG", "GT", "CT", "GG", "TT", "AA", "AA", "AG"), X2331 = c("AA", > "TC", "TT", "CC", "TC", "CC", "CC", "TT", "CC", "GG", "GG", > "GG", "GG", "TT", "CC", "GG", "CC", "AA", "AA", "AG"), X2343 = c("AA", > "TC", "TT", "CC", "TC", "CC", "CC", "TT", "CC", "GG", "GG", > "GG", "GG", "TT", "CT", "GG", "CC", "AA", "AA", "GA"), X2352 = c("AA", > "TT", "TT", "CC", "TT", "CC", "CC", "TT", "CC", "GG", "AA", > "GG", "GG", "TT", "CC", "GG", "CC", "AA", "GA", "AG"), X2293 = c("GA", > "TT", "TT", "CC", "TT", "CC", "CC", "TT", "CC", "GG", "GA", > "AA", "GG", "TT", "TC", "AA", "CT", "AA", "AA", "AA"), X2338 = c("GA", > "TT", "TT", "CC", "TT", "CC", "CC", "TT", "CC", "GG", "GG", > "AG", "GG", "TT", "TC", "AG", "TC", "AA", "AA", "GA"), X2449 = c("AA", > "TT", "TT", "CC", "TT", "CC", "CC", "TT", "CC", "GG", "AG", > "AA", "GG", "TT", "CC", "AA", "TC", "AA", "AA", "GA"), X2296 = c("GA", > "TT", "TT", "CC", "TT", "CC", "CC", "TT", "CC", "GA", "GG", > "AG", "GG", "TG", "TC", "AG", "CC", "AA", "AA", "AA"), X2453 = c("AG", > "TT", "TT", "CC", "TT", "CC", "CC", "TT", "CC", "AG", "GG", > "GA", "GG", "GT", "CT", "GA", &
Re: [R] how to count "A", "C", "T", "G" in each row in a big data.frame?
Thanks a lot. The problem is that I don't know how to handle the output list as I want calculate the frequency of A or G or T or C by row. Yao He 2013/1/10 Jessica Streicher : > Sorry, you wanted rows, i wrote for columns > > #rows would be: > test2<-apply(test[,-c(1:4)],1,function(x){table(t(x))}) > > #find single values in a row > sapply(test2,function(row){ > allVars<-paste(names(row),collapse="") > u <- unique(strsplit(allVars,"")[[1]]) > parts<-sapply(names(row),function(x){u%in%strsplit(x,"")[[1]]}) > mat<-parts%*%row > rownames(mat)<-u > mat > }) > > though i guess lists aren't ideal, but theres another answer as well i see. > > On 09.01.2013, at 15:23, Yao He wrote: > >> Dear All >> >> I have a data.frame like that: >> structure(list(name = c("Gga_rs10722041", "Gga_rs10722249", "Gga_rs10722565", >> "Gga_rs10723082", "Gga_rs10723993", "Gga_rs10724555", "Gga_rs10726238", >> "Gga_rs10726461", "Gga_rs10726774", "Gga_rs10726967", "Gga_rs10727581", >> "Gga_rs10728004", "Gga_rs10728156", "Gga_rs10728177", "Gga_rs10728373", >> "Gga_rs10728585", "Gga_rs10729598", "Gga_rs10729643", "Gga_rs10729685", >> "Gga_rs10729827"), chr = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, >> 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), pos = c(11248993L, >> 20038370L, 16164457L, 38050527L, 20307106L, 13707090L, 12230458L, >> 36732967L, 2790856L, 1305785L, 29631963L, 13606593L, 13656397L, >> 2261611L, 32096703L, 13733153L, 16524147L, 558735L, 12514023L, >> 3619538L), strand = c("+", "+", "+", "+", "+", "+", "+", "+", >> "+", "+", "+", "+", "+", "+", "+", "+", "+", "+", "+", "+"), >>X2353 = c("AA", "TT", "TT", "CC", "TT", "CC", "CC", "TT", >>"CC", "GG", "AG", "AG", "AG", "TT", "CC", "AG", "CC", "AA", >>"GG", "GG"), X2409 = c("AA", "CT", "TT", "CC", "CT", "CC", >>"CC", "TT", "CC", "GG", "GG", "AG", "AG", "TT", "CC", "AG", >>"CC", "AA", "AG", "GA"), X2500 = c("GA", "TT", "TT", "CC", >>"TT", "CC", "CC", "TT", "CC", "GG", "GG", "GG", "GG", "GT", >>"CT", "GG", "CC", "AA", "AA", "AA"), X2598 = c("AA", "TT", >>"TT", "CC", "TT", "CC", "CC", "TT", "CC", "GG", "AA", "AG", >>"GG", "TT", "CC", "AG", "TC", "AA", "AA", "AG"), X2610 = c("AA", >>"TT", "TT", "CC", "TT", "CC", "CC", "TT", "CC", "GG", "GA", >>"GA", "GG", "TT", "CC", "GA", "CC", "AA", "AA", "GA"), X2300 = c("GA", >>"TT", "TT", "CC", "TT", "CC", "CC", "TT", "CC", "GG", "GA", >>"AA", "AG", "TT", "TC", "AA", "TC", "AA", "AG", "AA"), X2507 = c("AG", >>"TT", "TT", "CC", "TT", "CC", "CC", "TT", "CC", "GG", "GG", >>"GA", "GG", "TT", "TC", "GG", "CC", "AA", "GA", "AG"), X2530 = c("AG", >>"TC", "TT", "CC", "TC", "CC", "CC", "TT", "CC", "GG", "AA", >>"GG", "GG", "TT", "CC
Re: [R] how to count "A", "C", "T", "G" in each row in a big data.frame?
It is really a good output. Maybe I could go on with this output. Everytime I understand R further from your help. The first four cols are irrelevant. It is a negligence 2013/1/10 William Dunlap : > Can you get what you need from the following, where 'd' is your data.frame, > the first four columns of which are irrelevant to this problem? > > dd <- d[,-(1:4)] ; table(rownames(dd)[row(dd)], unlist(dd)) > > AA AG CC CT GA GG GT TC TG TT > 27412 29 10 0 0 13 1 0 0 0 0 > 27413 0 0 4 9 0 0 0 12 0 28 > 27414 0 0 0 0 0 0 0 0 0 53 > 27415 0 0 53 0 0 0 0 0 0 0 > ... > 27430 46 3 0 0 2 2 0 0 0 0 > 27431 19 15 0 0 15 4 0 0 0 0 > table() is pretty quick. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > > >> -Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On >> Behalf >> Of Yao He >> Sent: Wednesday, January 09, 2013 4:04 PM >> To: jim holtman >> Cc: R help >> Subject: Re: [R] how to count "A", "C", "T", "G" in each row in a big >> data.frame? >> >> In fact I want to calculate the gene frequency of each SNP. >> >> The key problems are that: >> 1. my data.frame is large ,about 50,000 rows. So it is so slow to >> split() it by row >> >> 2 .The allele in each SNP (each row) are different.Some are A/G, some >> are G/C. It is a little bit embarrassed for me to handle it. >> >> Thank you for your help >> >> 2013/1/9 jim holtman : >> > forgot the data. this will count the characters; you can add logic >> > with 'table' to count groups >> > >> > >> > x <- >> > structure(list(name = c("Gga_rs10722041", "Gga_rs10722249", >> > "Gga_rs10722565", >> > "Gga_rs10723082", "Gga_rs10723993", "Gga_rs10724555", "Gga_rs10726238", >> > "Gga_rs10726461", "Gga_rs10726774", "Gga_rs10726967", "Gga_rs10727581", >> > "Gga_rs10728004", "Gga_rs10728156", "Gga_rs10728177", "Gga_rs10728373", >> > "Gga_rs10728585", "Gga_rs10729598", "Gga_rs10729643", "Gga_rs10729685", >> > "Gga_rs10729827"), chr = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, >> > 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), pos = c(11248993L, >> > 20038370L, 16164457L, 38050527L, 20307106L, 13707090L, 12230458L, >> > 36732967L, 2790856L, 1305785L, 29631963L, 13606593L, 13656397L, >> > 2261611L, 32096703L, 13733153L, 16524147L, 558735L, 12514023L, >> > 3619538L), strand = c("+", "+", "+", "+", "+", "+", "+", "+", >> > "+", "+", "+", "+", "+", "+", "+", "+", "+", "+", "+", "+"), >> > X2353 = c("AA", "TT", "TT", "CC", "TT", "CC", "CC", "TT", >> > "CC", "GG", "AG", "AG", "AG", "TT", "CC", "AG", "CC", "AA", >> > "GG", "GG"), X2409 = c("AA", "CT", "TT", "CC", "CT", "CC", >> > "CC", "TT", "CC", "GG", "GG", "AG", "AG", "TT", "CC", "AG", >> > "CC", "AA", "AG", "GA"), X2500 = c("GA", "TT", "TT", "CC", >> > "TT", "CC", "CC", "TT", "CC", "GG", "GG", "GG", "GG", "GT", >> > "CT", "GG", "CC", "AA", "AA", "AA"), X2598 = c("AA", "TT", >> > "TT", "CC", "TT", "CC", "CC", "TT", "CC", "GG", "AA", "AG", >> > "GG", "TT", "CC", "AG", "TC", "AA", "AA", "AG"), X2610 = c("AA", >> > "TT", "TT", "CC", "TT", "CC", "CC", "TT", "C
Re: [R] how to count "A", "C", "T", "G" in each row in a big data.frame?
Hi arun Then how could spilt them and get a table of letters count such as: id AA AG CC CT GA GG GT TC TG TT id A T C G > #1 27412 81 0 0 25 > #2 27413 0 77 29 0 Thanks 2013/1/10 arun : > Hi Yao, > You could also use: > library(reshape2) > dd<-dat1[,-(1:4)] > res<-dcast(melt(within(dd,{id=row.names(dd)}),id.var="id"),id~value,length) > head(res) > # id AA AG CC CT GA GG GT TC TG TT > #1 27412 29 10 0 0 13 1 0 0 0 0 > #2 27413 0 0 4 9 0 0 0 12 0 28 > #3 27414 0 0 0 0 0 0 0 0 0 53 > #4 27415 0 0 53 0 0 0 0 0 0 0 > #5 27416 0 0 3 9 0 0 0 12 0 29 > #6 27417 0 0 53 0 0 0 0 0 0 0 > > #Just for comparison: > dat2<- dat1[rep(row.names(dat1),2000),] > nrow(dat2) > #[1] 4 > row.names(dat2)<-1:4 > dd <- dat2[,-(1:4)] > system.time(res1<- table(rownames(dd)[row(dd)], unlist(dd))) > # user system elapsed > # 5.840 0.104 5.954 > system.time(res2 <- > dcast(melt(within(dd,{id=row.names(dd)}),id.var="id"),id~value,length)) > # user system elapsed > # 3.100 0.064 3.167 > head(res1,3) > > # AA AG CC CT GA GG GT TC TG TT > # 1 29 10 0 0 13 1 0 0 0 0 > # 10 0 4 0 0 6 43 0 0 0 0 > # 100 19 15 0 0 15 4 0 0 0 0 > head(res2,3) > # id AA AG CC CT GA GG GT TC TG TT > #1 1 29 10 0 0 13 1 0 0 0 0 > #2 10 0 4 0 0 6 43 0 0 0 0 > #3 100 19 15 0 0 15 4 0 0 0 0 > > A.K. > > > > > > > > - Original Message - > From: Yao He > To: R help > Cc: > Sent: Wednesday, January 9, 2013 9:23 AM > Subject: [R] how to count "A","C","T","G" in each row in a big data.frame? > > Dear All > > I have a data.frame like that: > structure(list(name = c("Gga_rs10722041", "Gga_rs10722249", "Gga_rs10722565", > "Gga_rs10723082", "Gga_rs10723993", "Gga_rs10724555", "Gga_rs10726238", > "Gga_rs10726461", "Gga_rs10726774", "Gga_rs10726967", "Gga_rs10727581", > "Gga_rs10728004", "Gga_rs10728156", "Gga_rs10728177", "Gga_rs10728373", > "Gga_rs10728585", "Gga_rs10729598", "Gga_rs10729643", "Gga_rs10729685", > "Gga_rs10729827"), chr = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, > 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L), pos = c(11248993L, > 20038370L, 16164457L, 38050527L, 20307106L, 13707090L, 12230458L, > 36732967L, 2790856L, 1305785L, 29631963L, 13606593L, 13656397L, > 2261611L, 32096703L, 13733153L, 16524147L, 558735L, 12514023L, > 3619538L), strand = c("+", "+", "+", "+", "+", "+", "+", "+", > "+", "+", "+", "+", "+", "+", "+", "+", "+", "+", "+", "+"), > X2353 = c("AA", "TT", "TT", "CC", "TT", "CC", "CC", "TT", > "CC", "GG", "AG", "AG", "AG", "TT", "CC", "AG", "CC", "AA", > "GG", "GG"), X2409 = c("AA", "CT", "TT", "CC", "CT", "CC", > "CC", "TT", "CC", "GG", "GG", "AG", "AG", "TT", "CC", "AG", > "CC", "AA", "AG", "GA"), X2500 = c("GA", "TT", "TT", "CC", > "TT", "CC", "CC", "TT", "CC", "GG", "GG", "GG", "GG", "GT", > "CT", "GG", "CC", "AA", "AA", "AA"), X2598 = c("AA", "TT", > "TT", "CC", "TT", "CC", "CC", "TT", "CC", "GG", "AA", "AG", > "GG", "TT", "CC", "AG", "TC", "AA", "AA", "AG"), X2610 = c("AA", > "TT", "TT", "CC", "TT", "CC", "CC", "TT", "CC", "GG", "GA", > "GA", "GG", "TT", "CC", "GA", "CC", "AA", "AA", "GA"), X2300 = c("GA", > &qu
Re: [R] how to generate a matrix by an my data.frame
Thanks a lot it works! 2013/1/11 Rui Barradas : > Hello, > > Here are two ways. > > dat <- read.table(text = " > > id1id2 value > 2353 2353 0.096313 > 2353 2409 0.301773 > [...etc...] > > 2356 2356 0 > 2356 2611 0 > 2611 2611 0 > ", header = TRUE) > > mat1 <- matrix(nrow = 53, ncol = 53) # initialize with NA's > mat1[upper.tri(mat1, diag = TRUE)] <- dat$value > > mat2 <- matrix(0, nrow = 53, ncol = 53) # initialize with zeros > mat2[upper.tri(mat2, diag = TRUE)] <- dat$value > > > Hope this helps, > > Rui Barradas > Em 10-01-2013 15:21, Yao He escreveu: > > Dear All > > It is a little hard to give a good small example of my question,so I > will show the full data on the bottom and the attachment.Maybe some > one could tell me an appropriate way > to show it.I'm sorry for the inconvenience. > > > Q:How to generate a 53*53 diagonal matrix by my data > Some problems confused me are that: > 1.Since it is a diagonal matrix,I have tried to transform col1 and > col2 to rowindex and colindex ,but I don't know how to generate matrix > by its value's index > 2. As you see, the number of 2353 corresponding to other ids in col2 > is 53,however,the number of 2409 corresponding to other ids in col2 is > 52 and 2500 corresponding to 51 values and so on,so it is hard to use > matrix() to generate it > > id1id2 value > 2353 23530.096313 > 2353 24090.301773 > 2353 25000.169518 > 2353 25980.11274 > 2353 26100.107414 > 2353 23000.034492 > 2353 25070.037521 > 2353 25300.064125 > 2353 23270.029259 > 2353 23890.036423 > 2353 24080.029259 > 2353 24630.036423 > 2353 24200.04409 > 2353 25630.055038 > 2353 24620.046478 > 2353 22920.036369 > 2353 24050.036369 > 2353 25430.053413 > 2353 25570.058151 > 2353 25830.081512 > 2353 23220.044373 > 2353 25350.04847 > 2353 25360.035538 > 2353 25810.035538 > 2353 25700.07711 > 2353 24760.047081 > 2353 25340.047081 > 2353 22800.088264 > 2353 23160.073608 > 2353 23390.067307 > 2353 23310.061172 > 2353 23430.060425 > 2353 23520.041153 > 2353 22930.040764 > 2353 23380.045128 > 2353 24490.040764 > 2353 22960.061333 > 2353 24530.046074 > 2353 24600.060387 > 2353 24740.060387 > 2353 26030.060387 > 2353 22820.048065 > 2353 23130.05584 > 2353 25380.050873 > 2353 25220.065727 > 2353 24890.041023 > 2353 25640.039696 > 2353 25940.056946 > 2353 22740.060875 > 2353 24510.037468 > 2353 23210 > 2353 23560 > 2353 26110 > 2409 24090.096313 > 2409 25000.169518 > 2409 25980.11274 > 2409 26100.107414 > 2409 23000.034492 > 2409 25070.037521 > 2409 25300.064125 > 2409 23270.029259 > 2409 23890.036423 > 2409 24080.029259 > 2409 24630.036423 > 2409 24200.04409 > 2409 25630.055038 > 2409 24620.046478 > 2409 22920.036369 > 2409 24050.036369 > 2409 25430.053413 > 2409 25570.058151 > 2409 25830.081512 > 2409 23220.044373 > 2409 25350.04847 > 2409 25360.035538 > 2409 25810.035538 > 2409 25700.07711 > 2409 24760.047081 > 2409 25340.047081 > 2409 22800.088264 > 2409 23160.073608 > 2409 23390.067307 > 2409 23310.061172 > 2409 23430.060425 > 2409 23520.041153 > 2409 22930.040764 > 2409 23380.045128 > 2409 24490.040764 > 2409 22960.061333 > 2409 24530.046074 > 2409 24600.060387 > 2409 24740.060387 > 2409 26030.060387 > 2409 22820.048065 > 2409 23130.05584 > 2409 25380.050873 > 2409 25220.065727 > 2409 24890.041023 > 2409 25640.039696 > 2409 25940.056946 > 2409 22740.060875 > 2409 24510.037468 > 2409 23210 > 2409 23560 > 2409 26110 > 2500 25000.048615 > 2500 25980.051979 > 2500 26100.041031 > 2500 23000.032974 > 2500 25070.052788 > 2500 25300.041435 > 2500 23270.038071 > 2500 23890.051659 > 2500 24080.038071 > 2500 24630.051659 > 2500 24200.052635 > 2500 25630.07872 > 2500 24620.048615 > 2500 22920.044365 > 2500 24050.044365 > 2500 25430.04277 > 2500 25570.051109 > 2500 25830.047409 > 2500 23220.054512 >
[R] how to read a df like that and transform it?
Dear all I have a data.frame like that : father mother num_daughterdaughter 291 39060 NULL 275 42190 NULL 273 42361 49410 281 41631 49408 274 42261 49406 295 38692 49403 49404 287 41130 NULL 295 38711 49401 292 38954 49396 49397 49398 49399 291 39003 49392 How to read it into R and transform it like that: father mother num_daughter daughter1 daughter2 daughter3 daughter4 291 39060 NULL 275 42190 NULL 273 42361 49410 281 41631 49408 274 42261 49406 295 38692 49403 49404 287 41130 NULL 295 38711 49401 292 38954 49396 4939749398 49399 291 39003 49392 library (plyr) and library (reshape2) and other good packages are OK for me. Thanks a lot! Yao He — Master candidate in 2rd year Department of Animal genetics & breeding Room 436,College of Animial Science&Technology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to transpose it in a fast way?
Thanks for everybody's help! I learn a lot from this discuss! 2013/3/10 jim holtman : > Did you check out the 'colbycol' package. > > On Fri, Mar 8, 2013 at 5:46 PM, Martin Morgan wrote: > >> On 03/08/2013 06:01 AM, Jan van der Laan wrote: >> >>> >>> You could use the fact that scan reads the data rowwise, and the fact that >>> arrays are stored columnwise: >>> >>> # generate a small example dataset >>> exampl <- array(letters[1:25], dim=c(5,5)) >>> write.table(exampl, file="example.dat", row.names=FALSE. col.names=FALSE, >>> sep="\t", quote=FALSE) >>> >>> # and read... >>> d <- scan("example.dat", what=character()) >>> d <- array(d, dim=c(5,5)) >>> >>> t(exampl) == d >>> >>> >>> Although this is probably faster, it doesn't help with the large size. >>> You could >>> used the n option of scan to read chunks/blocks and feed those to, for >>> example, >>> an ff array (which you ideally have preallocated). >>> >> >> I think it's worth asking what the overall goal is; all we get from this >> exercise is another large file that we can't easily manipulate in R! >> >> But nothing like a little challenge. The idea I think would be to >> transpose in chunks of rows by scanning in some number of rows and writing >> to a temporary file >> >> tpose1 <- function(fin, nrowPerChunk, ncol) { >> v <- scan(fin, character(), nmax=ncol * nrowPerChunk) >> m <- matrix(v, ncol=ncol, byrow=TRUE) >> fout <- tempfile() >> write(m, fout, nrow(m), append=TRUE) >> fout >> } >> >> Apparently the data is 60k x 60k, so we could maybe easily read 60k x 10k >> at a time from some file fl <- "big.txt" >> >> ncol <- 6L >> nrowPerChunk <- 1L >> nChunks <- ncol / nrowPerChunk >> >> fin <- file(fl); open(fin) >> fls <- replicate(nChunks, tpose1(fin, nrowPerChunk, ncol)) >> close(fin) >> >> 'fls' is now a vector of file paths, each containing a transposed slice of >> the matrix. The next task is to splice these together. We could do this by >> taking a slice of rows from each file, cbind'ing them together, and writing >> to an output >> >> splice <- function(fout, cons, nrowPerChunk, ncol) { >> slices <- lapply(cons, function(con) { >> v <- scan(con, character(), nmax=nrowPerChunk * ncol) >> matrix(v, nrowPerChunk, byrow=TRUE) >> }) >> m <- do.call(cbind, slices) >> write(t(m), fout, ncol(m), append=TRUE) >> } >> >> We'd need to use open connections as inputs and output >> >> cons <- lapply(fls, file); for (con in cons) open(con) >> fout <- file("big_transposed.txt"); open(fout, "w") >> xx <- replicate(nChunks, splice(fout, cons, nrowPerChunk, >> nrowPerChunk)) >> for (con in cons) close(con) >> close(fout) >> >> As another approach, it looks like the data are from genotypes. If they >> really only consist of pairs of A, C, G, T, then two pairs e.g., 'AA' 'CT' >> could be encoded as a single byte >> >> alf <- c("A", "C", "G", "T") >> nms <- outer(alf, alf, paste0) >> map <- outer(setNames(as.raw(0:15), nms), >> setNames(as.raw(bitwShiftL(0:**15, 4)), nms), >> "|") >> >> with e.g., >> >> > map[matrix(c("AA", "CT"), ncol=2)] >> [1] d0 >> >> This translates the problem of representing the 60k x 60k array as a 3.6 >> billion element vector of 60k * 60k * 8 bytes (approx. 30 Gbytes) to one of >> 60k x 30k = 1.8 billion elements (fits in R-2.15 vectors) of approx 1.8 >> Gbyte (probably usable in an 8 Gbyte laptop). >> >> Personally, I would probably put this data in a netcdf / rdf5 file. >> Perhaps I'd use snpStats or GWAStools in Bioconductor >> http://bioconductor.org. >> >> Martin >> >> >>> HTH, >>> >>> Jan >>> >>> >>> >>> >>> peter dalgaard schreef: >>> >>> On Mar 7, 2013, at 01:18 , Yao He wrote: >>>> >>>> Dear all: >>>>> >>>>> I
[R] Do association study based on mixed linear model
Dear All I want to do association study based on mixed linear model, My model not only includes serval fixed effects and random effects but also incorporates some covariates such as "birth weight". Otherwise, the size of the data are about 180 individuals and 12 variables and 6 Fixed effect estimates As asreml-R is not free ,is there any packages for my study? I heard nlme or lme4 but I'm not sure whether they could incorporate covariates and what about their computational efficiency? Thanks for you recommendation Yao He — Master candidate in 2rd year Department of Animal genetics & breeding Room 436,College of Animial Science&Technology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to do association study based on mixed linear model
Dear All: I want to do association study based on mixed linear model, My model not only includes serval fixed effects and random effects but also incorporates some covariates such as "birth weight". Otherwise, the size of the data are about 180 individuals and 12 variables and 6 Fixed effect estimates As asreml-R is not free ,is there any packages for my study? I heard nlme or lme4 but I'm not sure whether they could incorporate covariates and what about their computational efficiency? Thanks for you recommendation Yao He — Master candidate in 2rd year Department of Animal genetics & breeding Room 436,College of Animial Science&Technology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to select a subset data to do a barplot in ggplot2
Hi,everybody I have a dataframe like this FID IID STATUS 14621live 14628dead 24631live 24632live 24633live 24634live 64675live 64679dead 104716dead 104719live 104721dead 114726live 114728nosperm 114730nosperm 124732live 174783live 174783live 174784live I just want a barblot to count "live" or "dead" in every "FID", and fill the bar with different colour. I try these codes: p<-ggplot(data,aes(x=FID)); p+geom_bar(aes(x=factor(FID),y=..count..,fill=STATUS)) But how could I exclude "nosperm" or other levels just in the use of ggplot2 without generating another dataframe Thanks a lot Yao He Master candidate in 2rd year Department of Animal genetics & breeding Room 436,College of Animial Science&Technology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to handle NA values in aggregate()
Dear All: I am trying to calculate four columns' means in a dataframe like this: FID MID IID EW_INCU EW_17.5 EMWEEratio 1 4621 TWF2H545.26NA 15.61 NA 1 4621 TWF2H648.0244.09 13.41 0.3041506 2 4630 TWF2H19 51.44 47.81 NA NA 2 4631 TWF2H21 NA 52.72 16.70 0.3167678 2 4632 TWF2H22 55.70 50.45 16.48 0.3266601 2 4633 TWF2H23 44.42 40.89 12.96 0.3169479 I try this code > aggregate(df[,4:7],df[,1],mean) But I couldn't set the agrument na.rm=T in the mean() function,so the results are all NAs Please tell me how to handle NA values in the use of aggregate() Thanks a lot Yao He — Master candidate in 2rd year Department of Animal genetics & breeding Room 436,College of Animial Science&Technology, China Agriculture University,Beijing,100193 E-mail: yao.h.1...@gmail.com —— __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.