HI, Dear R community,
I have one data set like this, What I want to do is to calculate the
cumulative coverage. The following codes works for small data set (#rows =
100), but when feed the whole data set, it still running after 24 hours.
Can someone give some suggestions for long vector?
id reads
Contig79:1 4
Contig79:2 8
Contig79:3 13
Contig79:4 14
Contig79:5 17
Contig79:6 20
Contig79:7 25
Contig79:8 27
Contig79:9 32
Contig79:10 33
Contig79:11 34
matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
sep="\t", skip=0, header=F,fill=T) #
dim(matt)
[1] 3384766 2
matt_plot<-function(matt, outputfile) {
names(matt)<-c("id","reads")
cover<-matt$reads
#calculate the cumulative coverage.
+ cover_per<-function (data) {
+ output<-numeric(0)
+ for (i in data) {
+ x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
+ output<-c(output, x)
+ }
+ return(output)
+ }
result<-cover_per(cover)
Thanks so much!
--
Sincerely,
Changbin
--
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.