HI, Henrique,

Thanks for the great help!

I compared the output from your codes:
> te<-rev(100 * cumsum(matt$reads > 1) / length(matt$reads) )
> te
  [1] 100  99  98  97  96  95  94  93  92  91  90  89  88  87  86  85  84
83
 [19]  82  81  80  79  78  77  76  75  74  73  72  71  70  69  68  67  66
65
 [37]  64  63  62  61  60  59  58  57  56  55  54  53  52  51  50  49  48
47
 [55]  46  45  44  43  42  41  40  39  38  37  36  35  34  33  32  31  30
29
 [73]  28  27  26  25  24  23  22  21  20  19  18  17  16  15  14  13  12
11
 [91]  10   9   8   7   6   5   4   3   2   1

 the output from my code,
> result
  [1] 100  99  98  97  96  95  94  93  92  91  90  89  88  87  86  85  84
83
 [19]  82  81  80  79  79  77  77  77  74  73  72  71  70  70  68  67  67
65
 [37]  64  64  62  62  60  59  58  57  56  56  54  53  52  51  51  49  48
47
 [55]  46  45  45  43  42  41  40  39  38  37  36  35  34  33  32  31  30
29
 [73]  28  27  27  27  24  24  22  21  20  19  19  19  19  15  14  14  12
11
 [91]  10   9   8   7   7   5   4   3   2   1

There is no tie in your output. Look at the data set: There are ties in the
data set. Your codes work fast, but I think the results is not accurate.
Thanks so much for the great help!

> matt[c(1:35), ]
            id reads
1   Contig79:1     4
2   Contig79:2     8
;
;
22 Contig79:22    64
23 Contig79:23    64
24 Contig79:24    68
25 Contig79:25    68
26 Contig79:26    68

I also attached the testing file with this email. Thanks!



On Thu, Nov 4, 2010 at 9:12 AM, Henrique Dallazuanna <www...@gmail.com>wrote:

> Try this:
>
> rev(100 * cumsum(matt$reads > 1) / length(matt$reads) )
>
> On Thu, Nov 4, 2010 at 1:46 PM, Changbin Du <changb...@gmail.com> wrote:
>
>> HI, Dear R community,
>>
>> I have one data set like this,  What I want to do is to calculate the
>> cumulative coverage. The following codes works for small data set (#rows =
>> 100), but when feed the whole data set,  it still running after 24 hours.
>> Can someone give some suggestions for long vector?
>>
>> id                reads
>> Contig79:1    4
>> Contig79:2    8
>> Contig79:3    13
>> Contig79:4    14
>> Contig79:5    17
>> Contig79:6    20
>> Contig79:7    25
>> Contig79:8    27
>> Contig79:9    32
>> Contig79:10    33
>> Contig79:11    34
>>
>>
>> matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
>> sep="\t", skip=0, header=F,fill=T) #
>> dim(matt)
>> [1] 3384766       2
>>
>> matt_plot<-function(matt, outputfile) {
>> names(matt)<-c("id","reads")
>>
>>  cover<-matt$reads
>>
>>
>> #calculate the cumulative coverage.
>> + cover_per<-function (data) {
>> + output<-numeric(0)
>> + for (i in data) {
>> +           x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
>> +           output<-c(output, x)
>> +                 }
>> + return(output)
>> + }
>>
>>
>>  result<-cover_per(cover)
>>
>>
>> Thanks so much!
>>
>>
>> --
>> Sincerely,
>> Changbin
>> --
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>



-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856
Contig79:1      4
Contig79:2      8
Contig79:3      13
Contig79:4      14
Contig79:5      17
Contig79:6      20
Contig79:7      25
Contig79:8      27
Contig79:9      32
Contig79:10     33
Contig79:11     34
Contig79:12     36
Contig79:13     39
Contig79:14     40
Contig79:15     44
Contig79:16     49
Contig79:17     55
Contig79:18     56
Contig79:19     59
Contig79:20     60
Contig79:21     62
Contig79:22     64
Contig79:23     64
Contig79:24     68
Contig79:25     68
Contig79:26     68
Contig79:27     70
Contig79:28     73
Contig79:29     76
Contig79:30     77
Contig79:31     78
Contig79:32     78
Contig79:33     79
Contig79:34     80
Contig79:35     80
Contig79:36     84
Contig79:37     87
Contig79:38     87
Contig79:39     88
Contig79:40     88
Contig79:41     89
Contig79:42     93
Contig79:43     94
Contig79:44     98
Contig79:45     99
Contig79:46     99
Contig79:47     102
Contig79:48     103
Contig79:49     108
Contig79:50     112
Contig79:51     112
Contig79:52     113
Contig79:53     116
Contig79:54     118
Contig79:55     120
Contig79:56     124
Contig79:57     124
Contig79:58     126
Contig79:59     128
Contig79:60     130
Contig79:61     133
Contig79:62     134
Contig79:63     136
Contig79:64     139
Contig79:65     144
Contig79:66     145
Contig79:67     146
Contig79:68     148
Contig79:69     149
Contig79:70     151
Contig79:71     156
Contig79:72     157
Contig79:73     158
Contig79:74     159
Contig79:75     159
Contig79:76     159
Contig79:77     160
Contig79:78     160
Contig79:79     161
Contig79:80     163
Contig79:81     164
Contig79:82     165
Contig79:83     165
Contig79:84     165
Contig79:85     165
Contig79:86     166
Contig79:87     170
Contig79:88     170
Contig79:89     172
Contig79:90     174
Contig79:91     178
Contig79:92     180
Contig79:93     181
Contig79:94     184
Contig79:95     184
Contig79:96     187
Contig79:97     190
Contig79:98     192
Contig79:99     194
Contig79:100    199
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to