Re: [Bioc-devel] Help understanding an R performance issue

2017-06-30 Thread juliosarmientota
66 Sent from my MetroPCS 4G LTE Android deviceOn Jun 30, 2017 5:32 AM, Bernat Gel wrote: > > Ok, so it seems more like a bug somewhere than something I falied to > understand, then. One of the surprises for me is that shuffling the data so > the misses do not happen one after the other seems t

Re: [Bioc-devel] Bioconductor stats

2017-06-30 Thread Lluís Revilla
Hi Hervé, I wasn't aware of the discrepancy between the monthly number of IPs and the yearly number of IPs. I didn't realize that my own package showed this distinction between monthly and yearly number of IPs. Thanks for pointing it. Yes, usually the effect of a package being in several categori

Re: [Bioc-devel] Help understanding an R performance issue

2017-06-30 Thread Bernat Gel
Ok, that makes sense In my current use case I think I'll be able to filter out first the elements that will miss, so this behaviour is not triggered. But it's good to know this happens so I can try to avoid it in the future. Thanks. Bernat *Bernat Gel Moreno* Bioinformatician Hereditary C

Re: [Bioc-devel] Help understanding an R performance issue

2017-06-30 Thread Michael Lawrence
The reason it's faster when shuffled vs. all that end is that when a miss happens R compares the string to all strings before it in the subscript. So it's a lot worse to have a miss towards the end. As Martin wrote, there are basically two possible improvements that are somewhat complementary: 1)

Re: [Bioc-devel] Help understanding an R performance issue

2017-06-30 Thread Bernat Gel
Ok, so it seems more like a bug somewhere than something I falied to understand, then. One of the surprises for me is that shuffling the data so the misses do not happen one after the other seems to solve the issue... Thanks, Bernat *Bernat Gel Moreno* Bioinformatician Hereditary Cancer Pr

Re: [Bioc-devel] Bioconductor stats

2017-06-30 Thread Hervé Pagès
Hi LLuis, As Sean already said mirrors are not included in the stats. The monthly nb of distinct IPs are reset every month and the yearly nb of distinct IPs are reset every year. Some packages are indeed in two categories. Category assignment is based on the download URL only. For some mysteriou

Re: [Bioc-devel] Help understanding an R performance issue

2017-06-30 Thread Hervé Pagès
Hi Bernat, Michael, FWIW I reported this issue on R-devel a couple of times. Last time was in 2013: https://stat.ethz.ch/pipermail/r-devel/2013-May/066616.html Cheers, H. On 06/29/2017 11:58 PM, Bernat Gel wrote: Yes, that would explain part of the situation. But example cc5 shows that hash