Re: [Bioc-devel] poor performance of snpsByOverlaps()

2016-06-21 Thread Robert Castelo
Vince, thanks a lot for the example streaming dbSNP over the internet and how this is even faster than accessing the data locally. to me, this just confirms that the current performance of the SNPlocs.Hsapiens.dbSNP144.GRCh37 annotation package can be improved. Hervé will look at it and hopef

Re: [Bioc-devel] poor performance of snpsByOverlaps()

2016-06-21 Thread Hervé Pagès
Hi Robert, Thanks for report this. I'll look into it. H. On 06/17/2016 09:53 AM, Robert Castelo wrote: hi, the performance of snpsByOverlaps() in terms of time and memory consumption is quite poor and i wonder whether there is some bug in the code. here's one example: library(GenomicRanges)

Re: [Bioc-devel] poor performance of snpsByOverlaps()

2016-06-17 Thread Vincent Carey
I think you can get relevant information rapidly from the dbsnp vcf. You would acquire ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-common_all.vcf.gz ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/00-common_all.vcf.gz.tbi and wrap in a TabixFile > tf class: TabixFile

[Bioc-devel] poor performance of snpsByOverlaps()

2016-06-17 Thread Robert Castelo
hi, the performance of snpsByOverlaps() in terms of time and memory consumption is quite poor and i wonder whether there is some bug in the code. here's one example: library(GenomicRanges) library(SNPlocs.Hsapiens.dbSNP144.GRCh37) snps <- SNPlocs.Hsapiens.dbSNP144.GRCh37 gr <- GRanges(seqna