On 09/27/2016 06:00 PM, Dario Strbenac wrote:
Good day,
file <- system.file("extdata", "chr22.vcf.gz", package = "VariantAnnotation")
anotherFile <- system.file("extdata", "hapmap_exome_chr22.vcf.gz", package =
"VariantAnnotation")
aSet <- readVcf(file, "hg19")
system.time(commonMutations <- re
Dario's computer is faster than mine
> system.time(commonMutations <- readVcf(anotherFile, "hg19",
rowRanges(aSet)))
user system elapsed
426.271 57.296 483.766
The disk infrastructure is a determinant of throughput. Most VCF queries
are decomposable and can be parallelized. After
chunki
I think the basic problem is that each range requires a separate query
through tabix. BAM and tabix are designed to be fast for single
queries, like what a genome browser might generate, but not for
querying thousands of regions at once. At least that's the way it
seems to me. The index is only at
Good day,
file <- system.file("extdata", "chr22.vcf.gz", package = "VariantAnnotation")
anotherFile <- system.file("extdata", "hapmap_exome_chr22.vcf.gz", package =
"VariantAnnotation")
aSet <- readVcf(file, "hg19")
system.time(commonMutations <- readVcf(anotherFile, "hg19", rowRanges(aSet)))