Hi Vince, Kasper,
cc'ing Herve and Marc.
I think we have a couple of things going on so I wanted to clarify. The
'genome' argument to readVcf() is assigned to the GRanges in rowData
with the "genome<-" setter. This is where .normargGenome() gets called.
setReplaceMethod("genome", "Seqinfo",
function(x, value)
{
x@genome <- .normargGenome(value, seqnames(x))
x
}
)
If the 'genome' replacement value is named, the name(s) must match the
seqnames, not the build. So we aren't talking about matching compatible
builds,
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation")
vcf <- readVcf(fl, c("b37"="hg19")) ## this is wrong
vcf <- readVcf(fl, c("hg19"="hg19")) ## also wrong
Instead the name must be the seqname, the value is the build,
vcf <- readVcf(fl, c("20"="hg19")) ## correct
vcf <- readVcf(fl, "hg19") ## also correct
This requirement for 'genome' is not well documented on ?readVcf or
?Seqinfo. We can fix that.
The second thing is the issue of a flexible mapping between seqinfo
metadata for different institutions. Herve and Marc have worked on this
in AnnotationDbi. They can explain more about the 'SeqnameStyle' and how
it might be used more widely.
Val
On 04/25/2013 06:54 AM, Kasper Daniel Hansen wrote:
An "official" comment on this
http://genome.ucsc.edu/cgi-bin/hgGateway?db=hg19
with some more info in this discussion
https://groups.google.com/a/soe.ucsc.edu/forum/?fromgroups=#!topic/genome/hFp-dGG9gBs
Essentially it seems the b37 has been "patched" and this patched release is
not reflected in hg19 but may be (I don't know) reflected in the b37
download from NCBI
Kasper
On Thu, Apr 25, 2013 at 9:49 AM, Kasper Daniel Hansen <
kasperdanielhan...@gmail.com> wrote:
I agree with Vincent. I have seen code from Herve in a package with some
standardization of chromosome names, and this code could perhaps be used
more widely so we don't have all the problems with chr1 vs chr01 vs 1.
However, in this particular case, if Ulrich is actually interested in the
mitochondrial genome, he has a problem.
hg19, which is the genome version from UCSC is consider equal to NCBIs
b37. However, as far as I understand, UCSC screwed up with the
mitochondrial genome and used an old version for their hg19. So the error
message is in many ways right here: the two genomes are slightly different
because they have different mitochondrial genomes.
Kasper
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel