Files with a .vcf extension often get scrubbed from email because they
are interpreted as a 'vCard' file. (At least this has been my
experience.) Changing the file extension to something other than '.vcf'
usually solves the problem.
I was able to reproduce the error with the file you pasted in the
message. The bug was some old code looking for ":" in rownames. This was
a legacy check and is no longer necessary (I should have removed it some
time ago). Now fixed in release (1.8.7) and devel (1.19.16).
Thanks for persevering and reporting this bug.
Valerie
On 11/28/2013 02:13 AM, Becq, Jennifer wrote:
Hi Valerie,
The VCF that is causing the problem was at the bottom of my email, I can
copy-paste it here again:
##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
chr20 14855644 DEL:561590:0:1:0:0:0 C <DEL> . PASS
.
chr20 29627290 BND:81424:0:1:1:1 G [chr2:114173319[G
. MaxDepth .
chr20 35365307 BND:54200:0:1:0:1 T ]chr1:230941520]T
. PASS .
chr20 60520225 DEL:572151:1:1:6:4:0
AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG
A .
PASS .
chr20 60520443 DEL:572151:1:1:6:4:1
GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGCA
CCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGCCG
CCCCTTCTCACCGATGACGAGGAGCACTGCGA GC . PASS .
chr20 60520937 DEL:572151:1:1:11:0:0 C <DEL> . PASS
.
chr20 61766068 DEL:572433:0:0:5:2:0
CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCA
C CGG . PASS .
chr20 62063686 DUP:TANDEM:572544:0:0:8:0:0 T <DUP:TANDEM>
. PASS .
Thanks
Jennifer
Jennifer Becq
Senior Bioinformatics Scientist
Illumina Cambridge Ltd
Tel: +44 (0) 1799 532300
email: jb...@illumina.com
-----Original Message-----
From: Valerie Obenchain [mailto:voben...@fhcrc.org]
Sent: 27 November 2013 21:17
To: Becq, Jennifer; bioc-devel@r-project.org
Subject: Re: VariantAnnotation writeVcf problem
Hi,
I can't reproduce this error. Here is a read/write example using a file from
VariantAnnotation where the results are as expected.
fl <- system.file("extdata", "ex2.vcf", package="VariantAnnotation") dest <-
tempfile()
vcf1 <- readVcf(fl, "hg19")
> rownames(vcf1)
[1] "rs6054257" "20:17330_T/A" "rs6040355" "20:1230237_T/."
[5] "microsat1"
writeVcf(vcf1, dest)
vcf2 <- readVcf(dest, "hg19")
> rownames(vcf2)
[1] "rs6054257" "20:17330_T/A" "rs6040355" "20:1230237_T/."
[5] "microsat1"
I need a reproducible example in order to help. Is the vcf you're working with
publicly available?
Valerie
On 11/27/2013 03:37 AM, Becq, Jennifer wrote:
Hi Valerie,
Thank you for cc'ing my message.
The "ID" values are removed when reading a VCF through readVcf() and re-writing
it with writeVcf():
V = readVcf("test.vcf", "hg19")
rownames(V)
[1] "DEL:561590:0:1:0:0:0" "BND:81424:0:1:1:1"
[3] "BND:54200:0:1:0:1" "DEL:572151:1:1:6:4:0"
[5] "DEL:572151:1:1:6:4:1" "DEL:572151:1:1:11:0:0"
[7] "DEL:572433:0:0:5:2:0" "DUP:TANDEM:572544:0:0:8:0:0"
writeVcf(V, "writeTest.vcf")
V2 = readVcf("writeTest.vcf", "hg19")
rownames(V2)
[1] "chr20:14855644_C/<DEL>"
[2] "chr20:29627290_G/[chr2:114173319[G"
[3] "chr20:35365307_T/]chr1:230941520]T"
[4]
"chr20:60520225_AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG/A"
[5]
"chr20:60520443_GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGCACCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGCCGCCCCTTCTCACCGATGACGAGGAGCACTGCGA/GC"
[6] "chr20:60520937_C/<DEL>"
[7]
"chr20:61766068_CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCAC/CGG"
[8] "chr20:62063686_T/<DUP:TANDEM>"
sessionInfo()
R version 3.0.2 Patched (2013-10-27 r64116)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] VariantAnnotation_1.8.6 Rsamtools_1.14.2 Biostrings_2.30.1
[4] GenomicRanges_1.14.3 XVector_0.2.0 IRanges_1.20.5
[7] BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.24.0 Biobase_2.22.0 biomaRt_2.18.0
[4] bitops_1.0-6 BSgenome_1.30.0 DBI_0.2-7
[7] GenomicFeatures_1.14.2 RCurl_1.95-4.1 RSQLite_0.11.4
[10] rtracklayer_1.22.0 stats4_3.0.2 tools_3.0.2
[13] XML_3.98-1.1 zlibbioc_1.8.0
***** With the following VCF test.vcf:
##fileformat=VCFv4.1
#CHROM POS ID REF ALT QUAL FILTER INFO
chr20 14855644 DEL:561590:0:1:0:0:0 C <DEL> . PASS
.
chr20 29627290 BND:81424:0:1:1:1 G [chr2:114173319[G
. MaxDepth .
chr20 35365307 BND:54200:0:1:0:1 T ]chr1:230941520]T
. PASS .
chr20 60520225 DEL:572151:1:1:6:4:0
AACGATGAGGAGCATCGCGGCTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCACGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACTGTGGCCGCCCCTTCTCACCG
A
. PASS .
chr20 60520443 DEL:572151:1:1:6:4:1
GACTGTCTGCACCGTGGCCGCCCCTTCTCACTGACGATGAGGAGCACTGCGACTGTCTGCACCGTGGCCGCCCTTTCTGACTGATGATAAGGAACATTGCGACTGTCTGCACCGTGGCTGCCCCTTCTCACCAACGCTGAGGAGCACTGCAACCATCTGC
ACCGTGGCCGCCCCTTCTCACCGATGATGAGGAACATTGAGACTGTCTGCCCCGTGGCTGCCCCTTCTCACCGATGCTGAGGAGCACTGTGACTGTCTGCACCATGGGAGCCCCTTCTCACTGACAATGAGGAGCATTCAGAGTGTCTACACCGTGGCCGCGCCTTCTCACCGATGCTGAGGAGCACCGAGACTGTCTGCACCGTGGC
CGCCCCTTCTCACCGATGACGAGGAGCACTGCGA GC . PASS .
chr20 60520937 DEL:572151:1:1:11:0:0 C <DEL> . PASS
.
chr20 61766068 DEL:572433:0:0:5:2:0
CAGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCCACAGGGGAGGCAGGGCCCAGAGAGGAGGCGGGGCCACAGGGGAGGCGGGTCCCGGAGGGGAGGCGGGTCCCGGAGGGGAGGCAGGGCC
AC CGG . PASS .
chr20 62063686 DUP:TANDEM:572544:0:0:8:0:0 T <DUP:TANDEM>
. PASS .
Thanks
Jennifer
Jennifer Becq
Bioinformatics Scientist
Illumina Cambridge Ltd
Tel: +44 (0) 1799 532300
email: jb...@illumina.com
-----Original Message-----
From: Valerie Obenchain [mailto:voben...@fhcrc.org]
Sent: 20 November 2013 17:28
To: Becq, Jennifer; bioc-devel@r-project.org
Subject: Re: VariantAnnotation writeVcf problem
Hi Jennifer,
I've cc'd your message to the Bioconductor mailing list. We have two
lists, one for general questions and the other for bug reports/feature
requests. Please post future questions to one of these lists instead
of sending them to a single person. The lists reach a wider audience
and others can chime in with their responses/experience. You can find
info about the mailing lists here,
http://www.bioconductor.org/help/mailing-list/
writeVcf() should only write out '.' for ID if the ID is missing. There is no
restriction on the format of the ID. Can you provide a small sample of the vcf
file you're having trouble with (just a few lines is enough)? Also include the
output of your sessionInfo().
Valerie
On 11/15/2013 08:56 AM, Becq, Jennifer wrote:
Hi Valerie,
I've been using VariantAnnotation for quite a while now and it's been great!
However I've just encountered a problem:
If I read in a VCF and re-write it directly, the ID column has
disappeared and becomes "." instead of the original
"DEL:9586:0:1:0:0:0", even though the rownames of my VCF object are
correctly populated with the original ID column.
> library(VariantAnnotation)
> in1 = readVcf("my.vcf.gz", "hg19")
> writeVcf(in1, "test.vcf")
I was wondering if that was because ID only accepts a specific format
(rsID or chr:pos)?
Thank you for your help
Jennifer
*Jennifer Becq*
*Bioinformatics Scientist*
*Illumina Cambridge Ltd*
Tel: +44 (0) 1799 532300
email: jb...@illumina.com <mailto:jb...@illumina.com>
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel