I guess it depends on what the strand should mean. Would having a negative strand indicate that the ref/alt should be complemented? I'm not sure it's a good idea to conflate the strand of the variant itself with the strand of an overlapping feature.
On Wed, Jun 10, 2015 at 1:17 PM, Robert Castelo <robert.cast...@upf.edu> wrote: > my understanding was that VRanges is a container for variants and variant > annotations and strand is just one annotation more. when we use > locateVariants() a variant can be annotated to multiple transcripts where in > one overlaps an exon, in another an intron and so on. In all those > transcripts annotations there is a strand annotation, the strand of the > transcript. if the transcript is annotated in the negative strand of the > reference chromosome then the annotation of a transcript region to a variant > is going to be also on the negative strand. > > both locateVariants() and predictCoding() return GRanges objects with strand > annotations according to the transcripts being annotated. I thought it made > sense in VariantFiltering to use VRanges objects as a container for > variants and annotations and, for this reason, I would like to carry on the > strand annotation into the VRanges object. Is there a strong reason for a > VRanges object, derived from GRanges, not to have strand? > > > On 6/10/15 6:26 PM, Michael Lawrence wrote: >> >> VRanges is supposed to enforce strand. The goal is to use "*" always, >> for simplicity and consistency with the result of readVcf(). Is there >> a use case for negative strand variants? >> >> On Wed, Jun 10, 2015 at 5:54 AM, Robert Castelo <robert.cast...@upf.edu> >> wrote: >>> >>> Michael, >>> >>> regarding our email exchange three weeks ago, I found a couple of places >>> in >>> VariantAnnotation that IMO need to be updated to avoid enforcing strand >>> on >>> VRanges. >>> >>> these places occur when constructing and validating VRanges objects, >>> concretely at: >>> >>> 1. file R/methods-VRanges-class.R at the VRanges class constructor: >>> >>> VRanges <- >>> function(seqnames = Rle(), ranges = IRanges(), >>> ref = character(), alt = NA_character_, >>> totalDepth = NA_integer_, refDepth = NA_integer_, >>> altDepth = NA_integer_, ..., sampleNames = NA_character_, >>> softFilterMatrix = FilterMatrix(matrix(nrow = length(gr), >>> ncol = >>> 0L), >>> FilterRules()), >>> hardFilters = FilterRules()) >>> { >>> gr <- GRanges(seqnames, ranges, >>> strand = .rleRecycleVector("*", length(ranges)), ...) >>> [...] >>> >>> that precludes setting the strand at construction time: >>> >>> library(VariantAnnotation) >>> VRanges(seqnames="chr1", ranges=IRanges(1, 5), ref="T", alt="C", >>> strand="-") >>> Error in GRanges(seqnames, ranges, strand = .rleRecycleVector("*", >>> length(ranges)), : >>> formal argument "strand" matched by multiple actual arguments >>> >>> >>> 2. R/AllClasses.R at the VRanges class validity function >>> .valid.VRanges(): >>> >>> .valid.VRanges.strand <- function(object) { >>> if (any(object@strand == "-")) >>> paste("'strand' must always be '+' or '*'") >>> } >>> >>> [...] >>> >>> .valid.VRanges <- function(object) >>> { >>> c(.valid.VRanges.ref(object), >>> .valid.VRanges.alt(object), >>> .valid.VRanges.strand(object), >>> .valid.VRanges.depth(object)) >>> } >>> >>> that prompts an error when variants annotated on the negative strand are >>> detected: >>> >>> library(VariantAnnotation) >>> example(VRanges) >>> strand(vr) <- "-" >>> c(vr) >>> Error in validObject(.Object) : >>> invalid class “VRanges” object: 'strand' must always be '+' or '*' >>> >>> >>> cheers, >>> >>> robert. >>> >>> On 05/22/2015 09:49 PM, Michael Lawrence wrote: >>>> >>>> This changed recently. VariantAnnotation in devel no longer enforces a >>>> strand on VRanges, or at least it allows the "*" case. >>>> >>>> >>>> On Fri, May 22, 2015 at 11:33 AM, Robert Castelo <robert.cast...@upf.edu >>>> <mailto:robert.cast...@upf.edu>> wrote: >>>> >>>> Hi, >>>> >>>> I have encountered myself in a strange situation when using the >>>> function locateVariants() from VariantAnnotation with an input >>>> VRanges object. The problem is that some of the expected coding >>>> annotations are not showing up when using locateVariants() with >>>> default parameters. >>>> >>>> After investigating this situation I think I found the reason, >>>> which >>>> does not look like a bug but I would like that you give me some >>>> clarification about the logic behind using locateVariants() with >>>> VRanges objects. >>>> >>>> The documentation of the VRanges-class says that in this class of >>>> objects "The strand is always constrained to be positive (+).". I >>>> guess there may be a good reason for this but I could not find it >>>> in >>>> the documentation or googling about it. >>>> >>>> This means that when you coerce a CollapsedVCF object (obtained, >>>> for >>>> example, from a VCF file via readVcf()) to a VRanges object, even >>>> though variants in the VCF may have no strand, they get a positive >>>> strand in the VRanges object. >>>> >>>> The problem arises then, when you call locateVariants() with this >>>> VRanges object, because features on the negative strand are never >>>> going to overlap with the variants since, by default, the argument >>>> ignore.strand=FALSE. >>>> >>>> Let me illustrate this with a toy example. Consider the SNP >>>> rs1129038 >>>> (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1129038) >>>> at >>>> chr15:28356859 with allels A/G. It is located on the 3' UTR of the >>>> gene HERC2 coded on the negative strand of the human reference >>>> genome. Let's build a toy VRanges object having this variant: >>>> >>>> library(VariantAnnotation) >>>> vr <- VRanges(seqnames="chr15", >>>> ranges=IRanges(28356859, 28356859), >>>> ref="A", alt="G", >>>> refDepth=5, altDepth=7, >>>> totalDepth=12, sampleNames="A") >>>> strand(vr) >>>> factor-Rle of length 1 with 1 run >>>> Lengths: 1 >>>> Values : + >>>> Levels(3): + - * >>>> >>>> Let's build now its CollapsedVCF counterpart by using the >>>> corresponding coercion method and set the strand to "*": >>>> >>>> vcf <- asVCF(vr) >>>> strand(vcf) <- "*" >>>> >>>> Now run locateVariants() on both objects with UCSC annotations: >>>> >>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene) >>>> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene >>>> >>>> locateVariants(vcf, txdb, region=AllVariants()) >>>> GRanges object with 2 ranges and 9 metadata columns: >>>> seqnames ranges strand | LOCATION LOCSTART >>>> LOCEND QUERYID TXID CDSID >>>> <Rle> <IRanges> <Rle> | <factor> <integer> <integer> <integer> >>>> <character> <IntegerList> >>>> [1] chr15 [28356859, 28356859] * | threeUTR 50 50 >>>> 1 55386 >>>> [2] chr15 [28356859, 28356859] * | threeUTR 50 50 >>>> 1 55387 >>>> GENEID PRECEDEID FOLLOWID >>>> <character> <CharacterList> <CharacterList> >>>> [1] 8924 >>>> [2] 8924 >>>> ------- >>>> seqinfo: 1 sequence from an unspecified genome; no seqlengths >>>> >>>> locateVariants(vr, txdb, region=AllVariants()) >>>> GRanges object with 1 range and 9 metadata columns: >>>> seqnames ranges strand | LOCATION LOCSTART >>>> LOCEND QUERYID TXID CDSID >>>> <Rle> <IRanges> <Rle> | <factor> <integer> <integer> <integer> >>>> <integer> <IntegerList> >>>> [1] chr15 [28356859, 28356859] + | intergenic <NA> <NA> >>>> 1 <NA> >>>> GENEID PRECEDEID >>>> FOLLOWID >>>> <character> <CharacterList> <CharacterList> >>>> [1] <NA> 100132565,100289656,100616223,... 2567 >>>> ------- >>>> seqinfo: 1 sequence from an unspecified genome; no seqlengths >>>> >>>> Note that while we get the 3' UTR annotation from the strandless >>>> VCF >>>> object we do not get it from the VRanges object with the positive >>>> strand. To make my point clear: this positive strand shows up when >>>> you coerce a strandless VCF object to a VRanges one, because >>>> positive strandness seems to be the convention for VRanges objects: >>>> >>>> as(vcf, VRanges) >>>> VRanges object with 1 range and 1 metadata column: >>>> seqnames ranges strand ref >>>> alt totalDepth refDepth altDepth >>>> <Rle> <IRanges> <Rle> <character> <characterOrRle> <integerOrRle> >>>> <integerOrRle> <integerOrRle> >>>> [1] chr15 [28356859, 28356859] + A >>>> G 12 5 7 >>>> sampleNames softFilterMatrix | QUAL >>>> <factorOrRle> <matrix> | <numeric> >>>> [1] A | <NA> >>>> ------- >>>> seqinfo: 1 sequence from an unspecified genome; no seqlengths >>>> hardFilters: NULL >>>> >>>> >>>> Of course, if I run locateVariants() with the argument >>>> ignore.strand=TRUE, then I get the expected annotation: >>>> >>>> locateVariants(vr, txdb, region=AllVariants(), ignore.strand=TRUE) >>>> GRanges object with 2 ranges and 9 metadata columns: >>>> seqnames ranges strand | LOCATION LOCSTART >>>> LOCEND QUERYID TXID CDSID >>>> <Rle> <IRanges> <Rle> | <factor> <integer> <integer> <integer> >>>> <character> <IntegerList> >>>> [1] chr15 [28356859, 28356859] + | threeUTR 677 >>>> 677 1 55386 >>>> [2] chr15 [28356859, 28356859] + | threeUTR 677 >>>> 677 1 55387 >>>> GENEID PRECEDEID FOLLOWID >>>> <character> <CharacterList> <CharacterList> >>>> [1] 8924 >>>> [2] 8924 >>>> ------- >>>> seqinfo: 1 sequence from an unspecified genome; no seqlengths >>>> >>>> >>>> So, my question is, given that VRanges objects are enforced to have >>>> a positive strand, would not be better to have ignore.strand=TRUE >>>> as >>>> default in locateVariants? >>>> >>>> Alternatively, I would recommend that locateVariants() issues a >>>> warning, or maybe an error, when the input object is VRanges and >>>> ignore.strand=FALSE. >>>> >>>> Finally, out of curiosity, why a VRanges object enforces the >>>> positive strand in all its genomic ranges? Would not be better just >>>> taking the strand of the CollapsedVCF object when coercing the >>>> CollapsedVCF object to VRanges? >>>> >>>> >>>> thanks!! >>>> >>>> >>>> robert. >>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing >>>> list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>>> >>> -- >>> Robert Castelo, PhD >>> Associate Professor >>> Dept. of Experimental and Health Sciences >>> Universitat Pompeu Fabra (UPF) >>> Barcelona Biomedical Research Park (PRBB) >>> Dr Aiguader 88 >>> E-08003 Barcelona, Spain >>> telf: +34.933.160.514 >>> fax: +34.933.160.550 >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel > > _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel