Hi Pariksheet,

This is the expected behavior.

Some background: BSgenomeViews are list-like objects where the *list
elements* (i.e. the elements one extracts with [[) are the DNA
sequences from the views:

  > views
  BSgenomeViews object with 2 views and 0 metadata columns:
        seqnames         ranges strand                       dna
           <Rle>      <IRanges>  <Rle>            <DNAStringSet>
    [1]     chr1 [25001, 28000]      * [GCTTCAGCCT...TTATTTATTG]
    [2]     chr2 [30001, 31000]      * [GACCCTCCTG...AGCAGGTGGT]
    -------
    seqinfo: 93 sequences (1 circular) from hg19 genome

  > views[[1]]
    3000-letter "DNAString" instance
seq: GCTTCAGCCTGCACAGATAGGGGAGTAGGGGACAGA...CTGTCGTCTGAATTCCAAGCTTTTTGTTATTTATTG
  > views[[2]]
    1000-letter "DNAString" instance
seq: GACCCTCCTGTTGGCTTTTTTACAAGACTGCTATGT...TGCTGGGACAGTCATGGGCAAACCAGAGCAGGTGGT

When subsetting with [ I get:

  > views[1]
  BSgenomeViews object with 1 view and 0 metadata columns:
        seqnames         ranges strand                       dna
           <Rle>      <IRanges>  <Rle>            <DNAStringSet>
    [1]     chr1 [25001, 28000]      * [GCTTCAGCCT...TTATTTATTG]
    -------
    seqinfo: 93 sequences (1 circular) from hg19 genome

  > views[2]
  BSgenomeViews object with 1 view and 0 metadata columns:
        seqnames         ranges strand                       dna
           <Rle>      <IRanges>  <Rle>            <DNAStringSet>
    [1]     chr2 [30001, 31000]      * [GACCCTCCTG...AGCAGGTGGT]
    -------
    seqinfo: 93 sequences (1 circular) from hg19 genome

The important difference is that with [[ I get a DNAString object
(the content of the view) and with [ I get a BSgenomeViews object
of length 1.

The semantic of lapply() is to always use [[ internally to extract
elements. This is why 'lapply(views, class)' returns "DNAString".
If you want to operate on the length-one Views objects, you need to
do:

  lapply(seq_along(views), function(i) { do something on views[i] })

Hope this helps,
H.


On 04/05/2017 08:13 PM, Pariksheet Nanda wrote:
Hi bioconductor devs,

The BSgenomeViews class has been very useful in efficiently propagating
metadata for running Biostring operations.  I noticed something unexpected
when iterating over views - it seems to return the Biostrings object
instead of a single length Views object, and thus loses the associated view
metadata.  Is this intentional?  Below is some example code, the output and
sessionInfo().  Yes, I also confirmed this happens in the development
version of R / bioconductor 3.5.

On a side note, for unit testing it's been difficult to mock a BSgenome
object due to the link to physical files, and as a workaround I use a
small, arbitrary BSgenome package.  Can one construct a BSgenome from its
package bundled extdata?  The man page examples use packaged genomes.

Code to reproduce the issue:

----------------------------------------------------------------------
library(BSgenome)
genome <- getBSgenome("BSgenome.Hsapiens.UCSC.hg19")
gr <- GRanges(c("chr1:25001-28000", "chr2:30001-31000"))
views <- Views(genome, gr)
views
lapply(views, class)
----------------------------------------------------------------------

Result:

----------------------------------------------------------------------
views
BSgenomeViews object with 2 views and 0 metadata columns:
      seqnames         ranges strand                       dna
         <Rle>      <IRanges>  <Rle>            <DNAStringSet>
  [1]     chr1 [25001, 28000]      * [GCTTCAGCCT...TTATTTATTG]
  [2]     chr2 [30001, 31000]      * [GACCCTCCTG...AGCAGGTGGT]
  -------
  seqinfo: 93 sequences (1 circular) from hg19 genome
lapply(views, class)
[[1]]
[1] "DNAString"
attr(,"package")
[1] "Biostrings"

[[2]]
[1] "DNAString"
attr(,"package")
[1] "Biostrings"


----------------------------------------------------------------------

Tested against these configurations:
1) R 3.3.2 + BSgenome 1.42.0 (stable bioconductor 3.4)
2) R 2017-04-05 installed via llnl/spack + BSgenome 1.43.7 (devel
bioconductor 3.5)

sessionInfo for configuration #2 above:
----------------------------------------------------------------------
sessionInfo()
R Under development (unstable) (2017-04-05 r72488)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS:
/share/apps/spack/opt/spack/linux-ubuntu16-x86_64/gcc-5.4.0/r-2017-04-05-4tkzhsu6sdpwmlvnv275jf6x766gwnpy/rlib/R/lib/libRblas.so
LAPACK:
/share/apps/spack/opt/spack/linux-ubuntu16-x86_64/gcc-5.4.0/r-2017-04-05-4tkzhsu6sdpwmlvnv275jf6x766gwnpy/rlib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.43.7
 [3] rtracklayer_1.35.10               Biostrings_2.43.7
 [5] XVector_0.15.2                    GenomicRanges_1.27.23
 [7] GenomeInfoDb_1.11.10              IRanges_2.9.19
 [9] S4Vectors_0.13.15                 BiocGenerics_0.21.3

loaded via a namespace (and not attached):
 [1] zlibbioc_1.21.0            GenomicAlignments_1.11.12
 [3] BiocParallel_1.9.5         lattice_0.20-35
 [5] tools_3.5.0                SummarizedExperiment_1.5.7
 [7] grid_3.5.0                 Biobase_2.35.1
 [9] matrixStats_0.52.1         Matrix_1.2-9
[11] GenomeInfoDbData_0.99.0    bitops_1.0-6
[13] RCurl_1.95-4.8             DelayedArray_0.1.7
[15] compiler_3.5.0             Rsamtools_1.27.15
[17] XML_3.98-1.6
BiocInstaller::biocValid()
[1] TRUE


---
Pariksheet Nanda
PhD Candidate in Genetics and Genomics
System Administrator, Storrs HPC Cluster
University of Connecticut

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=4f5sS-9Z9WKeGLCHw3KM82mea8ioHJa_FrqleAw5Otc&s=urNRGmIvz9AzopqsBNZqNtWkPlNZnm5w3mZ_QML6aes&e=


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to