Hi Jialin,

Thanks for the excellent report. These "show" methods like
many others in Bioconductor, rely on low-level helper showAsCell()
which was not working properly on data-frame-like or array-like
objects with a single column, or on SplitDataFrameList objects.

This should now be addressed. The fix is in S4Vectors 0.14.5
(release) and 0.15.10 (devel). Both should become available
via biocLite() in about 24 hours.

Let us know if you still see "show" problems after you update.

Thanks,
H.

On 09/28/2017 01:19 AM, Jialin Ma wrote:
Dear all,

I have a package in reviewing at
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_Contributions_issues_487&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=npFXtfKAjVRDigSzntYatjIYfWrBU30MFNqbP6u8Njg&s=P6CWpnkqCx0GPBTlw7QD2gGs_Lc3c063in1J_F4vvDY&e=,
 in which I
would like to use a GRanges with nested data.frame or DataFrameList to
represent the track data internally.

However, the default show method does not seem to work well with such
structures.

I have an example for GRanges in which one meta-column is a one-column
data frame:

     gr <- GRanges("chr21", IRanges(1:5, width = 1))
     gr$df <- data.frame(x = 1:5)
     show(gr)

     GRanges object with 5 ranges and 1 metadata column:
     Error in .Method(..., deparse.level = deparse.level) :
       number of rows of matrices must match (see arg 3)

However, if the nested data frame has two columns, it can be printed
out correctly:

     gr <- GRanges("chr21", IRanges(1:5, width = 1))
     gr$df <- data.frame(x = 1:5, y = 11:15)
     show(gr)

     GRanges object with 5 ranges and 1 metadata column:
           seqnames    ranges strand |           df
              <Rle> <IRanges>  <Rle> | <data.frame>
       [1]    chr21    [1, 1]      * |         1:11
       [2]    chr21    [2, 2]      * |         2:12
       [3]    chr21    [3, 3]      * |         3:13
       [4]    chr21    [4, 4]      * |         4:14
       [5]    chr21    [5, 5]      * |         5:15
       -------
       seqinfo: 1 sequence from an unspecified genome; no seqlengths

In some cases, it can be printed with a warning message, but the form
is wrong:

     gr <- GRanges("chr21", IRanges(1:5, width = 1), emm = 6:10)
     gr$df <- data.frame(x = 1:5)
     show(gr)

     # The nested df is not printed with correct format, there is only
     # one column in the nested df.

     GRanges object with 5 ranges and 2 metadata columns:
           seqnames    ranges strand |       emm           df
              <Rle> <IRanges>  <Rle> | <integer> <data.frame>
       [1]    chr21    [1, 1]      * |         6    1,2,3,...
       [2]    chr21    [2, 2]      * |         7    1,2,3,...
       [3]    chr21    [3, 3]      * |         8    1,2,3,...
       [4]    chr21    [4, 4]      * |         9    1,2,3,...
       [5]    chr21    [5, 5]      * |        10    1,2,3,...
       -------
       seqinfo: 1 sequence from an unspecified genome; no seqlengths
     Warning message:
     In (function (..., row.names = NULL, check.rows = FALSE, check.names
     = TRUE,  :
       row names were found from a short variable and have been discarded

Nested DataFrameList can not be printed:

     DF <- DataFrame(x = 1:2)
     DF$split = split(DataFrame(aa = 1:4), c(1,1,2,2))
     show(DF)

     DataFrame with 2 rows and 2 columns
     Error in dim(object) <- c(nrow(object), prod(tail(dim(object), -1)))
     :
       invalid first argument

     class(DF$split)

     [1] "CompressedSplitDataFrameList"
     attr(,"package")
     [1] "IRanges"

     In the case above, I understand that it is hard to create a short
     string representation of the nested structure, but I think printing
     dimensions of the nested element may be sufficient.

     Any comments?

     Best,
     Jialin

     -----------
     Session Info:

     R version 3.4.1 (2017-06-30)
     Platform: x86_64-suse-linux-gnu (64-bit)
     Running under: openSUSE Tumbleweed

     Matrix products: default
     BLAS: /usr/lib64/R/lib/libRblas.so
     LAPACK: /usr/lib64/R/lib/libRlapack.so

     locale:
      [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
      [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
      [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
      [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
      [9] LC_ADDRESS=C               LC_TELEPHONE=C
     [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

     attached base packages:
     [1] stats4    parallel  stats     graphics  grDevices
     utils     datasets
     [8] methods   base

     other attached packages:
     [1] Biobase_2.37.2        GenomicRanges_1.29.14 GenomeInfoDb_1.13.4
     [4] IRanges_2.11.17       S4Vectors_0.15.8      BiocGenerics_0.23.1
     [7] magrittr_1.5

     loaded via a namesp

     r$> DF$split <- DF$split %>% as.list %>%
     lapply(as.data.frame)

     r$>
     DF

     DataFrame with 2 rows and 2 columns
               x  split
       <integer> <list>
     1         1    1,2
     2         2    3,4

     ace (and not attached):
     [1]
     zlibbioc_1.23.0         compiler_3.4.1          XVector_0.17.1
     [4] tools_3.4.1             GenomeInfoDbData_0.99.1 RCurl_1.95-
     4.8
     [7] ulimit_0.0-3            bitops_1.0-6

_______________________________________________
Bioc-devel@r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=npFXtfKAjVRDigSzntYatjIYfWrBU30MFNqbP6u8Njg&s=J5tukPZSuK7728ZillLQJHHrfu7e0o1QsLm0OPNiS2Y&e=


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to