Related to the storage of a list inside a DataFrame (as a column),
I found 2 issues:

  df <- DataFrame(A=I(list(a=1:3, b="BB")))

1. The name of the col is not as specified:

    > df
    DataFrame with 2 rows and 1 column
             X
        <list>
    1 ########
    2 ########

2. rbind() doesn't work as expected:

    > rbind(df, df)
    DataFrame with 3 rows and 4 columns
            X.a         X.b     X.a.1       X.b.1
      <integer> <character> <integer> <character>
    1         1          BB         1          BB
    2         2          BB         2          BB
    3         3          BB         3          BB

  or it can break:

    > df <- DataFrame(A=I(list(a=1:3, b=character(0))))
    > rbind(df, df)
    Error in DataFrame(cols) : cannot coerce class "list" to a DataFrame

This last issue will break c() on GRangesList objects that have mcols
of the kind I showed previously.

Cheers,
H.


> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GenomicRanges_1.13.39 XVector_0.1.0         IRanges_1.19.28
[4] BiocGenerics_0.7.4

loaded via a namespace (and not attached):
[1] stats4_3.0.1 tools_3.0.1

On 09/03/2013 02:40 PM, Hervé Pagès wrote:
Hi Julian, Michael,

Alternatively a trick is to use the outer mcols of the GRangesList
object. If the experimental metadata of each GRanges has the same
structure/fields, and those fields contain single values:

   library(GenomicRanges)
   gr1 <- GRanges()
   metadata(gr1) = list(a="1", b="hello")
   gr2 <- GRanges()
   metadata(gr2) = list(a="2", b="world")

   grl <- GRangesList(gr1, gr2)
   mcols(grl) <- DataFrame(a=c(metadata(gr1)$a, metadata(gr2)$a),
                           b=c(metadata(gr1)$b, metadata(gr2)$b))

Then:

   > mcols(grl)
   DataFrame with 2 rows and 2 columns
               a           b
     <character> <character>
   1           1       hello
   2           2       world

If the experimental metadata fields are going to be completely
arbitrary:

   metadata(gr1) = list(a="1", b="hello")
   metadata(gr2) = list(a=c("2", "3"), z="foo", y=letters[1:3])

   grl <- GRangesList(gr1, gr2)
   mcols(grl) <- DataFrame(metadata=I(list(metadata(gr1), metadata(gr2))))

Then:

   > mcols(grl)
   DataFrame with 2 rows and 1 column
     metadata
       <list>
   1 ########
   2 ########

'mcols(grl)$metadata' is a list of lists:

   > mcols(grl)$metadata
   [[1]]
   [[1]]$a
   [1] "1"

   [[1]]$b
   [1] "hello"


   [[2]]
   [[2]]$a
   [1] "2" "3"

   [[2]]$z
   [1] "foo"

   [[2]]$y
   [1] "a" "b" "c"

Cheers,
H.


On 09/03/2013 06:47 AM, Julian Gehring wrote:
Hi Michael,

Thanks, using 'GenomicRangesList' instead of 'GRangesList' essentially
solves my issues.  Could you please add a small note to the
documentation that mentions the different behaviors for the two classes?

Best wishes
Julian


On 09/03/2013 03:34 PM, Michael Lawrence wrote:
If the number of GRanges is small (not thousands), and you don't need
the
semantic of treating each GRanges as a "compound range", then use
GenomicRangesList(). It's a SimpleList, so metadata should be preserved.
It's the data structure for storing per-sample GRanges.

Michael


On Tue, Sep 3, 2013 at 2:39 AM, Julian Gehring
<julian.gehr...@embl.de>wrote:

Hi Michael,

The use case is storing experimental metadata togther with a GRanges
object that does not fit the tabular structure of a GRange.  And at a
later
stage, storing multiple of these annotated GRanges objects together
as a
list/GRangesList.

Best wishes
Julian



  This second case is exactly what happens to the individual GRanges
that
constitute the list. They are concatenated to form a single GRanges,
which
is stored along side a partitioning that defines the individual
elements.
There is no longer two separate GRanges objects, so there is no easy
way
to
keep the metadata around. It's unfortunate that an implementation
detail
is
exposed in this way, but it would take some effort to support this
feature.
This is a property of all CompressedList derivatives. What's the use
case?






_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to