Thanks for the comment Nadhir.

I had considered the use of a sparse matrix class. The reason I didn't implement it originally is because truly sparse interaction data would be better represented by just working with the pairwise format in the InteractionSet. You need the row/column indices to pass to the sparseMatrix constructor anyway; a memory-efficient algorithm to do, for example, compartment identification could just use that directly.

Most existing algorithms for doing this (e.g., k-means/hierarchical clustering) won't operate natively from a sparseMatrix, and I suspect they'll just run as.matrix() and convert it to a full matrix. Obviously, this would defeat the purpose of using a sparse matrix. So, if you have to rewrite the algorithms anyway, you might as well rewrite them in a manner that avoids needing the sparseMatrix() as a middleman.

Nonetheless, it's a good point about memory usage. I'll have a think about it; sparseMatrix() would help a bit, but as coverage increases for these experiments, the matrix will probably become fairly dense (even if it's just counts of 1 for some bin pairs). Even now, for compartment detection, fairly large bins are involved that sparseness usually isn't observed. Perhaps big.matrix() might be a better choice.

Cheers,

Aaron


On 16/11/15 09:58, DJEKIDEL MOHAMED NADHIR wrote:
Hi Aaron,

Sounds as a great initiative.
I just have some comments about the ContactMatrix-Class.

I think with increasing Hi-C resolution the usage of the matrix class
will consume a lot of memory.
Maybe using sparseMatrix from the Matrix package has a smaller finger print.

it can also be manipulated in cpp using  RcppEigen, if for example you
plan some functionalities such as AB domains or insulation scores, ... etc.

Regards,

- Nadhir

On Mon, Nov 16, 2015 at 5:33 PM, Aaron Lun <a...@wehi.edu.au
<mailto:a...@wehi.edu.au>> wrote:

    Hello all,

    I thought I might give an update on the state of affairs for the
    InteractionSet package. Currently, there's three classes:

    - the GInteractions class, inheriting from Vector and intended to
    represent pairwise interactions between genomic regions (based on
    suggestions from Malcolm Perry and Liz Ing-Simmons).

    - the InteractionSet class, inheriting from SummarizedExperiment0
    and containing a GInteractions object; intended to store
    experimental data about pairwise interactions (one interaction per row).

    - the ContactMatrix class, inheriting from Annotated and storing
    data in matrix form (where rows/columns represent genomic regions).

    Getters, setters, conversion methods between classes, distance
    calculation methods and overlap methods have been implemented. Man
    pages and "testthat" scripts have also been written. Still missing a
    vignette, though it should be easy enough to write one.

    All in all, I think it's a solid first draft. Any comments would be
    appreciated.

    Cheers,

    Aaron

    On 08/11/15 19:31, Aaron Lun wrote:

        Okay, some meat and bones are on GitHub now:

        https://github.com/LTLA/InteractionSet

        The idea is to represent genomic interactions as pairs of genomic
        regions, using indices to point to a common GRanges object (a la
        Hits,
        though I haven't used that explicitly due to the presence of
        additional
        constraints on the indices). Data for each interaction is stored
        using a
        SummarizedExperiment framework (one row per interaction).

        With regards to the methods, most of the low-hanging fruit has been
        implemented, courtesy of inheriting from SummarizedExperiment0.
        I'll add
        proper unit tests over the coming week. It currently passes
        through R
        CMD check okay, except for a warning about ":::" in the cbind/rbind
        definitions (callNextMethod() didn't seem to work inside those
        methods,
        and I didn't want to rewrite the SE0 'binding methods).

        Any thoughts appreciated.

        - Aaron

        On 07/11/15 19:33, Morgan, Martin wrote:

            Just to say that this is a great idea. If this starts as a
            github
            package (or in svn, we can create a location for you if
            you'd like) I
            and others would I am sure be happy to try to provide any
            guidance /
            insight. The main design principles are probably to reuse as
            much as
            possible from existing classes, especially the S4Vectors /
            GRanges
            world, and to integrate metadata as appropriate (like
            SummarizedExepriment, for instance).

            Martin
            ________________________________________
            From: Bioc-devel [bioc-devel-boun...@r-project.org
            <mailto:bioc-devel-boun...@r-project.org>] on behalf of Aaron
            Lun [a...@wehi.edu.au <mailto:a...@wehi.edu.au>]
            Sent: Thursday, November 05, 2015 12:27 PM
            To: bioc-devel@r-project.org <mailto:bioc-devel@r-project.org>
            Subject: Re: [Bioc-devel] Base class for interaction data -
            expressions of      interest

            There's a growing number of Bioconductor packages dealing with
            interaction data; diffHic, GenomicInteractions, HiTC, to
            name a few (and
            probably more in the future). Each of these packages defines
            its own
            class to store interaction data - DIList for diffHic,
            GenomicInteractions for GenomicInteractions, and HTClist for
            HiTC.

            These classes seem to share a lot of features, which
            suggests that they
            can be (easily?) replaced with a common class. This would
            have two
            advantages - one, developers of new and existing packages
            don't have to
            continually write and maintain new classes; and two, it
            provides users
            with a consistent user experience across the relevant packages.

            My question is, does anybody have anything in the pipeline
            with respect
            to a base package for an interaction class? If not, I'm
            planning to put
            something together for the next BioC release. To this end,
            I'd welcome
            any ideas/input/code; the aim is to make a drop-in
            replacement (insofar
            as that's possible) for the existing classes in each package.

            Cheers,

            Aaron

            _______________________________________________
            Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
            mailing list
            https://stat.ethz.ch/mailman/listinfo/bioc-devel


            This email message may contain legally privileged and/or
            confidential
            information.  If you are not the intended recipient(s), or the
            employee or agent responsible for the delivery of this
            message to the
            intended recipient(s), you are hereby notified that any
            disclosure,
            copying, distribution, or use of this email message is
            prohibited.  If
            you have received this message in error, please notify the
            sender
            immediately by e-mail and delete this email message from your
            computer. Thank you.


        _______________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        mailing list
        https://stat.ethz.ch/mailman/listinfo/bioc-devel


    _______________________________________________
    Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
    https://stat.ethz.ch/mailman/listinfo/bioc-devel



_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to