Re: [Bioc-devel] plotPCA for BiocGenerics

Ryan C. Thompson Fri, 31 Oct 2014 15:55:00 -0700

I'd just like to chime in that regardless of what approach is chosen, Idefinitely would appreciate a way to get the plot data without actuallymaking the plot. I often end up reimplementing plots in ggplot so thatI can easily customize some aspect of them, so in such cases I need away to just get the plot data/coordinates.

For example, if I have an edgeR DGEList and I want to get the X and Ycoordinates for the MDS plot, I need to do something like:


dev.new()
mds.coords <- plotMDS(dge)
dev.off()

which is kind of unfortunate.

So I guess this is more a reminder to people implementing plots to alsoimplement a way to get the plot data.


-Ryan

On Fri 31 Oct 2014 03:43:04 PM PDT, Steve Lianoglou wrote:

Hi,

On Fri, Oct 31, 2014 at 2:35 PM, Thomas Lin Pedersen
<thomas...@gmail.com> wrote:

With regards to abstraction - I would personally much rather read and write 
code that contained plotScores() and plotScree() etc. where the intend of the 
code is clearly communicated, instead of relying on a plot() function whose 
result is only known from experience. Trying to squeeze every kind of visual 
output into the same plot generic seems artificial and constrained to me. I 
totally agree on the plotPCA critique on the other hand...


If we've bought a ticket to ride on Kevin's and Michael's (and whoever
else) train of thought, wouldn't plot(pca(x), type='scree') or
plot(pca(x), type='scores') be the preferred way to go ... for some
definition of "preferable"?

-steve


Thomas

On 31 Oct 2014, at 22:09, Michael Lawrence <lawrence.mich...@gene.com> wrote:

I strongly agree with Kevin's position. plotPCA() represents two separate 
concerns in its very name: the computation and the rendering. Those need to be 
separated, at least behind the scenes. The syntax of plot(pca(x)) is preferable 
to plotPCA, because the structure of the operation is represented by in the 
expression itself, not just in a non-computable function name.

With regard to how a plot,PCA should behave: there is always a tension between 
high-level and low-level APIs. In the end, we need multiple levels of 
abstraction.  While high-level APIs sacrifice flexibility, we need them because 
they communicate the high-level *intent* of the user in the code itself 
(self-documenting code), and they enable reusability, which not only reduces 
redudant effort but also ensures consistency. Once our brains no longer need to 
parse low-level code, we can focus our mental power on correctness and 
efficiency. To design a high-level API, one needs to carefully analyze user 
requirements, i.e., the use cases. To choose the default behavior, one needs to 
rate the use cases by their prevalance, and by how closely they match the 
intuition-based expectations of the user.

The fact that at least 9 packages are performing such a similar task seems to 
indicate that a common abstraction is warranted, but I am not sure if 
BiocGenerics is the appropriate place.

Michael

On Tue, Oct 21, 2014 at 12:54 AM, Thomas Dybdal Pedersen <thomas...@gmail.com 
<mailto:thomas...@gmail.com>> wrote:
While I tend to agree with you that PCA is too big an operation to be hidden 
within a plotting function (MDS is an edge-case I would say), I can't see how 
we can ever reach a point where there is only one generic plot function. In the 
case of PCA there is a number of different plot-types that can all lay claim to 
the plot function of a PCA class, for instance scoreplot, scatterplot matrix of 
all scores, biplot, screeplot, accumulated R^2 barplot, leverage vs. 
distance-to-model... (you get the idea). So while having some very well-thought 
out classes for very common result types such as PCA, this class would still 
need a lot of different plot methods such as plotScores, plotScree etc (or 
plot(..., type='score'), but I don't find that very appealing). Expanding 
beyond PCA only muddles the water even more - there are very few interesting 
data structures that only have one visual representation to-rule-them-all...

just my 2c

best
Thomas

Date: Mon, 20 Oct 2014 18:50:48 -0400
From: Kevin Coombes <kevin.r.coom...@gmail.com 
<mailto:kevin.r.coom...@gmail.com>>

Well. I have two responses to that.

First, I think it would be a lot better/easier for users if (most)
developers could make use of the same plot function for "basic" classes
like PCA.

Second, if you think the basic PCA plotting routine needs enhancements,
you still have two options.  On the one hand, you could (as you said)
try to convince the maintainer of PCA to add what you want.  If it's
generally valuable, then he'd probably do it --- and other classes that
use it would benefit.  On the other hand, if it really is a special
enhancement that only makes sense for your class, then you can derive a
class from the basic PCA class
     setClass("mySpecialPCA", contains=c("PCA"), *other stuff here*)
  and implement your own version of the "plot" generic for this class.
And you could tweak the "as.PCA" function so it returns an object of the
mySpecialPCA class. And the user could still just "plot" the result
without hacving to care what's happening behind the scenes.

On 10/20/2014 5:59 PM, Michael Love wrote:

Ah, I see now. Personally, I don't think Bioconductor developers
should have to agree on single plotting functions for basic classes
like 'PCA' (because this logic applies equally to the situation of all
Bioconductor developers agreeing on single MA-plot, a single
variance-mean plot, etc). I think letting developers define their
plotPCA makes contributions easier (I don't have to ask the owner of
plot.PCA to incorporate something), even though it means we have a
growing list of generics.

Still you have a good point about splitting computation and plotting.
In practice, we subset the rows so PCA is not laborious.


On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes
<kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> 
<mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> wrote:

    Hi,

    I don't see how it needs more functions (as long as you can get
    developers to agree).  Suppose that someone can define a reusable
    PCA class.  This will contain a single "plot" generic function,
    defined once and reused by other classes. The existing "plotPCA"
    interface can also be implemented just once, in this class, as

        plotPCA <- function(object, ...) plot(as.PCA(object), ...)

    This can be exposed to users of your class through namespaces.
    Then the only thing a developer needs to implement in his own
    class is the single "as.PCA" function.  And he/she would have
    already been rquired to implement this as part of the old
    "plotPCA" function.  So it can be extracted from that, and the
    developer doesn't have to reimplement the visualization code from
    the PCA class.

    Best,
      Kevin


    On 10/20/2014 5:15 PM, davide risso wrote:

    Hi Kevin,

    I see your points and I agree (especially for the specific case
    of plotPCA that involves some non trivial computations).

    On the other hand, having a wrapper function that starting from
    the "raw" data gives you a pretty picture (with virtually zero
    effort by the user) using a sensible choice of parameters that
    are more or less OK for RNA-seq data is useful for practitioners
    that just want to look for patterns in the data.

    I guess it would be the same to have a PCA method for each of the
    objects and then using the plot method on those new objects, but
    that would just create a lot more objects and functions than the
    current approach (like Mike was saying).

    Your "as.pca" or "performPCA" approach would be definitely better
    if all the different methods would create objects of the *same*
    PCA class, but since we are talking about different packages, I
    don't know how easy it would be to coordinate. But perhaps this
    is the way we should go.

    Best,
    davide



    On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes
    <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> 
<mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> wrote:

        Hi,

        It depends.

        The "traditional" R approach to these matters is that you (a)
        first perform some sort of an analysis and save the results
        as an object and then (b) show or plot what you got.  It is
        part (b) that tends to be really generic, and (in my opinion)
        should have really generic names -- like "show" or "plot" or
        "hist" or "image".

        With PCA in particular, you usually have to perform a bunch
        of computations in order to get the principal components from
        some part of the data.  As I understand it now, these
        computations are performed along the way as part of the
        various "plotPCA" functions.  The "R way" to do this would be
        something like
            pca <- performPCA(mySpecialObject)  # or
        as.PCA(mySpecialObject)
            plot(pca) # to get the scatter plot
        This apporach has the user-friendly advantage that you can
        tweak the plot (in terms of colors, symbols, ranges, titles,
        etc) without having to recompute the principal components
        every time. (I often find myself re-plotting the same PCA
        several times, with different colors or symbols for different
        factrors associated with the samples.) In addition, you could
        then also do something like
            screeplot(pca)
        to get a plot of the percentages of variance explained.

        My own feeling is that if the object doesn't know what to do
        when you tell it to "plot" itself, then you haven't got the
        right abstraction.

        You may still end up needing generics for each kind of
        computation you want to perform (PCA, RLE, MA, etc), which is
        why I suggested an "as.PCA" function.  After all, "as" is
        already pretty generic.  In the long run, l this would herlp
        BioConductor developers, since they wouldn't all have to
        reimplement the visualization code; they would just have to
        figure out how to convert their own object into a PCA or RLE
        or MA object.

        And I know that this "plotWhatever" approach is used
        elsewhere in BioConductor, and it has always bothered me. It
        just seemed that a post suggesting a new generic function
        provided a reasonable opportunity to point out that there
        might be a better way.

        Best,
          Kevin

        PS: My own "ClassDicsovery" package, which is available from
        RForge via
        **|install.packages("ClassDiscovery",
        repos="http://R-Forge.R-project.org <http://r-forge.r-project.org/>"
        <http://R-Forge.R-project.org <http://r-forge.r-project.org/>>)|**
        includes a "SamplePCA" class that does something roughly
        similar to this for microarrays.

        PPS (off-topic): The worst offender in base R -- because it
        doesn't use this "typical" approch -- is the "heatmap"
        function.  Having tried to teach this function in several
        different classes, I have come to the conclusion that it is
        basically unusable by mortals.  And I think the problem is
        that it tries to combine too many steps -- clustering rows,
        clustering columns, scaling, visualization -- all in a single
        fiunction


        On 10/20/2014 3:47 PM, davide risso wrote:

        Hi Kevin,

        I don't agree. In the case of EDASeq (as I suppose it is the
        case for DESeq/DESeq2) plotting the principal components of
        the count matrix is only one of possible exploratory plots
        (RLE plots, MA plots, etc.).
        So, in my opinion, it makes more sense from an object
        oriented point of view to have multiple plotting methods for
        a single "RNA-seq experiment" object.

        In addition, this is the same strategy adopted elsewhere in
        Bioconductor, e.g., for the plotMA method.

        Just my two cents.

        Best,
        davide

        On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes
        <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>
        <mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> 
wrote:

            I understand that breaking code is a problem, and that
            is admittedly the main reason not to immediately adopt
            my suggestion.

            But as a purely logical exercise, creating a "PCA"
            object X or something similar and using either
                plot(X)
            or
            plot(as.PCA(mySpecialObject))
            is a much more sensible use of object-oriented
            programming/design. This requires no new generics (to
            write or to learn).

            And you could use it to transition away from the current
            system by convincing the various package maintainers to
            re-implement plotPCA as follows:

            plotPCA <- function(object, ...) {
              plot(as.PCA(object), ...)
            }

            This would be relatively easy to eventually deprecate
            and teach users to switch to the alternative.


            On 10/20/2014 1:07 PM, Michael Love wrote:

            hi Kevin,

            that would imply there is only one way to plot an
            object of a given class. Additionally, it would break a
            lot of code.?

            best,

            Mike

            On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes
            <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>
            <mailto:kevin.r.coom...@gmail.com 
<mailto:kevin.r.coom...@gmail.com>>> wrote:

                But shouldn't they all really just be named "plot"
                for the appropriate objects?  In which case, there
                would already be a perfectly good generic....

                On Oct 20, 2014 10:27 AM, "Michael Love"
                <michaelisaiahl...@gmail.com 
<mailto:michaelisaiahl...@gmail.com>
                <mailto:michaelisaiahl...@gmail.com 
<mailto:michaelisaiahl...@gmail.com>>> wrote:

                    I noticed that 'plotPCA' functions are defined
                    in EDASeq, DESeq2, DESeq,
                    affycoretools, Rcade, facopy, CopyNumber450k,
                    netresponse, MAIT (maybe
                    more).

                    Sounds like a case for BiocGenerics.

                    best,

                    Mike

                    [[alternative HTML version deleted]]

                    _______________________________________________
                    Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
                    <mailto:Bioc-devel@r-project.org 
<mailto:Bioc-devel@r-project.org>> mailing list
                    https://stat.ethz.ch/mailman/listinfo/bioc-devel 
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>




            
------------------------------------------------------------------------
            <http://www.avast.com/ <http://www.avast.com/>>

            This email is free from viruses and malware because
            avast! Antivirus <http://www.avast.com/ <http://www.avast.com/>> 
protection is
            active.





        --
        Davide Risso, PhD
        Post Doctoral Scholar
        Division of Biostatistics
        School of Public Health
        University of California, Berkeley
        344 Li Ka Shing Center, #3370
        Berkeley, CA 94720-3370
        E-mail: davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu>
        <mailto:davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu>>




        ------------------------------------------------------------------------
        <http://www.avast.com/ <http://www.avast.com/>>

        This email is free from viruses and malware because avast!
        Antivirus <http://www.avast.com/ <http://www.avast.com/>> protection is 
active.





    --
    Davide Risso, PhD
    Post Doctoral Scholar
    Division of Biostatistics
    School of Public Health
    University of California, Berkeley
    344 Li Ka Shing Center, #3370
    Berkeley, CA 94720-3370
    E-mail: davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu> 
<mailto:davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu>>




    ------------------------------------------------------------------------
    <http://www.avast.com/ <http://www.avast.com/>>

    This email is free from viruses and malware because avast!
    Antivirus <http://www.avast.com/ <http://www.avast.com/>> protection is 
active.




---
This email is free from viruses and malware because avast! Antivirus protection 
is active.


       [[alternative HTML version deleted]]



------------------------------

_______________________________________________
Bioc-devel mailing list
Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioc-devel 
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>


End of Bioc-devel Digest, Vol 127, Issue 43
*******************************************


_______________________________________________
Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel 
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>



         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] plotPCA for BiocGenerics

Reply via email to