Re: [Bioc-devel] plotPCA for BiocGenerics

Tim Triche, Jr. Fri, 31 Oct 2014 16:20:07 -0700

A drag, since hist(foo, plot=FALSE) acts as expected. I emulate this behavior 
for most of my code nowadays, not least since I often want to return something 
useful, whether plotted or not.


--t

> On Oct 31, 2014, at 3:53 PM, "Ryan C. Thompson" <r...@thompsonclan.org> wrote:
> 
> I'd just like to chime in that regardless of what approach is chosen, I 
> definitely would appreciate a way to get the plot data without actually 
> making the plot. I often end up reimplementing plots in ggplot so that I can 
> easily customize some aspect of them, so in such cases I need a way to just 
> get the plot data/coordinates.
> 
> For example, if I have an edgeR DGEList and I want to get the X and Y 
> coordinates for the MDS plot, I need to do something like:
> 
> dev.new()
> mds.coords <- plotMDS(dge)
> dev.off()
> 
> which is kind of unfortunate.
> 
> So I guess this is more a reminder to people implementing plots to also 
> implement a way to get the plot data.
> 
> -Ryan
> 
>> On Fri 31 Oct 2014 03:43:04 PM PDT, Steve Lianoglou wrote:
>> Hi,
>> 
>> On Fri, Oct 31, 2014 at 2:35 PM, Thomas Lin Pedersen
>> <thomas...@gmail.com> wrote:
>>> With regards to abstraction - I would personally much rather read and write 
>>> code that contained plotScores() and plotScree() etc. where the intend of 
>>> the code is clearly communicated, instead of relying on a plot() function 
>>> whose result is only known from experience. Trying to squeeze every kind of 
>>> visual output into the same plot generic seems artificial and constrained 
>>> to me. I totally agree on the plotPCA critique on the other hand...
>> 
>> If we've bought a ticket to ride on Kevin's and Michael's (and whoever
>> else) train of thought, wouldn't plot(pca(x), type='scree') or
>> plot(pca(x), type='scores') be the preferred way to go ... for some
>> definition of "preferable"?
>> 
>> -steve
>> 
>>> 
>>> Thomas
>>> 
>>> 
>>>> On 31 Oct 2014, at 22:09, Michael Lawrence <lawrence.mich...@gene.com> 
>>>> wrote:
>>>> 
>>>> I strongly agree with Kevin's position. plotPCA() represents two separate 
>>>> concerns in its very name: the computation and the rendering. Those need 
>>>> to be separated, at least behind the scenes. The syntax of plot(pca(x)) is 
>>>> preferable to plotPCA, because the structure of the operation is 
>>>> represented by in the expression itself, not just in a non-computable 
>>>> function name.
>>>> 
>>>> With regard to how a plot,PCA should behave: there is always a tension 
>>>> between high-level and low-level APIs. In the end, we need multiple levels 
>>>> of abstraction.  While high-level APIs sacrifice flexibility, we need them 
>>>> because they communicate the high-level *intent* of the user in the code 
>>>> itself (self-documenting code), and they enable reusability, which not 
>>>> only reduces redudant effort but also ensures consistency. Once our brains 
>>>> no longer need to parse low-level code, we can focus our mental power on 
>>>> correctness and efficiency. To design a high-level API, one needs to 
>>>> carefully analyze user requirements, i.e., the use cases. To choose the 
>>>> default behavior, one needs to rate the use cases by their prevalance, and 
>>>> by how closely they match the intuition-based expectations of the user.
>>>> 
>>>> The fact that at least 9 packages are performing such a similar task seems 
>>>> to indicate that a common abstraction is warranted, but I am not sure if 
>>>> BiocGenerics is the appropriate place.
>>>> 
>>>> Michael
>>>> 
>>>> On Tue, Oct 21, 2014 at 12:54 AM, Thomas Dybdal Pedersen 
>>>> <thomas...@gmail.com <mailto:thomas...@gmail.com>> wrote:
>>>> While I tend to agree with you that PCA is too big an operation to be 
>>>> hidden within a plotting function (MDS is an edge-case I would say), I 
>>>> can't see how we can ever reach a point where there is only one generic 
>>>> plot function. In the case of PCA there is a number of different 
>>>> plot-types that can all lay claim to the plot function of a PCA class, for 
>>>> instance scoreplot, scatterplot matrix of all scores, biplot, screeplot, 
>>>> accumulated R^2 barplot, leverage vs. distance-to-model... (you get the 
>>>> idea). So while having some very well-thought out classes for very common 
>>>> result types such as PCA, this class would still need a lot of different 
>>>> plot methods such as plotScores, plotScree etc (or plot(..., 
>>>> type='score'), but I don't find that very appealing). Expanding beyond PCA 
>>>> only muddles the water even more - there are very few interesting data 
>>>> structures that only have one visual representation to-rule-them-all...
>>>> 
>>>> just my 2c
>>>> 
>>>> best
>>>> Thomas
>>>> 
>>>> 
>>>>> Date: Mon, 20 Oct 2014 18:50:48 -0400
>>>>> From: Kevin Coombes <kevin.r.coom...@gmail.com 
>>>>> <mailto:kevin.r.coom...@gmail.com>>
>>>>> 
>>>>> Well. I have two responses to that.
>>>>> 
>>>>> First, I think it would be a lot better/easier for users if (most)
>>>>> developers could make use of the same plot function for "basic" classes
>>>>> like PCA.
>>>>> 
>>>>> Second, if you think the basic PCA plotting routine needs enhancements,
>>>>> you still have two options.  On the one hand, you could (as you said)
>>>>> try to convince the maintainer of PCA to add what you want.  If it's
>>>>> generally valuable, then he'd probably do it --- and other classes that
>>>>> use it would benefit.  On the other hand, if it really is a special
>>>>> enhancement that only makes sense for your class, then you can derive a
>>>>> class from the basic PCA class
>>>>>     setClass("mySpecialPCA", contains=c("PCA"), *other stuff here*)
>>>>>  and implement your own version of the "plot" generic for this class.
>>>>> And you could tweak the "as.PCA" function so it returns an object of the
>>>>> mySpecialPCA class. And the user could still just "plot" the result
>>>>> without hacving to care what's happening behind the scenes.
>>>>> 
>>>>>> On 10/20/2014 5:59 PM, Michael Love wrote:
>>>>>> Ah, I see now. Personally, I don't think Bioconductor developers
>>>>>> should have to agree on single plotting functions for basic classes
>>>>>> like 'PCA' (because this logic applies equally to the situation of all
>>>>>> Bioconductor developers agreeing on single MA-plot, a single
>>>>>> variance-mean plot, etc). I think letting developers define their
>>>>>> plotPCA makes contributions easier (I don't have to ask the owner of
>>>>>> plot.PCA to incorporate something), even though it means we have a
>>>>>> growing list of generics.
>>>>>> 
>>>>>> Still you have a good point about splitting computation and plotting.
>>>>>> In practice, we subset the rows so PCA is not laborious.
>>>>>> 
>>>>>> 
>>>>>> On Mon, Oct 20, 2014 at 5:38 PM, Kevin Coombes
>>>>>> <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> 
>>>>>> <mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>    Hi,
>>>>>> 
>>>>>>    I don't see how it needs more functions (as long as you can get
>>>>>>    developers to agree).  Suppose that someone can define a reusable
>>>>>>    PCA class.  This will contain a single "plot" generic function,
>>>>>>    defined once and reused by other classes. The existing "plotPCA"
>>>>>>    interface can also be implemented just once, in this class, as
>>>>>> 
>>>>>>        plotPCA <- function(object, ...) plot(as.PCA(object), ...)
>>>>>> 
>>>>>>    This can be exposed to users of your class through namespaces.
>>>>>>    Then the only thing a developer needs to implement in his own
>>>>>>    class is the single "as.PCA" function.  And he/she would have
>>>>>>    already been rquired to implement this as part of the old
>>>>>>    "plotPCA" function.  So it can be extracted from that, and the
>>>>>>    developer doesn't have to reimplement the visualization code from
>>>>>>    the PCA class.
>>>>>> 
>>>>>>    Best,
>>>>>>      Kevin
>>>>>> 
>>>>>> 
>>>>>>>    On 10/20/2014 5:15 PM, davide risso wrote:
>>>>>>>    Hi Kevin,
>>>>>>> 
>>>>>>>    I see your points and I agree (especially for the specific case
>>>>>>>    of plotPCA that involves some non trivial computations).
>>>>>>> 
>>>>>>>    On the other hand, having a wrapper function that starting from
>>>>>>>    the "raw" data gives you a pretty picture (with virtually zero
>>>>>>>    effort by the user) using a sensible choice of parameters that
>>>>>>>    are more or less OK for RNA-seq data is useful for practitioners
>>>>>>>    that just want to look for patterns in the data.
>>>>>>> 
>>>>>>>    I guess it would be the same to have a PCA method for each of the
>>>>>>>    objects and then using the plot method on those new objects, but
>>>>>>>    that would just create a lot more objects and functions than the
>>>>>>>    current approach (like Mike was saying).
>>>>>>> 
>>>>>>>    Your "as.pca" or "performPCA" approach would be definitely better
>>>>>>>    if all the different methods would create objects of the *same*
>>>>>>>    PCA class, but since we are talking about different packages, I
>>>>>>>    don't know how easy it would be to coordinate. But perhaps this
>>>>>>>    is the way we should go.
>>>>>>> 
>>>>>>>    Best,
>>>>>>>    davide
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>    On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes
>>>>>>>    <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com> 
>>>>>>> <mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>        Hi,
>>>>>>> 
>>>>>>>        It depends.
>>>>>>> 
>>>>>>>        The "traditional" R approach to these matters is that you (a)
>>>>>>>        first perform some sort of an analysis and save the results
>>>>>>>        as an object and then (b) show or plot what you got.  It is
>>>>>>>        part (b) that tends to be really generic, and (in my opinion)
>>>>>>>        should have really generic names -- like "show" or "plot" or
>>>>>>>        "hist" or "image".
>>>>>>> 
>>>>>>>        With PCA in particular, you usually have to perform a bunch
>>>>>>>        of computations in order to get the principal components from
>>>>>>>        some part of the data.  As I understand it now, these
>>>>>>>        computations are performed along the way as part of the
>>>>>>>        various "plotPCA" functions.  The "R way" to do this would be
>>>>>>>        something like
>>>>>>>            pca <- performPCA(mySpecialObject)  # or
>>>>>>>        as.PCA(mySpecialObject)
>>>>>>>            plot(pca) # to get the scatter plot
>>>>>>>        This apporach has the user-friendly advantage that you can
>>>>>>>        tweak the plot (in terms of colors, symbols, ranges, titles,
>>>>>>>        etc) without having to recompute the principal components
>>>>>>>        every time. (I often find myself re-plotting the same PCA
>>>>>>>        several times, with different colors or symbols for different
>>>>>>>        factrors associated with the samples.) In addition, you could
>>>>>>>        then also do something like
>>>>>>>            screeplot(pca)
>>>>>>>        to get a plot of the percentages of variance explained.
>>>>>>> 
>>>>>>>        My own feeling is that if the object doesn't know what to do
>>>>>>>        when you tell it to "plot" itself, then you haven't got the
>>>>>>>        right abstraction.
>>>>>>> 
>>>>>>>        You may still end up needing generics for each kind of
>>>>>>>        computation you want to perform (PCA, RLE, MA, etc), which is
>>>>>>>        why I suggested an "as.PCA" function.  After all, "as" is
>>>>>>>        already pretty generic.  In the long run, l this would herlp
>>>>>>>        BioConductor developers, since they wouldn't all have to
>>>>>>>        reimplement the visualization code; they would just have to
>>>>>>>        figure out how to convert their own object into a PCA or RLE
>>>>>>>        or MA object.
>>>>>>> 
>>>>>>>        And I know that this "plotWhatever" approach is used
>>>>>>>        elsewhere in BioConductor, and it has always bothered me. It
>>>>>>>        just seemed that a post suggesting a new generic function
>>>>>>>        provided a reasonable opportunity to point out that there
>>>>>>>        might be a better way.
>>>>>>> 
>>>>>>>        Best,
>>>>>>>          Kevin
>>>>>>> 
>>>>>>>        PS: My own "ClassDicsovery" package, which is available from
>>>>>>>        RForge via
>>>>>>>        **|install.packages("ClassDiscovery",
>>>>>>>        repos="http://R-Forge.R-project.org 
>>>>>>> <http://r-forge.r-project.org/>"
>>>>>>>        <http://R-Forge.R-project.org 
>>>>>>> <http://r-forge.r-project.org/>>)|**
>>>>>>>        includes a "SamplePCA" class that does something roughly
>>>>>>>        similar to this for microarrays.
>>>>>>> 
>>>>>>>        PPS (off-topic): The worst offender in base R -- because it
>>>>>>>        doesn't use this "typical" approch -- is the "heatmap"
>>>>>>>        function.  Having tried to teach this function in several
>>>>>>>        different classes, I have come to the conclusion that it is
>>>>>>>        basically unusable by mortals.  And I think the problem is
>>>>>>>        that it tries to combine too many steps -- clustering rows,
>>>>>>>        clustering columns, scaling, visualization -- all in a single
>>>>>>>        fiunction
>>>>>>> 
>>>>>>> 
>>>>>>>>        On 10/20/2014 3:47 PM, davide risso wrote:
>>>>>>>>        Hi Kevin,
>>>>>>>> 
>>>>>>>>        I don't agree. In the case of EDASeq (as I suppose it is the
>>>>>>>>        case for DESeq/DESeq2) plotting the principal components of
>>>>>>>>        the count matrix is only one of possible exploratory plots
>>>>>>>>        (RLE plots, MA plots, etc.).
>>>>>>>>        So, in my opinion, it makes more sense from an object
>>>>>>>>        oriented point of view to have multiple plotting methods for
>>>>>>>>        a single "RNA-seq experiment" object.
>>>>>>>> 
>>>>>>>>        In addition, this is the same strategy adopted elsewhere in
>>>>>>>>        Bioconductor, e.g., for the plotMA method.
>>>>>>>> 
>>>>>>>>        Just my two cents.
>>>>>>>> 
>>>>>>>>        Best,
>>>>>>>>        davide
>>>>>>>> 
>>>>>>>>        On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes
>>>>>>>>        <kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>
>>>>>>>>        <mailto:kevin.r.coom...@gmail.com 
>>>>>>>> <mailto:kevin.r.coom...@gmail.com>>> wrote:
>>>>>>>> 
>>>>>>>>            I understand that breaking code is a problem, and that
>>>>>>>>            is admittedly the main reason not to immediately adopt
>>>>>>>>            my suggestion.
>>>>>>>> 
>>>>>>>>            But as a purely logical exercise, creating a "PCA"
>>>>>>>>            object X or something similar and using either
>>>>>>>>                plot(X)
>>>>>>>>            or
>>>>>>>>            plot(as.PCA(mySpecialObject))
>>>>>>>>            is a much more sensible use of object-oriented
>>>>>>>>            programming/design. This requires no new generics (to
>>>>>>>>            write or to learn).
>>>>>>>> 
>>>>>>>>            And you could use it to transition away from the current
>>>>>>>>            system by convincing the various package maintainers to
>>>>>>>>            re-implement plotPCA as follows:
>>>>>>>> 
>>>>>>>>            plotPCA <- function(object, ...) {
>>>>>>>>              plot(as.PCA(object), ...)
>>>>>>>>            }
>>>>>>>> 
>>>>>>>>            This would be relatively easy to eventually deprecate
>>>>>>>>            and teach users to switch to the alternative.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>            On 10/20/2014 1:07 PM, Michael Love wrote:
>>>>>>>>>            hi Kevin,
>>>>>>>>> 
>>>>>>>>>            that would imply there is only one way to plot an
>>>>>>>>>            object of a given class. Additionally, it would break a
>>>>>>>>>            lot of code.?
>>>>>>>>> 
>>>>>>>>>            best,
>>>>>>>>> 
>>>>>>>>>            Mike
>>>>>>>>> 
>>>>>>>>>            On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes
>>>>>>>>>            <kevin.r.coom...@gmail.com 
>>>>>>>>> <mailto:kevin.r.coom...@gmail.com>
>>>>>>>>>            <mailto:kevin.r.coom...@gmail.com 
>>>>>>>>> <mailto:kevin.r.coom...@gmail.com>>> wrote:
>>>>>>>>> 
>>>>>>>>>                But shouldn't they all really just be named "plot"
>>>>>>>>>                for the appropriate objects?  In which case, there
>>>>>>>>>                would already be a perfectly good generic....
>>>>>>>>> 
>>>>>>>>>                On Oct 20, 2014 10:27 AM, "Michael Love"
>>>>>>>>>                <michaelisaiahl...@gmail.com 
>>>>>>>>> <mailto:michaelisaiahl...@gmail.com>
>>>>>>>>>                <mailto:michaelisaiahl...@gmail.com 
>>>>>>>>> <mailto:michaelisaiahl...@gmail.com>>> wrote:
>>>>>>>>> 
>>>>>>>>>                    I noticed that 'plotPCA' functions are defined
>>>>>>>>>                    in EDASeq, DESeq2, DESeq,
>>>>>>>>>                    affycoretools, Rcade, facopy, CopyNumber450k,
>>>>>>>>>                    netresponse, MAIT (maybe
>>>>>>>>>                    more).
>>>>>>>>> 
>>>>>>>>>                    Sounds like a case for BiocGenerics.
>>>>>>>>> 
>>>>>>>>>                    best,
>>>>>>>>> 
>>>>>>>>>                    Mike
>>>>>>>>> 
>>>>>>>>>                    [[alternative HTML version deleted]]
>>>>>>>>> 
>>>>>>>>>                    _______________________________________________
>>>>>>>>>                    Bioc-devel@r-project.org 
>>>>>>>>> <mailto:Bioc-devel@r-project.org>
>>>>>>>>>                    <mailto:Bioc-devel@r-project.org 
>>>>>>>>> <mailto:Bioc-devel@r-project.org>> mailing list
>>>>>>>>>                    https://stat.ethz.ch/mailman/listinfo/bioc-devel 
>>>>>>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>            
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>            <http://www.avast.com/ <http://www.avast.com/>>
>>>>>>>> 
>>>>>>>>            This email is free from viruses and malware because
>>>>>>>>            avast! Antivirus <http://www.avast.com/ 
>>>>>>>> <http://www.avast.com/>> protection is
>>>>>>>>            active.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>        --
>>>>>>>>        Davide Risso, PhD
>>>>>>>>        Post Doctoral Scholar
>>>>>>>>        Division of Biostatistics
>>>>>>>>        School of Public Health
>>>>>>>>        University of California, Berkeley
>>>>>>>>        344 Li Ka Shing Center, #3370
>>>>>>>>        Berkeley, CA 94720-3370
>>>>>>>>        E-mail: davide.ri...@berkeley.edu 
>>>>>>>> <mailto:davide.ri...@berkeley.edu>
>>>>>>>>        <mailto:davide.ri...@berkeley.edu 
>>>>>>>> <mailto:davide.ri...@berkeley.edu>>
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>        
>>>>>>> ------------------------------------------------------------------------
>>>>>>>        <http://www.avast.com/ <http://www.avast.com/>>
>>>>>>> 
>>>>>>>        This email is free from viruses and malware because avast!
>>>>>>>        Antivirus <http://www.avast.com/ <http://www.avast.com/>> 
>>>>>>> protection is active.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>    --
>>>>>>>    Davide Risso, PhD
>>>>>>>    Post Doctoral Scholar
>>>>>>>    Division of Biostatistics
>>>>>>>    School of Public Health
>>>>>>>    University of California, Berkeley
>>>>>>>    344 Li Ka Shing Center, #3370
>>>>>>>    Berkeley, CA 94720-3370
>>>>>>>    E-mail: davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu> 
>>>>>>> <mailto:davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu>>
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>    
>>>>>> ------------------------------------------------------------------------
>>>>>>    <http://www.avast.com/ <http://www.avast.com/>>
>>>>>> 
>>>>>>    This email is free from viruses and malware because avast!
>>>>>>    Antivirus <http://www.avast.com/ <http://www.avast.com/>> protection 
>>>>>> is active.
>>>>> 
>>>>> 
>>>>> 
>>>>> ---
>>>>> This email is free from viruses and malware because avast! Antivirus 
>>>>> protection is active.
>>>>> 
>>>>> 
>>>>>       [[alternative HTML version deleted]]
>>>>> 
>>>>> 
>>>>> 
>>>>> ------------------------------
>>>>> 
>>>>> _______________________________________________
>>>>> Bioc-devel mailing list
>>>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel 
>>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>>>> 
>>>>> 
>>>>> End of Bioc-devel Digest, Vol 127, Issue 43
>>>>> *******************************************
>>>> 
>>>> _______________________________________________
>>>> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel 
>>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>> 
>>> 
>>>         [[alternative HTML version deleted]]
>>> 
>>> _______________________________________________
>>> Bioc-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 
> _______________________________________________
> Bioc-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] plotPCA for BiocGenerics

Reply via email to