Hi Kevin,
I see your points and I agree (especially for the specific case
of plotPCA that involves some non trivial computations).
On the other hand, having a wrapper function that starting from
the "raw" data gives you a pretty picture (with virtually zero
effort by the user) using a sensible choice of parameters that
are more or less OK for RNA-seq data is useful for practitioners
that just want to look for patterns in the data.
I guess it would be the same to have a PCA method for each of the
objects and then using the plot method on those new objects, but
that would just create a lot more objects and functions than the
current approach (like Mike was saying).
Your "as.pca" or "performPCA" approach would be definitely better
if all the different methods would create objects of the *same*
PCA class, but since we are talking about different packages, I
don't know how easy it would be to coordinate. But perhaps this
is the way we should go.
Best,
davide
On Mon, Oct 20, 2014 at 1:26 PM, Kevin Coombes
<kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>
<mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>> wrote:
Hi,
It depends.
The "traditional" R approach to these matters is that you (a)
first perform some sort of an analysis and save the results
as an object and then (b) show or plot what you got. It is
part (b) that tends to be really generic, and (in my opinion)
should have really generic names -- like "show" or "plot" or
"hist" or "image".
With PCA in particular, you usually have to perform a bunch
of computations in order to get the principal components from
some part of the data. As I understand it now, these
computations are performed along the way as part of the
various "plotPCA" functions. The "R way" to do this would be
something like
pca <- performPCA(mySpecialObject) # or
as.PCA(mySpecialObject)
plot(pca) # to get the scatter plot
This apporach has the user-friendly advantage that you can
tweak the plot (in terms of colors, symbols, ranges, titles,
etc) without having to recompute the principal components
every time. (I often find myself re-plotting the same PCA
several times, with different colors or symbols for different
factrors associated with the samples.) In addition, you could
then also do something like
screeplot(pca)
to get a plot of the percentages of variance explained.
My own feeling is that if the object doesn't know what to do
when you tell it to "plot" itself, then you haven't got the
right abstraction.
You may still end up needing generics for each kind of
computation you want to perform (PCA, RLE, MA, etc), which is
why I suggested an "as.PCA" function. After all, "as" is
already pretty generic. In the long run, l this would herlp
BioConductor developers, since they wouldn't all have to
reimplement the visualization code; they would just have to
figure out how to convert their own object into a PCA or RLE
or MA object.
And I know that this "plotWhatever" approach is used
elsewhere in BioConductor, and it has always bothered me. It
just seemed that a post suggesting a new generic function
provided a reasonable opportunity to point out that there
might be a better way.
Best,
Kevin
PS: My own "ClassDicsovery" package, which is available from
RForge via
**|install.packages("ClassDiscovery",
repos="http://R-Forge.R-project.org <http://r-forge.r-project.org/>"
<http://R-Forge.R-project.org <http://r-forge.r-project.org/>>)|**
includes a "SamplePCA" class that does something roughly
similar to this for microarrays.
PPS (off-topic): The worst offender in base R -- because it
doesn't use this "typical" approch -- is the "heatmap"
function. Having tried to teach this function in several
different classes, I have come to the conclusion that it is
basically unusable by mortals. And I think the problem is
that it tries to combine too many steps -- clustering rows,
clustering columns, scaling, visualization -- all in a single
fiunction
On 10/20/2014 3:47 PM, davide risso wrote:
Hi Kevin,
I don't agree. In the case of EDASeq (as I suppose it is the
case for DESeq/DESeq2) plotting the principal components of
the count matrix is only one of possible exploratory plots
(RLE plots, MA plots, etc.).
So, in my opinion, it makes more sense from an object
oriented point of view to have multiple plotting methods for
a single "RNA-seq experiment" object.
In addition, this is the same strategy adopted elsewhere in
Bioconductor, e.g., for the plotMA method.
Just my two cents.
Best,
davide
On Mon, Oct 20, 2014 at 11:30 AM, Kevin Coombes
<kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>
<mailto:kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>>>
wrote:
I understand that breaking code is a problem, and that
is admittedly the main reason not to immediately adopt
my suggestion.
But as a purely logical exercise, creating a "PCA"
object X or something similar and using either
plot(X)
or
plot(as.PCA(mySpecialObject))
is a much more sensible use of object-oriented
programming/design. This requires no new generics (to
write or to learn).
And you could use it to transition away from the current
system by convincing the various package maintainers to
re-implement plotPCA as follows:
plotPCA <- function(object, ...) {
plot(as.PCA(object), ...)
}
This would be relatively easy to eventually deprecate
and teach users to switch to the alternative.
On 10/20/2014 1:07 PM, Michael Love wrote:
hi Kevin,
that would imply there is only one way to plot an
object of a given class. Additionally, it would break a
lot of code.?
best,
Mike
On Mon, Oct 20, 2014 at 12:50 PM, Kevin Coombes
<kevin.r.coom...@gmail.com <mailto:kevin.r.coom...@gmail.com>
<mailto:kevin.r.coom...@gmail.com
<mailto:kevin.r.coom...@gmail.com>>> wrote:
But shouldn't they all really just be named "plot"
for the appropriate objects? In which case, there
would already be a perfectly good generic....
On Oct 20, 2014 10:27 AM, "Michael Love"
<michaelisaiahl...@gmail.com
<mailto:michaelisaiahl...@gmail.com>
<mailto:michaelisaiahl...@gmail.com
<mailto:michaelisaiahl...@gmail.com>>> wrote:
I noticed that 'plotPCA' functions are defined
in EDASeq, DESeq2, DESeq,
affycoretools, Rcade, facopy, CopyNumber450k,
netresponse, MAIT (maybe
more).
Sounds like a case for BiocGenerics.
best,
Mike
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
<mailto:Bioc-devel@r-project.org
<mailto:Bioc-devel@r-project.org>> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
------------------------------------------------------------------------
<http://www.avast.com/ <http://www.avast.com/>>
This email is free from viruses and malware because
avast! Antivirus <http://www.avast.com/ <http://www.avast.com/>>
protection is
active.
--
Davide Risso, PhD
Post Doctoral Scholar
Division of Biostatistics
School of Public Health
University of California, Berkeley
344 Li Ka Shing Center, #3370
Berkeley, CA 94720-3370
E-mail: davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu>
<mailto:davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu>>
------------------------------------------------------------------------
<http://www.avast.com/ <http://www.avast.com/>>
This email is free from viruses and malware because avast!
Antivirus <http://www.avast.com/ <http://www.avast.com/>> protection is
active.
--
Davide Risso, PhD
Post Doctoral Scholar
Division of Biostatistics
School of Public Health
University of California, Berkeley
344 Li Ka Shing Center, #3370
Berkeley, CA 94720-3370
E-mail: davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu>
<mailto:davide.ri...@berkeley.edu <mailto:davide.ri...@berkeley.edu>>