>>>>> Pages, Herve >>>>> on Tue, 29 Jan 2019 16:44:47 +0000 writes:
> Hi Martin. Speed is not the concern: I just did some > quick benchmarking and didn't observe any significant > difference in method dispatch performance after doing > setGeneric("toto", function(x, a=0, b=0, c=0) > standardGeneric("toto")) vs doing setGeneric("toto", > signature="x", function(x, a=0, b=0, c=0) > standardGeneric("toto")). > Here is the real concern to me: > Aliases of the form > \alias{colSums,dgCMatrix,ANY,ANY-method} are a real pain > to maintain. It's also a pain for the end user to have to > do ?`colSums,dgCMatrix,ANY,ANY-method` to access the man > page for a particular method. I know Matrix uses "short" > aliases i.e. aliases of the form > \alias{colSums,dgCMatrix-method} so the user only needs to > do ?`colSums,dgCMatrix-method` but there is a lot of > fragility to the situation. > Here is why: The exact form that needs to be used for > these aliases can change anytime depending on what happens > in one of the upstream packages (not a concern for the > Matrix package because no upstream package defines colSums > methods). More precisely: If all the colSums methods > defined in the upstream packages and in my own package are > defined with setMethod statements of the form: > setMethod("colSums", signature(x="SomeClass"), ...) > then the aliases in the man pages must be of the form > \alias{colSums,SomeClass-method} and the end user can just > do ?`colSums,SomeClass-method`, which is good. But if > **one** upstream package decides to use a setMethod > statement of the form: > setMethod("colSums", signature(x="SomeClass", > na.rm="ANY", dims="ANY"), ...) > then the aliases for **all** the colSums methods in > **all** the downstream packages now must be of the form > \alias{colSums,SomeOtherClass,ANY,ANY-method}, even if the > method for SomeOtherClass is defined with > signature(x="SomeOtherClass"). Hmm... but to me, the behavior you describe in the above paragraph seems rather an implementation "infelicity" in R's help / documentation system, than an intrinsic necessity. Or have you thought more about this and discussed it with other S4 experts (John Chambers, Michael L., Martin Morgan, ...) and came to a different conclusion? Very generally: Just because the documentation (help system) rules are implemented as they are should *NOT* influence "the best way" to program things in R. and particularly for something such as S4 which has been adapted and tuned for a long time ... So, rather the documentation "setup" should adapt to what seems best from an R coding point of view. More specifically, if we are allowed to use short signatures in R code, i.e., signature(x=<someClass>) short for signature(x=<someClass>, na.m="ANY", dims="ANY") then the documentation \alias{} should allow to use the same principle, as the documentation / help "keys" which \alias{.} constructs will be similarly uniquely determined (at least as long as other packages do not describe methods for "my" <someClass>) So, the help / documentation (and "R CMD check" checks) should have been changed long ago, if you had sent patches to do so, n'est-ce pas? :-) ;-) [[yes, half jokingly]]. Martin > Also, as a consequence, now > the end user has to use the long syntax to access the man > pages for these methods. And if later the author of the > upstream package decides to switch back to the > setMethod("colSums", signature(x="SomeClass"), ...) form, > then I have to switch back all the aliases in all my > downstream packages to the short form again! > This fragility of the alias syntax was one of the > motivations for me to put many setGeneric statements of > the form setGeneric("someGeneric", signature="x") in > BiocGenerics over the years. So I don't have many dozens > of aliases that suddenly break for mysterious reasons ('R > CMD check' would suddenly starts reporting warnings for > these aliases despite no changes in my package or in R). > Best, > H. > On 1/29/19 03:16, Martin Maechler wrote: >>>>>>> Michael Lawrence on Mon, 28 Jan 2019 20:47:58 -0800 >>>>>>> writes: >> > That will have some consequences; for example, > >> documentation aliases will need to change. Not sure how > >> many packages will need to be fixed outside of Matrix, >> but > it's not an isolated change. Martin might comment >> on the > rationale for the full signature, since he >> defined those > generics. >> >> > On Mon, Jan 28, 2019 at 7:21 PM Pages, Herve > >> <hpa...@fredhutch.org> wrote: >> >> >> >> Actually there is a 4th solution which is to modify >> the >> definition of the implicit generics in the methods >> >> package (file makeBasicFunsList.R) to make them >> dispatch >> on their 1st arg only. Should be easy. Then >> no package >> will need to use a setGeneric statement >> >> anymore. Everybody will automatically get a clean >> >> implicit generic. Including the Matrix package which >> >> shouldn't need to be touched (except maybe for some >> >> aliases in its man page that might need to be changed >> >> from \alias{colSums,dgCMatrix,ANY,ANY-method} to >> >> \alias{colSums,dgCMatrix-method}). >> >> >> >> Anybody wants to try to make a patch for this? >> >> >> H. >> >> I've already replied without having read the above two >> messages. In my reply I had indeed more or less argued >> as Hervé does above. >> >> Michael, Hervé, .. : Why is it really so much better to >> disallow dispatch for the other compulsory arguments? >> Dispatch there allows to use methods for class "missing" >> which is nicer in my eyes than the traditional default >> argument + missing() "tricks". >> >> Is it mainly speed you are concerned about. If yes, do >> we have data (and data analysis) about performance here? >> >> Martin >> >> >> >> >> On 1/28/19 19:00, Michael Lawrence wrote: > I agree >> (2) >> is a good compromise. CC'ing Martin for his >> perspective. >> >> > >> >> > Michael >> >> > >> >> > On Mon, Jan 28, 2019 at 6:58 PM Pages, Herve >> >> <hpa...@fredhutch.org> wrote: >> Hi Aaron, >> >> >> >> >> >> The 4 matrix summarization generics currently >> defined >> in BiocGenerics >> are defined as followed: >> >> >> >> >> >> setGeneric("rowSums", signature="x") >> >> >> setGeneric("colSums", signature="x") >> >> >> setGeneric("rowMeans", signature="x") >> >> >> setGeneric("colMeans", signature="x") >> >> >> >> >> >> The only reason for having these definitions in >> >> BiocGenerics is to >> restrict dispatch the first >> >> argument. This is cleaner than what we would >> get with >> >> the implicit generics where dispatch is on all >> arguments >> (it >> doesn't really make sense to dispatch >> on toggles >> like 'na.rm' or >> 'dims'). Sticking to >> simple dispatch >> when possible makes life easier for >> >> the developer >> (especially in times of troubleshooting) >> and for the user >> >> (methods are easier to discover >> and their man pages >> easier to access). >> >> >> >> >> >> However, the 4 statements above create new generics >> >> that mask the >> implicit generics defined in the >> Matrix >> package (Matrix doesn't contain >> any >> setGeneric >> statements for these generics, only >> setMethod >> >> statements). This is a very unsatisfying >> situation and it >> has hit me >> repeatedly over the >> last couple of years. >> >> >> >> >> >> We have basically 3 ways to go. From simpler to >> more >> complicated: >> >> >> >> >> >> 1) Give up on single dispatch for these >> generics. That >> is, we remove the >> 4 statements above >> from >> BiocGenerics. Then we use setMethod() in package >> >> code >> like Matrix does. >> >> >> >> >> >> 2) Convince the Matrix folks to put the 4 >> statements >> above in Matrix. >> Then any BioC package >> that needs to >> define methods for these generics >> >> would just need to >> import them from the Matrix >> package. Maybe we could >> >> even push this one step >> further by having BiocGenerics >> import and >> re-export >> these generics. This would make >> them "available" in >> BioC as >> soon as the BiocGenerics >> is loaded (and any >> package that needs to define >> >> methods on them would >> just need to import them from >> BiocGenerics). >> >> >> >> >> >> 3) Put the 4 statements above in a MatrixGenerics >> >> package. Then convince >> the Matrix folks to define >> >> methods on the generics defined in >> >> >> MatrixGenerics. Very unlikely to happen! >> >> >> >> >> >> IMO 2) is the best compromise. Want to give it a >> shot? >> >> >> >> >> >> H. >> >> >> >> >> >> >> >> >> On 1/27/19 13:45, Aaron Lun wrote: >>> This is a >> >> resurrection of some old threads: >> >> >>> >> >> >>> >> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_bioc-2Ddevel_2017-2DNovember_012273.html&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=O21AQgvbUp3XRwM4jf0WeZA2ePj9yT3fc2X5hOsKNJk&s=pcpUyjpkQe6U79lZ_n2SANNp6Zj_s6i1Sq2yZx2NSjw&e= >> >> >>> >> >> >>> >> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_MatrixGenerics_issues&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=O21AQgvbUp3XRwM4jf0WeZA2ePj9yT3fc2X5hOsKNJk&s=NrmcVnmvgkDp3p64J-izZU9VD5nFsFCWOTI-TsnzCpY&e= >> >> >>> >> >> >>> For those who are unfamiliar with this, the basic >> >> issue is that various >>> Matrix and BiocGenerics >> >> functions mask each other. This is mildly >>> frustrating >> >> in interactive sessions: >> >> >>> >> >> >>>> library(Matrix) >>>> library(DelayedArray) >>>> x >> <- >> rsparsematrix(10, 10, 0.1) >>>> colSums(x) # fails >> >>>> >> Matrix::colSums(x) # okay >>> ... but quite >> annoying >> during package development, requiring code >> like >>> this: >> >> >>> >> >> >>> if (is(x, "Matrix")) { >>> z <- Matrix::colSums(x) >> >> >>> } else { >>> z <- colSums(x) # assuming >> DelayedArray >> does the masking. >>> } >> >> >>> >> >> >>> ... which defeats the purpose of using S4 dispatch >> in >> the first place. >> >> >>> >> >> >>> I have been encountering this issue with >> increasing >> frequency in my >>> packages, as a lot of >> my code base >> needs to be able to interface with >>> >> both Matrix and >> Bioconductor objects (e.g., >> DelayedMatrices) at the >>> >> same time. What needs to >> happen so that I can just write: >> >> >>> >> >> >>> z <- colSums(x) >> >> >>> >> >> >>> ... and everything will work for both Matrix and >> >> Bioconductor classes? >>> It seems that many of these >> >> function names are implicit generics >>> anyway, can >> >> BiocGenerics take advantage of that for the time >> being? >> >> >>> >> >> >>> Best, >> >> >>> >> >> >>> Aaron >> >> >>> >> >> >>> _______________________________________________ >> >>> >> Bioc-devel@r-project.org mailing list >>> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=O21AQgvbUp3XRwM4jf0WeZA2ePj9yT3fc2X5hOsKNJk&s=JtgGBnaZJH44fV8OUp-SwnHxhD_i_mdVkqoMfUoA5tM&e= >> >> >> -- >> >> >> Hervé Pagès >> >> >> >> >> >> Program in Computational Biology >> Division of >> Public >> Health Sciences >> Fred Hutchinson Cancer >> Research Center >> >> 1100 Fairview Ave. N, M1-B514 >> >> P.O. Box 19024 >> >> Seattle, WA 98109-1024 >> >> >> >> >> >> E-mail: hpa...@fredhutch.org >> Phone: (206) >> 667-5791 >> >> Fax: (206) 667-1319 >> >> >> >> >> >> _______________________________________________ >> >> >> Bioc-devel@r-project.org mailing list >> >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=c-Mmi30ouubEEHC5W9_X6DIwxblt1nQlIfgCaK8uCJU&s=U8Hu1kzglD_RP7t_eR5w_nYAIaupBgrEKx11geSZwVg&e= >> >> >> >> -- >> >> Hervé Pagès >> >> >> >> Program in Computational Biology Division of Public >> >> Health Sciences Fred Hutchinson Cancer Research Center >> >> 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA >> >> 98109-1024 >> >> >> >> E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 >> Fax: >> (206) 667-1319 >> >> > -- > Hervé Pagès > Program in Computational Biology Division of Public Health > Sciences Fred Hutchinson Cancer Research Center 1100 > Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA > 98109-1024 > E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax: > (206) 667-1319 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel