We'll try a single arg to REDUCER and see how it goes.
BTW I'm also going to swap out DataFrame for Vector in the rowData. DataFrame has been more difficult than anticipated (storing names, subsetting to get ranges out) and doesn't give any clear advantage over Vector.
Val On 06/17/2014 02:59 PM, Michael Lawrence wrote:
I think there are two different use cases here. The first, the one that I think is driving the design, is that the user writes a function for a particular problem, where the value of iterate is known. The other use case is that the user gets a summary function from somewhere else (a package) and applies it using reduceBy*. In that case, the user would potentially need to write a wrapper, depending on the formals of the reusable function. The only way I could make the second use case work with the current design is to have a higher order function that returns a universal iterator that detects the value of iterate via nargs() and behaves appropriately. The higher order function would not need to be known to the user, just the package developer. On Tue, Jun 17, 2014 at 1:39 PM, Martin Morgan <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>> wrote: Val's out today and I'm at least part of the problem so... On 06/17/2014 10:13 AM, Michael Lawrence wrote: On Tue, Jun 17, 2014 at 7:00 AM, Valerie Obenchain <voben...@fhcrc.org <mailto:voben...@fhcrc.org>> wrote: Hi Michael, Ryan, Yes, it would be ideal to have a single signature for both cases of 'iterate'. We went over the pros/cons again and at the end of the day decided to keep things as they are. No perfect solution here. These were the primary points: - Disadvantages of defining REDUCER with only '...' is that '...' can represent variables other than just the output from MAPPER. Do you mean that "..." will capture additional arguments? From where? reduceBy* takes an argument ... and this is currently available to both the MAPPER and REDUCER, see below. - The unappealing aspect of the variadic approach is introducing a new check each time REDUCER is called. What is this check? - Going the other direction, considering a single arg for REDUCER instead two, requires coercing 'last' and 'current' to a list before pulling them apart again. What is the problem with constructing this list? Isn't that one extremely fast line of code? it's not the list construction but the lost convenience of named arguments, in addition to consistency with Reduce when the data are presented iteratively -- REDUCER=`+` instead of REDUCER=function(lst) sum(unlist(lst, use.names=FALSE)). It seems to me simpler to settle on one signature, and my preference would be for the single list argument, just because the call is smaller and simpler. Then have a convenient adaptor to handle the variadic case. The variadic adapter concept is easy enough to understand in context, but would send me for a head scratch at some later time. Martin Valerie On 06/15/14 16:36, Michael Lawrence wrote: I kind of prefer the adaptor solution, just for the sake of API cleanliness (the MAPPER/REDUCER pair has some elegance), but I think we agree that the iterate switch introduces undesirable coupling. On Sun, Jun 15, 2014 at 3:07 PM, Ryan <r...@thompsonclan.org <mailto:r...@thompsonclan.org>> wrote: What about having two separate reducer arguments, one for a reducer that takes two elements at a time and combines them, and the other for a reducer that takes a list and combines all the elements of the list? Specifying both at once would be an error. I think it makes more sense to say "these two arguments expect different things" than "this one argument expects a different thing depending on the value of another argument". -Ryan On Sun Jun 15 11:17:59 2014, Michael Lawrence wrote: I just thought there is some benefit for the callback to be the same, regardless of the iterate setting. This would allow generalization across different data scales. Perhaps all that is needed is a constructor for an adapter closure, one for each direction. For example, the variadic adapter would look like: Variadic <- function(FUN) { function(x, y) { if (missing(y)) { do.call(FUN, x) } else { FUN(x, y) } } } That would make it easy to e.g. adapt rbind into the framework. I wonder if there is precedent and better terminology from the functional programming domain? Michael On Sun, Jun 15, 2014 at 8:38 AM, Martin Morgan <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>> wrote: On 06/15/2014 07:34 AM, Michael Lawrence wrote: Hi guys, Was just checking out GenomicFiles and was a little surprised that the arguments to the REDUCER are different depending on iterate=TRUE vs. iterate=FALSE. In my often flawed opinion, iteration should not be a concern of the REDUCER. It should be oblivious to the iteration mode. In other words, when iterate=TRUE, it is a special case of having two objects to combine, instead of multiple. My 'rationale' was that one would choose iterate=FALSE when one required all elements to perform the reduction. I thought of the list (rather than ...) as the general R data structure for representing N elements, with a special case (consistent with Reduce) made for the pairwise reduction of iterate=TRUE. Either way, the two cases (x, y vs. list(), x, y vs. ...) seem to require some explaining to the user. Is there a clear better choice? You're the second person to trip over this, so I guess there's a crack in the sidewalk... Martin What would be convenient (but unnecessary) is to detect from the formal arguments whether REDUCER is variadic or list-based. In other words, if REDUCER is defined like function(...) { } it is called via do.call(), otherwise it is passed the list. Thoughts? Maybe I'm totally confused? Michael [[alternative HTML version deleted]] _________________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list https://stat.ethz.ch/mailman/__listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel> -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 <tel:%28206%29%20667-2793> [[alternative HTML version deleted]] _________________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list https://stat.ethz.ch/mailman/__listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel> [[alternative HTML version deleted]] _________________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list https://stat.ethz.ch/mailman/__listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel> [[alternative HTML version deleted]] _________________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list https://stat.ethz.ch/mailman/__listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel> -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel