We'll try a single arg to REDUCER and see how it goes.

BTW I'm also going to swap out DataFrame for Vector in the rowData. DataFrame has been more difficult than anticipated (storing names, subsetting to get ranges out) and doesn't give any clear advantage over Vector.

Val



On 06/17/2014 02:59 PM, Michael Lawrence wrote:
I think there are two different use cases here. The first, the one that
I think is driving the design, is that the user writes a function for a
particular problem, where the value of iterate is known. The other use
case is that the user gets a summary function from somewhere else (a
package) and applies it using reduceBy*. In that case, the user would
potentially need to write a wrapper, depending on the formals of the
reusable function. The only way I could make the second use case work
with the current design is to have a higher order function that returns
a universal iterator that detects the value of iterate via nargs() and
behaves appropriately. The higher order function would not need to be
known to the user, just the package developer.



On Tue, Jun 17, 2014 at 1:39 PM, Martin Morgan <mtmor...@fhcrc.org
<mailto:mtmor...@fhcrc.org>> wrote:

    Val's out today and I'm at least part of the problem so...


    On 06/17/2014 10:13 AM, Michael Lawrence wrote:

        On Tue, Jun 17, 2014 at 7:00 AM, Valerie Obenchain
        <voben...@fhcrc.org <mailto:voben...@fhcrc.org>>
        wrote:

            Hi Michael, Ryan,

            Yes, it would be ideal to have a single signature for both
            cases of
            'iterate'. We went over the pros/cons again and at the end
            of the day
            decided to keep things as they are. No perfect solution here.

            These were the primary points:

            - Disadvantages of defining REDUCER with only '...' is that
            '...' can
            represent variables other than just the output from MAPPER.


        Do you mean that "..." will capture additional arguments? From
        where?


    reduceBy* takes an argument ... and this is currently available to
    both the MAPPER and REDUCER, see below.




            - The unappealing aspect of the variadic approach is
            introducing a new
            check each time REDUCER is called.


        What is this check?


            - Going the other direction, considering a single arg for
            REDUCER instead
            two, requires coercing 'last' and 'current' to a list before
            pulling them
            apart again.


        What is the problem with constructing this list? Isn't that one
        extremely
        fast line of code?


    it's not the list construction but the lost convenience of named
    arguments, in addition to consistency with Reduce when the data are
    presented iteratively -- REDUCER=`+` instead of
    REDUCER=function(lst) sum(unlist(lst, use.names=FALSE)).



        It seems to me simpler to settle on one signature, and my
        preference would
        be for the single list argument, just because the call is
        smaller and
        simpler. Then have a convenient adaptor to handle the variadic case.


    The variadic adapter concept is easy enough to understand in
    context, but would send me for a head scratch at some later time.

    Martin





            Valerie



            On 06/15/14 16:36, Michael Lawrence wrote:

                I kind of prefer the adaptor solution, just for the sake
                of API
                cleanliness
                (the MAPPER/REDUCER pair has some elegance), but I think
                we agree that the
                iterate switch introduces undesirable coupling.




                On Sun, Jun 15, 2014 at 3:07 PM, Ryan
                <r...@thompsonclan.org <mailto:r...@thompsonclan.org>> wrote:

                   What about having two separate reducer arguments, one
                for a reducer that

                    takes two elements at a time and combines them, and
                    the other for a
                    reducer
                    that takes a list and combines all the elements of
                    the list? Specifying
                    both at once would be an error. I think it makes
                    more sense to say "these
                    two arguments expect different things" than "this
                    one argument expects a
                    different thing depending on the value of another
                    argument".

                    -Ryan


                    On Sun Jun 15 11:17:59 2014, Michael Lawrence wrote:

                       I just thought there is some benefit for the
                    callback to be the same,

                        regardless of the iterate setting. This would
                        allow generalization
                        across
                        different data scales. Perhaps all that is
                        needed is a constructor for
                        an
                        adapter closure, one for each direction.

                        For example, the variadic adapter would look like:

                        Variadic <- function(FUN) {
                              function(x, y) {
                                if (missing(y)) {
                                  do.call(FUN, x)
                                } else {
                                  FUN(x, y)
                                }
                              }
                        }

                        That would make it easy to e.g. adapt rbind into
                        the framework. I wonder
                        if
                        there is precedent and better terminology from
                        the functional
                        programming
                        domain?

                        Michael



                        On Sun, Jun 15, 2014 at 8:38 AM, Martin Morgan
                        <mtmor...@fhcrc.org <mailto:mtmor...@fhcrc.org>>
                        wrote:

                            On 06/15/2014 07:34 AM, Michael Lawrence wrote:


                                Hi guys,


                                Was just checking out GenomicFiles and
                                was a little surprised that the
                                arguments to the REDUCER are different
                                depending on iterate=TRUE vs.
                                iterate=FALSE. In my often flawed
                                opinion, iteration should not be a
                                concern of the REDUCER. It should be
                                oblivious to the iteration mode.
                                In
                                other words, when iterate=TRUE, it is a
                                special case of having two
                                objects
                                to combine, instead of multiple.


                                   My 'rationale' was that one would
                                choose iterate=FALSE when one

                            required
                            all elements to perform the reduction. I
                            thought of the list (rather
                            than
                            ...) as the general R data structure for
                            representing N elements, with
                            a
                            special case (consistent with Reduce) made
                            for the pairwise reduction
                            of
                            iterate=TRUE. Either way, the two cases (x,
                            y vs. list(), x, y vs. ...)
                            seem to require some explaining to the user.
                            Is there a clear better
                            choice? You're the second person to trip
                            over this, so I guess there's
                            a
                            crack in the sidewalk...

                            Martin


                                What would be convenient (but
                            unnecessary) is to detect from the
                            formal

                                arguments whether REDUCER is variadic or
                                list-based. In other words,
                                if
                                REDUCER is defined like function(...) {
                                } it is called via do.call(),
                                otherwise it is passed the list.

                                Thoughts? Maybe I'm totally confused?

                                Michael

                                            [[alternative HTML version
                                deleted]]

                                
_________________________________________________
                                Bioc-devel@r-project.org
                                <mailto:Bioc-devel@r-project.org>
                                mailing list
                                
https://stat.ethz.ch/mailman/__listinfo/bioc-devel
                                
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>



                                   --

                            Computational Biology / Fred Hutchinson
                            Cancer Research Center
                            1100 Fairview Ave. N.
                            PO Box 19024 Seattle, WA 98109

                            Location: Arnold Building M1 B861
                            Phone: (206) 667-2793 <tel:%28206%29%20667-2793>


                                        [[alternative HTML version deleted]]


                        _________________________________________________
                        Bioc-devel@r-project.org
                        <mailto:Bioc-devel@r-project.org> mailing list
                        https://stat.ethz.ch/mailman/__listinfo/bioc-devel
                        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>



                          [[alternative HTML version deleted]]

                _________________________________________________
                Bioc-devel@r-project.org
                <mailto:Bioc-devel@r-project.org> mailing list
                https://stat.ethz.ch/mailman/__listinfo/bioc-devel
                <https://stat.ethz.ch/mailman/listinfo/bioc-devel>





                 [[alternative HTML version deleted]]

        _________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        mailing list
        https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>



    --
    Computational Biology / Fred Hutchinson Cancer Research Center
    1100 Fairview Ave. N.
    PO Box 19024 Seattle, WA 98109

    Location: Arnold Building M1 B861
    Phone: (206) 667-2793 <tel:%28206%29%20667-2793>



_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to