Perhaps you and Andrew should take this discussion off list...

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Fri, Sep 17, 2021 at 3:45 PM Leonard Mada via R-help
<r-help@r-project.org> wrote:
>
> Why would you want to merge different factors?
>
> It makes no sense on real data. Even if some names are the same, the
> factors are not the same!
>
>
> The only real-data application that springs to mind is censoring (right
> or left, depending on the choice): but here we have both open and closed
> intervals, e.g. to the right (in the same data-set).
>
>
> Leonard
>
>
> On 9/18/2021 1:29 AM, Andrew Simmons wrote:
> > I disagree, I don't really think it's too long or ugly, but if you
> > think it is, you could abbreviate it as 'i'.
> >
> >
> > x <- 0:20
> > breaks1 <- seq.int <http://seq.int>(0, 16, 4)
> > breaks2 <- seq.int <http://seq.int>(0, 20, 4)
> > data.frame(
> >     cut(x, breaks1, right = FALSE, i = TRUE),
> >     cut(x, breaks2, right = FALSE, i = TRUE),
> >     check.names = FALSE
> > )
> >
> >
> > I hope this helps.
> >
> > On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada <leo.m...@syonic.eu
> > <mailto:leo.m...@syonic.eu>> wrote:
> >
> >     Hello Andrew,
> >
> >
> >     But "cut" generates factors. In most cases with real data one
> >     expects to have also the ends of the interval: the argument
> >     "include.lowest" is both ugly and too long.
> >
> >     [The test-code on the ftable thread contains this error! I have
> >     run through this error a couple of times.]
> >
> >
> >     The only real situation that I can imagine to be problematic:
> >
> >     - if the interval goes to +Inf (or -Inf): I do not know if there
> >     would be any effects when including +Inf (or -Inf).
> >
> >
> >     Leonard
> >
> >
> >     On 9/18/2021 1:14 AM, Andrew Simmons wrote:
> >>     While it is not explicitly mentioned anywhere in the
> >>     documentation for .bincode, I suspect 'include.lowest = FALSE' is
> >>     the default to keep the definitions of the bins consistent. For
> >>     example:
> >>
> >>
> >>     x <- 0:20
> >>     breaks1 <- seq.int <http://seq.int>(0, 16, 4)
> >>     breaks2 <- seq.int <http://seq.int>(0, 20, 4)
> >>     cbind(
> >>         .bincode(x, breaks1, right = FALSE, include.lowest = TRUE),
> >>         .bincode(x, breaks2, right = FALSE, include.lowest = TRUE)
> >>     )
> >>
> >>
> >>     by having 'include.lowest = TRUE' with different ends, you can
> >>     get inconsistent behaviour. While this probably wouldn't be an
> >>     issue with 'real' data, this would seem like something you'd want
> >>     to avoid by default. The definitions of the bins are
> >>
> >>
> >>     [0, 4)
> >>     [4, 8)
> >>     [8, 12)
> >>     [12, 16]
> >>
> >>
> >>     and
> >>
> >>
> >>     [0, 4)
> >>     [4, 8)
> >>     [8, 12)
> >>     [12, 16)
> >>     [16, 20]
> >>
> >>
> >>     so you can see where the inconsistent behaviour comes from. You
> >>     might be able to get R-core to add argument 'warn', but probably
> >>     not to change the default of 'include.lowest'. I hope this helps
> >>
> >>
> >>     On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada <leo.m...@syonic.eu
> >>     <mailto:leo.m...@syonic.eu>> wrote:
> >>
> >>         Thank you Andrew.
> >>
> >>
> >>         Is there any reason not to make: include.lowest = TRUE the
> >>         default?
> >>
> >>
> >>         Regarding the NA:
> >>
> >>         The user still has to suspect that some values were not
> >>         included and run that test.
> >>
> >>
> >>         Leonard
> >>
> >>
> >>         On 9/18/2021 12:53 AM, Andrew Simmons wrote:
> >>>         Regarding your first point, argument 'include.lowest'
> >>>         already handles this specific case, see ?.bincode
> >>>
> >>>         Your second point, maybe it could be helpful, but since both
> >>>         'cut.default' and '.bincode' return NA if a value isn't
> >>>         within a bin, you could make something like this on your own.
> >>>         Might be worth pitching to R-bugs on the wishlist.
> >>>
> >>>
> >>>
> >>>         On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help
> >>>         <r-help@r-project.org <mailto:r-help@r-project.org>> wrote:
> >>>
> >>>             Hello List members,
> >>>
> >>>
> >>>             the following improvements would be useful for function
> >>>             cut (and .bincode):
> >>>
> >>>
> >>>             1.) Argument: Include extremes
> >>>             extremes = TRUE
> >>>             if(right == FALSE) {
> >>>                 # include also right for last interval;
> >>>             } else {
> >>>                 # include also left for first interval;
> >>>             }
> >>>
> >>>
> >>>             2.) Argument: warn = TRUE
> >>>
> >>>             Warn if any values are not included in the intervals.
> >>>
> >>>
> >>>             Motivation:
> >>>             - reduce risk of errors when using function cut();
> >>>
> >>>
> >>>             Sincerely,
> >>>
> >>>
> >>>             Leonard
> >>>
> >>>             ______________________________________________
> >>>             R-help@r-project.org <mailto:R-help@r-project.org>
> >>>             mailing list -- To UNSUBSCRIBE and more, see
> >>>             https://stat.ethz.ch/mailman/listinfo/r-help
> >>>             <https://stat.ethz.ch/mailman/listinfo/r-help>
> >>>             PLEASE do read the posting guide
> >>>             http://www.R-project.org/posting-guide.html
> >>>             <http://www.R-project.org/posting-guide.html>
> >>>             and provide commented, minimal, self-contained,
> >>>             reproducible code.
> >>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to