Perhaps you and Andrew should take this discussion off list... Bert Gunter
"The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Sep 17, 2021 at 3:45 PM Leonard Mada via R-help <r-help@r-project.org> wrote: > > Why would you want to merge different factors? > > It makes no sense on real data. Even if some names are the same, the > factors are not the same! > > > The only real-data application that springs to mind is censoring (right > or left, depending on the choice): but here we have both open and closed > intervals, e.g. to the right (in the same data-set). > > > Leonard > > > On 9/18/2021 1:29 AM, Andrew Simmons wrote: > > I disagree, I don't really think it's too long or ugly, but if you > > think it is, you could abbreviate it as 'i'. > > > > > > x <- 0:20 > > breaks1 <- seq.int <http://seq.int>(0, 16, 4) > > breaks2 <- seq.int <http://seq.int>(0, 20, 4) > > data.frame( > > cut(x, breaks1, right = FALSE, i = TRUE), > > cut(x, breaks2, right = FALSE, i = TRUE), > > check.names = FALSE > > ) > > > > > > I hope this helps. > > > > On Fri, Sep 17, 2021 at 6:26 PM Leonard Mada <leo.m...@syonic.eu > > <mailto:leo.m...@syonic.eu>> wrote: > > > > Hello Andrew, > > > > > > But "cut" generates factors. In most cases with real data one > > expects to have also the ends of the interval: the argument > > "include.lowest" is both ugly and too long. > > > > [The test-code on the ftable thread contains this error! I have > > run through this error a couple of times.] > > > > > > The only real situation that I can imagine to be problematic: > > > > - if the interval goes to +Inf (or -Inf): I do not know if there > > would be any effects when including +Inf (or -Inf). > > > > > > Leonard > > > > > > On 9/18/2021 1:14 AM, Andrew Simmons wrote: > >> While it is not explicitly mentioned anywhere in the > >> documentation for .bincode, I suspect 'include.lowest = FALSE' is > >> the default to keep the definitions of the bins consistent. For > >> example: > >> > >> > >> x <- 0:20 > >> breaks1 <- seq.int <http://seq.int>(0, 16, 4) > >> breaks2 <- seq.int <http://seq.int>(0, 20, 4) > >> cbind( > >> .bincode(x, breaks1, right = FALSE, include.lowest = TRUE), > >> .bincode(x, breaks2, right = FALSE, include.lowest = TRUE) > >> ) > >> > >> > >> by having 'include.lowest = TRUE' with different ends, you can > >> get inconsistent behaviour. While this probably wouldn't be an > >> issue with 'real' data, this would seem like something you'd want > >> to avoid by default. The definitions of the bins are > >> > >> > >> [0, 4) > >> [4, 8) > >> [8, 12) > >> [12, 16] > >> > >> > >> and > >> > >> > >> [0, 4) > >> [4, 8) > >> [8, 12) > >> [12, 16) > >> [16, 20] > >> > >> > >> so you can see where the inconsistent behaviour comes from. You > >> might be able to get R-core to add argument 'warn', but probably > >> not to change the default of 'include.lowest'. I hope this helps > >> > >> > >> On Fri, Sep 17, 2021 at 6:01 PM Leonard Mada <leo.m...@syonic.eu > >> <mailto:leo.m...@syonic.eu>> wrote: > >> > >> Thank you Andrew. > >> > >> > >> Is there any reason not to make: include.lowest = TRUE the > >> default? > >> > >> > >> Regarding the NA: > >> > >> The user still has to suspect that some values were not > >> included and run that test. > >> > >> > >> Leonard > >> > >> > >> On 9/18/2021 12:53 AM, Andrew Simmons wrote: > >>> Regarding your first point, argument 'include.lowest' > >>> already handles this specific case, see ?.bincode > >>> > >>> Your second point, maybe it could be helpful, but since both > >>> 'cut.default' and '.bincode' return NA if a value isn't > >>> within a bin, you could make something like this on your own. > >>> Might be worth pitching to R-bugs on the wishlist. > >>> > >>> > >>> > >>> On Fri, Sep 17, 2021, 17:45 Leonard Mada via R-help > >>> <r-help@r-project.org <mailto:r-help@r-project.org>> wrote: > >>> > >>> Hello List members, > >>> > >>> > >>> the following improvements would be useful for function > >>> cut (and .bincode): > >>> > >>> > >>> 1.) Argument: Include extremes > >>> extremes = TRUE > >>> if(right == FALSE) { > >>> # include also right for last interval; > >>> } else { > >>> # include also left for first interval; > >>> } > >>> > >>> > >>> 2.) Argument: warn = TRUE > >>> > >>> Warn if any values are not included in the intervals. > >>> > >>> > >>> Motivation: > >>> - reduce risk of errors when using function cut(); > >>> > >>> > >>> Sincerely, > >>> > >>> > >>> Leonard > >>> > >>> ______________________________________________ > >>> R-help@r-project.org <mailto:R-help@r-project.org> > >>> mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> <https://stat.ethz.ch/mailman/listinfo/r-help> > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> <http://www.R-project.org/posting-guide.html> > >>> and provide commented, minimal, self-contained, > >>> reproducible code. > >>> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.