Re: [Rd] ifelse() woes ... can we agree on a ifelse2() ?

2016-08-07 Thread Uwe Ligges



On 06.08.2016 17:30, Duncan Murdoch wrote:

On 06/08/2016 10:18 AM, Martin Maechler wrote:

Dear R-devel readers,
( = people interested in the improvement and development of R).

This is not the first time that this topic is raised.
and I am in now state to promise that anything will result from
this thread ...

Still, I think the majority among us has agreed that

1) you should never use ifelse(test, yes, no)
   if you know that length(test) == 1, in which case
if(test) yes else no
   is much preferable  (though not equivalent: ifelse(NA, 1, 0) !)

2) it is potentially inefficient by design since it (almost
   always) evaluates both 'yes' and 'no' independent of 'test'.

3) is a nice syntax in principle, and so is often used, also by
   myself, inspite of '2)'  just because nicely self-explaining
   code is sometimes clearly preferable to more efficient but
   less readable code.

4) it is too late to change ifelse() fundamentally, because it
   works according to its documentation
   (and I think very much the same as in S and S-PLUS) and has
   done so for ages.

 and if you don't agree with  1) -- 4)  you may pretend for
 a moment instead of starting to discuss them thoroughly.

Recently, a useR has alerted me to the fact that my Rmpfr's
package arbitrary (high) precision numbers don't work for a
relatively simple function.

As I found the reason was that that simple function used
 ifelse(.,.,.)
and the problem was that the (*simplified*) gist of ifelse(test, yes, no)
is

  test <- as.logical(test)
  ans <- test
  ans[ test] <- yes
  ans[!test] <- no

and in case of Rmpfr, the problem is that

   []  <-  

cannot work correctly

[[ maybe it could in a future R, if I could define a method

   setReplaceMethod("[", c("logical,"logical","mpfr"),
function(x,i,value) .)

   but that currently fails as the C-low-level dispatch for '[<-'
   does not look at the full signature
 ]]

I vaguely remember having seen proposals for
light weight substitutes for ifelse(),  called
 ifelse1() or
 ifelse2() etc...

and I wonder if we should not try to see if there was a version
that could go into "base R" (maybe the 'utils' package, not
   'base'; that's not so important).

One difference to ifelse() would be that the type/mode/class of the
result
is not initialized by logical, by default but rather by the
"common type" of  yes and no ... maybe determined  by  c()'ing
parts of those.
The idea was that this would work for most S3 and S4 objects for
which logical 'length', (logical) indexing '[', and 'rep()' works.


I think your description is more or less:

   test <- as.logical(test)
   ans <- c(yes, no)[seq_along(test)]
   ans <- ans[seq_along(test)]
   ans[ test] <- yes[test]
   ans[!test] <- no[!test]

(though the implementation details would vary, and recycling rules would
apply if the lengths of test, yes and no weren't all equal).

You didn't mention what happens with attributes.  Currently we keep the
attributes from test, which probably doesn't make a lot of sense. In
particular,

ifelse(c(TRUE, FALSE), factor(2:3), factor(3:4))

returns nonsense, as does my translation of your idea above.

That implementation also drops attributes. I'd say this definition would
make more sense:

   test <- as.logical(test)
   ans <- yes
   ans[!test] <- no[!test]

(and this is suggested as an alternative in ?ifelse).  It generates an
error in my test example, which seems reasonable.  It gives the "right"
thing in

ifelse(c(TRUE, FALSE), factor(2:3), factor(3:2))

because the factors have the same levels.

The lack of symmetry between yes and no is slightly irksome, but I would
think in most cases you could choose attributes from just one of yes and
no to be what you want in the result (and use !test to swap the order if
necessary).



One possibility would also be to consider  a "numbers-only" or
rather "same type"-only {e.g., would also work for characters}
version.


I don't know what you mean by these.


Of course, an ifelse2()  should also be more efficient than
ifelse() in typical "atomic" cases.


I don't think it is obvious how to make it more efficient.  ifelse()
already skips evaluation of yes or no if not needed.  (An argument could
be made that it would be better to guarantee evaluation of both, but
it's usually easy enough to do this explicitly, so I don't see a need.)


Same from here: I do not see how this can easily be made more efficient, 
since evaluating ony parts causes a lot of copies of objects whichs 
slows stuff down, hence you need some complexity in yes and no to make 
evaluations of parts of them more efficient on R level.



Anyway, to solve the problem, we may want an add argument to ifelse2() 
that allows for specification of the type of the result (as vapply does)?


Best,
Uwe


Duncan Murdoch




Thank you for your ideas and suggestions.
Again, there's no promise of implementation coming along wi

[Rd] table(exclude = NULL) always includes NA

2016-08-07 Thread Suharto Anggono Suharto Anggono via R-devel
This is an example from 
https://stat.ethz.ch/pipermail/r-help/2007-May/132573.html .

With R 2.7.2:

> a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
> table(a, b, exclude = NULL)
  b
a  1 2
  11 1
  22 0
  31 0
   1 0

With R 3.3.1:

> a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
> table(a, b, exclude = NULL)
  b
a  1 2 
  11 10
  22 00
  31 00
   1 00
> table(a, b, useNA = "ifany")
  b
a  1 2
  11 1
  22 0
  31 0
   1 0
> table(a, b, exclude = NULL, useNA = "ifany")
  b
a  1 2 
  11 10
  22 00
  31 00
   1 00

For the example, in R 3.3.1, the result of 'table' with exclude = NULL includes 
NA even if NA is not present. It is different from R 2.7.2, that comes from 
factor(exclude = NULL), that includes NA only if NA is present.

>From R 3.3.1 help on 'table', in "Details" section:
'useNA' controls if the table includes counts of 'NA' values: the allowed 
values correspond to never, only if the count is positive and even for zero 
counts.  This is overridden by specifying 'exclude = NULL'.

Specifying 'exclude = NULL' overrides 'useNA' to what value? The documentation 
doesn't say. Looking at the code of function 'table', the value is "always".

For the example, in R 3.3.1, the result like in R 2.7.2 can be obtained with 
useNA = "ifany" and 'exclude' unspecified.


The result of 'summary' of a logical vector is affected. As mentioned in 
http://stackoverflow.com/questions/26775501/r-dropping-nas-in-logical-column-levels
 , in the code of function 'summary.default', for logical, table(object, 
exclude = NULL) is used.

With R 2.7.2:

> log <- c(NA, logical(4), NA, !logical(2), NA)
> summary(log)
   Mode   FALSETRUENA's
logical   4   2   3
> summary(log[!is.na(log)])
   Mode   FALSETRUE
logical   4   2
> summary(TRUE)
   ModeTRUE
logical   1

With R 3.3.1:

> log <- c(NA, logical(4), NA, !logical(2), NA)
> summary(log)
   Mode   FALSETRUENA's
logical   4   2   3
> summary(log[!is.na(log)])
   Mode   FALSETRUENA's
logical   4   2   0
> summary(TRUE)
   ModeTRUENA's
logical   1   0

In R 3.3.1, "NA's' is always in the result of 'summary' of a logical vector. It 
is unlike 'summary' of a numeric vector.
On the other hand, in R 3.3.1, FALSE is not in the result of 'summary' of a 
logical vector that doesn't  contain FALSE.

I prefer the result of 'summary' of a logical vector like in R 2.7.2, or, 
alternatively, the result that always includes all possible values: FALSE, 
TRUE, NA.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] problem with abine(lm(...)) for plot(y~x, log='xy')

2016-08-07 Thread Spencer Graves

Hello:


   In the following plot, the fitted line plots 100 percent above 
the points:



tstDat <- data.frame(x=10^(1:3), y=10^(1:3+.1*rnorm(3)))
tstFit <- lm(log(y)~log(x), tstDat)
plot(y~x, tstDat, log='xy')
abline(tstFit)


   I can get the correct line with the following:


tstPredDat <- data.frame(x=10^seq(1, 3, len=2))
tstPred <- predict(tstFit, tstPredDat)
lines(tstPredDat$x, exp(tstPred))


   I tried "abline(tstFit)" hoping it would work.  If the error had 
not been so obvious, I might not have noticed it.



   Thanks for your work to build a better R (and through that a 
better world).



   Spencer Graves

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] problem with abine(lm(...)) for plot(y~x, log='xy')

2016-08-07 Thread peter dalgaard
Try log10()...

-pd

> On 07 Aug 2016, at 21:03 , Spencer Graves  wrote:
> 
> Hello:
> 
> 
>   In the following plot, the fitted line plots 100 percent above the 
> points:
> 
> 
> tstDat <- data.frame(x=10^(1:3), y=10^(1:3+.1*rnorm(3)))
> tstFit <- lm(log(y)~log(x), tstDat)
> plot(y~x, tstDat, log='xy')
> abline(tstFit)
> 
> 
>   I can get the correct line with the following:
> 
> 
> tstPredDat <- data.frame(x=10^seq(1, 3, len=2))
> tstPred <- predict(tstFit, tstPredDat)
> lines(tstPredDat$x, exp(tstPred))
> 
> 
>   I tried "abline(tstFit)" hoping it would work.  If the error had not 
> been so obvious, I might not have noticed it.
> 
> 
>   Thanks for your work to build a better R (and through that a better 
> world).
> 
> 
>   Spencer Graves
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel