On 19/06/2021 9:58 a.m., Remo Röthlin wrote:
Dear useRs
I’m encountering an unexpected behaviour when trying to apply format(x,
scientific = TRUE) on integer vectors (but not double vectors).
The resulting string is not formatted in scientific notation, however, using
formatC() instead, the result is as expected.
Is this the expected behaviour of format(x, scientific = TRUE)? I haven’t found
any information or discussion on a difference in scientific notation between
format and formatC.
If you look at the internals of the format.default() function, you'll
see that it ignores the "scientific" argument when the type of the
argument is integer:
https://github.com/wch/r-source/blob/23dc578c6f40acdf53f92bab88cf91ecd25cd2e8/src/main/paste.c#L543-L552
The help page describes that argument as:
`Either a logical specifying whether elements of a real or complex
vector should be encoded in scientific format, or an integer penalty
(see options("scipen")). Missing values correspond to the current
default penalty.`
so there's no reason to expect it applies to integer vectors as well.
I suspect the reason for this goes back to S, which was influenced more
by Fortran than by C: and I think Fortran (at least as it was in the
70s and 80s) never used scientific notation on integers.
Duncan Murdoch
Both functions are implemented as .Internal() functions in C, and while
do_formatC() uses C’s directly built-in capabilities to format, do_format()
does additional work.
Unfortunately my knowledge of R internals is not good enough to see why
format() treats integers differently in this case.
Warm regards,
Remo
SessionInfo and code to reproduce the issue with output (was also reproduced on
Windows 10 x64 R 4.1.0 and RStudio Cloud R 3.6.3 & R 4.0.3):
sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
BLAS:
/Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.0
Sys.getlocale()
[1] "de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8"
numvec <- c(-1.23e4, 1.23e4)
typeof(numvec) # double
[1] "double"
intvec <- c(-1.23e4L, 1.23e4L)
typeof(intvec) # integer
[1] "integer"
numvec2 <- as.double(intvec)
identical(numvec, numvec2)
[1] TRUE
formatC(numvec, format = "e") # Formatted as scientific notation
[1] "-1.2300e+04" "1.2300e+04"
format(numvec, scientific = TRUE) # Formatted as scientific notation
[1] "-1.23e+04" " 1.23e+04"
formatC(intvec, format = "e") # Formatted as scientific notation
[1] "-1.2300e+04" "1.2300e+04"
format(intvec, scientific = TRUE) # *Not* formatted as scientific notation
[1] "-12300" " 12300"
formatC(numvec2, format = "e") # Formatted as scientific notation
[1] "-1.2300e+04" "1.2300e+04"
format(numvec2, scientific = TRUE) # Formatted as scientific notation
[1] "-1.23e+04" " 1.23e+04"
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.