On 19/06/2021 9:58 a.m., Remo Röthlin wrote:
Dear useRs

I’m encountering an unexpected behaviour when trying to apply format(x, 
scientific = TRUE) on integer vectors (but not double vectors).
The resulting string is not formatted in scientific notation, however, using 
formatC() instead, the result is as expected.

Is this the expected behaviour of format(x, scientific = TRUE)? I haven’t found 
any information or discussion on a difference in scientific notation between 
format and formatC.

If you look at the internals of the format.default() function, you'll see that it ignores the "scientific" argument when the type of the argument is integer:

https://github.com/wch/r-source/blob/23dc578c6f40acdf53f92bab88cf91ecd25cd2e8/src/main/paste.c#L543-L552

The help page describes that argument as:

`Either a logical specifying whether elements of a real or complex vector should be encoded in scientific format, or an integer penalty (see options("scipen")). Missing values correspond to the current default penalty.`

so there's no reason to expect it applies to integer vectors as well.

I suspect the reason for this goes back to S, which was influenced more by Fortran than by C: and I think Fortran (at least as it was in the 70s and 80s) never used scientific notation on integers.

Duncan Murdoch

Both functions are implemented as .Internal() functions in C, and while 
do_formatC() uses C’s directly built-in capabilities to format, do_format() 
does additional work.
Unfortunately my knowledge of R internals is not good enough to see why 
format() treats integers differently in this case.

Warm regards,

Remo

SessionInfo and code to reproduce the issue with output (was also reproduced on 
Windows 10 x64 R 4.1.0 and RStudio Cloud R 3.6.3 & R 4.0.3):

sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   
/Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK: 
/Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.1.0
Sys.getlocale()
[1] "de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8"

numvec <- c(-1.23e4, 1.23e4)
typeof(numvec) # double
[1] "double"

intvec <- c(-1.23e4L, 1.23e4L)
typeof(intvec) # integer
[1] "integer"

numvec2 <- as.double(intvec)
identical(numvec, numvec2)
[1] TRUE

formatC(numvec, format = "e") # Formatted as scientific notation
[1] "-1.2300e+04" "1.2300e+04"
format(numvec, scientific = TRUE) # Formatted as scientific notation
[1] "-1.23e+04" " 1.23e+04"

formatC(intvec, format = "e") # Formatted as scientific notation
[1] "-1.2300e+04" "1.2300e+04"
format(intvec, scientific = TRUE) # *Not* formatted as scientific notation
[1] "-12300" " 12300"

formatC(numvec2, format = "e") # Formatted as scientific notation
[1] "-1.2300e+04" "1.2300e+04"
format(numvec2, scientific = TRUE) # Formatted as scientific notation
[1] "-1.23e+04" " 1.23e+04"

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to