I see the problem on both Linux and Windows, R-3.0.1. > vapply(as.numeric(9994:9995), function(x)format(x, scientific=FALSE, digits=3), "") [1] "9994" " 9995" > vapply(as.numeric(99994:99995), function(x)format(x, scientific=FALSE, digits=4), "") [1] "99994" " 99995" > vapply(as.numeric(999994:999995), function(x)format(x, scientific=FALSE, digits=5), "") [1] "999994" " 999995"
The ones with the initial space are the ones that would round up to the next power of 10 when rounded to the requested number of significant digits: > x <- as.numeric(1:5e5) > z <- vapply(x, function(x)format(x, scientific=FALSE, digits=3), "") > i <- grep(" ", z) > z[i] [1] " 9995" " 9996" " 9997" " 9998" " 9999" " 99950" " 99951" " 99952" [9] " 99953" " 99954" " 99955" " 99956" " 99957" " 99958" " 99959" " 99960" [17] " 99961" " 99962" " 99963" " 99964" " 99965" " 99966" " 99967" " 99968" [25] " 99969" " 99970" " 99971" " 99972" " 99973" " 99974" " 99975" " 99976" [33] " 99977" " 99978" " 99979" " 99980" " 99981" " 99982" " 99983" " 99984" [41] " 99985" " 99986" " 99987" " 99988" " 99989" " 99990" " 99991" " 99992" [49] " 99993" " 99994" " 99995" " 99996" " 99997" " 99998" " 99999" > print(x[i], digits=3) [1] 1e+04 1e+04 1e+04 1e+04 1e+04 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 [13] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 [25] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 [37] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 [49] 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 1e+05 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > Of Mathieu Basille > Sent: Thursday, August 01, 2013 8:31 AM > To: R help > Subject: Re: [R] 'format' behaviour in a 'apply' call depending on > 'options(digits = K)' > > This problem does not seem to be widely popular, but at least affects two > users (both on Linux, maybe a hint here?). To me, it looks like a bug (is > it a R bug, or a OS-related bug, I don't know). Should I forward it to > R-devel, or some other place where R gurus may have a chance to look at it? > > Mathieu. > > > Le 07/30/2013 02:34 PM, arun a écrit : > > Hi Mathieu > > yes, the original problem occurs in my system too. I am using R 3.0.1 on > > linux mint 15. I > guess the default case would be trim=FALSE, but still it looks very strange > especially in > ?apply(), as it starts from " 99995" onwards. > > > > sessionInfo() > > R version 3.0.1 (2013-05-16) > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > locale: > > [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 > > [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8 > > [7] LC_PAPER=C LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] stringr_0.6.2 reshape2_1.2.2 > > > > loaded via a namespace (and not attached): > > [1] plyr_1.8 tools_3.0.1 > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > From: Mathieu Basille <basille....@ase-research.org> > > To: arun <smartpink...@yahoo.com> > > Cc: R help <r-help@r-project.org> > > Sent: Tuesday, July 30, 2013 2:29 PM > > Subject: Re: [R] 'format' behaviour in a 'apply' call depending on > > 'options(digits = K)' > > > > Thanks Arun for your answer. 'trim = TRUE' does indeed solve the symptoms > > of the problem, and this is the solution I'm currently using. However, it > > does not help to understand what the problem is, and what is the cause of > > it. > > > > Can you confirm that the original problem also occurs on your computer (and > > what is your OS)? It would be interesting since David is not able to > > reproduce the problem with Mac OS X. > > Mathieu. > > > > > > Le 07/30/2013 02:15 PM, arun a écrit : > >> Hi, > >> Try using trim=TRUE, in ?format() > >> options(digits=4) > >> > >> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) > >> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], > >> trim=TRUE,scientific = FALSE)) > >> df2$id2[99990:100010] > >> # [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" > >> "99997" > >> # [9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" > >> "100005" > >> #[17] "100006" "100007" "100008" "100009" "100010" > >> > >> > >> id2 <- format(1:110000, scientific = FALSE,trim=TRUE) > >> id2[99990:100010] > >> # [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" > >> "99997" > >> #[9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" > >> "100005" > >> #[17] "100006" "100007" "100008" "100009" "100010" > >> A.K. > >> > >> > >> ----- Original Message ----- > >> From: Mathieu Basille <basille....@ase-research.org> > >> To: David Winsemius <dwinsem...@comcast.net> > >> Cc: r-help@r-project.org > >> Sent: Tuesday, July 30, 2013 2:07 PM > >> Subject: Re: [R] 'format' behaviour in a 'apply' call depending on > >> 'options(digits = K)' > >> > >> Thanks David for your interest. I have to admit that your answer puzzles me > >> even more than before. It seems that the underlying problem is way beyond > >> my R skills... > >> > >> The generation of id2 is indeed quite demanding, especially compared to a > >> simple 'as.character' call. Anyway, since it seems to be system specific, > >> here is the sessionInfo() that I forgot to attach to my first message: > >> > >> R version 3.0.1 (2013-05-16) > >> Platform: x86_64-pc-linux-gnu (64-bit) > >> > >> locale: > >> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C > >> [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 > >> [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 > >> [7] LC_PAPER=C LC_NAME=C > >> [9] LC_ADDRESS=C LC_TELEPHONE=C > >> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C > >> > >> attached base packages: > >> [1] stats graphics grDevices utils datasets methods base > >> > >> In brief: last stable R available under Debian Testing... Hopefully this > >> can help tracking down the problem. > >> Mathieu. > >> > >> > >> Le 07/30/2013 01:58 PM, David Winsemius a écrit : > >>> > >>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote: > >>> > >>>> Dear list, > >>>> > >>>> Here is a simple example in which the behaviour of 'format' does not > >>>> make sense to > me. I have read the documentation and searched the archives, but nothing > pointed me in > the right direction to understand this behaviour. Let's start with a simple > data frame: > >>>> > >>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) > >>>> > >>>> Let's now create a new variable 'id2' which is the character > >>>> representation of 'id'. > Note that I use 'scientific = FALSE' to ensure that long numbers such as > 100,000 are not > formatted using their scientific representation (in this case 1e+05): > >>>> > >>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = > >>>> FALSE)) > >>>> > >>>> Let's have a look at part of the result: > >>>> > >>>> df1$id2[99990:100010] > >>>> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" > >>>> [8] "99997" "99998" "99999" "100000" "100001" "100002" "100003" > >>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" > >>> > >>> Some formating processes are carried out by system functions. In this > >>> case I am > unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched > >>> > >>>> df1$id2[99990:100010] > >>> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" > >>> "99997" > >>> [9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" > >>> "100005" > >>> [17] "100006" "100007" "100008" "100009" "100010" > >>> > >>> (I did notice that generation of the id2 variable seemed to take an > >>> inordinately long > time.) > >>> > >>> -- David. > >>>> > >>>> So far, so good. Let's now play with the 'digits' option: > >>>> > >>>> options(digits = 4) > >>>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) > >>>> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = > >>>> FALSE)) > >>>> df2$id2[99990:100010] > >>>> [1] "99990" "99991" "99992" "99993" "99994" " 99995" " 99996" > >>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" > >>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" > >>>> > >>>> Notice the extra leading space from 99995 to 99999? To make sure it only > happened there: > >>>> > >>>> df2$id2[which(df1$id2 != df2$id2)] > >>>> [1] " 99995" " 99996" " 99997" " 99998" " 99999" > >>>> > >>>> And just to make sure it only occurs in a 'apply' call, here is the same > >>>> directly on a > numeric vector: > >>>> > >>>> id2 <- format(1:110000, scientific = FALSE) > >>>> id2[99990:100010] > >>>> [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996" > >>>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003" > >>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" > >>>> > >>>> Here the leading spaces are for every number, which makes sense to me. > >>>> Is there > anything I'm misinterpreting in the behaviour of 'format'? > >>>> Thanks in advance for any hint, > >>>> Mathieu. > >>>> > >>>> > >>>> PS: Some background for this question. It all comes from a Rmd document, > >>>> that > knitr consistently failed to process, while the R code was fine using batch > or interactive > R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by > default in R, which > made one of my function throw an error with knitr, but not with batch or > interactive R. I > managed to solve the problem using 'trim = TRUE' in 'format', but I still do > not > understand what's going on... > >>>> If you're interested, see here for more details on the original problem: > http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r- > behaviour/17872176 > >>>> > >>>> > >>>> -- > >>>> > >>>> ~$ whoami > >>>> Mathieu Basille, PhD > >>>> > >>>> ~$ locate --details > >>>> University of Florida \\ > >>>> Fort Lauderdale Research and Education Center > >>>> (+1) 954-577-6314 > >>>> http://ase-research.org/basille > >>>> > >>>> ~$ fortune > >>>> « Le tout est de tout dire, et je manque de mots > >>>> Et je manque de temps, et je manque d'audace. » > >>>> -- Paul Éluard > >>>> > >>>> ______________________________________________ > >>>> R-help@r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>> > >>> David Winsemius > >>> Alameda, CA, USA > >>> > >> > >> > >> > >>> > >>> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote: > >>> > >>>> Dear list, > >>>> > >>>> Here is a simple example in which the behaviour of 'format' does not > >>>> make sense to > me. I have read the documentation and searched the archives, but nothing > pointed me in > the right direction to understand this behaviour. Let's start with a simple > data frame: > >>>> > >>>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000) > >>>> > >>>> Let's now create a new variable 'id2' which is the character > >>>> representation of 'id'. > Note that I use 'scientific = FALSE' to ensure that long numbers such as > 100,000 are not > formatted using their scientific representation (in this case 1e+05): > >>>> > >>>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = > >>>> FALSE)) > >>>> > >>>> Let's have a look at part of the result: > >>>> > >>>> df1$id2[99990:100010] > >>>> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" > >>>> [8] "99997" "99998" "99999" "100000" "100001" "100002" "100003" > >>>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010" > >>> > >>> Some formating processes are carried out by system functions. In this > >>> case I am > unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched > >>> > >>>> df1$id2[99990:100010] > >>> [1] "99990" "99991" "99992" "99993" "99994" "99995" "99996" > >>> "99997" > >>> [9] "99998" "99999" "100000" "100001" "100002" "100003" "100004" > >>> "100005" > >>> [17] "100006" "100007" "100008" "100009" "100010" > >>> > >>> (I did notice that generation of the id2 variable seemed to take an > >>> inordinately long > time.) > >>> > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.