Re: [Rd] Discrepancy: R sum() VS C or Fortran sum
Pierre, It's comforting to think that the same simple summation loop implemented in R, C, and Fortran would give bit-wise identical results, but that isn't the case in practice. Floating-point results depend on lots of things: the chip used, the compiler used, the optimization flags, etc. For example, in the unlikely event that you run this on a 32-bit machine, you have the messy 80-bit internal precision used by the double precision hardware on the 32-bit platform to consider. This is enough to derail the "bit-wise equivalence train" right away. There are plenty of other such issues that can rise up and play a role. Of course, there may be something about R's double precision evaluation that is wrong, but IMHO a failure to be bit-wise identical with a computation done elsewhere isn't enough to say a bug has been turfed up. IMHO it's not unusual for computations to differ by the least significant bit or two. If that is enough to derail your computations for the Jacobians and Hessians, that is at least a teachable moment for your numerical methods class you have. BTW, are you using finite difference techniques for the derivatives or computational derivatives? The latter techniques are not so sensitive to round-off errors: I would expect them to be as accurate as symbolic derivatives. HTH, -Steve On Fri, Mar 16, 2018 at 10:58 AM, Pierre Chausse wrote: > Hi all, > > I found a discrepancy between the sum() in R and either a sum done in C or > Fortran for vector of just 5 elements. The difference is very small, but > this is a very small part of a much larger numerical problem in which first > and second derivatives are computed numerically. This is part of a > numerical method course I am teaching in which I want to compare speeds of > R versus Fortran (We solve a general equilibrium problem all numerically, > if you want to know). Because of this discrepancy, the Jacobian and Hessian > in R versus in Fortran are quite different, which results in the Newton > method producing a different solution (for a given stopping rule). Since > the solution computed in Fortran is almost identical to the analytical > solution, I suspect that the sum in Fortran may be more accurate (That's > just a guess). Most of the time the sum produces identical results, but > for some numbers, it is different. The following example, shows what > happens: > > set.seed(12233) > n <- 5 > a <- runif(n,1,5) > e <- runif(n, 5*(1:n),10*(1:n)) > s <- runif(1, 1.2, 4) > p <- runif(5, 3, 10) > x <- c(e[-5], (sum(e*p)-sum(e[-5]*p[-5]))/p[5]) > u <- a^(1/s)*x^((s-1)/s) > dyn.load("sumF.so") > > u[1] <- u[1]+.0001 ### If we do not add .0001, all differences are 0 > s1 <- sum(u) > s2 <- .Fortran("sumf", as.double(u), as.integer(n), sf1=double(1), >sf2=double(1))[3:4] > s3 <- .C("sumc", as.double(u), as.integer(n), sC=double(1))[[3]] > > s1-s2[[1]] ## R versus compiler sum() (Fortran) > > [1] -7.105427e-15 > > s1-s2[[2]] ## R versus manual sum (Fortran > > [1] -7.105427e-15 > > s1-s3 ## R Versus manual sum in C > > [1] -7.105427e-15 > > s2[[2]]-s2[[1]] ## manual sum versus compiler sum() (Fortran) > > [1] 0 > > s3-s2[[2]] ## Fortran versus C > > [1] 0 > > My sumf and sumc are > > subroutine sumf(x, n, sx1, sx2) > integer i, n > double precision x(n), sx1, sx2 > sx1 = sum(x) > sx2 = 0.0d0 > do i=1,n > sx2 = sx2+x(i) > end do > end > > void sumc(double *x, int *n, double *sum) > { > int i; > double sum1 = 0.0; > for (i=0; i< *n; i++) { > sum1 += x[i]; > } > *sum = sum1; > } > > Can that be a bug? Thanks. > > -- > Pierre Chaussé > Assistant Professor > Department of Economics > University of Waterloo > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Steven Dirkse, Ph.D. GAMS Development Corp. office: 202.342.0180 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] round(x, dig) [was "Development version of R fails tests .."]
ive difference: 9.27902e-05 > >> Execution halted > >> > >> This happens while the 32bit architecture is installed, which is a > bit > >> surprising as I get the following results for the last installed > >> version of R's development version: > >> > >> R Under development (unstable) (2020-01-25 r77715) -- "Unsuffered > Consequences" > >> Copyright (C) 2020 The R Foundation for Statistical Computing > >> Platform: x86_64-pc-linux-gnu/32 (32-bit) > >> [...] > >> > round(10.7775, digits=3) > >> [1] 10.778 > >> > >> and > >> > >> R Under development (unstable) (2020-01-25 r77715) -- "Unsuffered > Consequences" > >> Copyright (C) 2020 The R Foundation for Statistical Computing > >> Platform: x86_64-pc-linux-gnu/64 (64-bit) > >> [...] > >> > round(10.7775, digits=3) > >> [1] 10.778 > >> > >> > >> On the other hand, the R 3.6.2 version, that I mainly use at the > moment, > >> gives the following results: > >> > >> R version 3.6.2 (2019-12-12) -- "Dark and Stormy Night" > >> Copyright (C) 2019 The R Foundation for Statistical Computing > >> Platform: x86_64-pc-linux-gnu/32 (32-bit) > >> [...] > >> > round(10.7775, digits=3) > >> [1] 10.777 > >> > >> and > >> > >> R version 3.6.2 (2019-12-12) -- "Dark and Stormy Night" > >> Copyright (C) 2019 The R Foundation for Statistical Computing > >> Platform: x86_64-pc-linux-gnu/64 (64-bit) > >> [...] > >> > round(10.7775, digits=3) > >> [1] 10.777 > >> > >> > >> So it seems as if the behaviour of round() has changed between R > 3.6.2 > >> and the development version. But I do not understand why this test > all > >> of a sudden failed if the results from the last successfully > installed > >> development version of R suggest that the test should be passed. > >> > >> Thanks in advance for any insight and tips. > >> > >> Cheers, > >> Berwin > > Note that r77727 was the last of a few commits I made related to > dealing with R's bug report PR#17668: > https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17668 > > which itself triggered an involved dialogue, mostly online, > visible at the PR's URL above. > > It lead me to also write an R package 'round' (in order to > compare R 3.6.x and later's round() versions, comparing them etc) > with a (not entirely polished) package vignette > that explains how rounding to decimal digits is not at all > trivial and why and how I ended (*) improving R's > round(x, digits) algorithm in R-devel. > > The CRAN version of the package > https://cran.r-project.org/package=round > > install.packages("round") > > is not quite current, notably its vignette isn't and so I have > mentioned in the above thread > ( https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17668#c8 ) > that the latest version of the vignette is also available as > > https://stat.ethz.ch/~maechler/R/Rounding.html > > You can install and load the devel version of 'round' by > >remotes::install_gitlab("mmaechler/round") >require("round") > > and then look a bit at the different versions of round(.) using > >example(roundX) > > i.e. using round::roundX(x, digits, version) > > For those who read so far: I'm really interested in getting > critical (constructive) feedback and comments about what I've > written there (in the bugzilla report, and the package vignette). > It seems almost nobody till now has had much interest and time to delve > into the somewhat intriguing issues. > > Best regards, > Martin Maechler > ETH Zurich and R Core team > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Steven Dirkse, Ph.D. GAMS Development Corp. office: 202.342.0180 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] dput()
Ben, I'll edit and split your question just a little. 1) "Is there a way to get an *exact* ASCII representation of a double-precision value?" 2) "Is there a way to get round-trip behavior, i.e. to make sure that the value, when converted back to double, is identical() to the original" The hexNumeric idea mentioned by Duncan is a positive answer to the first question. It's a little hard to grok at first, but it is fully precise and represents exactly a 64-bit double. And since it is exact it converts back identically. But there is another way to get round-trip behavior. There is a set of routines called dtoa that, when given an IEEE double, produce the shortest sequence of base 10 digits that will map back to the double. There may be some rounding when producing these digits, but of all the digit sequences that would map back to the input x, these routines produce the shortest such. A link to the original routines is here: http://www.netlib.org/fp/dtoa.c and some searching will turn up variants of this code in newer guises. A good question to ask: for all finite doubles, what is the length of the longest digit sequence required? I believe 17 digits is the max digits required. It may be 18, but I doubt it. I don't have an example at hand and I spent some time looking when working with these routines. Oh, BTW, trailing or leading zeros do not count toward the length of the digit sequence. -Steve On Sat, Feb 29, 2020 at 4:21 AM Ben Bolker wrote: > > I think Robin knows about FAQ 7.31/floating point (author of > 'Brobdingnag', among other numerical packages). I agree that this is > surprising (to me). > > To reframe this question: is there way to get an *exact* ASCII > representation of a numeric value (i.e., guaranteeing the restored value > is identical() to the original) ? > > .deparseOpts has > > ‘"digits17"’: Real and finite complex numbers are output using > format ‘"%.17g"’ which may give more precision than the > default (but the output will depend on the platform and there > may be loss of precision when read back). > > ... but this still doesn't guarantee that all precision is kept. > > Maybe > > saveRDS(x,textConnection("out","w"),ascii=TRUE) > identical(x,as.numeric(out[length(out)])) ## TRUE > > ? > > > > > On 2020-02-29 2:42 a.m., Rui Barradas wrote: > > Hello, > > > > FAQ 7.31 > > > > See also this StackOverflow post: > > > > > https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal > > > > Hope this helps, > > > > Rui Barradas > > > > Às 00:08 de 29/02/20, robin hankin escreveu: > >> My interpretation of dput.Rd is that dput() gives an exact ASCII form > >> of the internal representation of an R object. But: > >> > >> rhankin@cuttlefish:~ $ R --version > >> R version 3.6.2 (2019-12-12) -- "Dark and Stormy Night" > >> Copyright (C) 2019 The R Foundation for Statistical Computing > >> Platform: x86_64-pc-linux-gnu (64-bit) > >> > >> [snip] > >> > >> rhankin@cuttlefish:~ $ R --vanilla --quiet > >>> x <- sum(dbinom(0:20,20,0.35)) > >>> dput(x) > >> 1 > >>> x-1 > >> [1] -4.440892e-16 > >>> > >>> x==1 > >> [1] FALSE > >>> > >> > >> So, dput(x) gives 1, but x is not equal to 1. Can anyone advise? > >> > >> __ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Steven Dirkse, Ph.D. GAMS Development Corp. office: 202.342.0180 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Bug in optim for specific orders of magnitude
Collin, It is interesting that you get such strange results when passing denormal values to optim(). Several possibilities for this spring to mind. A better question though might be, "What kind of results are you expecting when you pass such small values to optim()?" In general, you shouldn't have a strong expectation that you'll get good results back when you have small coefficients like 1e-317 (or 1e-17) in your objective function. The algorithms behind optim() are not designed or implemented to do their best work with coefficients of magnitude far from 1. You can hope. You can get lucky. But I would not advise you to have strong expectations. -Steve On Fri, Dec 23, 2022 at 1:20 PM Collin Erickson wrote: > Hello, > > I've come across what seems to be a bug in optim that has become a nuisance > for me. > > To recreate the bug, run: > > optim(c(0,0), function(x) {x[1]*1e-317}, lower=c(-1,-1), upper=c(1,1), > method='L-BFGS-B') > > The error message says: > > Error in optim(c(0, 0), function(x) { : > non-finite value supplied by optim > > What makes this particularly treacherous is that this error only occurs for > specific powers. By running the following code you will find that the error > only occurs when the power is between -309 and -320; above and below that > work fine. > > p <- 1:1000 > giveserror <- rep(NA, length(p)) > for (i in seq_along(p)) { > tryout <- try({ > optim(c(0,0), function(x) {x[1]*10^-p[i]}, lower=c(-1,-1), > upper=c(1,1), method='L-BFGS-B') > }) > giveserror[i] <- inherits(tryout, "try-error") > } > p[giveserror] > > Obviously my function is much more complex than this and usually doesn't > fail, but this reprex demonstrates that this is a problem. To avoid the > error I may multiply by a factor or take the log, but it seems like a > legitimate bug that should be fixed. > > I tried to look inside of optim to track down the error, but the error lies > within the external C code: > > .External2(C_optim, par, fn1, gr1, method, con, lower, > upper) > > For reference, I am running R 4.2.2, but was also able to recreate this bug > on another device running R 4.1.2 and another running 4.0.3. > > Thanks, > Collin Erickson > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Steven Dirkse, Ph.D. GAMS Development Corp. office: 202.342.0180 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] clarifying and adjusting the C API for R
Thanks for sharing this overview of an interesting and much-needed project. You mention that R exports about 1500 symbols (on platforms supporting visibility) but this subject isn't mentioned explicitly again in your note, so I'm wondering how things tie together. Un-exported symbols cannot be part of the API - how would people use them in this case? In a perfect world the set of exported symbols could define the API or match it exactly, but I guess that isn't the case at present. So I conclude that R exports extra (i.e. non-API) symbols. Is part of the goal to remove these extra exports? -Steve On Thu, Jun 6, 2024 at 10:47 AM luke-tierney--- via R-devel < r-devel@r-project.org> wrote: > This is an update on some current work on the C API for use in R > extensions. > > The internal R implementation makes use of tens of thousands of C > entry points. On Linux and Windows, which support visibility > restrictions, most of these are visible only within the R executble or > shared library. About 1500 are not hidden and are visible to > dynamically loaded shared libraries, such as ones in packages, and to > embedding applications. > > There are two main reasons for limiting access to entry points in a > software framework: > > - Some entry points are very easy to use in ways that corrupt internal >data, leading to segfaults or, worse, incorrect computations without >segfaults. > > - Some entry point expose internal structure and other implementation >details, which makes it hard to make improvements without breaking >client code that has come to depend on these details. > > The API of C entry points that can be used in R extensions, both for > packages and embedding, has evolved organically over many years. The > definition for the current release expressed in the Writing R > Extensions manual (WRE) is roughly: > > An entry point can be used if (1) it is declared in a header file > in R.home("include"), and (2) if it is documented for use in WRE. > > Ideally, (1) would be necessary and sufficient, but for a variety of > reasons that isn't achievable, at least not in the near term. (2) can > be challenging to determine; in particular, it is not amenable to a > computational answer. > > An experimental effort is underway to add annotations to the WRE > Texinfo source to allow (2) to be answered unambiguously. The > annotations so far mostly reflect my reading or WRE and may be revised > as they are reviewed by others. The annotated document can be used for > programmatically identifying what is currently considered part of the C > API. The result so far is an experimental function tools:::funAPI(): > > > head(tools:::funAPI()) > nameloc apitype > 1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.heapi > 2alloc3DArrayWRE api > 3 allocArrayWRE api > 4 allocLangWRE api > 5 allocListWRE api > 6 allocMatrixWRE api > > The 'apitype' field has three possible levels > > | api | stable (ideally) API | > | eapi | experimental API | > | emb | embedding API| > > Entry points in the embedded API would typically only be used in > applications embedding R or providing new front ends, but might be > reasonable to use in packages that support embedding. > > The 'loc' field indicates how the entry point is identified as part of > an API: explicit mention in WRE, or declaration in a header file > identified as fully part of an API. > > [tools:::funAPI() may not be completely accurate as it relies on > regular expressions for examining header files considered part of the > API rather than proper parsing. But it seems to be pretty close to > what can be achieved with proper parsing. Proper parsing would add > dependencies on additional tools, which I would like to avoid for > now. One dependency already present is that a C compiler has to be on > the search path and cc -E has to run the C pre-processor.] > > Two additional experimental functions are available for analyzing > package compliance: tools:::checkPkgAPI and tools:::checkAllPkgsAPI. > These examine installed packages. > > [These may produce some false positives on macOS; they may or may not > work on Windows at this point.] > > Using these tools initially showed around 200 non-API entry points > used across packages on CRAN and BIOC. Ideally this number should be > reduced to zero. This will require a combination of additions to the > API and changes in packages. > > Some entry points can safely be added to the API. Around 40 have > already been added to WRE with API annotations; another 40 or so can > probably be added after review. > > The remainder mostly fall into two groups: > > - Entry points that should never be used in packages, such as >SET_OBJECT or SETLENGTH (or a
[Rd] reducing a too-large matrix obtained with allocMatrix()
Hello, I have some C code (for a shared lib called via .External) that uses PROTECT(w= allocMatrix(REALSXP, m, n)); mostly successfully. In rare cases, though, the row count m will be an overestimate. Is there a way to reallocate the matrix in-place, something like reAllocMatrix (w,m-excess,n)/* where excess is > 0 */ to chop off the last excess rows of w? I only find out about the excess rows during the single pass over the data made to copy it from its source to w. Short of this, all I know to do is allocate a matrix of the proper size and copy the values. Any other ways to do this? -- Steve Dirkse __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel