This is intended to be a bug report with proposed patch. I am posting to this mailing list as described in NOTE in "Bug Reporting in R".
Function `stats:::regularize.values()` is meant to preprocess `x` and `y` arguments to have "proper" values for later use during interpolation. If input is already "proper", I would expect it to reuse the same objects without creating new ones. However, this isn't the case and is the source of unneccessary extra memory usage in `approx()` and others. The root cause of this seems to be a forceful reordering in lines 37-39 of 'approx.R' file. If reordering is done only if `x` is unsorted then no copies are created. Also this doesn't seem like breaking any existing code. There is a patch attached. Reproducable code: x <- seq(1, 100, 1) y <- seq(1, 100, 1) reg_xy <- stats:::regularize.values(x, y, mean) # Regularized versions of `x` and `y` are identical to input but are stored at # different places identical(x, reg_xy[["x"]]) #> [1] TRUE .Internal(inspect(x)) #> @15719b0 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,... .Internal(inspect(reg_xy[["x"]])) #> @2b84130 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,... identical(y, reg_xy[["y"]]) #> [1] TRUE .Internal(inspect(y)) #> @2c91be0 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,... .Internal(inspect(reg_xy[["y"]])) #> @2bb4880 14 REALSXP g0c7 [NAM(3)] (len=100, tl=0) 1,2,3,4,5,... # Differs from original only by using `if (is.unsorted(x))` new_regularize.values <- function (x, y, ties) { x <- xy.coords(x, y, setLab = FALSE) y <- x$y x <- x$x if (any(na <- is.na(x) | is.na(y))) { ok <- !na x <- x[ok] y <- y[ok] } nx <- length(x) if (!identical(ties, "ordered")) { if (is.unsorted(x)) { o <- order(x) x <- x[o] y <- y[o] } if (length(ux <- unique(x)) < nx) { if (missing(ties)) warning("collapsing to unique 'x' values") y <- as.vector(tapply(y, match(x, x), ties)) x <- ux stopifnot(length(y) == length(x)) } } list(x = x, y = y) } new_reg_xy <- new_regularize.values(x, y, mean) # Output is still identical to input and also references to the same objects identical(x, new_reg_xy[["x"]]) #> [1] TRUE .Internal(inspect(x)) #> @15719b0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,... .Internal(inspect(new_reg_xy[["x"]])) #> @15719b0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,... identical(y, new_reg_xy[["y"]]) #> [1] TRUE .Internal(inspect(y)) #> @2c91be0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,... .Internal(inspect(new_reg_xy[["y"]])) #> @2c91be0 14 REALSXP g1c7 [MARK,NAM(3)] (len=100, tl=0) 1,2,3,4,5,... # Current R version R.version #> _ #> platform x86_64-pc-linux-gnu #> arch x86_64 #> os linux-gnu #> system x86_64, linux-gnu #> status #> major 3 #> minor 5.2 #> year 2018 #> month 12 #> day 20 #> svn rev 75870 #> language R #> version.string R version 3.5.2 (2018-12-20) #> nickname Eggshell Igloo -- Best regards, Evgeni Chasnovski
Index: src/library/stats/R/approx.R =================================================================== --- src/library/stats/R/approx.R (revision 75926) +++ src/library/stats/R/approx.R (working copy) @@ -34,9 +34,11 @@ } nx <- length(x) if (!identical(ties, "ordered")) { - o <- order(x) - x <- x[o] - y <- y[o] + if (is.unsorted(x)) { + o <- order(x) + x <- x[o] + y <- y[o] + } if (length(ux <- unique(x)) < nx) { if (missing(ties)) warning("collapsing to unique 'x' values")
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel