It should be possible to calculate object.size in the presence of sharing, at least with respect to all sub-nodes of a SEXP. E.g., during calculation, keep a hash of all SEXP pointers visited. If a pointer has already been visited, add only the size of the pointer to the total object size.
Travers On Wed, Jan 23, 2019 at 1:33 AM Tomas Kalibera <tomas.kalib...@gmail.com> wrote: > > On 1/22/19 6:17 PM, Kevin Ushey wrote: > > I think that object.size() is most commonly used to answer the question, > > "what R objects are consuming the most memory currently in my R session?" > > and for that reason I think returning the size of the internal > > representations of objects (for e.g. ALTREP objects; unevaluated promises) > > is the right default behavior. > > I don't think one could answer that question at all in the presence of > sharing (of objects with value semantics due to copy on write, string > cache or other caches, sharing of objects with referential semantics > such as environments, etc). Also the mapping from R objects (SEXPs) to > what users might understand as objects would not be clear (which SEXPs > belong to which "object", which SEXPs are too low-level for the user to > be considered, etc). In principle, there could be a memory profiler > working at SEXP level and exposing all the intricacies of the memory > layout, answering reachability questions on a heap dump (so one could > find out about a 1G integer vector and then list all bindings say in > namespace environments from which it is reachable), but of course that > would be a lot of work to implement and to maintain. The problem is not > unique to R (e.g. see Java with the same problems of sharing that > prevent meaningful definition for object size). I am not persuaded it > makes sense to add more options to a function that does not have and > cannot have a well defined user-level semantics, and I would discourage > writing code that is trying to build on that function as I think that it > might lead to confusion and frustration. I think equality for example is > easier to define (just that one could come up with multiple meaningful > definitions, so it makes sense to have multiple options). > > Best > Tomas > > > > I also agree it would be worth considering adding arguments that control > > how object.size() is computed for different kinds of R objects, since users > > might want to use object.size() to answer different types of questions. > > > > All that said, if the ultimate goal here is to avoid having RStudio > > materialize ALTREP objects in the background, then perhaps that change > > should happen in RStudio :-) > > > > Best, > > Kevin > > > > On Tue, Jan 22, 2019 at 8:21 AM Tierney, Luke <luke-tier...@uiowa.edu> > > wrote: > > > >> On Mon, 21 Jan 2019, Martin Maechler wrote: > >> > >>>>>>>> Travers Ching > >>>>>>>> on Tue, 15 Jan 2019 12:50:45 -0800 writes: > >>> > I have a toy alt-rep string package that generates > >>> > randomly seeded strings. example: library(altstringisode) > >>> > x <- altrandomStrings(1e8) head(x) [1] > >>> > "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" > >>> > "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ... etc object.size(1e8) > >>> > >>> > Object.size will call the set_altstring_Elt_method for > >>> > every single element, materializing (slowly) every element > >>> > of the vector. This is a problem mostly in R-studio since > >>> > object.size is called automatically, defeating the purpose > >>> > of alt-rep. > >> There is no sensible way in general to figure out how large the > >> strings would be without computing them. There might be specifically > >> for a deferred sequence conversion but it would require a fair bit of > >> effort to figure out that would be better spent elsewhere. > >> > >> I've never been a big fan of object.size since what it is trying to > >> compute isn't very well defined in the context of sharing and possible > >> internal state changes (even before ALTREP byte code compilation could > >> change the internals of a function [which object.size sees] and > >> assigning into environments or evaluating promises can change > >> environments [which object.size ignores]). The issue is not unlike the > >> one faced by identical(), which has a bunch of options for the > >> different ways objects can be identical, and might need even more. > >> > >> We could in general have object.size for and ALTREP return the > >> object.size results of the current internal representation, but that > >> might not always be appropriate. Again, what object.size is trying to > >> compute isn't very well defined. > >> > >> RStudio does seem to call object.size on every assignment to > >> .GlobalEnv. That might be worth revisiting. > >> > >> > >> Best, > >> > >> luke > >> > >>> Hmm. But still, the idea had been that object.size() *shuld* > >>> return the size of the "de-ALTREP'ed" object *but* should not > >>> de-ALTREP it. > >>> That's what happens for integers, but indeed fails to happen for > >>> such as.character(.)ed integers. > >>> > >>> From my eRum presentation (which took from the official ALTREP > >> documentation > >>> https://svn.r-project.org/R/branches/ALTREP/ALTREP.html ) : > >>> > >>> > x <- 1:1e15 > >>> > object.size(x) # 8000'000'000'000'048 bytes : 8000 TBytes -- ok, not > >> really > >>> 8000000000000048 bytes > >>> > is.unsorted(x) # FALSE : i.e., R's *knows* it is sorted > >>> [1] FALSE > >>> > xs <- sort(x) # > >>> > .Internal(inspect(x)) > >>> @80255f8 14 REALSXP g0c0 [NAM(7)] 1 : 1000000000000000 (compact) > >>> > > >>> > >>> > cx <- as.character(x) > >>> > .Internal(inspect(cx)) > >>> @80485d8 16 STRSXP g0c0 [NAM(1)] <deferred string conversion> > >>> @80255f8 14 REALSXP g1c0 [MARK,NAM(7)] 1 : 1000000000000000 (compact) > >>> > system.time( print(object.size(x)), gc=FALSE) > >>> 8000000000000048 bytes > >>> user system elapsed > >>> 0.000 0.000 0.001 > >>> > system.time( print(object.size(cx)), gc=FALSE) > >>> Error: cannot allocate vector of size 8388608.0 Gb > >>> Timing stopped at: 11.43 0 11.46 > >>> > > >>> > >>> One could consider it a bug that object.size(cx) is indeed > >>> inspecting every string, i.e., accessing cx[i] for all i. > >>> Note that it is *not* deALTREPing cx itself : > >>> > >>>> x <- 1:1e6 > >>>> cx <- as.character(x) > >>>> .Internal(inspect(cx)) > >>> @7f5b1a0 16 STRSXP g0c0 [NAM(1)] <deferred string conversion> > >>> @7f5adb0 13 INTSXP g0c0 [NAM(7)] 1 : 1000000 (compact) > >>>> system.time( print(object.size(cx)), gc=FALSE) > >>> 64000048 bytes > >>> user system elapsed > >>> 0.369 0.005 0.374 > >>>> .Internal(inspect(cx)) > >>> @7f5b1a0 16 STRSXP g0c0 [NAM(7)] <deferred string conversion> > >>> @7f5adb0 13 INTSXP g0c0 [NAM(7)] 1 : 1000000 (compact) > >>> > Is there a way to avoid the problem of forced > >>> > materialization in rstudio? > >>> > >>> > PS: Is there a way to tell if a post has been received by > >>> > the mailing list? How long does it take to show up in the > >>> > archives? > >>> > >>> [ that (waiting time) distribution is quite right skewed... I'd > >>> guess it's median to be less than 10 minutes... but we had > >>> artificially delayed it somewhat in the past to fight > >>> spammers, and ETH (the hosting instituttion) and others have > >>> increased spam and virus filtering so everything has become > >>> quite a bit slower ] > >>> > >>> ______________________________________________ > >>> R-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>> > >> -- > >> Luke Tierney > >> Ralph E. Wareham Professor of Mathematical Sciences > >> University of Iowa Phone: 319-335-3386 > >> Department of Statistics and Fax: 319-335-3017 > >> Actuarial Science > >> 241 Schaeffer Hall email: luke-tier...@uiowa.edu > >> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu > >> > >> ______________________________________________ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel