Thanks for the detailed response, Gabriel! I think that an object_size alt-rep method that package developers need to implement might be hard to get right. One alternative could be an alt-rep method that returns the number of bytes/characters in a given string element since I believe the object size of a CHARSXP depends only on string length? I think two optional alt-string methods would be nice:
`alt_string_elt_nchars` -- for the `nchar` function in R `alt_string_elt_nbytes` -- for `object.size` (which might be different than nchars due to encoding) Also since it's an issue that mainly affects R-studio, I started an issue on their github, and it sounds like they'll avoid calling object.size on alt-rep objects automatically. That would fix the main problem I've been having. Thanks, Travers On Fri, Jan 18, 2019 at 2:49 PM Gabriel Becker <gabembec...@gmail.com> wrote: > > Travers, > > Great to hear you're trying out the ALTREP stuff, good on you :). > > Did you mean the get_altstring_Elt_method? I see the code in size.c within > utils that grabs each element, but I don't see any setting (and the setters > are noops currently anyway they just do things the old way). > > One thing we have to decide is what object.size means for an altrep. I tend > to think it should mean the size of the alternative representation currently > in use in memory, but I see that a small note in ?object.size indicates that > size of objects with compact internal representations may be overestimated, > so technically this is "as currently documented". The "we" here, of course, > is the R-core team so we'll have to see how they feel on the matter. > > As for what to do about it, one possibility is to add an object.size method > to the ALTREP method table that gets called if object.size is called on an > ALTREP object. In this case, it would be up to the class to define an > appropriate object.size method. That would be relatively easy to do from a > technical standpoint on R's side, but what comes out of object.size would be > a bit "Wild West-y", without the consistency and correctness guarantees one > might expect from a function in utils. > > Another option is to to have object.size recurse to calling object.size on > the two parts (SEXPS which together make up a CONS cell, I believe) that make > up an ALTREP internally. Roughly speaking one of these is usually the > alternative representation while the other is the spot to put an object with > the traditional representation if the payload is ever fully materialized in > an altrep-unsafe way - e.g., C code grabs a writable dataptr via INTEGER, > REAL, DATAPTR, etc. Note there are exceptions to what I said above, > though,such as the wrapper ALTREP classes which always have the parent object > (typically a traditionally laid-out vector), because the "alternative > representation" part is strictly a metadata annotation in that case and > contains no representation of the payload data for those classes. > > In this second case the result of object.size would be consistent across all > ALTREP classes, but in both cases the result of object.size would no longer > give any information about the size of a vector payload. This is consistent > with how object.size deals with external pointers now, but could lead to some > surprise in the case of vectors which the end user may not even know are > ALTREPs. > > Thoughts from anyone else on this list? > > Anyway, thanks for pointing this out. I'll talk with Luke and see what makes > sense to do here. > > Best, > ~G > > On Wed, Jan 16, 2019 at 3:49 AM Travers Ching <trave...@gmail.com> wrote: >> >> I have a toy alt-rep string package that generates randomly seeded strings. >> >> example: >> library(altstringisode) >> x <- altrandomStrings(1e8) >> head(x) >> [1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ... etc >> object.size(1e8) >> >> Object.size will call the set_altstring_Elt_method for every single >> element, materializing (slowly) every element of the vector. This is >> a problem mostly in R-studio since object.size is called >> automatically, defeating the purpose of alt-rep. >> >> Is there a way to avoid the problem of forced materialization in rstudio? >> >> PS: Is there a way to tell if a post has been received by the mailing >> list? How long does it take to show up in the archives? >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel