Thanks for the detailed response, Gabriel!

I think that an object_size alt-rep method that package developers
need to implement might be hard to get right.  One alternative could
be an alt-rep method that returns the number of bytes/characters in a
given string element since I believe the object size of a CHARSXP
depends only on string length?  I think two optional alt-string
methods would be nice:

`alt_string_elt_nchars` -- for the `nchar` function in R
`alt_string_elt_nbytes` -- for `object.size` (which might be different
than nchars due to encoding)

Also since it's an issue that mainly affects R-studio, I started an
issue on their github, and it sounds like they'll avoid calling
object.size on alt-rep objects automatically.  That would fix the main
problem I've been having.

Thanks,
Travers

On Fri, Jan 18, 2019 at 2:49 PM Gabriel Becker <gabembec...@gmail.com> wrote:
>
> Travers,
>
> Great to hear you're trying out the ALTREP stuff, good on you :).
>
> Did you mean the get_altstring_Elt_method? I see the code in size.c within 
> utils that grabs each element, but I don't see any setting (and the setters 
> are noops currently anyway they just do things the old way).
>
> One thing we have to decide is what object.size means for an altrep. I tend 
> to think it should mean the size of the alternative representation currently 
> in use in memory, but I see that a small note in ?object.size indicates that 
> size of objects with compact internal representations may be overestimated, 
> so technically this is "as currently documented". The "we" here, of course, 
> is the R-core team so we'll have to see how they feel on the matter.
>
> As for what to do about it, one possibility is to add an object.size method 
> to the ALTREP method table that gets called if object.size is called on an 
> ALTREP object.  In this case, it would be up  to the class to define an 
> appropriate object.size method. That would be relatively easy to do from a 
> technical standpoint on R's side, but what comes out of object.size would be 
> a bit "Wild West-y", without the consistency and correctness guarantees one 
> might expect from a function in utils.
>
> Another option is to to have object.size recurse to calling object.size on 
> the two parts (SEXPS which together make up a CONS cell, I believe) that make 
> up an ALTREP  internally. Roughly speaking one of these is usually the 
> alternative representation while the other is the spot to put an object with 
> the traditional representation if the payload is ever fully materialized in 
> an altrep-unsafe way - e.g., C code grabs a writable dataptr via INTEGER, 
> REAL, DATAPTR, etc. Note there are exceptions to what I said above, 
> though,such as the wrapper ALTREP classes which always have the parent object 
> (typically a traditionally laid-out vector), because the "alternative 
> representation" part is strictly a metadata annotation in that case and 
> contains no representation of the payload data for those classes.
>
> In this second case the result of object.size would be consistent across all 
> ALTREP classes, but in both cases the result of object.size would no longer 
> give any information about the size of a vector payload. This is consistent 
> with how object.size deals with external pointers now, but could lead to some 
> surprise in the case of vectors which the end user may not even know are 
> ALTREPs.
>
> Thoughts from anyone else on this list?
>
> Anyway, thanks for pointing this out. I'll talk with Luke and see what makes 
> sense to do here.
>
> Best,
> ~G
>
> On Wed, Jan 16, 2019 at 3:49 AM Travers Ching <trave...@gmail.com> wrote:
>>
>> I have a toy alt-rep string package that generates randomly seeded strings.
>>
>> example:
>> library(altstringisode)
>> x <- altrandomStrings(1e8)
>> head(x)
>> [1] "2PN0bdwPY7CA8M06zVKEkhHgZVgtV1" "5PN2qmWqBlQ9wQj99nsQzldVI5ZuGX" ... etc
>> object.size(1e8)
>>
>> Object.size will call the set_altstring_Elt_method for every single
>> element, materializing (slowly) every element of the vector.  This is
>> a problem mostly in R-studio since object.size is called
>> automatically, defeating the purpose of alt-rep.
>>
>> Is there a way to avoid the problem of forced materialization in rstudio?
>>
>> PS: Is there a way to tell if a post has been received by the mailing
>> list?  How long does it take to show up in the archives?
>>
>> ______________________________________________
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to