Turns out it did not really need uintptr so I switched to unsafe.Pointer. 
Also added more tests.

Thanks Francis..

On Friday, October 4, 2019 at 12:38:22 PM UTC+3, fra...@adeven.com wrote:
>
> Serhat,
>
> That implementation looks very tidy. But it still uses uintptr. So it 
> doesn't solve the GC problems discussed above.
>
> On Thursday, September 26, 2019 at 10:59:08 PM UTC+2, Serhat Şevki Dinçer 
> wrote:
>>
>> Hi,
>>
>> I wrote a string utility library <https://github.com/jfcg/sixb> with 
>> many tests to ensure the conversion assumptions are correct. It could be 
>> helpful.
>>
>> Cheers..
>>
>> On Wednesday, September 18, 2019 at 12:42:31 PM UTC+3, Francis wrote:
>>>
>>> I am looking at the correct way to convert from a byte slice to a string 
>>> and back with no allocations. All very unsafe.
>>>
>>> I think these two cases are fairly symmetrical. So to simplify the 
>>> discussion below I will only talk about converting from a string to []byte.
>>>
>>> func StringToBytes(s string) (b []byte)
>>>
>>> From what I have read it is currently not clear how to perform this 
>>> correctly.
>>>
>>> When I say correctly I mean that the function returns a `[]byte` which 
>>> contains all of and only the bytes in the string and never confuses the 
>>> garbage collector. We fully expect that the `[]byte` returned will contain 
>>> the same underlying memory as the string and modifying its contents will 
>>> modify the string, making the string dangerously mutable. We are 
>>> comfortable with the dangerously mutable string.
>>>
>>> Following the directions in unsafe you _might_ think that this would be 
>>> a good solution.
>>>
>>> func StringToBytes(s string) []byte {
>>>         return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
>>>                 Data: 
>>> (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
>>>                 Len:  len(s),
>>>                 Cap:  len(s),
>>>         }))
>>> }
>>>
>>> The line
>>>
>>> Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
>>>
>>> here is a really carefully commented example of this approach from github
>>>
>>> seems to satisfy unsafe <https://golang.org/pkg/unsafe/> rule 5 about 
>>> converting uintptr to and from unsafe.Pointer in a singe expression.
>>>
>>> However, it clearly violates rule 6 which states `...SliceHeader and 
>>> StringHeader are only valid when interpreting the content of an actual 
>>> slice or string value.`. The `[]byte` we are returning here is built from a 
>>> `&reflect.SliceHeader` and not based on an existing `[]byte`.
>>>
>>> So we can switch to
>>>
>>> func StringToBytes(s string) (b []byte) {
>>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>>         sliceHeader.Data = stringHeader.Data
>>>         sliceHeader.Len = stringHeader.Len
>>>         sliceHeader.Cap = stringHeader.Len
>>>         return b
>>> }
>>>
>>> Now we are using an existing []byte to build `sliceHeader` which is 
>>> good. But we end up with a new problem. sliceHeader.Data and 
>>> stringHeader.Data are both uintptr. So by creating them in one expression 
>>> and then writing them in another expression we violate the rule that 
>>> `uintptr cannot be stored in variable`.
>>>
>>> There is a possible sense that we are protected because both of our 
>>> `uinptr`s are actually real pointers inside a real string and []byte. This 
>>> seems to be indicated by the line `In this usage hdr.Data is really an 
>>> alternate way to refer to the underlying pointer in the string header, not 
>>> a uintptr variable itself.`
>>>
>>> This feels very unclear to me.
>>>
>>> In particular the code example in the unsafe package
>>>
>>> var s string
>>> hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1
>>> hdr.Data = uintptr(unsafe.Pointer(p))              // case 6 (this case)
>>> hdr.Len = n
>>>
>>>
>>> is not the same as the case we are dealing with here. Specifically in 
>>> the unsafe package documentation we are writing from a uintpr stored in a 
>>> separate variable to another uinptr. They are probably very similar in 
>>> practice, but it isn't obvious and in my experience subtly incorrect code 
>>> often comes from relying on vague understandings of important documents.
>>>
>>> If we assume that our uinptrs are safe because they are backed by real 
>>> pointers then there is another issue with our string being garbage 
>>> collected.
>>>
>>> func StringToBytes(s string) (b []byte) {
>>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>>         // Our string is no longer referenced anywhere and could 
>>> potentially be garbage collected
>>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>>         sliceHeader.Data = stringHeader.Data
>>>         sliceHeader.Len = stringHeader.Len
>>>         sliceHeader.Cap = stringHeader.Len
>>>         return b
>>> }
>>>
>>>
>>> There is a discussion where this potential problem is raised
>>>
>>> https://github.com/golang/go/issues/25484
>>>
>>> we also see this issue mentioned in
>>>
>>>
>>> https://groups.google.com/forum/#!msg/golang-nuts/dcjzJy-bSpw/tcZYBzQqAQAJ
>>>
>>> The solution of 
>>>
>>> func StringToBytes(s string) (b []byte) {
>>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>>         // Our string is no longer referenced anywhere and could 
>>> potentially be garbage collected
>>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>>         sliceHeader.Data = stringHeader.Data
>>>         sliceHeader.Len = stringHeader.Len
>>>         sliceHeader.Cap = stringHeader.Len
>>>                  runtime.KeepAlive(&s)
>>>         return b
>>> }
>>>
>>>
>>> is proposed. This _probably_ works. But a survey of the implementations 
>>> of unsafe string/[]byte conversions in Go projects that we depend on at 
>>> work (this operation is very common), didn't show a single example of 
>>> anyone using the KeepAlive trick.
>>>
>>> In particular the person who initiated the conversation in golang-nuts 
>>> where the KeepAlive was suggested has implemented this conversion without it
>>>
>>> https://github.com/cespare/xxhash/blob/v2.1.0/xxhash_unsafe.go#L28
>>>
>>> A workaround of writing 
>>>
>>> func StringToBytes(s string) (b []byte) {
>>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>>         sliceHeader.Data = stringHeader.Data
>>>         sliceHeader.Len = len(s)
>>>         sliceHeader.Cap = len(s)
>>>         // Maybe we managed to hold onto s until here?
>>>         return b
>>> }
>>>
>>>
>>> was proposed. I think the reasoning here is that the references to 
>>> `len(s)` keep the string alive. I am not totally convinced because I think 
>>> the compiler is free to write this as 
>>>
>>> func StringToBytes(s string) (b []byte) {
>>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>>         sliceHeader.Len = len(s)
>>>         sliceHeader.Cap = len(s)
>>>         // Compiler has reordered our code, and s might be garbage 
>>> collected
>>>         sliceHeader.Data = stringHeader.Data
>>>         return b
>>> }
>>>
>>> but, maybe this modification can never happen.
>>>
>>> At this point I don't think we have any clear answers about how to write 
>>> this code correctly.
>>>
>>> If we look inside the Go codebase we see a few interesting approaches
>>>
>>> going from []byte to string we can just
>>>
>>> return *(*string)(unsafe.Pointer(&b))
>>>
>>>
>>> This approach is used in 
>>>
>>> https://golang.org/src/strings/builder.go#L45
>>> https://github.com/golang/go/blob/master/src/runtime/string.go#L152
>>>
>>> So we can see that the Go std-lib isn't using Slice/StringHeader to 
>>> perform these conversions, which seems like a shame.
>>>
>>> Looking at how this is done on github reveals a variety of different 
>>> approaches and discussion on the topic don't seem to ever be conclusive.
>>>
>>>
>>>
>>> In general it seems that 
>>>
>>> func StringToBytes(s string) (b []byte) {
>>>         stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
>>>         sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
>>>         sliceHeader.Data = stringHeader.Data
>>>         sliceHeader.Len = len(s)
>>>         sliceHeader.Cap = len(s)
>>>         return b
>>> }
>>>
>>>
>>> is probably pretty good, and here is a very carefully commented example 
>>> of it from github.
>>>
>>> https://github.com/m3db/m3x/blob/master/unsafe/string.go#L62
>>>
>>> Is there an authoritative correct way to do this?
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/a827a9bc-82c6-4f4f-ad2f-530573d6d6d3%40googlegroups.com.

Reply via email to