I am looking at the correct way to convert from a byte slice to a string 
and back with no allocations. All very unsafe.

I think these two cases are fairly symmetrical. So to simplify the 
discussion below I will only talk about converting from a string to []byte.

func StringToBytes(s string) (b []byte)

>From what I have read it is currently not clear how to perform this 
correctly.

When I say correctly I mean that the function returns a `[]byte` which 
contains all of and only the bytes in the string and never confuses the 
garbage collector. We fully expect that the `[]byte` returned will contain 
the same underlying memory as the string and modifying its contents will 
modify the string, making the string dangerously mutable. We are 
comfortable with the dangerously mutable string.

Following the directions in unsafe you _might_ think that this would be a 
good solution.

func StringToBytes(s string) []byte {
        return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
                Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,
                Len:  len(s),
                Cap:  len(s),
        }))
}

The line

Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data,

here is a really carefully commented example of this approach from github

seems to satisfy unsafe <https://golang.org/pkg/unsafe/> rule 5 about 
converting uintptr to and from unsafe.Pointer in a singe expression.

However, it clearly violates rule 6 which states `...SliceHeader and 
StringHeader are only valid when interpreting the content of an actual 
slice or string value.`. The `[]byte` we are returning here is built from a 
`&reflect.SliceHeader` and not based on an existing `[]byte`.

So we can switch to

func StringToBytes(s string) (b []byte) {
        stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
        sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
        sliceHeader.Data = stringHeader.Data
        sliceHeader.Len = stringHeader.Len
        sliceHeader.Cap = stringHeader.Len
        return b
}

Now we are using an existing []byte to build `sliceHeader` which is good. 
But we end up with a new problem. sliceHeader.Data and stringHeader.Data 
are both uintptr. So by creating them in one expression and then writing 
them in another expression we violate the rule that `uintptr cannot be 
stored in variable`.

There is a possible sense that we are protected because both of our 
`uinptr`s are actually real pointers inside a real string and []byte. This 
seems to be indicated by the line `In this usage hdr.Data is really an 
alternate way to refer to the underlying pointer in the string header, not 
a uintptr variable itself.`

This feels very unclear to me.

In particular the code example in the unsafe package

var s string
hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1
hdr.Data = uintptr(unsafe.Pointer(p))              // case 6 (this case)
hdr.Len = n


is not the same as the case we are dealing with here. Specifically in the 
unsafe package documentation we are writing from a uintpr stored in a 
separate variable to another uinptr. They are probably very similar in 
practice, but it isn't obvious and in my experience subtly incorrect code 
often comes from relying on vague understandings of important documents.

If we assume that our uinptrs are safe because they are backed by real 
pointers then there is another issue with our string being garbage 
collected.

func StringToBytes(s string) (b []byte) {
        stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
        // Our string is no longer referenced anywhere and could 
potentially be garbage collected
        sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
        sliceHeader.Data = stringHeader.Data
        sliceHeader.Len = stringHeader.Len
        sliceHeader.Cap = stringHeader.Len
        return b
}


There is a discussion where this potential problem is raised

https://github.com/golang/go/issues/25484

we also see this issue mentioned in

https://groups.google.com/forum/#!msg/golang-nuts/dcjzJy-bSpw/tcZYBzQqAQAJ

The solution of 

func StringToBytes(s string) (b []byte) {
        stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
        // Our string is no longer referenced anywhere and could 
potentially be garbage collected
        sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
        sliceHeader.Data = stringHeader.Data
        sliceHeader.Len = stringHeader.Len
        sliceHeader.Cap = stringHeader.Len
                 runtime.KeepAlive(&s)
        return b
}


is proposed. This _probably_ works. But a survey of the implementations of 
unsafe string/[]byte conversions in Go projects that we depend on at work 
(this operation is very common), didn't show a single example of anyone 
using the KeepAlive trick.

In particular the person who initiated the conversation in golang-nuts 
where the KeepAlive was suggested has implemented this conversion without it

https://github.com/cespare/xxhash/blob/v2.1.0/xxhash_unsafe.go#L28

A workaround of writing 

func StringToBytes(s string) (b []byte) {
        stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
        sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
        sliceHeader.Data = stringHeader.Data
        sliceHeader.Len = len(s)
        sliceHeader.Cap = len(s)
        // Maybe we managed to hold onto s until here?
        return b
}


was proposed. I think the reasoning here is that the references to `len(s)` 
keep the string alive. I am not totally convinced because I think the 
compiler is free to write this as 

func StringToBytes(s string) (b []byte) {
        stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
        sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
        sliceHeader.Len = len(s)
        sliceHeader.Cap = len(s)
        // Compiler has reordered our code, and s might be garbage collected
        sliceHeader.Data = stringHeader.Data
        return b
}

but, maybe this modification can never happen.

At this point I don't think we have any clear answers about how to write 
this code correctly.

If we look inside the Go codebase we see a few interesting approaches

going from []byte to string we can just

return *(*string)(unsafe.Pointer(&b))


This approach is used in 

https://golang.org/src/strings/builder.go#L45
https://github.com/golang/go/blob/master/src/runtime/string.go#L152

So we can see that the Go std-lib isn't using Slice/StringHeader to perform 
these conversions, which seems like a shame.

Looking at how this is done on github reveals a variety of different 
approaches and discussion on the topic don't seem to ever be conclusive.



In general it seems that 

func StringToBytes(s string) (b []byte) {
        stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s))
        sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes))
        sliceHeader.Data = stringHeader.Data
        sliceHeader.Len = len(s)
        sliceHeader.Cap = len(s)
        return b
}


is probably pretty good, and here is a very carefully commented example of 
it from github.

https://github.com/m3db/m3x/blob/master/unsafe/string.go#L62

Is there an authoritative correct way to do this?

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/438ffe67-82ab-4163-aaca-060a89da9359%40googlegroups.com.

Reply via email to