Hi, I wrote a string utility library <https://github.com/jfcg/sixb> with many tests to ensure the conversion assumptions are correct. It could be helpful.
Cheers.. On Wednesday, September 18, 2019 at 12:42:31 PM UTC+3, Francis wrote: > > I am looking at the correct way to convert from a byte slice to a string > and back with no allocations. All very unsafe. > > I think these two cases are fairly symmetrical. So to simplify the > discussion below I will only talk about converting from a string to []byte. > > func StringToBytes(s string) (b []byte) > > From what I have read it is currently not clear how to perform this > correctly. > > When I say correctly I mean that the function returns a `[]byte` which > contains all of and only the bytes in the string and never confuses the > garbage collector. We fully expect that the `[]byte` returned will contain > the same underlying memory as the string and modifying its contents will > modify the string, making the string dangerously mutable. We are > comfortable with the dangerously mutable string. > > Following the directions in unsafe you _might_ think that this would be a > good solution. > > func StringToBytes(s string) []byte { > return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{ > Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data, > Len: len(s), > Cap: len(s), > })) > } > > The line > > Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data, > > here is a really carefully commented example of this approach from github > > seems to satisfy unsafe <https://golang.org/pkg/unsafe/> rule 5 about > converting uintptr to and from unsafe.Pointer in a singe expression. > > However, it clearly violates rule 6 which states `...SliceHeader and > StringHeader are only valid when interpreting the content of an actual > slice or string value.`. The `[]byte` we are returning here is built from a > `&reflect.SliceHeader` and not based on an existing `[]byte`. > > So we can switch to > > func StringToBytes(s string) (b []byte) { > stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) > sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) > sliceHeader.Data = stringHeader.Data > sliceHeader.Len = stringHeader.Len > sliceHeader.Cap = stringHeader.Len > return b > } > > Now we are using an existing []byte to build `sliceHeader` which is good. > But we end up with a new problem. sliceHeader.Data and stringHeader.Data > are both uintptr. So by creating them in one expression and then writing > them in another expression we violate the rule that `uintptr cannot be > stored in variable`. > > There is a possible sense that we are protected because both of our > `uinptr`s are actually real pointers inside a real string and []byte. This > seems to be indicated by the line `In this usage hdr.Data is really an > alternate way to refer to the underlying pointer in the string header, not > a uintptr variable itself.` > > This feels very unclear to me. > > In particular the code example in the unsafe package > > var s string > hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1 > hdr.Data = uintptr(unsafe.Pointer(p)) // case 6 (this case) > hdr.Len = n > > > is not the same as the case we are dealing with here. Specifically in the > unsafe package documentation we are writing from a uintpr stored in a > separate variable to another uinptr. They are probably very similar in > practice, but it isn't obvious and in my experience subtly incorrect code > often comes from relying on vague understandings of important documents. > > If we assume that our uinptrs are safe because they are backed by real > pointers then there is another issue with our string being garbage > collected. > > func StringToBytes(s string) (b []byte) { > stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) > // Our string is no longer referenced anywhere and could > potentially be garbage collected > sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) > sliceHeader.Data = stringHeader.Data > sliceHeader.Len = stringHeader.Len > sliceHeader.Cap = stringHeader.Len > return b > } > > > There is a discussion where this potential problem is raised > > https://github.com/golang/go/issues/25484 > > we also see this issue mentioned in > > https://groups.google.com/forum/#!msg/golang-nuts/dcjzJy-bSpw/tcZYBzQqAQAJ > > The solution of > > func StringToBytes(s string) (b []byte) { > stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) > // Our string is no longer referenced anywhere and could > potentially be garbage collected > sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) > sliceHeader.Data = stringHeader.Data > sliceHeader.Len = stringHeader.Len > sliceHeader.Cap = stringHeader.Len > runtime.KeepAlive(&s) > return b > } > > > is proposed. This _probably_ works. But a survey of the implementations of > unsafe string/[]byte conversions in Go projects that we depend on at work > (this operation is very common), didn't show a single example of anyone > using the KeepAlive trick. > > In particular the person who initiated the conversation in golang-nuts > where the KeepAlive was suggested has implemented this conversion without it > > https://github.com/cespare/xxhash/blob/v2.1.0/xxhash_unsafe.go#L28 > > A workaround of writing > > func StringToBytes(s string) (b []byte) { > stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) > sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) > sliceHeader.Data = stringHeader.Data > sliceHeader.Len = len(s) > sliceHeader.Cap = len(s) > // Maybe we managed to hold onto s until here? > return b > } > > > was proposed. I think the reasoning here is that the references to > `len(s)` keep the string alive. I am not totally convinced because I think > the compiler is free to write this as > > func StringToBytes(s string) (b []byte) { > stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) > sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) > sliceHeader.Len = len(s) > sliceHeader.Cap = len(s) > // Compiler has reordered our code, and s might be garbage > collected > sliceHeader.Data = stringHeader.Data > return b > } > > but, maybe this modification can never happen. > > At this point I don't think we have any clear answers about how to write > this code correctly. > > If we look inside the Go codebase we see a few interesting approaches > > going from []byte to string we can just > > return *(*string)(unsafe.Pointer(&b)) > > > This approach is used in > > https://golang.org/src/strings/builder.go#L45 > https://github.com/golang/go/blob/master/src/runtime/string.go#L152 > > So we can see that the Go std-lib isn't using Slice/StringHeader to > perform these conversions, which seems like a shame. > > Looking at how this is done on github reveals a variety of different > approaches and discussion on the topic don't seem to ever be conclusive. > > > > In general it seems that > > func StringToBytes(s string) (b []byte) { > stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) > sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) > sliceHeader.Data = stringHeader.Data > sliceHeader.Len = len(s) > sliceHeader.Cap = len(s) > return b > } > > > is probably pretty good, and here is a very carefully commented example of > it from github. > > https://github.com/m3db/m3x/blob/master/unsafe/string.go#L62 > > Is there an authoritative correct way to do this? > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/577868dd-433e-49e0-8eed-59aea3b62ca1%40googlegroups.com.