Serhat, That implementation looks very tidy. But it still uses uintptr. So it doesn't solve the GC problems discussed above.
On Thursday, September 26, 2019 at 10:59:08 PM UTC+2, Serhat Şevki Dinçer wrote: > > Hi, > > I wrote a string utility library <https://github.com/jfcg/sixb> with many > tests to ensure the conversion assumptions are correct. It could be helpful. > > Cheers.. > > On Wednesday, September 18, 2019 at 12:42:31 PM UTC+3, Francis wrote: >> >> I am looking at the correct way to convert from a byte slice to a string >> and back with no allocations. All very unsafe. >> >> I think these two cases are fairly symmetrical. So to simplify the >> discussion below I will only talk about converting from a string to []byte. >> >> func StringToBytes(s string) (b []byte) >> >> From what I have read it is currently not clear how to perform this >> correctly. >> >> When I say correctly I mean that the function returns a `[]byte` which >> contains all of and only the bytes in the string and never confuses the >> garbage collector. We fully expect that the `[]byte` returned will contain >> the same underlying memory as the string and modifying its contents will >> modify the string, making the string dangerously mutable. We are >> comfortable with the dangerously mutable string. >> >> Following the directions in unsafe you _might_ think that this would be a >> good solution. >> >> func StringToBytes(s string) []byte { >> return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{ >> Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data, >> Len: len(s), >> Cap: len(s), >> })) >> } >> >> The line >> >> Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data, >> >> here is a really carefully commented example of this approach from github >> >> seems to satisfy unsafe <https://golang.org/pkg/unsafe/> rule 5 about >> converting uintptr to and from unsafe.Pointer in a singe expression. >> >> However, it clearly violates rule 6 which states `...SliceHeader and >> StringHeader are only valid when interpreting the content of an actual >> slice or string value.`. The `[]byte` we are returning here is built from a >> `&reflect.SliceHeader` and not based on an existing `[]byte`. >> >> So we can switch to >> >> func StringToBytes(s string) (b []byte) { >> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >> sliceHeader.Data = stringHeader.Data >> sliceHeader.Len = stringHeader.Len >> sliceHeader.Cap = stringHeader.Len >> return b >> } >> >> Now we are using an existing []byte to build `sliceHeader` which is good. >> But we end up with a new problem. sliceHeader.Data and stringHeader.Data >> are both uintptr. So by creating them in one expression and then writing >> them in another expression we violate the rule that `uintptr cannot be >> stored in variable`. >> >> There is a possible sense that we are protected because both of our >> `uinptr`s are actually real pointers inside a real string and []byte. This >> seems to be indicated by the line `In this usage hdr.Data is really an >> alternate way to refer to the underlying pointer in the string header, not >> a uintptr variable itself.` >> >> This feels very unclear to me. >> >> In particular the code example in the unsafe package >> >> var s string >> hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1 >> hdr.Data = uintptr(unsafe.Pointer(p)) // case 6 (this case) >> hdr.Len = n >> >> >> is not the same as the case we are dealing with here. Specifically in the >> unsafe package documentation we are writing from a uintpr stored in a >> separate variable to another uinptr. They are probably very similar in >> practice, but it isn't obvious and in my experience subtly incorrect code >> often comes from relying on vague understandings of important documents. >> >> If we assume that our uinptrs are safe because they are backed by real >> pointers then there is another issue with our string being garbage >> collected. >> >> func StringToBytes(s string) (b []byte) { >> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >> // Our string is no longer referenced anywhere and could >> potentially be garbage collected >> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >> sliceHeader.Data = stringHeader.Data >> sliceHeader.Len = stringHeader.Len >> sliceHeader.Cap = stringHeader.Len >> return b >> } >> >> >> There is a discussion where this potential problem is raised >> >> https://github.com/golang/go/issues/25484 >> >> we also see this issue mentioned in >> >> https://groups.google.com/forum/#!msg/golang-nuts/dcjzJy-bSpw/tcZYBzQqAQAJ >> >> The solution of >> >> func StringToBytes(s string) (b []byte) { >> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >> // Our string is no longer referenced anywhere and could >> potentially be garbage collected >> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >> sliceHeader.Data = stringHeader.Data >> sliceHeader.Len = stringHeader.Len >> sliceHeader.Cap = stringHeader.Len >> runtime.KeepAlive(&s) >> return b >> } >> >> >> is proposed. This _probably_ works. But a survey of the implementations >> of unsafe string/[]byte conversions in Go projects that we depend on at >> work (this operation is very common), didn't show a single example of >> anyone using the KeepAlive trick. >> >> In particular the person who initiated the conversation in golang-nuts >> where the KeepAlive was suggested has implemented this conversion without it >> >> https://github.com/cespare/xxhash/blob/v2.1.0/xxhash_unsafe.go#L28 >> >> A workaround of writing >> >> func StringToBytes(s string) (b []byte) { >> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >> sliceHeader.Data = stringHeader.Data >> sliceHeader.Len = len(s) >> sliceHeader.Cap = len(s) >> // Maybe we managed to hold onto s until here? >> return b >> } >> >> >> was proposed. I think the reasoning here is that the references to >> `len(s)` keep the string alive. I am not totally convinced because I think >> the compiler is free to write this as >> >> func StringToBytes(s string) (b []byte) { >> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >> sliceHeader.Len = len(s) >> sliceHeader.Cap = len(s) >> // Compiler has reordered our code, and s might be garbage >> collected >> sliceHeader.Data = stringHeader.Data >> return b >> } >> >> but, maybe this modification can never happen. >> >> At this point I don't think we have any clear answers about how to write >> this code correctly. >> >> If we look inside the Go codebase we see a few interesting approaches >> >> going from []byte to string we can just >> >> return *(*string)(unsafe.Pointer(&b)) >> >> >> This approach is used in >> >> https://golang.org/src/strings/builder.go#L45 >> https://github.com/golang/go/blob/master/src/runtime/string.go#L152 >> >> So we can see that the Go std-lib isn't using Slice/StringHeader to >> perform these conversions, which seems like a shame. >> >> Looking at how this is done on github reveals a variety of different >> approaches and discussion on the topic don't seem to ever be conclusive. >> >> >> >> In general it seems that >> >> func StringToBytes(s string) (b []byte) { >> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >> sliceHeader.Data = stringHeader.Data >> sliceHeader.Len = len(s) >> sliceHeader.Cap = len(s) >> return b >> } >> >> >> is probably pretty good, and here is a very carefully commented example >> of it from github. >> >> https://github.com/m3db/m3x/blob/master/unsafe/string.go#L62 >> >> Is there an authoritative correct way to do this? >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/21938acb-e242-47a0-8742-2e8b5f9ee760%40googlegroups.com.