Turns out it did not really need uintptr so I switched to unsafe.Pointer. Also added more tests.
Thanks Francis.. On Friday, October 4, 2019 at 12:38:22 PM UTC+3, fra...@adeven.com wrote: > > Serhat, > > That implementation looks very tidy. But it still uses uintptr. So it > doesn't solve the GC problems discussed above. > > On Thursday, September 26, 2019 at 10:59:08 PM UTC+2, Serhat Şevki Dinçer > wrote: >> >> Hi, >> >> I wrote a string utility library <https://github.com/jfcg/sixb> with >> many tests to ensure the conversion assumptions are correct. It could be >> helpful. >> >> Cheers.. >> >> On Wednesday, September 18, 2019 at 12:42:31 PM UTC+3, Francis wrote: >>> >>> I am looking at the correct way to convert from a byte slice to a string >>> and back with no allocations. All very unsafe. >>> >>> I think these two cases are fairly symmetrical. So to simplify the >>> discussion below I will only talk about converting from a string to []byte. >>> >>> func StringToBytes(s string) (b []byte) >>> >>> From what I have read it is currently not clear how to perform this >>> correctly. >>> >>> When I say correctly I mean that the function returns a `[]byte` which >>> contains all of and only the bytes in the string and never confuses the >>> garbage collector. We fully expect that the `[]byte` returned will contain >>> the same underlying memory as the string and modifying its contents will >>> modify the string, making the string dangerously mutable. We are >>> comfortable with the dangerously mutable string. >>> >>> Following the directions in unsafe you _might_ think that this would be >>> a good solution. >>> >>> func StringToBytes(s string) []byte { >>> return *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{ >>> Data: >>> (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data, >>> Len: len(s), >>> Cap: len(s), >>> })) >>> } >>> >>> The line >>> >>> Data: (*(*reflect.StringHeader)(unsafe.Pointer(&s))).Data, >>> >>> here is a really carefully commented example of this approach from github >>> >>> seems to satisfy unsafe <https://golang.org/pkg/unsafe/> rule 5 about >>> converting uintptr to and from unsafe.Pointer in a singe expression. >>> >>> However, it clearly violates rule 6 which states `...SliceHeader and >>> StringHeader are only valid when interpreting the content of an actual >>> slice or string value.`. The `[]byte` we are returning here is built from a >>> `&reflect.SliceHeader` and not based on an existing `[]byte`. >>> >>> So we can switch to >>> >>> func StringToBytes(s string) (b []byte) { >>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >>> sliceHeader.Data = stringHeader.Data >>> sliceHeader.Len = stringHeader.Len >>> sliceHeader.Cap = stringHeader.Len >>> return b >>> } >>> >>> Now we are using an existing []byte to build `sliceHeader` which is >>> good. But we end up with a new problem. sliceHeader.Data and >>> stringHeader.Data are both uintptr. So by creating them in one expression >>> and then writing them in another expression we violate the rule that >>> `uintptr cannot be stored in variable`. >>> >>> There is a possible sense that we are protected because both of our >>> `uinptr`s are actually real pointers inside a real string and []byte. This >>> seems to be indicated by the line `In this usage hdr.Data is really an >>> alternate way to refer to the underlying pointer in the string header, not >>> a uintptr variable itself.` >>> >>> This feels very unclear to me. >>> >>> In particular the code example in the unsafe package >>> >>> var s string >>> hdr := (*reflect.StringHeader)(unsafe.Pointer(&s)) // case 1 >>> hdr.Data = uintptr(unsafe.Pointer(p)) // case 6 (this case) >>> hdr.Len = n >>> >>> >>> is not the same as the case we are dealing with here. Specifically in >>> the unsafe package documentation we are writing from a uintpr stored in a >>> separate variable to another uinptr. They are probably very similar in >>> practice, but it isn't obvious and in my experience subtly incorrect code >>> often comes from relying on vague understandings of important documents. >>> >>> If we assume that our uinptrs are safe because they are backed by real >>> pointers then there is another issue with our string being garbage >>> collected. >>> >>> func StringToBytes(s string) (b []byte) { >>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >>> // Our string is no longer referenced anywhere and could >>> potentially be garbage collected >>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >>> sliceHeader.Data = stringHeader.Data >>> sliceHeader.Len = stringHeader.Len >>> sliceHeader.Cap = stringHeader.Len >>> return b >>> } >>> >>> >>> There is a discussion where this potential problem is raised >>> >>> https://github.com/golang/go/issues/25484 >>> >>> we also see this issue mentioned in >>> >>> >>> https://groups.google.com/forum/#!msg/golang-nuts/dcjzJy-bSpw/tcZYBzQqAQAJ >>> >>> The solution of >>> >>> func StringToBytes(s string) (b []byte) { >>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >>> // Our string is no longer referenced anywhere and could >>> potentially be garbage collected >>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >>> sliceHeader.Data = stringHeader.Data >>> sliceHeader.Len = stringHeader.Len >>> sliceHeader.Cap = stringHeader.Len >>> runtime.KeepAlive(&s) >>> return b >>> } >>> >>> >>> is proposed. This _probably_ works. But a survey of the implementations >>> of unsafe string/[]byte conversions in Go projects that we depend on at >>> work (this operation is very common), didn't show a single example of >>> anyone using the KeepAlive trick. >>> >>> In particular the person who initiated the conversation in golang-nuts >>> where the KeepAlive was suggested has implemented this conversion without it >>> >>> https://github.com/cespare/xxhash/blob/v2.1.0/xxhash_unsafe.go#L28 >>> >>> A workaround of writing >>> >>> func StringToBytes(s string) (b []byte) { >>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >>> sliceHeader.Data = stringHeader.Data >>> sliceHeader.Len = len(s) >>> sliceHeader.Cap = len(s) >>> // Maybe we managed to hold onto s until here? >>> return b >>> } >>> >>> >>> was proposed. I think the reasoning here is that the references to >>> `len(s)` keep the string alive. I am not totally convinced because I think >>> the compiler is free to write this as >>> >>> func StringToBytes(s string) (b []byte) { >>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >>> sliceHeader.Len = len(s) >>> sliceHeader.Cap = len(s) >>> // Compiler has reordered our code, and s might be garbage >>> collected >>> sliceHeader.Data = stringHeader.Data >>> return b >>> } >>> >>> but, maybe this modification can never happen. >>> >>> At this point I don't think we have any clear answers about how to write >>> this code correctly. >>> >>> If we look inside the Go codebase we see a few interesting approaches >>> >>> going from []byte to string we can just >>> >>> return *(*string)(unsafe.Pointer(&b)) >>> >>> >>> This approach is used in >>> >>> https://golang.org/src/strings/builder.go#L45 >>> https://github.com/golang/go/blob/master/src/runtime/string.go#L152 >>> >>> So we can see that the Go std-lib isn't using Slice/StringHeader to >>> perform these conversions, which seems like a shame. >>> >>> Looking at how this is done on github reveals a variety of different >>> approaches and discussion on the topic don't seem to ever be conclusive. >>> >>> >>> >>> In general it seems that >>> >>> func StringToBytes(s string) (b []byte) { >>> stringHeader := (*reflect.StringHeader)(unsafe.Pointer(&s)) >>> sliceHeader := (*reflect.SliceHeader)(unsafe.Pointer(&bytes)) >>> sliceHeader.Data = stringHeader.Data >>> sliceHeader.Len = len(s) >>> sliceHeader.Cap = len(s) >>> return b >>> } >>> >>> >>> is probably pretty good, and here is a very carefully commented example >>> of it from github. >>> >>> https://github.com/m3db/m3x/blob/master/unsafe/string.go#L62 >>> >>> Is there an authoritative correct way to do this? >>> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/a827a9bc-82c6-4f4f-ad2f-530573d6d6d3%40googlegroups.com.