I also stumbled upon this bug. I have base64 of some data in []bytes field in my struct. I do json.Marshal, then json.Unmarshal to the same struct type. And now, surprise! data automatically base64 decoded. what... any encoder in any system should have following guarantee (enforced with fuzz tests): decode(encode(X)) == X. This is so intuitive. Shame this does not work in Go for []byte.
On Tuesday, June 25, 2024 at 9:01:57 PM UTC+8 Mauro Lacy wrote: > Related: https://stackoverflow.com/a/78662958/3768429 > > On Monday 16 December 2013 at 19:37:30 UTC+1 Kyle Lemons wrote: > >> On Sun, Dec 15, 2013 at 2:05 PM, Brian Picciano <bgpic...@gmail.com> >> wrote: >> >>> I'm going to compress my three responses to one. >>> >>> > What I'm less clear on is exactly what your use case is for encoding >>> the it as a string if you're dealing with it exclusively as bytes. I mean, >>> if you're using it as both, surely you're making the copy anyway at some >>> point. >>> >>> That's my point, I don't want to use string EVER in my application (I >>> have no reason to for this particular one). But with encoding/json I have >>> to, because I can't directly get []byte out of it for a JSON string value. >>> So I have to convert. >>> >>> > JSON defines strings to be UTF-8 encoded, and as such is not suitable >>> for storing binary data. Encoding an unknown []byte with base64 eliminates >>> problems >>> >>> I think that's a decision the coder should make. If I am worried that my >>> binary data can't be encoded into a UTF-8 string then I can encode it into >>> hex or base64 or whatever I like. But if I KNOW that my data is coming in >>> as a proper JSON string and going out the other end without being changed >>> in between there's no reason I should be forced to pay the penalty of four >>> extra copies ([]byte -> string (inside encoding/json) -> []byte -> app -> >>> string -> []byte (inside encoding/json)). >>> >> >> I assume you've benchmarked this and found that the extra copies are a >> bottleneck? If not, don't assume that they are without some hard data, >> especially if you're doing a lot of I/O (as I would expect of a networked >> service). The JSON library does string/[]byte and []byte/string >> conversions itself in some places. >> >> >>> > If it's textual data, then string is the correct type. >>> >>> That's true if I am actually interacting with the data. If I'm just >>> carrying the data along and spitting it back out somewhere else than I >>> don't really care what it is, and what I really need to optimize for is >>> speed and memory. Four copies aren't helping. >>> >>> > Alternatively you can make your own type that implements MarshalJSON >>> and UnmarshalJSON. >>> >>> The problem with doing this (and the RawMessage) is that you skip the >>> unicode (un)escaping step which encoding/json does for strings (internally, >>> it actually does it while they're still []byte, so it's pretty trivial to >>> have it do it for []byte fields too). I could just pass along the []byte >>> untouched, with the backslashes an all still in there, and send it out the >>> other end as a JSON string and no-one would be any wiser. But what if that >>> other end isn't JSON? What if it''s some custom binary interface? They're >>> going to be receiving different data than was passed in. >>> >> >> The implementations are probably pretty easy to write: `js, err := >> json.Marshal(string(b))` etc. >> >> >>> On Sun, Dec 15, 2013 at 7:32 AM, egon <egon...@gmail.com> wrote: >>> >>>> You can use: >>>> >>>> type MyStruct struct { >>>> A, B json.RawMessage >>>> } >>>> >>>> json.RawMessage is defined as type RawMessage byte[]. Alternatively >>>> you can make your own type that implements MarshalJSON and >>>> UnmarshalJSON. >>>> >>>> The reason it does b64, is that it is "the correct way" to represent >>>> byte array in a json. In other words by default it is safe, but you can >>>> override the behavior by using RawMessage or a custom Marshaler. >>>> >>>> +egon >>>> >>>> On Saturday, December 14, 2013 10:25:18 PM UTC+2, Brian Picciano wrote: >>>>> >>>>> I'm sorry if this has been brought up already, I haven't been able to >>>>> find anything on it in my searching. I also know this would be a fairly >>>>> significant change and would break backwards compatibility, but it is a >>>>> fairly annoying "feature" that I think is more of a hindrance than a help. >>>>> >>>>> Basically the current behavior is that if you have a struct with a >>>>> []byte field that you pass into the json marshaler, it will represent >>>>> that >>>>> in the output json string as the base64 encoded version of what you put >>>>> in, >>>>> and if you're unmarshaling into a []byte it will try to base64 decode the >>>>> json string first. I can understand why this might be thought to be "the >>>>> right way", since it forces you to use string as a string and raw binary >>>>> data as []byte. But it's a bit presumptuous to assume that there is no >>>>> legitimate reason anyone should pass a string through to a []byte and >>>>> work >>>>> with them that way. >>>>> >>>>> Currently, if I want my destination struct to be something like: >>>>> >>>>> type MyStruct struct { >>>>> A, B []byte >>>>> } >>>>> >>>>> And have that be filled by the json string: `{"A":"foo","B":"bar"}`, >>>>> then I would first have to make a temporary struct like: >>>>> >>>>> type MyStructStr struct { >>>>> A, B string >>>>> } >>>>> >>>>> And copy/convert each field over individually. Same goes if I want to >>>>> convert from MyStruct back into json. This adds a lot of extra code and >>>>> data copies. In encoding/json the data is initially passed in and >>>>> (un)quoted as []byte, where it is then converted to string. So now I'm >>>>> converting back to []byte. This is unnecessary and unoptimizes for the >>>>> common case. >>>>> >>>>> I've hacked a version of encoding/json where I took out the b64 stuff. >>>>> It works just fine and is actually less code than it used to be. So >>>>> there's >>>>> no technical reason it has to stay (to my knowledge). Again, I know this >>>>> probably won't make it in for anything in versions 1.*, but for 2 I think >>>>> it should be considered. Also, if this is the wrong place to post this >>>>> please let me know, I'll happily move it. >>>>> >>>>> >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "golang-nuts" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to golang-nuts...@googlegroups.com. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/4a08aa30-c0d8-4e7e-af34-b7a06a32d1fbn%40googlegroups.com.