I also stumbled upon this bug. I have base64 of some data in []bytes field 
in my struct. I do json.Marshal, then json.Unmarshal to the same struct 
type. And now, surprise! data automatically base64 decoded. what... any 
encoder in any system should have following guarantee (enforced with fuzz 
tests): decode(encode(X)) == X. This is so intuitive. Shame this does not 
work in Go for []byte.

On Tuesday, June 25, 2024 at 9:01:57 PM UTC+8 Mauro Lacy wrote:

> Related: https://stackoverflow.com/a/78662958/3768429
>
> On Monday 16 December 2013 at 19:37:30 UTC+1 Kyle Lemons wrote:
>
>> On Sun, Dec 15, 2013 at 2:05 PM, Brian Picciano <bgpic...@gmail.com> 
>> wrote:
>>
>>> I'm going to compress my three responses to one.
>>>
>>> > What I'm less clear on is exactly what your use case is for encoding 
>>> the it as a string if you're dealing with it exclusively as bytes. I mean, 
>>> if you're using it as both, surely you're making the copy anyway at some 
>>> point.
>>>
>>> That's my point, I don't want to use string EVER in my application (I 
>>> have no reason to for this particular one). But with encoding/json I have 
>>> to, because I can't directly get []byte out of it for a JSON string value. 
>>> So I have to convert.
>>>
>>> > JSON defines strings to be UTF-8 encoded, and as such is not suitable 
>>> for storing binary data.  Encoding an unknown []byte with base64 eliminates 
>>> problems
>>>
>>> I think that's a decision the coder should make. If I am worried that my 
>>> binary data can't be encoded into a UTF-8 string then I can encode it into 
>>> hex or base64 or whatever I like. But if I KNOW that my data is coming in 
>>> as a proper JSON string and going out the other end without being changed 
>>> in between there's no reason I should be forced to pay the penalty of four 
>>> extra copies ([]byte -> string (inside encoding/json) -> []byte -> app -> 
>>> string -> []byte (inside encoding/json)).
>>>
>>
>> I assume you've benchmarked this and found that the extra copies are a 
>> bottleneck?  If not, don't assume that they are without some hard data, 
>> especially if you're doing a lot of I/O (as I would expect of a networked 
>> service).  The JSON library does string/[]byte and []byte/string 
>> conversions itself in some places.
>>  
>>
>>> > If it's textual data, then string is the correct type.
>>>
>>> That's true if I am actually interacting with the data. If I'm just 
>>> carrying the data along and spitting it back out somewhere else than I 
>>> don't really care what it is, and what I really need to optimize for is 
>>> speed and memory. Four copies aren't helping.
>>>
>>> > Alternatively you can make your own type that implements MarshalJSON
>>>  and UnmarshalJSON.
>>>
>>> The problem with doing this (and the RawMessage) is that you skip the 
>>> unicode (un)escaping step which encoding/json does for strings (internally, 
>>> it actually does it while they're still []byte, so it's pretty trivial to 
>>> have it do it for []byte fields too). I could just pass along the []byte 
>>> untouched, with the backslashes an all still in there, and send it out the 
>>> other end as a JSON string and no-one would be any wiser. But what if that 
>>> other end isn't JSON? What if it''s some custom binary interface? They're 
>>> going to be receiving different data than was passed in.
>>>
>>
>> The implementations are probably pretty easy to write: `js, err := 
>> json.Marshal(string(b))` etc.
>>  
>>
>>> On Sun, Dec 15, 2013 at 7:32 AM, egon <egon...@gmail.com> wrote:
>>>
>>>> You can use:
>>>>
>>>> type MyStruct struct {
>>>>     A, B json.RawMessage
>>>> }
>>>>
>>>> json.RawMessage is defined as type RawMessage byte[]. Alternatively 
>>>> you can make your own type that implements MarshalJSON and 
>>>> UnmarshalJSON.
>>>>
>>>> The reason it does b64, is that it is "the correct way" to represent 
>>>> byte array in a json. In other words by default it is safe, but you can 
>>>> override the behavior by using RawMessage or a custom Marshaler.
>>>>
>>>> +egon
>>>>
>>>> On Saturday, December 14, 2013 10:25:18 PM UTC+2, Brian Picciano wrote:
>>>>>
>>>>> I'm sorry if this has been brought up already, I haven't been able to 
>>>>> find anything on it in my searching. I also know this would be a fairly 
>>>>> significant change and would break backwards compatibility, but it is a 
>>>>> fairly annoying "feature" that I think is more of a hindrance than a help.
>>>>>
>>>>> Basically the current behavior is that if you have a struct with a 
>>>>> []byte field that you pass into the json marshaler, it will represent 
>>>>> that 
>>>>> in the output json string as the base64 encoded version of what you put 
>>>>> in, 
>>>>> and if you're unmarshaling into a []byte it will try to base64 decode the 
>>>>> json string first. I can understand why this might be thought to be "the 
>>>>> right way", since it forces you to use string as a string and raw binary 
>>>>> data as []byte. But it's a bit presumptuous to assume that there is no 
>>>>> legitimate reason anyone should pass a string through to a []byte and 
>>>>> work 
>>>>> with them that way.
>>>>>
>>>>> Currently, if I want my destination struct to be something like:
>>>>>
>>>>> type MyStruct struct {
>>>>>     A, B []byte
>>>>> }
>>>>>
>>>>> And have that be filled by the json string: `{"A":"foo","B":"bar"}`, 
>>>>> then I would first have to make a temporary struct like:
>>>>>
>>>>> type MyStructStr struct {
>>>>>     A, B string
>>>>> }
>>>>>
>>>>> And copy/convert each field over individually. Same goes if I want to 
>>>>> convert from MyStruct back into json. This adds a lot of extra code and 
>>>>> data copies. In encoding/json the data is initially passed in and 
>>>>> (un)quoted as []byte, where it is then converted to string. So now I'm 
>>>>> converting back to []byte. This is unnecessary and unoptimizes for the 
>>>>> common case.
>>>>>
>>>>> I've hacked a version of encoding/json where I took out the b64 stuff. 
>>>>> It works just fine and is actually less code than it used to be. So 
>>>>> there's 
>>>>> no technical reason it has to stay (to my knowledge). Again, I know this 
>>>>> probably won't make it in for anything in versions 1.*, but for 2 I think 
>>>>> it should be considered. Also, if this is the wrong place to post this 
>>>>> please let me know, I'll happily move it.
>>>>>  
>>>>>
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "golang-nuts" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to golang-nuts...@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/golang-nuts/4a08aa30-c0d8-4e7e-af34-b7a06a32d1fbn%40googlegroups.com.

Reply via email to