On Wed, Nov 25, 2015 at 12:47 AM Ron <[email protected]> wrote:

> Sure.
>
> For example, I defined the below message in the proto file:
> message Person
> {
>  string first_name = 1;
>  string last_name = 2;
> }
>
>
> When I set the first_name field to "Ron" both binary serialization and
> JSON serialization work fine.
>
>
> But when I set it to "רון" (as UTF8) , while the serialization to binary
> is correct (shown here as base64):
>
> *CgbXqNeV158=*
> ... when using *BinaryToJsonString *to get the JSON representation the
> value is mishandled and is ultimatately replaced with an empty string:
> { "firstName": "" }
>
>
> This example will probably only work correctly with compilers that define
> char as unsigned by default, but with compilers that define char as signed
> (such as Microsoft's) - I think you should get the same (incorrect) result
> I pasted above.
>
Thanks for the explanation. Could you help file a bug for this on protobuf
github site? If you know of an solution to this, you are also welcomed to
send us a pull request.


>
>
>
> On Tuesday, November 24, 2015 at 10:51:55 PM UTC+2, Feng Xiao wrote:
>>
>>
>>
>> On Tue, Nov 24, 2015 at 11:42 AM, Ron <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> When using *BinaryToJsonString *or *BinaryToJsonStream*, I seem to
>>> encounter a problem whenever there's a message containing a string
>>> containing multibyte characters.
>>> After some debugging, it seems the place where things start to go wrong
>>> is in *ReadCodePoint* (in json_escaping.cc) when the first byte of the
>>> multibyte character is being read from the string (as char) and assigned
>>> into a variable of type uint32. This casting directly from a signed 1-byte
>>> value to an unsigned 4-byte value seems to produce values that are
>>> different than intended and different than expected a little later on by
>>> some *if-else* statements trying to look at that value to determine the
>>> correct length of the multibyte character. From there things go wrong and
>>> the string isn't serialized and just gets dropped...
>>>
>>> For now as a temporary solution I added a cast of the value returned by
>>> StringPiece's *operator[ ]* to uint8 before the assignment into uint32,
>>> but any advice or a more permanent solution will be appreciated.
>>>
>> Could you provide a sample input that will fail for this reason?
>>
>>
>
>>> Thanks,
>>> Ron
>>>
>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Protocol Buffers" group.
>>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to [email protected].
>>> To post to this group, send email to [email protected].
>>
>>
>>> Visit this group at http://groups.google.com/group/protobuf.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Reply via email to