Re: [protobuf] zigzag encode and zigzag decode invoking undefined behavior

'Josh Haberman' via Protocol Buffers Fri, 28 Apr 2017 17:26:45 -0700

If you have a change that makes protobuf code not rely on undefined 
behavior, and also doesn't sacrifice performance, then we should merge it.


Feel free to send a GitHub PR. But make sure the code is C++03 (protobuf 
doesn't use C++11 yet).

On Tuesday, April 25, 2017 at 1:36:31 PM UTC-7, Wolfgang Brehm wrote:
>
> I know, but the proposed solution is still worth considering as it gives 
> you the best of both worlds with just a little compile time overhead.
> It yields the same result on 2s complement machines as the current 
> function but also on other machines because it is well defined.
> This means that
> clang++ -fsanitize=undefined
> will not have anything to complain either.
> TL;DR: You loose nothing but time and gain better defined code, but 
> basically everything stays the same.
>
> Am Dienstag, 25. April 2017 20:17:33 UTC+2 schrieb Feng Xiao:
>>
>> Right now protobuf implementation assumes 2s complement and won't work on 
>> any other environments. That's undefined behavior according to C++ standard 
>> but in practice it's very unlikely to be a problem.
>>
>> On Tue, Apr 25, 2017 at 5:42 AM, Wolfgang Brehm <[email protected]> 
>> wrote:
>>
>>> right shifting negative integers is tecnically undefined, this means 
>>> that the implementation used for encoding integers:
>>> (n<<1)^(n>>(digits-1))
>>> is tecnically undefined according to:
>>>
>>>> The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits 
>>>> are zero-filled. If E1 has an unsigned type, the value of the result is E1 
>>>> × 2 E2, reduced modulo one more than the maximum value representable in 
>>>> the 
>>>> result type. Otherwise, if E1 has a signed type and non-negative value, 
>>>> and 
>>>> E1 × 2 E2 is representable in the corresponding unsigned type of the 
>>>> result 
>>>> type, then that value, converted to the result type, is the resulting 
>>>> value; otherwise, the behavior is undefined.
>>>
>>>
>>> and the implementation for decoding:
>>> (n>>1)^(-(n&1))
>>> depends on the 2s complement as well and is probably tecnically 
>>> undefined (but I cant find a quote from the standard).
>>>
>>> This is why I propose the following solution:
>>>
>>> template<typename S,class U = typename std::make_unsigned<S>::type>>
>>> U constexpr zigzag_encode(const S &n){
>>>   return (U(n)<<1)^(n<0?~U(0):U(0));
>>> }
>>>
>>> template<typename U,class S = typename std::make_signed<U>::type>>
>>> S constexpr zigzag_decode(const U &n){
>>>   return (n&1)?-1-S(n>>1):(n>>1);
>>> }
>>>
>>> This does not look as cool, but modern compilers (llvm and gcc were 
>>> tested) will compile to bascially the same instructions, gcc will introduce 
>>> 1 additional instruction for decode.
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Protocol Buffers" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/protobuf.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Re: [protobuf] zigzag encode and zigzag decode invoking undefined behavior

Reply via email to