leerho commented on PR #111:
URL: 
https://github.com/apache/datasketches-rust/pull/111#issuecomment-4271498916

   As @proost points out, the different sketches use hash functions in very 
different ways so I'm not sure there is a common "hashing model" across all the 
different sketches.  
   
   Nonetheless, in order to have cross-language compatibility each language 
needs to ensure that data is presented to hash functions in a deterministic and 
non-ambiguous way, and given the same data, the same hash algorithm in every 
language will result in the same hash values.
   
   Different languages have their own primitive data representations and 
implicit integral and floating-point conversion rules and therefore must be 
dealt with on a language by language basis.  
   
   Java was the first language that the library started with and, 
unfortunately, does not yet support unsigned integral primitives (except for 
_char_).  As a result, Java sets the common standard for the integral types.   
Java supports a fairly simple set of rules for widening implicit integral 
conversions and floating-point conversions that are used in expressions and for 
method input parameters.
   
   C++ widening implicit conversion rules are more granular and complex because 
it has to account for machine-dependent type sizes and unsigned variants.  As 
you can see in this code 
[snippet](https://github.com/apache/datasketches-cpp/blob/master/theta/include/theta_sketch_impl.hpp#L145-L209),
 a _uint8_t_ is cast to an _int8_t,_ which is then cast to an _int64_t_ before 
being sent to the hash function.   Although C++ can do some of this 
automatically, these extra methods were created to make the widening 
conversions explicit to ensure that data is handled correctly and consistently, 
leaving no doubt in the developer's mind about how the data is being handled.
   
   I am not a Rust expert, so I'll leave it to the Rust folks to decide on the 
best strategy.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to