I'd like to understand which operations can actually leverage "binary
comparisons" when using the DataStream API. This is regarding the
optimisation you receive when using Flink's own built-in serialization stack
as opposed to Avro/Kryo/Json/etc... whereby fields are compared without the
object needing to be deserialized.

I'm expecting that all operations which use field positions (i.e. `keyBy`
and `partitionCustom`) leverage binary comparisons, since they don't require
a reference to a deserialized object. **However I cannot see any other
methods that take field-offset parameters... they all take callbacks... so
does anything else in the DataStream API actually perform binary
comparisons?**

For example, the `join` operation requires a closure to perform the equality
check... which means the objects must be deserialized before being passed
into the closure... unless something really clever is happening
under-the-hood?

Please can you provide a list of stream operations that perform binary
comparisons / avoid deserialization?

Thanks!



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-do-I-ensure-binary-comparisons-are-being-used-tp10806.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Reply via email to