Hi Stephen,
I found a very nice article [1], which might help you solve the issues
you are concerned about. The elegant solution to this problem might be
summarized as "do not implement equals() and hashCode() for POJO types,
use Object's default implementation". I'm not 100% sure that this will
not have any negative impacts on some other Flink components, but I
_suppose_ it should not (someone might correct me if I'm wrong).
Jan
[1] http://web.mit.edu/6.031/www/sp17/classes/15-equality/
On 10/7/19 1:37 PM, Chesnay Schepler wrote:
This question should only be relevant for cases where POJOs are used
as keys, in which case they /must not/ return a class-constant nor
effectively-random value, as this would break the hash partitioning.
This is somewhat alluded to in the keyBy() documentation
<https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/#datastream-transformations>,
but could be clarified.
It is in any case heavily discouraged to modify objects after they
have been emitted from a function; the mutability of POJOs is hence
usually not a problem.
On 02/10/2019 14:17, Stephen Connolly wrote:
I notice
https://ci.apache.org/projects/flink/flink-docs-stable/dev/types_serialization.html#rules-for-pojo-types
says that all non-transient fields need a setter.
That means that the fields cannot be final.
That means that the hashCode() should probably just return a constant
value (otherwise an object could be mutated and then lost from a
hash-based collection.
Is it really the case that we have to either register a serializer or
abandon immutability and consequently force hashCode to be a constant
value?
What are the recommended implementation patterns for the POJOs used
in a topology
Thanks
-Stephen