Hi all, After trying to understand exactly how keyBy works internally, I did not get anything more than "it applies obj.hashcode() % n", where n is the number of tasks/processors.
This post for example https://stackoverflow.com/questions/45062061/why-is-keyed-stream-on-a-keyby-creating-skewed-downstream-execution, suggest to implement a KeySelector and write our own hashcode function. Though none of the above is clear, especially the hashcode part. I am running a pc with 4 slots/processors and I would like to hash each record based on a certain field to a specific processor. Ideally, lets say that the 4 processors have ids: 0, 1, 2, 3. Then I would like to send the tuples whose (key % 4) = 0 to the proc with id 0, (key % 4) = 1 to the proc with id 1 etc etc. I would like to know exactly to which processor/task each tuple goes. Can I do that deterministically with keyBy in Flink?? Thanks in advance. Max -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/