slot

m@xi Mon, 30 Oct 2017 08:22:03 -0700

Hi all,

After trying to understand exactly how keyBy works internally, I did not get
anything more than "it applies obj.hashcode() % n", where n is the number of
tasks/processors.


This post for example
https://stackoverflow.com/questions/45062061/why-is-keyed-stream-on-a-keyby-creating-skewed-downstream-execution,
suggest to implement a KeySelector and write our own hashcode function.
Though none of the above is clear, especially the hashcode part.

I am running a pc with 4 slots/processors and I would like to hash each
record based on a certain field to a specific processor. Ideally, lets say
that the 4 processors have ids: 0, 1, 2, 3. Then I would like to send the
tuples whose (key % 4) = 0 to the proc with id 0,  (key % 4) = 1 to the proc
with id 1 etc etc.

I would like to know exactly to which processor/task each tuple goes.
Can I do that deterministically with keyBy in Flink??

Thanks in advance.
Max



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Use keyBy to deterministically hash each record to a processor/task/slot

Reply via email to