Re: Questions about keyed streams

2021-09-28 Thread Dan Hill
Hi! I'm just getting back to this. Questions: 1. Across operators, does the same key group ids get mapped to the same task managers? E.g. if an item is in key group 1 of operator A and that runs on taskmanager-0, will key group 1 of operator B also run on taskmanager-0? 2. Are there any internal

Re: Questions about keyed streams

2021-07-29 Thread Arvid Heise
Afaik you can express the partition key in Table API now which will be used for co-location and optimization. So I'd probably give that a try first and convert the Table to DataStream where needed. On Sat, Jul 24, 2021 at 9:22 PM Dan Hill wrote: > Thanks Fabian and Senhong! > > Here's an example

Re: Questions about keyed streams

2021-07-24 Thread Dan Hill
Thanks Fabian and Senhong! Here's an example diagram of the join that I want to do. There are more layers of joins. https://docs.google.com/presentation/d/17vYTBUIgrdxuYyEYXrSHypFhwwS7NdbyhVgioYMxPWc/edit#slide=id.p 1) Thanks! I'll look into these. 2) I'm using the same key across multiple Kaf

Re: Questions about keyed streams

2021-07-23 Thread Senhong Liu
Hi Dan, 1) If the key doesn’t change in the downstream operators and you want to avoid shuffling, maybe the DataStreamUtils#reinterpretAsKeyedStream would be helpful. 2) I am not sure that if you are saying that the data are already partitioned in the Kafka and you want to avoid shuffling in th

Re: Questions about keyed streams

2021-07-22 Thread Fabian Paul
Hi Dan, 1) In general, there is no guarantee that your downstream operator is on the same TM although working on the same key group. Nevertheless, you can try force this kind of behaviour to prevent the network transfer by either chaining the two operators (if no shuffle is in between) or confi

Questions about keyed streams

2021-07-21 Thread Dan Hill
Hi. 1) If I use the same key in downstream operators (my key is a user id), will the rows stay on the same TaskManager machine? I join in more info based on the user id as the key. I'd like for these to stay on the same machine rather than shuffle a bunch of user-specific info to multiple task m