Re: kafka partitions, data locality

2019-04-30 Thread Fabian Hueske
efan Richter [mailto:s.rich...@ververica.com] > *Sent:* Friday, April 26, 2019 11:15 AM > *To:* Smirnov Sergey Vladimirovich (39833) > *Cc:* Dawid Wysakowicz ; Ken Krugler < > kkrugler_li...@transpac.com>; u...@flink.apache.org; dev@flink.apache.org > *Subject:* Re: kafka partitio

RE: kafka partitions, data locality

2019-04-29 Thread Smirnov Sergey Vladimirovich (39833)
rugler mailto:kkrugler_li...@transpac.com>> Cc: u...@flink.apache.org<mailto:u...@flink.apache.org>; dev@flink.apache.org<mailto:dev@flink.apache.org> Subject: Re: kafka partitions, data locality Hi Smirnov, Actually there is a way to tell Flink that data is already partitioned. You can t

RE: kafka partitions, data locality

2019-04-26 Thread Smirnov Sergey Vladimirovich (39833)
[mailto:kkrugler_li...@transpac.com] Sent: Wednesday, April 17, 2019 9:23 PM To: Smirnov Sergey Vladimirovich (39833) <mailto:s.smirn...@tinkoff.ru> Subject: Re: kafka partitions, data locality Hi Sergey, As you surmised, once you do a keyBy/max on the Kafka topic, to group by clientId and find the max

Re: kafka partitions, data locality

2019-04-26 Thread Stefan Richter
, > Sergey > From: Ken Krugler [mailto:kkrugler_li...@transpac.com > <mailto:kkrugler_li...@transpac.com>] > Sent: Wednesday, April 17, 2019 9:23 PM > To: Smirnov Sergey Vladimirovich (39833) > <mailto:s.smirn...@tinkoff.ru> > Subject: Re: kafka partitions, data loc

Re: kafka partitions, data locality

2019-04-25 Thread Dawid Wysakowicz
; With best regards, > > Sergey > > *From:*Ken Krugler [mailto:kkrugler_li...@transpac.com] > *Sent:* Wednesday, April 17, 2019 9:23 PM > *To:* Smirnov Sergey Vladimirovich (39833) > *Subject:* Re: kafka partitions, data locality > >   > > Hi Sergey, > >   >

RE: kafka partitions, data locality

2019-04-23 Thread Smirnov Sergey Vladimirovich (39833)
: kafka partitions, data locality Hi Sergey, As you surmised, once you do a keyBy/max on the Kafka topic, to group by clientId and find the max, then the topology will have a partition/shuffle to it. This is because Flink doesn’t know that client ids don’t span Kafka partitions. I don’t know of