Re: How to partition within same physical node in Flink

2018-07-04 Thread Ashish Pokharel
Thanks - I will wait for Stefan’s comments before I start digging in. > On Jul 4, 2018, at 4:24 AM, Fabian Hueske wrote: > > Hi Ashish, > > I think we don't want to make it an official public API (at least not at this > point), but maybe you can dig into the internal API and leverage it for yo

Re: How to partition within same physical node in Flink

2018-07-04 Thread Fabian Hueske
Hi Ashish, I think we don't want to make it an official public API (at least not at this point), but maybe you can dig into the internal API and leverage it for your use case. I'm not 100% sure about all the implications, that's why I pulled in Stefan in this thread. Best, Fabian 2018-07-02 15:3

Re: How to partition within same physical node in Flink

2018-07-02 Thread ashish pok
Thanks Fabian! It sounds like KeyGroup will do the trick if that can be made publicly accessible. On Monday, July 2, 2018, 5:43:33 AM EDT, Fabian Hueske wrote: Hi Ashish, hi Vijay, Flink does not distinguish between different parts of a key (parent, child) in the public APIs. However,

Re: How to partition within same physical node in Flink

2018-07-02 Thread Fabian Hueske
Hi Ashish, hi Vijay, Flink does not distinguish between different parts of a key (parent, child) in the public APIs. However, there is an internal concept of KeyGroups which is similar to what you call physical partitioning. A KeyGroup is a group of keys that are always processed on the same physi

Re: How to partition within same physical node in Flink

2018-06-28 Thread ashish pok
Fabian, All, Along this same line, we have a datasource where we have parent key and child key. We need to first keyBy parent and then by child. If we want to have physical partitioning in a way where physical partiotioning happens first by parent key and localize grouping by child key, is there

Re: How to partition within same physical node in Flink

2018-06-28 Thread Fabian Hueske
Hi Vijay, Flink does not provide fine-grained control to place keys to certain slots or machines. When specifying a key, it is up to Flink (i.e., its internal hash function) where the data is processed. This works well for large key spaces, but can be difficult if you have only a few keys. So, ev

Re: How to partition within same physical node in Flink

2018-06-26 Thread Vijay Balakrishnan
Hi Fabian, Thanks once again for your reply. I need to get the data from each cam/camera into 1 partition/slot and not move the gigantic video data around as much as I perform other operations on it. For eg, I can get seq#1 and seq#2 for cam1 in cam1 partition/slot and then combine, split,parse, st

Re: How to partition within same physical node in Flink

2018-06-26 Thread Fabian Hueske
Hi, keyBy() does not work hierarchically. Each keyBy() overrides the previous partitioning. You can keyBy(cam, seq#) which guarantees that all records with the same (cam, seq#) are processed by the same parallel instance. However, Flink does not give any guarantees about how the (cam, seq#) partit

Re: How to partition within same physical node in Flink

2018-06-25 Thread Vijay Balakrishnan
I see a .slotSharingGroup for SingleOutputStreamOperator which can put parallel instances of operations in same TM slot

Re: How to partition within same physical node in Flink

2018-06-25 Thread Vijay Balakrishnan
Thanks, Fabian. Been reading your excellent book on Flink Streaming.Can't wait for more chapters. Attached a pic. [image: partition-by-cam-ts.jpg] I have records with seq# 1 and cam1 and cam2. I also have records with varying seq#'s. By partitioning on cam field first(keyBy(cam)), I can get cam1

Re: How to partition within same physical node in Flink

2018-06-25 Thread Fabian Hueske
Hi, Flink distributes task instances to slots and does not expose physical machines. Records are partitioned to task instances by hash partitioning. It is also not possible to guarantee that the records in two different operators are send to the same slot. Sharing information by side-passing it (e

How to partition within same physical node in Flink

2018-06-24 Thread Vijay Balakrishnan
Hi, Need to partition by cameraWithCube.getCam() 1st using parallelCamTasks(passed in as args). Then within each partition, need to partition again by cameraWithCube.getTs() but need to make sure each of the 2nd partition by getTS() runs on the same physical node ? How do I achieve that ? DataS