On 2018/09/13 03:30:28, Ken Krugler <kkrugler_li...@transpac.com> wrote: 
> Hi Bhaskar,
> 
> > On 2018/09/12 20:42:22, Ken Krugler <kkrugler_li...@transpac.com> wrote: 
> >> Hi Bhaskar,
> >> 
> >> I assume you don’t have 1000 streams, but rather one (keyed) stream with 
> >> 1000 different key values, yes?
> >> 
> >> If so, then this one stream is physically partitioned based on the 
> >> parallelism of the operator following the keyBy(), not per unique key.
> >> 
> >> The most common per-key “resource” is the memory required for each key's 
> >> state, if you’ve got any operations that need to maintain state 
> >> (accumulators, windows, etc).
> >> 
> >> For 1000 unique keys, this should be negligible.
> >> 
> >> — Ken
> >> 
> >> 
> >>> On Sep 12, 2018, at 9:55 AM, bhaskar.eba...@gmail.com 
> >>> <mailto:bhaskar.eba...@gmail.com> wrote:
> >>> 
> >>> Hi
> >>> 
> >>> I have created a KeyedStream with state as explained below
> >>> For example i have created 1000 streams,  out of which 50% of streams 
> >>> data is going to come once in 8 hours. Will the resources of these under 
> >>> utilized streams are idle for that duration? Or Flink internal task 
> >>> manager is having some strategy to utilize them for other new streams 
> >>> that are coming?
> >>> Regards
> >>> Bhaskar
> >> 
> > Hi Ken
> > As per documentation it is showing: 
> > https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/
> > On DataStream if we apply KeyBy  then output is KeyedStream.
> 
> Correct.
> 
> > Once its stream means it should execute in parallel right?
> 
> It will operate in parallel based on the parallelism of the downstream 
> operations being applied to the KeyedStream.
> 
> Number of unique keys has nothing to do with number of parallel 
> (simultaneous) operators being used to process the KeyedStream.
> 
> > There will be 1000 streams each is having Keyed State.
> 
> “Stream” has a specific meaning in Flink, which I think you’re not using as 
> intended here.
> 
> > What you are saying is the main over head here is only memory. That means 
> > Does these 1000 streams are going to be run across 1000 task slots in 
> > parallel?  These 1000 task slots is the main memory over head? Even 50% of 
> > them idle its not harm?
> 
> See above - you don’t have 1000 “task slots” and you don’t have 1000 stream.
> 
> You have N operators running at the same time, where N is based on the 
> parallelism that you set (either implicitly, or explicitly) for the 
> operator(s) processing the KeyedStream.
> 
> Note that If you have 1000 unique keys, and you’ve got (for example) a single 
> ValueState per key, then you’d have 1000 states.
> 
> But if you have say a sliding window, then the number of states per key can 
> grow significantly, since each key can have multiple states (one per each 
> “open window”).
> 
> But also note that using RocksDB to handle state means that not all state has 
> to be in memory at the same time, so you’ve got more room to scale.
> 
> 
> Regards,
> 
> — Ken
> 
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
> 
> 
Thanks Ken for the detailed clarification!

Regards
Bhaskar

Reply via email to