Hi Avi,

The typical approach would be as you've described in #1.  #2 is not
necessary -- #1 is already doing basically exactly that.

-Jamie


On Wed, Nov 21, 2018 at 3:36 AM Avi Levi <avi.l...@bluevoyant.com> wrote:

> Hi ,
> I am very new to flink so please be gentle :)
>
> *The challenge:*
> I have a road sensor that should scan billons of cars per day. for starter
> I want to recognise if each car that passes by is new or not. new cars
> (never been seen before by that sensor ) will be placed on a different
> topic on kafka than the other (total of two topics for new and old) .
>  under the assumption that the state will contain billions of unique car
> ids.
>
> *Suggested Solutions*
> My question is it which approach is better.
> Both approaches using RocksDB
>
> 1. use the ValueState and to split the steam like
>   *val domainsSrc = env*
> *    .addSource(consumer)*
> *    .keyBy(car => car.id <http://car.id>)*
> *    .map(...)*
> and checking if the state value is null to recognise new cars. if new than
> I will update the state
> how will the persistent data will be shard among the nodes in the cluster
> (let's say that I have 10 nodes) ?
>
> 2. use MapState and to partition the stream to groups by some arbitrary
> factor e.g
> *val domainsSrc = env*
> *    .addSource(consumer)*
> *    .keyBy{ car =>*
> *        val h car.id.hashCode % partitionFactor*
> *        math.abs(h)*
> *    } .map(...)*
> and to check *mapState.keys.contains(car.id <http://car.id>) *if not -
> add it to the state
>
> which approach is better ?
>
> Thanks in advance
> Avi
>

Reply via email to