Hi ,
I am very new to flink so please be gentle :)

*The challenge:*
I have a road sensor that should scan billons of cars per day. for starter
I want to recognise if each car that passes by is new or not. new cars
(never been seen before by that sensor ) will be placed on a different
topic on kafka than the other (total of two topics for new and old) .
 under the assumption that the state will contain billions of unique car
ids.

*Suggested Solutions*
My question is it which approach is better.
Both approaches using RocksDB

1. use the ValueState and to split the steam like
  *val domainsSrc = env*
*    .addSource(consumer)*
*    .keyBy(car => car.id <http://car.id>)*
*    .map(...)*
and checking if the state value is null to recognise new cars. if new than
I will update the state
how will the persistent data will be shard among the nodes in the cluster
(let's say that I have 10 nodes) ?

2. use MapState and to partition the stream to groups by some arbitrary
factor e.g
*val domainsSrc = env*
*    .addSource(consumer)*
*    .keyBy{ car =>*
*        val h car.id.hashCode % partitionFactor*
*        math.abs(h)*
*    } .map(...)*
and to check *mapState.keys.contains(car.id <http://car.id>) *if not - add
it to the state

which approach is better ?

Thanks in advance
Avi

Reply via email to