Hi , I am very new to flink so please be gentle :) *The challenge:* I have a road sensor that should scan billons of cars per day. for starter I want to recognise if each car that passes by is new or not. new cars (never been seen before by that sensor ) will be placed on a different topic on kafka than the other (total of two topics for new and old) . under the assumption that the state will contain billions of unique car ids.
*Suggested Solutions* My question is it which approach is better. Both approaches using RocksDB 1. use the ValueState and to split the steam like *val domainsSrc = env* * .addSource(consumer)* * .keyBy(car => car.id <http://car.id>)* * .map(...)* and checking if the state value is null to recognise new cars. if new than I will update the state how will the persistent data will be shard among the nodes in the cluster (let's say that I have 10 nodes) ? 2. use MapState and to partition the stream to groups by some arbitrary factor e.g *val domainsSrc = env* * .addSource(consumer)* * .keyBy{ car =>* * val h car.id.hashCode % partitionFactor* * math.abs(h)* * } .map(...)* and to check *mapState.keys.contains(car.id <http://car.id>) *if not - add it to the state which approach is better ? Thanks in advance Avi