Hello, One of the applications Spire [1] is using Flink for is to process AIS [2] data collected by our satellites and from other sources. AIS is transmitting a ships' static and dynamic information, such as names, callsigns or positions. One of the challenges processing AIS data is that there are no unique keys, since the mmsi or imo can be spoofed or is sometimes shared between vessels.
To deal with multiple vessels per mmsi we use a Keyed Process Function that keeps state per detected vessel, data about the vessel is stored in the state of the function and is hard to transfer out of the batch processing. Batch processing really helps to collect data about a vessel and is therefore necessary for us before we can switch to stream mode. Since the state and the outputs are not the same the reconstruction of the state for stream mode can't be achieved by feeding the outputs into the pipeline via some source. Therefore we need code in our batch job just to deal with extracting the state. A vessel is usually outputted for each update that is received for it, but outputting it together with it's entire state is not desirable for performance reasons in batch mode. Also some vessels should never be outputted but need to be restored. The pipeline has a couple of stateful functions and the more we add the harder it gets to restore the state. Best, Jörn