Hello,

One of the applications Spire [1] is using Flink for is to process AIS [2]
data collected by our satellites and from other sources. AIS is
transmitting a ships' static and dynamic information, such as names,
callsigns or positions. One of the challenges processing AIS data is that
there are no unique keys, since the mmsi or imo can be spoofed or is
sometimes shared between vessels.

To deal with multiple vessels per mmsi we use a Keyed Process Function that
keeps state per detected vessel, data about the vessel is stored in the
state of the function and is hard to transfer out of the batch processing.
Batch processing really helps to collect data about a vessel and is
therefore necessary for us before we can switch to stream mode.
Since the state and the outputs are not the same the reconstruction of the
state for stream mode can't be achieved by feeding the outputs into the
pipeline via some source. Therefore we need code in our batch job just to
deal with extracting the state.

A vessel is usually outputted for each update that is received for it, but
outputting it together with it's entire state is not desirable for
performance reasons in batch mode. Also some vessels should never be
outputted but need to be restored.

The pipeline has a couple of stateful functions and the more we add the
harder it gets to restore the state.

Best,
Jörn

Reply via email to