Hey,


I’m a new user to Flink and I’m trying to figure out if I can build a
pipeline I’m working on using Flink.

I have a data source that sends out a continues data stream at a bandwidth
of anywhere between 45MB/s to 600MB/s (yes, that’s MiB/s, not Mib/s, and
NOT a series of individual messages but an actual continues stream of data
where some data may depend on previous or future data to be fully
deciphered).

I need to be able to pass the data through several processing stages (that
manipulate the data but still produce the same order of magnitude output at
each stage) and I need processing to be done with low-latency.

The data itself CAN be segmented but the segments will be some HUGE (~100MB
– 250MB) and I would like to be able to stream data in and out of the
processors ASAP instead of waiting for full segments to be complete at each
stage (so bytes will flow in/out as soon as they are available).



The obvious solution would be to split the data into very small buffers,
but since each segment would have to be sent completely to the same
processor node (and not split between several nodes), doing such
micro-batching would be a bad idea as it would spread a single segment’s
buffers between multiple nodes.



Is there any way to accomplish this with Flink? Or is Flink the wrong
platform for that type of processing?



Any help would be greatly appreciated!



Thanks,



Tal

Reply via email to