Hi Tal, that sounds like an interesting use case. I think I need a bit more details about your use case to see how it can be done with Flink. You said you need low latency, what latency is acceptable for you?
Also, I was wondering how are you going to feed the input data into Flink? If the data is coming from multiple sources, maybe everything can be done completely parallel. Do you need any fault tolerance guarantees? You can use Flink's DataStream abstraction with different data types, and you could create a DataStream<byte>. Flink would internally still send multiple of those records in one buffer. I think the more efficient approach is, as you suggested, to use a DataStream<byte[]> of larger chunks. What kind of transformations are you planning to do on the stream? Regarding the amount of data we are talking about here: Flink is certainly able to handle those loads. I recently did some tests with our KafkaConsumer and I was able to read 390 megabytes/second on my laptop, using a parallelism of one (so only one reading thread). My SSD has a read rate of 530 MBs/. With sufficiently fast hardware, a few Flink TaskManagers will be able to read 600MB/s. On Wed, Jan 20, 2016 at 1:39 PM, Ritesh Kumar Singh < riteshoneinamill...@gmail.com> wrote: > I think with sufficient processing power flink can do the above mentioned > task using the stream api > <https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/index.html> > . > > Thanks, > *Ritesh Kumar Singh,* > *https://riteshtoday.wordpress.com/* <https://riteshtoday.wordpress.com/> > > On Wed, Jan 20, 2016 at 11:18 AM, Tal Maoz <magogo...@gmail.com> wrote: > >> Hey, >> >> >> >> I’m a new user to Flink and I’m trying to figure out if I can build a >> pipeline I’m working on using Flink. >> >> I have a data source that sends out a continues data stream at a >> bandwidth of anywhere between 45MB/s to 600MB/s (yes, that’s MiB/s, not >> Mib/s, and NOT a series of individual messages but an actual continues >> stream of data where some data may depend on previous or future data to be >> fully deciphered). >> >> I need to be able to pass the data through several processing stages >> (that manipulate the data but still produce the same order of magnitude >> output at each stage) and I need processing to be done with low-latency. >> >> The data itself CAN be segmented but the segments will be some HUGE >> (~100MB – 250MB) and I would like to be able to stream data in and out of >> the processors ASAP instead of waiting for full segments to be complete at >> each stage (so bytes will flow in/out as soon as they are available). >> >> >> >> The obvious solution would be to split the data into very small buffers, >> but since each segment would have to be sent completely to the same >> processor node (and not split between several nodes), doing such >> micro-batching would be a bad idea as it would spread a single segment’s >> buffers between multiple nodes. >> >> >> >> Is there any way to accomplish this with Flink? Or is Flink the wrong >> platform for that type of processing? >> >> >> >> Any help would be greatly appreciated! >> >> >> >> Thanks, >> >> >> >> Tal >> > >