I meant to respond to this thread yesterday, but got busy with work and
slipped me.

This is possible doable using Flink Streaming, others can correct me here.

*Assumption:* Both the Batch and Streaming processes are reading from a
single Kafka topic and by "Batched data", I am assuming its the same data
that's being fed to Streaming but aggregated over a longer time period.

This could be done using a Lambda like Architecture.

1. A Kafka topic that's ingesting data to be distributed to various
consumers.
2. A Flink Streaming process with a small time window (minutes/seconds)
that's ingesting from Kafka and handles data over this small window.
3. Another Flink Streaming process with a very long time window (few hrs ?)
that's also ingesting from Kafka and is munging over large time periods of
data (think mini-batch that extends Streaming).

This should work and u don't need a separate Batch process.  A similar
architecture using Spark Streaming (for both batch and streaming) is
demonstrated by Cloudera's Oryx 2.0 project - see http://oryx.io


On Thu, Jul 21, 2016 at 12:41 PM, milind parikh <milindspar...@gmail.com>
wrote:

> At this point in time, imo, batch processing is not why you should be
> considering Flink.
>
> That said, I predict that the stream processing (and event processing)
> will become the dominant methodology; as we begin to gravitate towards  "I
> can't wait; I want it now" phenomenon. In that methodology,  I believe
> Flink represents the cutting edge of what is possible; at this point in
> time.
>
> Regards
> Milind
>
> On Jul 20, 2016 4:57 PM, "Leith Mudge" <lei...@palamir.com> wrote:
>
> Thanks Milind & Till,
>
>
>
> This is what I thought from my reading of the documentation but it is nice
> to have it confirmed by people more knowledgeable.
>
>
>
> Supplementary to this question is whether Flink is the best choice for
> batch processing at this point in time or whether I would be better to look
> at a more mature and dedicated batch processing engine such as Spark? I do
> like the choices that adopting the unified programming model outlined in
> Apache Beam/Google Cloud Dataflow SDK and this purports to have runners for
> both Flink and Spark.
>
>
>
> Regards,
>
>
>
> Leith
>
> *From: *Till Rohrmann <trohrm...@apache.org>
> *Date: *Wednesday, 20 July 2016 at 5:05 PM
> *To: *<user@flink.apache.org>
> *Subject: *Re: Using Kafka and Flink for batch processing of a batch data
> source
>
>
>
> At the moment there is also no batch source for Kafka. I'm also not so
> sure how you would define a batch given a Kafka stream. Only reading till a
> certain offset? Or maybe until one has read n messages?
>
>
>
> I think it's best to write the batch data to HDFS or another batch data
> store.
>
>
>
> Cheers,
>
> Till
>
>
>
> On Wed, Jul 20, 2016 at 8:08 AM, milind parikh <milindspar...@gmail.com>
> wrote:
>
> It likely does not make sense to publish a file ( "batch data") into
> Kafka; unless the file is very small.
>
> An improvised pub-sub mechanism for Kafka could be to (a) write the file
> into a persistent store outside of kafka (b) publishing of a message into
> Kafka about that write so as to enable processing of that file.
>
> If you really needed to have provenance around processing, you could route
> data processing through Nifi before Flink.
>
> Regards
> Milind
>
>
>
> On Jul 19, 2016 9:37 PM, "Leith Mudge" <lei...@palamir.com> wrote:
>
> I am currently working on an architecture for a big data streaming and
> batch processing platform. I am planning on using Apache Kafka for a
> distributed messaging system to handle data from streaming data sources and
> then pass on to Apache Flink for stream processing. I would also like to
> use Flink's batch processing capabilities to process batch data.
>
> Does it make sense to pass the batched data through Kafka on a periodic
> basis as a source for Flink batch processing (is this even possible?) or
> should I just write the batch data to a data store and then process by
> reading into Flink?
>
>
> ------------------------------
>
>
> | All rights in this email and any attached documents or files are
> expressly reserved. This e-mail, and any files transmitted with it,
> contains confidential information which may be subject to legal privilege.
> If you are not the intended recipient, please delete it and notify Palamir
> Pty Ltd by e-mail. Palamir Pty Ltd does not warrant this transmission or
> attachments are free from viruses or similar malicious code and does not
> accept liability for any consequences to the recipient caused by opening or
> using this e-mail. For the legal protection of our business, any email sent
> or received by us may be monitored or intercepted. | Please consider the
> environment before printing this email. |
>
>
>
> ------------------------------
>
> | All rights in this email and any attached documents or files are
> expressly reserved. This e-mail, and any files transmitted with it,
> contains confidential information which may be subject to legal privilege.
> If you are not the intended recipient, please delete it and notify Palamir
> Pty Ltd by e-mail. Palamir Pty Ltd does not warrant this transmission or
> attachments are free from viruses or similar malicious code and does not
> accept liability for any consequences to the recipient caused by opening or
> using this e-mail. For the legal protection of our business, any email sent
> or received by us may be monitored or intercepted. | Please consider the
> environment before printing this email. |
>
>
>

Reply via email to