It likely does not make sense to publish a file ( "batch data") into Kafka; unless the file is very small.
An improvised pub-sub mechanism for Kafka could be to (a) write the file into a persistent store outside of kafka (b) publishing of a message into Kafka about that write so as to enable processing of that file. If you really needed to have provenance around processing, you could route data processing through Nifi before Flink. Regards Milind On Jul 19, 2016 9:37 PM, "Leith Mudge" <lei...@palamir.com> wrote: > I am currently working on an architecture for a big data streaming and > batch processing platform. I am planning on using Apache Kafka for a > distributed messaging system to handle data from streaming data sources and > then pass on to Apache Flink for stream processing. I would also like to > use Flink's batch processing capabilities to process batch data. > > Does it make sense to pass the batched data through Kafka on a periodic > basis as a source for Flink batch processing (is this even possible?) or > should I just write the batch data to a data store and then process by > reading into Flink? > > ------------------------------ > > | All rights in this email and any attached documents or files are > expressly reserved. This e-mail, and any files transmitted with it, > contains confidential information which may be subject to legal privilege. > If you are not the intended recipient, please delete it and notify Palamir > Pty Ltd by e-mail. Palamir Pty Ltd does not warrant this transmission or > attachments are free from viruses or similar malicious code and does not > accept liability for any consequences to the recipient caused by opening or > using this e-mail. For the legal protection of our business, any email sent > or received by us may be monitored or intercepted. | Please consider the > environment before printing this email. | >