Re: Kakfa batches

2016-08-03 Thread Prabhu V
Thanks for the reply. Yeah I mean a manual commit in #3, this is because in this case the offsets would accurately reflect the number of messages processed. My understanding is that the current checkpointing process commits the state of all the operators separately, the kafka connector will commi

Re: Kakfa batches

2016-08-03 Thread Stephan Ewen
There is a pull request for Kerberos Keytab-based authentication. That way, streaming jobs can run longer than 7 days. https://github.com/apache/flink/pull/2275 On Wed, Aug 3, 2016 at 3:19 PM, Ufuk Celebi wrote: > On Wed, Aug 3, 2016 at 2:07 PM, Prabhu V wrote: > > Obeservations with Streaming

Re: Kakfa batches

2016-08-03 Thread Ufuk Celebi
On Wed, Aug 3, 2016 at 2:07 PM, Prabhu V wrote: > Obeservations with Streaming. > > 1) Long running kerberos fails in 7 days (the data that is held in the > window buffer is lost and restart results in event loss) This is a known issue I think. Looping in Max who knows the details I think. > 2)

Kakfa batches

2016-08-03 Thread Prabhu V
I understand flink does steaming, but i feel my requirement is more batch oriented. Read froma kafka cluster, Do a little data massaging Bucket data into hadoop files that are atleast one hdfs block in size. Our environment is Yarn and kerberized (kafka and hadoop, i am currently allowed pass the