Thanks for the reply.
Yeah I mean a manual commit in #3, this is because in this case the offsets
would accurately reflect the number of messages processed.
My understanding is that the current checkpointing process commits the
state of all the operators separately, the kafka connector will commi
There is a pull request for Kerberos Keytab-based authentication. That way,
streaming jobs can run longer than 7 days.
https://github.com/apache/flink/pull/2275
On Wed, Aug 3, 2016 at 3:19 PM, Ufuk Celebi wrote:
> On Wed, Aug 3, 2016 at 2:07 PM, Prabhu V wrote:
> > Obeservations with Streaming
On Wed, Aug 3, 2016 at 2:07 PM, Prabhu V wrote:
> Obeservations with Streaming.
>
> 1) Long running kerberos fails in 7 days (the data that is held in the
> window buffer is lost and restart results in event loss)
This is a known issue I think. Looping in Max who knows the details I think.
> 2)
I understand flink does steaming, but i feel my requirement is more batch
oriented.
Read froma kafka cluster,
Do a little data massaging
Bucket data into hadoop files that are atleast one hdfs block in size.
Our environment is Yarn and kerberized (kafka and hadoop, i am currently
allowed pass the