Failover

2019-01-20 Thread Amin Sadeghi
Hi
Please help me for kafka failover
https://stackoverflow.com/questions/54274799/kafka-broker-failover-not-worked


Re: Failover

2019-01-20 Thread M. Manna
I’m simply giving my comments but others please correct me if this is not
correct.

For Scenario 1 - your leader node0 is offline - and your quorum is now not
stable. So if a leader goes offline it's not certain who will take over. So
that's why 3 is recommended as a minimum, however not the best number to
deploy in production.

For Scenario 2 - Your leader node0 is online. So even if node 1 or 2 is
stopped, the consumers can get the message from leaders, not followers.

So this is not unexpected.

Thanks,

On Sun, 20 Jan 2019 at 11:17, Amin Sadeghi  wrote:

> Hi
> Please help me for kafka failover
>
> https://stackoverflow.com/questions/54274799/kafka-broker-failover-not-worked
>


Re: Failover

2019-01-20 Thread Amin Sadeghi
Thanks for answers
But if a leader goes offline , when a follower become leader, in other
words when a leader of partitions will changed

On Sun, Jan 20, 2019 at 3:03 PM M. Manna  wrote:

> I’m simply giving my comments but others please correct me if this is not
> correct.
>
> For Scenario 1 - your leader node0 is offline - and your quorum is now not
> stable. So if a leader goes offline it's not certain who will take over. So
> that's why 3 is recommended as a minimum, however not the best number to
> deploy in production.
>
> For Scenario 2 - Your leader node0 is online. So even if node 1 or 2 is
> stopped, the consumers can get the message from leaders, not followers.
>
> So this is not unexpected.
>
> Thanks,
>
> On Sun, 20 Jan 2019 at 11:17, Amin Sadeghi 
> wrote:
>
> > Hi
> > Please help me for kafka failover
> >
> >
> https://stackoverflow.com/questions/54274799/kafka-broker-failover-not-worked
> >
>


Re: Failover

2019-01-20 Thread suresh sargar
Zookeeper checks heartbeat of each node.as soon as ZK not able to get
heartbeat from leader, re-election happens. Heartbeat check interval is
configurable. If u run kafka_topic.sh with describe option u will see newly
elected leader and isr will also updated one.

On Sun 20 Jan, 2019, 5:53 PM Amin Sadeghi  Thanks for answers
> But if a leader goes offline , when a follower become leader, in other
> words when a leader of partitions will changed
>
> On Sun, Jan 20, 2019 at 3:03 PM M. Manna  wrote:
>
> > I’m simply giving my comments but others please correct me if this is not
> > correct.
> >
> > For Scenario 1 - your leader node0 is offline - and your quorum is now
> not
> > stable. So if a leader goes offline it's not certain who will take over.
> So
> > that's why 3 is recommended as a minimum, however not the best number to
> > deploy in production.
> >
> > For Scenario 2 - Your leader node0 is online. So even if node 1 or 2 is
> > stopped, the consumers can get the message from leaders, not followers.
> >
> > So this is not unexpected.
> >
> > Thanks,
> >
> > On Sun, 20 Jan 2019 at 11:17, Amin Sadeghi 
> > wrote:
> >
> > > Hi
> > > Please help me for kafka failover
> > >
> > >
> >
> https://stackoverflow.com/questions/54274799/kafka-broker-failover-not-worked
> > >
> >
>


Re: Failover

2019-01-20 Thread Amin Sadeghi
OK , but after re-election happened , consumers do not consume messages

On Sun, Jan 20, 2019 at 4:18 PM suresh sargar 
wrote:

> Zookeeper checks heartbeat of each node.as soon as ZK not able to get
> heartbeat from leader, re-election happens. Heartbeat check interval is
> configurable. If u run kafka_topic.sh with describe option u will see newly
> elected leader and isr will also updated one.
>
> On Sun 20 Jan, 2019, 5:53 PM Amin Sadeghi 
> > Thanks for answers
> > But if a leader goes offline , when a follower become leader, in other
> > words when a leader of partitions will changed
> >
> > On Sun, Jan 20, 2019 at 3:03 PM M. Manna  wrote:
> >
> > > I’m simply giving my comments but others please correct me if this is
> not
> > > correct.
> > >
> > > For Scenario 1 - your leader node0 is offline - and your quorum is now
> > not
> > > stable. So if a leader goes offline it's not certain who will take
> over.
> > So
> > > that's why 3 is recommended as a minimum, however not the best number
> to
> > > deploy in production.
> > >
> > > For Scenario 2 - Your leader node0 is online. So even if node 1 or 2 is
> > > stopped, the consumers can get the message from leaders, not followers.
> > >
> > > So this is not unexpected.
> > >
> > > Thanks,
> > >
> > > On Sun, 20 Jan 2019 at 11:17, Amin Sadeghi 
> > > wrote:
> > >
> > > > Hi
> > > > Please help me for kafka failover
> > > >
> > > >
> > >
> >
> https://stackoverflow.com/questions/54274799/kafka-broker-failover-not-worked
> > > >
> > >
> >
>


Re: Failover

2019-01-20 Thread M. Manna
I think you’re slightly confused with what’s going on here.

Your quorum is broken- when it becomes 2 nodes, how does it make sense to
elect 1 broker and 1 follower ? And why is it that node 1 is the broker,
not follower?

Your issue is the quorum setup. Your setup can only sustain 1 failure,
which you have with node0 going down. Heartbeat isn’t the issue here.

Thanks,

On Sun, 20 Jan 2019 at 12:53, Amin Sadeghi  wrote:

> OK , but after re-election happened , consumers do not consume messages
>
> On Sun, Jan 20, 2019 at 4:18 PM suresh sargar 
> wrote:
>
> > Zookeeper checks heartbeat of each node.as soon as ZK not able to get
> > heartbeat from leader, re-election happens. Heartbeat check interval is
> > configurable. If u run kafka_topic.sh with describe option u will see
> newly
> > elected leader and isr will also updated one.
> >
> > On Sun 20 Jan, 2019, 5:53 PM Amin Sadeghi  wrote:
> >
> > > Thanks for answers
> > > But if a leader goes offline , when a follower become leader, in other
> > > words when a leader of partitions will changed
> > >
> > > On Sun, Jan 20, 2019 at 3:03 PM M. Manna  wrote:
> > >
> > > > I’m simply giving my comments but others please correct me if this is
> > not
> > > > correct.
> > > >
> > > > For Scenario 1 - your leader node0 is offline - and your quorum is
> now
> > > not
> > > > stable. So if a leader goes offline it's not certain who will take
> > over.
> > > So
> > > > that's why 3 is recommended as a minimum, however not the best number
> > to
> > > > deploy in production.
> > > >
> > > > For Scenario 2 - Your leader node0 is online. So even if node 1 or 2
> is
> > > > stopped, the consumers can get the message from leaders, not
> followers.
> > > >
> > > > So this is not unexpected.
> > > >
> > > > Thanks,
> > > >
> > > > On Sun, 20 Jan 2019 at 11:17, Amin Sadeghi 
> > > > wrote:
> > > >
> > > > > Hi
> > > > > Please help me for kafka failover
> > > > >
> > > > >
> > > >
> > >
> >
> https://stackoverflow.com/questions/54274799/kafka-broker-failover-not-worked
> > > > >
> > > >
> > >
> >
>


Re: Failover

2019-01-20 Thread M. Manna
Could you show us the output of Kafka-topics.sh describing your desired
topic status?



On Sun, 20 Jan 2019 at 12:56, M. Manna  wrote:

> I think you’re slightly confused with what’s going on here.
>
> Your quorum is broken- when it becomes 2 nodes, how does it make sense to
> elect 1 broker and 1 follower ? And why is it that node 1 is the broker,
> not follower?
>
> Your issue is the quorum setup. Your setup can only sustain 1 failure,
> which you have with node0 going down. Heartbeat isn’t the issue here.
>
> Thanks,
>

> On Sun, 20 Jan 2019 at 12:53, Amin Sadeghi 
> wrote:
>
OK , but after re-election happened , consumers do not consume messages
>>
>> On Sun, Jan 20, 2019 at 4:18 PM suresh sargar 
>> wrote:
>>
>> > Zookeeper checks heartbeat of each node.as soon as ZK not able to get
>> > heartbeat from leader, re-election happens. Heartbeat check interval is
>> > configurable. If u run kafka_topic.sh with describe option u will see
>> newly
>> > elected leader and isr will also updated one.
>> >
>> > On Sun 20 Jan, 2019, 5:53 PM Amin Sadeghi > wrote:
>> >
>> > > Thanks for answers
>> > > But if a leader goes offline , when a follower become leader, in other
>> > > words when a leader of partitions will changed
>> > >
>> > > On Sun, Jan 20, 2019 at 3:03 PM M. Manna  wrote:
>> > >
>> > > > I’m simply giving my comments but others please correct me if this
>> is
>> > not
>> > > > correct.
>> > > >
>> > > > For Scenario 1 - your leader node0 is offline - and your quorum is
>> now
>> > > not
>> > > > stable. So if a leader goes offline it's not certain who will take
>> > over.
>> > > So
>> > > > that's why 3 is recommended as a minimum, however not the best
>> number
>> > to
>> > > > deploy in production.
>> > > >
>> > > > For Scenario 2 - Your leader node0 is online. So even if node 1 or
>> 2 is
>> > > > stopped, the consumers can get the message from leaders, not
>> followers.
>> > > >
>> > > > So this is not unexpected.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > On Sun, 20 Jan 2019 at 11:17, Amin Sadeghi > >
>> > > > wrote:
>> > > >
>> > > > > Hi
>> > > > > Please help me for kafka failover
>> > > > >
>> > > > >
>> > > >
>> > >
>>
> >
>> https://stackoverflow.com/questions/54274799/Kafka-broker-failover-not-worked
>> 
>> > > > >
>> > > >
>> > >
>> >
>>
>


Re: Why do the offsets of the consumer-group (app-id) of my Kafka Streams Application get reset after application restart?

2019-01-20 Thread Matthias J. Sax
Seems this question was cross posted on SO:
https://stackoverflow.com/questions/54145281/why-do-the-offsets-of-the-consumer-group-app-id-of-my-kafka-streams-applicatio


On 1/14/19 8:49 AM, Jonathan Santilli wrote:
> Hello Bill, thanks a lot for the reply,
> I will implement your recommendation about the
> *KafkaStreams#setGlobalStateRestoreListener.*
> 
> About your question:
> 
> *When you say you have used both "exactly once" and "at least once" for the*
> *"at least once" case did you run for a while in that mode then restart?*
> 
> *Yes, I have done that among other combinations, but the same behaviour.*
> 
> This is what I see in the logs after restart:
> 
> INFO  [*APP-ID*-51df00e9-8b2e-42e5-8d62-6fbf506035d2-StreamThread-3]
> internals.StoreChangelogReader (StoreChangelogReader.java:215) -
> stream-thread [*APP-ID*-51df00e9-8b2e-42e5-8d62-6fbf506035d2-StreamThread-3]
> No checkpoint found for task 1_8 state store
> KTABLE-SUPPRESS-STATE-STORE-11 changelog
> *APP-ID-KTABLE-SUPPRESS-STATE-STORE-11-changelog-8* with EOS turned
> on. Reinitializing the task and restore its state from the beginning.
> 
> INFO  [*APP-ID*-51df00e9-8b2e-42e5-8d62-6fbf506035d2-StreamThread-3]
> internals.Fetcher (Fetcher.java:583) - [Consumer
> clientId=*APP-ID*-51df00e9-8b2e-42e5-8d62-6fbf506035d2-StreamThread-3-restore-consumer,
> groupId=] Resetting offset for partition
> *APP-ID-KTABLE-SUPPRESS-STATE-STORE-11-changelog-8
> to offset 0*.
> 
> 
> Before I restart, I always check the LAG for the consumer group (*APP-ID*)
> reading from the output topic 'outPutTopicNameOfGroupedData' to verify is
> 1. Immediately after the restart and verify the logs above, the LAG for
> that consumer group (*APP-ID*) reading from the output topic '
> outPutTopicNameOfGroupedData' goes up, increasing so much that the App
> reading from 'outPutTopicNameOfGroupedData' topic, is re-processing the
> data again.
> 
> I hope someone can give me some clue, I will really appreciate.
> 
> 
> Cheers!
> --
> Jonathan
> 
> 
> On Mon, Jan 14, 2019 at 4:12 PM Bill Bejeck  wrote:
> 
>> Hi Jonathan,
>>
>> With EOS enabled, Kafka Streams does not use checkpoint files for restoring
>> state stores; it will replay the data contained in the changelog topic.
>> But this should not affect where the input source topic(s) after a restart
>> also the changelog topics are only consumed from during a restore (or for
>> keeping standby tasks up to date).
>>
>> When you say you have used both "exactly once" and "at least once" for the
>> "at least once" case did you run for a while in that mode then restart? You
>> can confirm how much data and from which offset the streams is restoring a
>> state store by using a custom implementation of the StateRestoreListener
>> class and set it via the KafkaStreams#setGlobalStateRestoreListener.
>>
>> -Bill
>>
>>
>> On Mon, Jan 14, 2019 at 7:32 AM Jonathan Santilli <
>> jonathansanti...@gmail.com> wrote:
>>
>>> I have a Kafka Streams application for which, whenever I restart it, the
>>> offsets for the topic partitions (*KTABLE-SUPPRESS-STATE-STORE*) it is
>>> consuming get reset to 0. Hence, for all partitions, the lags increase
>> and
>>> the app needs to reprocess all the data.
>>>
>>> I have ensured the lag is 1 for every partition before the restart. All
>>> consumers that belong to that consumer-group-id (app-id) are active. The
>>> restart is immediate, it takes around 30 secs.
>>>
>>> The app is using exactly once as processing guarantee.
>>>
>>> I have read this answer How does an offset expire for an Apache Kafka
>>> consumer group?
>>> <
>>>
>> https://stackoverflow.com/questions/39131465/how-does-an-offset-expire-for-an-apache-kafka-consumer-group

>>> .
>>>
>>> I have tried with *auto.offset.reset = latest* and *auto.offset.reset =
>>> earliest*.
>>>
>>> I assume that after the restart the app should pick-up from the latest
>>> committed offset for that consumer group.
>>>
>>> It is possible to know why the offsets are getting reset from 0?
>>>
>>> I would really appreciate any clue about this.
>>>
>>> This is the code the App execute:
>>>
>>> final StreamsBuilder builder = new StreamsBuilder();
>>> final KStream<..., ...> events = builder
>>> .stream(inputTopicNames, Consumed.with(..., ...)
>>> .withTimestampExtractor(...);
>>>
>>> events
>>> .filter((k, v) -> ...)
>>> .flatMapValues(v -> ...)
>>> .flatMapValues(v -> ...)
>>> .selectKey((k, v) -> v)
>>> .groupByKey(Grouped.with(..., ...))
>>> .windowedBy(
>>> TimeWindows.of(Duration.ofSeconds(windowSizeInSecs))
>>> .advanceBy(Duration.ofSeconds(windowSizeInSecs))
>>> .grace(Duration.ofSeconds(windowSizeGraceInSecs)))
>>> .reduce((agg, new) -> {
>>> ...
>>> return agg;
>>> })
>>> .suppress(Suppressed.untilWindowCloses(
>>>   Suppressed.BufferConfig.unbounded()))
>>> .toStream()
>>> .to(outPutTopicNameOfGroupedData, Pro