Re: consumer group, why commit requests are not considered as effective heartbeats?

2016-03-25 Thread Zaiming Shi
Hi Jason

thanks for the reply!

Forgot to mention that in we tried to test the simplest scenario in which
there was only one member in the group. I think that should rule out group
 rebalancing right?

On Thursday, March 24, 2016, Jason Gustafson  wrote:

> HI Zaiming,
>
> I think the problem is not that commit requests aren't considered as
> effective as heartbeats (they are), but that you can't rejoin the group
> using only commits/heartbeats. Every time the group rebalances, all members
> must rejoin the group by sending a JoinGroup request. Once a rebalance has
> begun (e.g. because a new consumer has been started), then each member must
> send the JoinGroup before expiration of the session timeout. If not, then
> they will be kicked out of the group even if they are still sending
> heartbeats. Does that make sense?
>
> -Jason
>
>
>
> On Wed, Mar 23, 2016 at 10:03 AM, Zaiming Shi  > wrote:
>
> > Hi there!
> >
> > We have noticed that when committing requests are sent intensively, we
> > receive IllegalGenerationId.
> > Here is the settings we had problem with: session-timeout: 30 sec,
> > heartbeat-rate: 3 sec.
> > Problem resolved by increasing the session timeout to 180 sec.
> >
> > So I suppose, due to whatever reason (either the client didn't send
> > heartbeat, or the broker didn't process the heartbeats in time), the
> > session was considered dead in group coordinator.
> >
> > My question is: why commit requests can't be taken as an indicator of
> > member being alive? hence not to kill the session.
> >
> > Regards
> > -Zaiming
> >
>


Broker hangs all of a sudden.

2016-03-25 Thread Jordan Sheinfeld
Hi All,

This really looks like a bug to me.
I'm running Kafka 2.11-0.9.0.1.
Running 2 brokers, 3 consumers.
On a day to day basis, everything is working smoothly.
All of a sudden, out of nowhere one of the brokers hangs and the producers
lack to send messages to it, since it is the leader of some partitions
without replication to other brokers (replication-factor = 1) , the strange
thing is that it may happen not under heavy load.
In ZooKeeper I see only the working node registered under /broker/ids, the
hanged one is not there.
Restart of the node fixes everything.

Logs of the stuck node can be found at:
https://dl.dropboxusercontent.com/u/25559686/broker.rar

The time it was stuck is from around 08:41 , until the broker was restarted
at 09:12.

Any help is highly appreciated.

Best Regards,
 Jordan.


Supervisord for Kafka 0.8.1

2016-03-25 Thread Kashyap Mhaisekar
Hi,
Am having trouble configuring Kafka server starts with supervisord.

Has anyone from this group succeeded in integrating Kafka server start and
stop via supervisord? Can you please share the snippet of his out of
configured?

Thanks
Kashyap


Re: Supervisord for Kafka 0.8.1

2016-03-25 Thread Achanta Vamsi Subhash
We use daemontools and this is our run file:

#!/bin/bash

PAC=kafka-0.8.2.x
APP_HOME=/usr/share/$PAC
# app options
APP_CONFIG_HOME=${APP_HOME}/config
APP_OPTS="${APP_CONFIG_HOME}/server.properties"
JVM_OPTS=""

# jvm user options
if [ "abc$KAFKA_HEAP_OPTS" == "abc" ]; then
export KAFKA_HEAP_OPTS="-Xms8g -Xmx8g"
fi

if [ "abc$KAFKA_JVM_PERFORMANCE_OPTS" == "abc" ]; then
export KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC
-XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
-Djava.awt.headless=true"
fi

exec setuidgid $APP_USER $APP_HOME/bin/kafka-server-start.sh $JVM_OPTS
$APP_OPTS 2>&1


On Fri, Mar 25, 2016 at 8:13 PM, Kashyap Mhaisekar 
wrote:

> Hi,
> Am having trouble configuring Kafka server starts with supervisord.
>
> Has anyone from this group succeeded in integrating Kafka server start and
> stop via supervisord? Can you please share the snippet of his out of
> configured?
>
> Thanks
> Kashyap
>



-- 
Regards
Vamsi Subhash


Re: consumer group, why commit requests are not considered as effective heartbeats?

2016-03-25 Thread Jason Gustafson
Hi Zaiming,

It rules out the most likely cause of rebalance, but not the only one.
Rebalances can also be caused by a topic metadata change or a coordinator
change. Can you post some logs from the consumer around the time that the
unexpected rebalance occurred?

-Jason

On Fri, Mar 25, 2016 at 12:09 AM, Zaiming Shi  wrote:

> Hi Jason
>
> thanks for the reply!
>
> Forgot to mention that in we tried to test the simplest scenario in which
> there was only one member in the group. I think that should rule out group
>  rebalancing right?
>
> On Thursday, March 24, 2016, Jason Gustafson  wrote:
>
> > HI Zaiming,
> >
> > I think the problem is not that commit requests aren't considered as
> > effective as heartbeats (they are), but that you can't rejoin the group
> > using only commits/heartbeats. Every time the group rebalances, all
> members
> > must rejoin the group by sending a JoinGroup request. Once a rebalance
> has
> > begun (e.g. because a new consumer has been started), then each member
> must
> > send the JoinGroup before expiration of the session timeout. If not, then
> > they will be kicked out of the group even if they are still sending
> > heartbeats. Does that make sense?
> >
> > -Jason
> >
> >
> >
> > On Wed, Mar 23, 2016 at 10:03 AM, Zaiming Shi  > > wrote:
> >
> > > Hi there!
> > >
> > > We have noticed that when committing requests are sent intensively, we
> > > receive IllegalGenerationId.
> > > Here is the settings we had problem with: session-timeout: 30 sec,
> > > heartbeat-rate: 3 sec.
> > > Problem resolved by increasing the session timeout to 180 sec.
> > >
> > > So I suppose, due to whatever reason (either the client didn't send
> > > heartbeat, or the broker didn't process the heartbeats in time), the
> > > session was considered dead in group coordinator.
> > >
> > > My question is: why commit requests can't be taken as an indicator of
> > > member being alive? hence not to kill the session.
> > >
> > > Regards
> > > -Zaiming
> > >
> >
>


Re: New-Consumer group not showing up

2016-03-25 Thread Jason Gustafson
Hey Ryan,

Sounds like you might be using the so-called "simple consumer" mode. If you
use assign() to give your consumer a specific partition, you're not
actually using a consumer group, so there won't be any coordination with
other consumers. If you use subscribe() on the other hand, then you should
see the group listed in the output of consumer-groups.sh.

-Jason

On Thu, Mar 24, 2016 at 7:37 PM, craig w  wrote:

> Did you subscribe and poll? I believe your consumer group won't show up
> until it has been assigned one or more partitions.
> On Mar 24, 2016 9:48 PM, "Ryan Phillips"  wrote:
>
> > I am only assigning this consumer one partition to listen to. Perhaps
> > that is where the issue lies, because the kafka-console-consumer's
> > group id (listening on all partitions) shows up correctly within the
> > kafka-consumer-groups command.
> >
> > On Thu, Mar 24, 2016 at 6:59 PM, Ryan Phillips 
> wrote:
> > > Greetings,
> > >
> > > I’m attempting to use the New Consumer in my project (0.9.0.1), and it
> > > appears to be working nicely. The issue that I am seeing is that the
> > > group.id is not showing up within:
> > >
> > > kafka-consumer-groups.sh —bootstrap-server
> > > kafka0.local:9092,kafka1.local:9092 —new-consumer —list
> > >
> > > The command loads up, but the consumer that is running is not
> > > displayed. The consumer is clearly processing a massive amount of
> > > messages. I have included the properties used at the end of my
> > > message.
> > >
> > > Thanks,
> > > Ryan
> > >
> > > Properties props = new Properties();
> > > props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
> > > props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBrokerSeeds);
> > > props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
> > > props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
> > > props.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, 6 * 1024 *
> > 1024);
> > > props.put(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG, 30 * 1000);
> > > props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 15 * 1000);
> > > props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
> > > "org.apache.kafka.common.serialization.StringDeserializer");
> > > props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
> > > "org.apache.kafka.common.serialization.ByteArrayDeserializer");
> > > consumer = new KafkaConsumer(props);
> >
>


Re: New-Consumer group not showing up

2016-03-25 Thread Ryan Phillips
Thanks Jason and Craig.

Jason, that is exactly correct. I am using the simple consumer mode.
Is there an easy way to find out the lag for consumers using the
Simple Consumer mode?

Regards,
Ryan

On Fri, Mar 25, 2016 at 11:47 AM, Jason Gustafson  wrote:
> Hey Ryan,
>
> Sounds like you might be using the so-called "simple consumer" mode. If you
> use assign() to give your consumer a specific partition, you're not
> actually using a consumer group, so there won't be any coordination with
> other consumers. If you use subscribe() on the other hand, then you should
> see the group listed in the output of consumer-groups.sh.
>
> -Jason
>
> On Thu, Mar 24, 2016 at 7:37 PM, craig w  wrote:
>
>> Did you subscribe and poll? I believe your consumer group won't show up
>> until it has been assigned one or more partitions.
>> On Mar 24, 2016 9:48 PM, "Ryan Phillips"  wrote:
>>
>> > I am only assigning this consumer one partition to listen to. Perhaps
>> > that is where the issue lies, because the kafka-console-consumer's
>> > group id (listening on all partitions) shows up correctly within the
>> > kafka-consumer-groups command.
>> >
>> > On Thu, Mar 24, 2016 at 6:59 PM, Ryan Phillips 
>> wrote:
>> > > Greetings,
>> > >
>> > > I’m attempting to use the New Consumer in my project (0.9.0.1), and it
>> > > appears to be working nicely. The issue that I am seeing is that the
>> > > group.id is not showing up within:
>> > >
>> > > kafka-consumer-groups.sh —bootstrap-server
>> > > kafka0.local:9092,kafka1.local:9092 —new-consumer —list
>> > >
>> > > The command loads up, but the consumer that is running is not
>> > > displayed. The consumer is clearly processing a massive amount of
>> > > messages. I have included the properties used at the end of my
>> > > message.
>> > >
>> > > Thanks,
>> > > Ryan
>> > >
>> > > Properties props = new Properties();
>> > > props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
>> > > props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, kafkaBrokerSeeds);
>> > > props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
>> > > props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);
>> > > props.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, 6 * 1024 *
>> > 1024);
>> > > props.put(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG, 30 * 1000);
>> > > props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 15 * 1000);
>> > > props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
>> > > "org.apache.kafka.common.serialization.StringDeserializer");
>> > > props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
>> > > "org.apache.kafka.common.serialization.ByteArrayDeserializer");
>> > > consumer = new KafkaConsumer(props);
>> >
>>


Re: Supervisord for Kafka 0.8.1

2016-03-25 Thread Daniel Tamai
Hi Kashyap,

Using supervisord 3.2, this is the snippet from my supervisord.conf for
Kafka:

[program:kafka]
command=%(ENV_KAFKA_HOME)s/bin/kafka-server-start.sh
%(ENV_CONFIG_DIR)s/server.properties
stdout_logfile=%(ENV_LOG_DIR)s/kafka_out.log
stderr_logfile=%(ENV_LOG_DIR)s/kafka_err.log
umask=022
autostart=false
autorestart=true
startsecs=10
startretries=3

I don't think this will work with supervisord 3.1.

Em sex, 25 de mar de 2016 às 13:08, Achanta Vamsi Subhash <
achanta.va...@flipkart.com> escreveu:

> We use daemontools and this is our run file:
>
> #!/bin/bash
>
> PAC=kafka-0.8.2.x
> APP_HOME=/usr/share/$PAC
> # app options
> APP_CONFIG_HOME=${APP_HOME}/config
> APP_OPTS="${APP_CONFIG_HOME}/server.properties"
> JVM_OPTS=""
>
> # jvm user options
> if [ "abc$KAFKA_HEAP_OPTS" == "abc" ]; then
> export KAFKA_HEAP_OPTS="-Xms8g -Xmx8g"
> fi
>
> if [ "abc$KAFKA_JVM_PERFORMANCE_OPTS" == "abc" ]; then
> export KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC
> -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
> -Djava.awt.headless=true"
> fi
>
> exec setuidgid $APP_USER $APP_HOME/bin/kafka-server-start.sh $JVM_OPTS
> $APP_OPTS 2>&1
>
>
> On Fri, Mar 25, 2016 at 8:13 PM, Kashyap Mhaisekar 
> wrote:
>
> > Hi,
> > Am having trouble configuring Kafka server starts with supervisord.
> >
> > Has anyone from this group succeeded in integrating Kafka server start
> and
> > stop via supervisord? Can you please share the snippet of his out of
> > configured?
> >
> > Thanks
> > Kashyap
> >
>
>
>
> --
> Regards
> Vamsi Subhash
>


Re: Sessionizing inputs with Kafka Streams

2016-03-25 Thread Guozhang Wang
Hello Josh,

We are aware that the Transformer / Processor can be improved, for example
the punctuate() function should be able to return the same typed value R
for Transformer.

As for now, in your case you can return a sentinel from transform, and add
a "filter" right after it removing sentinel values.

Guozhang


On Wed, Mar 23, 2016 at 7:02 PM, josh gruenberg  wrote:

> Thank you, Guozhang! In my exploration, I did overlook the "transform"
> method; this looks promising.
>
> I could still use a little more help: I'm confused because for this
> sessionization use-case, an invocation of the 'transform' method usually
> suggests that a session is still active, so I'll have nothing to emit from
> 'transform'. Instead, I'm guessing I'll need to produce my results from the
> 'punctuate' callback. So my questions are:
>
> 1. what should I return from 'transform' to indicate that I have no output
> at this time? From my reading of 'KStreamTransformProcessor.process', it
> appears that "null" won't fly. Should I return a dummy KeyValue, and then
> filter that out downstream? Seems a little cumbersome, but perhaps not
> terrible as an interim solution... Is there a better way?
> 2. To emit completed aggregations in response to 'punctuate', can I just
> send them via 'context.forward'? (I'll note that this doesn't appear to
> enforce any type-safety, which could lead to maintainability issues.)
>
> Finally, I'll add that this pattern feels like it's abusing the Transformer
> SPI. The interface assumes that transformation is always 1:1, which is
> artificially limiting. I imagine some sort of generalization of this part
> of the system could improve usability. For example, both 'transform' and
> 'punctuate' might be reframed as void methods that receive a type-safe
> interface for 'context.forward'. (I have this small change drafted up
> within the kafka trunk sources, and could submit a PR if the maintainers
> are interested?)
>
> Thanks,
> -josh
>
> On Wed, Mar 23, 2016 at 11:02 AM Guozhang Wang  wrote:
>
> > Hello Josh,
> >
> > As of now Kafka Streams does not yet support session windows as in the
> > Dataflow model, though we do have near term plans to support it.
> >
> > As for now you can still work around it with the Processor, by calling
> > "KStream.transform()" function, which can still return you a stream
> object.
> > In your customized "Transofmer" implementation, you can attach a state
> > store of your own and access it in the "transform" function, and only
> > return the results, for example, when one session has ended.
> >
> > As a concrete example, Confluent has some internal tools that uses Kafka
> > Streams already for some online operations, where a sessioned window
> > processor are needed as well. We use the "transform" function in the
> > Streams DSL (i.e. "KStreamBuilder") in the following sketch:
> >
> > --
> >
> > builder.addStateStore(/* new RocksDBKeyValueStoreSupplier(..)*/,
> > "store-name");
> >
> > stream1 = builder.stream("source-topic");
> >
> > stream2.transform(MyTransformerFunc, "store-name");
> >
> > --
> >
> > then in MyTransformerFunc:
> >
> > public void init(ProcessorContext context) {
> >   this.kvStore = context.getStateStore("store-name");
> >
> >
> >// now you can access this store in the transform function.
> > }
> >
> > --
> >
> >
> > Hope this helps.
> >
> > Guozhang
> >
> > On Tue, Mar 22, 2016 at 11:51 AM, josh gruenberg 
> wrote:
> >
> > > Hello there,
> > >
> > > I've been experimenting with the Kafka Streams preview, and I'm excited
> > > about its features and capabilities! My team is enthusiastic about the
> > > lightweight operational profile, and the support for local state is
> very
> > > compelling.
> > >
> > > However, I'm having trouble designing a solution with KStreams to
> > satisfy a
> > > particular use-case: we want to "Sessionize" a stream of events, by
> > > gathering together inputs that share a common identifier and occur
> > without
> > > a configurable interruption (gap) in event-time.
> > >
> > > This is achievable with other streaming frameworks (eg, using
> > > Beam/Dataflow's "Session" windows, or SparkStreaming's mapWithState
> with
> > > its "timeout" capability), but I don't see how to approach it with the
> > > current Kafka Streams API.
> > >
> > > I've investigated using the aggregateWithKey function, but this doesn't
> > > appear to support data-driven windowing. I've also considered using a
> > > custom Processor to perform the aggregation, but don't see how to take
> an
> > > output-stream from a Processor and continue to work with it. This area
> of
> > > the system is undocumented, so I'm not sure how to proceed.
> > >
> > > Am I missing something? Do you have any suggestions?
> > >
> > > -josh
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


Re: Sessionizing inputs with Kafka Streams

2016-03-25 Thread Guozhang Wang
Josh,

If you have some ideas about improving the Transformer / Processor APIs as
well as supporting sessioned windows. Please do feel free to create a JIRA
and start discussions there. Also, PRs are more than welcomed :)

Guozhang

On Fri, Mar 25, 2016 at 10:50 AM, Guozhang Wang  wrote:

> Hello Josh,
>
> We are aware that the Transformer / Processor can be improved, for example
> the punctuate() function should be able to return the same typed value R
> for Transformer.
>
> As for now, in your case you can return a sentinel from transform, and add
> a "filter" right after it removing sentinel values.
>
> Guozhang
>
>
> On Wed, Mar 23, 2016 at 7:02 PM, josh gruenberg  wrote:
>
>> Thank you, Guozhang! In my exploration, I did overlook the "transform"
>> method; this looks promising.
>>
>> I could still use a little more help: I'm confused because for this
>> sessionization use-case, an invocation of the 'transform' method usually
>> suggests that a session is still active, so I'll have nothing to emit from
>> 'transform'. Instead, I'm guessing I'll need to produce my results from
>> the
>> 'punctuate' callback. So my questions are:
>>
>> 1. what should I return from 'transform' to indicate that I have no output
>> at this time? From my reading of 'KStreamTransformProcessor.process', it
>> appears that "null" won't fly. Should I return a dummy KeyValue, and then
>> filter that out downstream? Seems a little cumbersome, but perhaps not
>> terrible as an interim solution... Is there a better way?
>> 2. To emit completed aggregations in response to 'punctuate', can I just
>> send them via 'context.forward'? (I'll note that this doesn't appear to
>> enforce any type-safety, which could lead to maintainability issues.)
>>
>> Finally, I'll add that this pattern feels like it's abusing the
>> Transformer
>> SPI. The interface assumes that transformation is always 1:1, which is
>> artificially limiting. I imagine some sort of generalization of this part
>> of the system could improve usability. For example, both 'transform' and
>> 'punctuate' might be reframed as void methods that receive a type-safe
>> interface for 'context.forward'. (I have this small change drafted up
>> within the kafka trunk sources, and could submit a PR if the maintainers
>> are interested?)
>>
>> Thanks,
>> -josh
>>
>> On Wed, Mar 23, 2016 at 11:02 AM Guozhang Wang 
>> wrote:
>>
>> > Hello Josh,
>> >
>> > As of now Kafka Streams does not yet support session windows as in the
>> > Dataflow model, though we do have near term plans to support it.
>> >
>> > As for now you can still work around it with the Processor, by calling
>> > "KStream.transform()" function, which can still return you a stream
>> object.
>> > In your customized "Transofmer" implementation, you can attach a state
>> > store of your own and access it in the "transform" function, and only
>> > return the results, for example, when one session has ended.
>> >
>> > As a concrete example, Confluent has some internal tools that uses Kafka
>> > Streams already for some online operations, where a sessioned window
>> > processor are needed as well. We use the "transform" function in the
>> > Streams DSL (i.e. "KStreamBuilder") in the following sketch:
>> >
>> > --
>> >
>> > builder.addStateStore(/* new RocksDBKeyValueStoreSupplier(..)*/,
>> > "store-name");
>> >
>> > stream1 = builder.stream("source-topic");
>> >
>> > stream2.transform(MyTransformerFunc, "store-name");
>> >
>> > --
>> >
>> > then in MyTransformerFunc:
>> >
>> > public void init(ProcessorContext context) {
>> >   this.kvStore = context.getStateStore("store-name");
>> >
>> >
>> >// now you can access this store in the transform function.
>> > }
>> >
>> > --
>> >
>> >
>> > Hope this helps.
>> >
>> > Guozhang
>> >
>> > On Tue, Mar 22, 2016 at 11:51 AM, josh gruenberg 
>> wrote:
>> >
>> > > Hello there,
>> > >
>> > > I've been experimenting with the Kafka Streams preview, and I'm
>> excited
>> > > about its features and capabilities! My team is enthusiastic about the
>> > > lightweight operational profile, and the support for local state is
>> very
>> > > compelling.
>> > >
>> > > However, I'm having trouble designing a solution with KStreams to
>> > satisfy a
>> > > particular use-case: we want to "Sessionize" a stream of events, by
>> > > gathering together inputs that share a common identifier and occur
>> > without
>> > > a configurable interruption (gap) in event-time.
>> > >
>> > > This is achievable with other streaming frameworks (eg, using
>> > > Beam/Dataflow's "Session" windows, or SparkStreaming's mapWithState
>> with
>> > > its "timeout" capability), but I don't see how to approach it with the
>> > > current Kafka Streams API.
>> > >
>> > > I've investigated using the aggregateWithKey function, but this
>> doesn't
>> > > appear to support data-driven windowing. I've also considered using a
>> > > custom Processor to perform the aggregation, but don't s

How does Cloudera manager Collects Kafka Metrics

2016-03-25 Thread yeshwanth kumar
can someone explain, how Cloudera manager Collects Kafka Metrics, such as
TotalMessages in a Topic, Total Bytes read and written from and into Kafka.


please let me know

Thanks,
-Yeshwanth
Can you Imagine what I would do if I could do all I can - Art of War


Re: How does Cloudera manager Collects Kafka Metrics

2016-03-25 Thread Timothy Chen
That's all information available from the jmx endpoints in Kafka.

Tim

On Fri, Mar 25, 2016 at 1:21 PM, yeshwanth kumar  wrote:
> can someone explain, how Cloudera manager Collects Kafka Metrics, such as
> TotalMessages in a Topic, Total Bytes read and written from and into Kafka.
>
>
> please let me know
>
> Thanks,
> -Yeshwanth
> Can you Imagine what I would do if I could do all I can - Art of War


Re: consumer group, why commit requests are not considered as effective heartbeats?

2016-03-25 Thread Zaiming Shi
Hi Jason

If I understand correctly, when coordinator is changed the consumer
should get 'NotCoordinatorForGroup' exception not 'IllegalGenerationId'.
Topic metadata change? like number of partitions changed ?
I was testing it in a pretty stable cluster, and it was reproduced several
times,
I had no such issue if we change session timeout to 3 minutes.
--- does this rule out the topic metadata change?

The logs are lost because I was running debug mode in our Erlang client to
help debugging this issue for my colleague who's using the new Java client.
My colleague has observed very likely the same pattern as I described above.
He is trying to get on hold a minimal setup for a reliable reproduction.

I will also try to reproduce it in Erlang, and post here a (hopefully
sensible)
sequence of timestamped heartbeat and commit requests and responses.

Will ask more questions if we have new findings.

Regards
-Zaiming



On Fri, Mar 25, 2016 at 5:43 PM, Jason Gustafson  wrote:

> Hi Zaiming,
>
> It rules out the most likely cause of rebalance, but not the only one.
> Rebalances can also be caused by a topic metadata change or a coordinator
> change. Can you post some logs from the consumer around the time that the
> unexpected rebalance occurred?
>
> -Jason
>
> On Fri, Mar 25, 2016 at 12:09 AM, Zaiming Shi  wrote:
>
> > Hi Jason
> >
> > thanks for the reply!
> >
> > Forgot to mention that in we tried to test the simplest scenario in which
> > there was only one member in the group. I think that should rule out
> group
> >  rebalancing right?
> >
> > On Thursday, March 24, 2016, Jason Gustafson  wrote:
> >
> > > HI Zaiming,
> > >
> > > I think the problem is not that commit requests aren't considered as
> > > effective as heartbeats (they are), but that you can't rejoin the group
> > > using only commits/heartbeats. Every time the group rebalances, all
> > members
> > > must rejoin the group by sending a JoinGroup request. Once a rebalance
> > has
> > > begun (e.g. because a new consumer has been started), then each member
> > must
> > > send the JoinGroup before expiration of the session timeout. If not,
> then
> > > they will be kicked out of the group even if they are still sending
> > > heartbeats. Does that make sense?
> > >
> > > -Jason
> > >
> > >
> > >
> > > On Wed, Mar 23, 2016 at 10:03 AM, Zaiming Shi  > > > wrote:
> > >
> > > > Hi there!
> > > >
> > > > We have noticed that when committing requests are sent intensively,
> we
> > > > receive IllegalGenerationId.
> > > > Here is the settings we had problem with: session-timeout: 30 sec,
> > > > heartbeat-rate: 3 sec.
> > > > Problem resolved by increasing the session timeout to 180 sec.
> > > >
> > > > So I suppose, due to whatever reason (either the client didn't send
> > > > heartbeat, or the broker didn't process the heartbeats in time), the
> > > > session was considered dead in group coordinator.
> > > >
> > > > My question is: why commit requests can't be taken as an indicator of
> > > > member being alive? hence not to kill the session.
> > > >
> > > > Regards
> > > > -Zaiming
> > > >
> > >
> >
>


Bucket records based on time(kafka-hdfs-connector)

2016-03-25 Thread Mohammad Tariq
Hi kafka gurus,

This might sound a little off the track, but I don't know where else to go.
I tried the Confluent google group but it seems to be quite
unresponsive/inactive. Please bear with me. Many thanks in advance!

I am trying to copy data from Kafka into Hive tables using
kafka-hdfs-connector provided by Confluent. While I am able to do it
successfully I was wondering how to bucket the incoming data based on time
interval. For example, I would like to have a new partition created every 5
minutes.

I tried io.confluent.connect.hdfs.partitioner.TimeBasedPartitioner with
partition.duration.ms but I think I am doing it the wrong way. I see only
one partition in the Hive table with all the data going into that
particular partition. Something like this :

hive> show partitions test;
OK
partition
year=2016/month=03/day=15/hour=19/minute=03

And all the avro objects are getting copied into this partition.

Instead, I would like to have something like this :

hive> show partitions test;
OK
partition
year=2016/month=03/day=15/hour=19/minute=03
year=2016/month=03/day=15/hour=19/minute=08
year=2016/month=03/day=15/hour=19/minute=13

Initially connector will create the path
year=2016/month=03/day=15/hour=19/minute=03 and will continue to copy all
the incoming data into this directory for next 5 minutes, and at the start
of 6th minute it should create a new path, i.e
year=2016/month=03/day=15/hour=19/minute=08 and copy the data for next 5
minutes into this directory, and so on.

This is how my config file looks like :

name=hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=test
hdfs.url=hdfs://localhost:9000
flush.size=3
partitioner.class=io.confluent.connect.hdfs.partitioner.TimeBasedPartitioner
partition.duration.ms=30
path.format='year'=/'month'=MM/'day'=dd/'hour'=HH/'minute'=MM/
locale=en
timezone=GMT
logs.dir=/kafka-connect/logs
topics.dir=/kafka-connect/topics
hive.integration=true
hive.metastore.uris=thrift://localhost:9083
schema.compatibility=BACKWARD

It would be really helpful if someone could point me in the right
direction. I would be glad to share more details in case it's required.
Don't want to make this email look like one that never ends.

Thank you so much for your valuable time!


[image: http://]

Tariq, Mohammad
about.me/mti
[image: http://]



Re: Supervisord for Kafka 0.8.1

2016-03-25 Thread Kashyap Mhaisekar
Thanks guys. Will try these two.
On Mar 25, 2016 12:36, "Daniel Tamai"  wrote:

> Hi Kashyap,
>
> Using supervisord 3.2, this is the snippet from my supervisord.conf for
> Kafka:
>
> [program:kafka]
> command=%(ENV_KAFKA_HOME)s/bin/kafka-server-start.sh
> %(ENV_CONFIG_DIR)s/server.properties
> stdout_logfile=%(ENV_LOG_DIR)s/kafka_out.log
> stderr_logfile=%(ENV_LOG_DIR)s/kafka_err.log
> umask=022
> autostart=false
> autorestart=true
> startsecs=10
> startretries=3
>
> I don't think this will work with supervisord 3.1.
>
> Em sex, 25 de mar de 2016 às 13:08, Achanta Vamsi Subhash <
> achanta.va...@flipkart.com> escreveu:
>
> > We use daemontools and this is our run file:
> >
> > #!/bin/bash
> >
> > PAC=kafka-0.8.2.x
> > APP_HOME=/usr/share/$PAC
> > # app options
> > APP_CONFIG_HOME=${APP_HOME}/config
> > APP_OPTS="${APP_CONFIG_HOME}/server.properties"
> > JVM_OPTS=""
> >
> > # jvm user options
> > if [ "abc$KAFKA_HEAP_OPTS" == "abc" ]; then
> > export KAFKA_HEAP_OPTS="-Xms8g -Xmx8g"
> > fi
> >
> > if [ "abc$KAFKA_JVM_PERFORMANCE_OPTS" == "abc" ]; then
> > export KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseG1GC
> > -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
> > -Djava.awt.headless=true"
> > fi
> >
> > exec setuidgid $APP_USER $APP_HOME/bin/kafka-server-start.sh $JVM_OPTS
> > $APP_OPTS 2>&1
> >
> >
> > On Fri, Mar 25, 2016 at 8:13 PM, Kashyap Mhaisekar 
> > wrote:
> >
> > > Hi,
> > > Am having trouble configuring Kafka server starts with supervisord.
> > >
> > > Has anyone from this group succeeded in integrating Kafka server start
> > and
> > > stop via supervisord? Can you please share the snippet of his out of
> > > configured?
> > >
> > > Thanks
> > > Kashyap
> > >
> >
> >
> >
> > --
> > Regards
> > Vamsi Subhash
> >
>


Re: consumer group, why commit requests are not considered as effective heartbeats?

2016-03-25 Thread Jason Gustafson
Hi Zaiming,

Yeah, you're right. Changing coordinator won't cause a rebalance (it hasn't
been that way since we added group metadata persistence). I went back and
checked the code and we actually do not reset the heartbeat timer when a
commit is received. I'm not sure whether there's a good reason for that,
but nothing is coming to mind. At least when the group is stable, the
commit could be treated as an implicit heartbeat. Feel free to create a
JIRA and we can see what others think. Out of curiosity, is this a
significant problem for the Erlang client you're writing?

-Jason

On Fri, Mar 25, 2016 at 1:38 PM, Zaiming Shi  wrote:

> Hi Jason
>
> If I understand correctly, when coordinator is changed the consumer
> should get 'NotCoordinatorForGroup' exception not 'IllegalGenerationId'.
> Topic metadata change? like number of partitions changed ?
> I was testing it in a pretty stable cluster, and it was reproduced several
> times,
> I had no such issue if we change session timeout to 3 minutes.
> --- does this rule out the topic metadata change?
>
> The logs are lost because I was running debug mode in our Erlang client to
> help debugging this issue for my colleague who's using the new Java client.
> My colleague has observed very likely the same pattern as I described
> above.
> He is trying to get on hold a minimal setup for a reliable reproduction.
>
> I will also try to reproduce it in Erlang, and post here a (hopefully
> sensible)
> sequence of timestamped heartbeat and commit requests and responses.
>
> Will ask more questions if we have new findings.
>
> Regards
> -Zaiming
>
>
>
> On Fri, Mar 25, 2016 at 5:43 PM, Jason Gustafson 
> wrote:
>
> > Hi Zaiming,
> >
> > It rules out the most likely cause of rebalance, but not the only one.
> > Rebalances can also be caused by a topic metadata change or a coordinator
> > change. Can you post some logs from the consumer around the time that the
> > unexpected rebalance occurred?
> >
> > -Jason
> >
> > On Fri, Mar 25, 2016 at 12:09 AM, Zaiming Shi  wrote:
> >
> > > Hi Jason
> > >
> > > thanks for the reply!
> > >
> > > Forgot to mention that in we tried to test the simplest scenario in
> which
> > > there was only one member in the group. I think that should rule out
> > group
> > >  rebalancing right?
> > >
> > > On Thursday, March 24, 2016, Jason Gustafson 
> wrote:
> > >
> > > > HI Zaiming,
> > > >
> > > > I think the problem is not that commit requests aren't considered as
> > > > effective as heartbeats (they are), but that you can't rejoin the
> group
> > > > using only commits/heartbeats. Every time the group rebalances, all
> > > members
> > > > must rejoin the group by sending a JoinGroup request. Once a
> rebalance
> > > has
> > > > begun (e.g. because a new consumer has been started), then each
> member
> > > must
> > > > send the JoinGroup before expiration of the session timeout. If not,
> > then
> > > > they will be kicked out of the group even if they are still sending
> > > > heartbeats. Does that make sense?
> > > >
> > > > -Jason
> > > >
> > > >
> > > >
> > > > On Wed, Mar 23, 2016 at 10:03 AM, Zaiming Shi  > > > > wrote:
> > > >
> > > > > Hi there!
> > > > >
> > > > > We have noticed that when committing requests are sent intensively,
> > we
> > > > > receive IllegalGenerationId.
> > > > > Here is the settings we had problem with: session-timeout: 30 sec,
> > > > > heartbeat-rate: 3 sec.
> > > > > Problem resolved by increasing the session timeout to 180 sec.
> > > > >
> > > > > So I suppose, due to whatever reason (either the client didn't send
> > > > > heartbeat, or the broker didn't process the heartbeats in time),
> the
> > > > > session was considered dead in group coordinator.
> > > > >
> > > > > My question is: why commit requests can't be taken as an indicator
> of
> > > > > member being alive? hence not to kill the session.
> > > > >
> > > > > Regards
> > > > > -Zaiming
> > > > >
> > > >
> > >
> >
>