Re: kafka-connect-hdfs failure due to corrupt WAL file

2016-08-02 Thread Ewen Cheslack-Postava
If you're not worried about duplicates, then yes, you can delete the WAL to recover. By the way, we're aware of some issues in that code that we're addressing here: https://github.com/confluentinc/kafka-connect-hdfs/pull/96 -Ewen On Tue, Aug 2, 2016 at 3:12 PM, Prabhu V wrote: > Hi, > > I am u

Re: Consumer poll in 0.9.0.1 hanging

2016-08-02 Thread Ewen Cheslack-Postava
It seems like we definitely shouldn't block indefinitely, but what is probably happening is that the consumer is fetching metadata, not finding the topic, then getting hung up somewhere. It probably won't hang indefinitely -- there is a periodic refresh of metadata, defaulting to every 5 minutes,

Re: Same partition number of different Kafka topcs

2016-08-02 Thread Ewen Cheslack-Postava
Jack, The partition is always selected by the client -- if it weren't the brokers would need to forward requests since different partitions are handled by different brokers. The only "default Kafka partitioner" is the one that you could consider "standardized" by the Java client implementation. So

Re: KafkaConsumer blocks indefinitely when server settings are wrong

2016-08-02 Thread Ewen Cheslack-Postava
This is unfortunate, but a known issue. See https://issues.apache.org/jira/browse/KAFKA-1894 The producer suffers from a similar issue with its initial metadata fetch on the first send(). -Ewen On Thu, Jul 28, 2016 at 12:46 PM, Oleg Zhurakousky < ozhurakou...@hortonworks.com> wrote: > Also, read

Re: Jars in Kafka 0.10

2016-08-02 Thread Ewen Cheslack-Postava
This is a combination of Connect and Streams, and actually probably more related to Connect because it pulls in a bunch of jars to implement its REST API. You don't need all of them to run any of the individual components, but they are currently bundled altogether in a way that is, unfortunately, n

Re: How many TCP connections Java producers opens to feed data to broker?

2016-08-02 Thread Ewen Cheslack-Postava
One connection to each broker. You can get very high throughput even with a single TCP connection. The producer handles batching internally so you don't send small requests if you have high data rates. -Ewen On Wed, Jul 27, 2016 at 5:41 AM, Vladimir Picka wrote: > Hello, > > does it use just on

Understand producer metrics

2016-08-02 Thread David Yu
I'm having a hard time finding documentation explaining the set of producer metrics exposed by Kafka. Can anyone explain the following? - batch-size-avg - Is this the number of msgs or number of bytes? Does this only make sense for async producers? - incoming-byte-rate/outgoing-byte-rate

Using automatic brokerId generation

2016-08-02 Thread Digumarthi, Prabhakar Venkata Surya
Hi , I am right now using kafka version 0.9.1.0 If I choose to enable automatic brokerId generation, and let’s say if one of my broker dies and a new broker gets started with a different brokerId. Is there a way I can get the new broker Id part of the replica set of a partition automatically?

Kafka In cloud environment

2016-08-02 Thread Digumarthi, Prabhakar Venkata Surya
In case I use automatic brokerId generation and if a broker dies, and a new broker is added with a different broker Id . Will the replica set gets updated automatically ? The information contained in this e-mail is confidential and/or pro

Issue with KTable State Store

2016-08-02 Thread Srinidhi Muppalla
Hey All, We are having issues successfully storing and accessing a Ktable on our cluster which happens to be on AWS. We are trying to store a Ktable of counts of ’success' and ‘failure’ strings, similar to the WordCountDemo in the documentation. The Kafka Streams application that creates the KT

Re: Kafka Consumer poll

2016-08-02 Thread Oleg Zhurakousky
Also keep in mind that unfortunately KafkaConsumer.poll(..) will deadlock regardless of the timeout if connection to the broker can not be established and won't react to thread interrupts. This essentially means that the only way to exit is to kill jvm. This is all because Kafka fetches topic me

Using KafkaConnect or KafkaStreams?Join the "powered by" list :)

2016-08-02 Thread Gwen Shapira
Hi, You know the famous "Powered by Kafka" page? https://cwiki.apache.org/confluence/display/KAFKA/Powered+By Where the cool companies are showing off their use of Kafka? We want to do the same for KafkaConnect and KafkaStreams - showcase the early adopters of the technology. If you are using e

Re: Kafka ETL for Parquet

2016-08-02 Thread Shikhar Bhushan
Hi Kidong, What specific issues did you run into when trying this out? I think the basic idea would be to depend on the avro-serializer package and proceed with implementing your custom Converter similarly to AvroConverter interface. You only need the deserialization bits (`toConnectData`), and c

kafka-connect-hdfs failure due to corrupt WAL file

2016-08-02 Thread Prabhu V
Hi, I am using kafka-connect-hdfs in a 2 nodes and one of the nodes had to be rebooted when the process was running. Upon restart the process fails with 16/08/02 21:43:30 ERROR hdfs.TopicPartitionWriter: Recovery failed at state RECOVERY_PARTITION_PAUSED org.apache.kafka.connect.errors.ConnectEx

Re: Opening up Kafka JMX port for Kafka Consumer in Kafka Streams app

2016-08-02 Thread David Garcia
Have you looked at kafka manager: https://github.com/yahoo/kafka-manager It provides consumer level metrics. -David On 8/2/16, 12:36 PM, "Phillip Mann" wrote: Hello all, This is a bit of a convoluted title but we are trying to set up monitoring on our Kafka Cluster and Kafka Strea

Opening up Kafka JMX port for Kafka Consumer in Kafka Streams app

2016-08-02 Thread Phillip Mann
Hello all, This is a bit of a convoluted title but we are trying to set up monitoring on our Kafka Cluster and Kafka Streams app. I currently have JMX port open on our Kafka cluster across our brokers. I am able to use a Java JMX client to get certain metrics that are helpful to us. However,

Re: Kafka Consumer poll

2016-08-02 Thread Kamal C
See the answers inline. On Tue, Aug 2, 2016 at 12:23 AM, sat wrote: > Hi, > > I am new to Kafka. We are planning to use Kafka messaging for our > application. I was playing with Kafka 0.9.0.1 version and i have following > queries. Sorry for asking basic questions. > > > 1) I have instantiated K

Re: KTable and Rebalance Operations

2016-08-02 Thread Matthias J. Sax
Hi David, on startup of the second application instance, the KTable is effectively partitioned into two distinct partial KTables, each holding the key-valus pairs for their corresponding assigned partitions. Thus, your "lookups" on each instance, can only access the key-value pairs for the set of

KTable and Rebalance Operations

2016-08-02 Thread David Garcia
Hello, I’ve googled around for this, but haven’t had any luck. Based upon this: http://docs.confluent.io/3.0.0/streams/architecture.html#state KTables are local to instances. An instance will process one or more partitions from one or more topics. How does Kstreams/Ktables handle the followi

Re: Kafka streams Issue

2016-08-02 Thread Guozhang Wang
Hi Hamza, We are also working on letting users to have some indirect control over the data volume based on caching: https://cwiki.apache.org/confluence/display/KAFKA/KIP-63%3A+Unify+store+and+downstream+caching+in+streams Guozhang On Fri, Jul 29, 2016 at 8:24 AM, Hamza HACHANI wrote: > Thanks

Re: Cluster config

2016-08-02 Thread Guozhang Wang
Which Kafka version are you using and how many consumers? Guozhang On Thu, Jul 28, 2016 at 1:57 PM, Kessiler Rodrigues wrote: > The replication factor is 4. > > > > > On Jul 28, 2016, at 5:55 PM, David Garcia wrote: > > > > What is your replication for these topics? > > > > On 7/28/16, 3:03 PM

[VOTE] 0.10.0.1 RC1

2016-08-02 Thread Ismael Juma
Hello Kafka users, developers and client-developers, This is the second candidate for the release of Apache Kafka 0.10.0.1. This is a bug fix release and it includes fixes and improvements from 52 JIRAs (including a few critical bugs). See the release notes for more details: http://home.apache.or

Re: Reactive Kafka performance

2016-08-02 Thread Michael Noll
David, you wrote: > Each task would effectively block on DB-io for every history-set retrieval; > obviously we would use a TTL cache (KTable could be useful here, but it > wouldn’t be able to hold “all” of the history for every user) Can you elaborate a bit why you think a KTable wouldn't be abl

Expose Kafka Server Configuration

2016-08-02 Thread Chris Barlock
Does Kafka expose its server configuration in any way that I can get it programmatically? Specifically, I'm interested in knowing the message.max.bytes value. Chris

Re: Kafka java consumer processes duplicate messages

2016-08-02 Thread Amit K
Thanks for your reply. I see the duplicates when I bring down and up a broker when load testing is in progress. If I keep it down for whole test, everything is fine. I will try modes as you mentioned earlier and now and monitor the performance. On Tue, Aug 2, 2016 at 12:57 PM, R Krishna wrote:

Re: Kafka java consumer processes duplicate messages

2016-08-02 Thread R Krishna
Sure, rebalance is a normal cause for duplicates. Sure, "As I lower value of auto.commit.interval.ms, the performance deteriorates drastically" but you should see less duplicates. Did you try commit async or storing offsets somewhere else? On Aug 1, 2016 10:59 PM, "Amit K" wrote: > Thanks for rep