Re: What's the relationship between Kafka and Zookeeper ?

2016-09-10 Thread David Garcia
Data is always provided by the leader of a topic-partition (i.e. a broker). Here is a summary of how zookeeper is used: https://www.quora.com/What-is-the-actual-role-of-ZooKeeper-in-Kafka -David On 9/10/16, 3:47 PM, "Eric Ho" wrote: I notice that some Spark programs would contact someth

Re: what's the relationship between Zookeeper and Kafka ?

2016-09-10 Thread Valerio Bruno
AFAIK Kafka uses Zookeeper to coordinate the Kafka clusters ( set of brokers ). Consumers usually connect Zookeeper to retrieve the list of brokers. Then connect the broker. *Valerio* On 10 September 2016 at 22:11, Eric Ho wrote: > I notice that some Spark programs would contact something lik

what's the relationship between Zookeeper and Kafka ?

2016-09-10 Thread Eric Ho
I notice that some Spark programs would contact something like 'zoo1:2181' when trying to suck data out of Kafka. Does the kafka data actually get routed out of zookeeper before delivering the payload onto Spark ? -- -eric ho

Re: Performance issue with KafkaStreams

2016-09-10 Thread Ara Ebrahimi
Hi Eno, Could you elaborate more on tuning Kafka Streaming applications? What are the relationships between partitions and num.stream.threads num.consumer.fetchers and other such parameters? On a single node setup with x partitions, what’s the best way to make sure these partitions are consumed

MockClientSupplier

2016-09-10 Thread Andy Chambers
Hi, The MockClientSupplier looks like it would be useful for developers wishing to write unit tests for kafka streams apps. Is it public? If so, can someone help me out with the maven coordinates. Currently depending on these maven coordinates [org.apache.kafka/kafka-streams "0.10.0.1"] [org.

What's the relationship between Kafka and Zookeeper ?

2016-09-10 Thread Eric Ho
I notice that some Spark programs would contact something like 'zoo1:2181' when trying to suck data out of Kafka. Does the kafka data actually get routed out of zookeeper before delivering the payload onto Spark ? -- -eric ho

Re: Time of derived records in Kafka Streams

2016-09-10 Thread Eno Thereska
Hi Elias, Good question. The general answer is that each time a record is output, the timestamp is that of the current Kafka Streams task that processes it, so it's the internal Kafka Streams time. If the Kafka Streams task is processing records with event time, the timestamp at any point is th

Re: Performance issue with KafkaStreams

2016-09-10 Thread Eno Thereska
Hi Caleb, We have a benchmark that we run nightly to keep track of performance. The numbers we have do indicate that consuming through streams is indeed slower than just a pure consumer, however the performance difference is not as large as you are observing. Would it be possible for you to run