Data is always provided by the leader of a topic-partition (i.e. a broker).
Here is a summary of how zookeeper is used:
https://www.quora.com/What-is-the-actual-role-of-ZooKeeper-in-Kafka
-David
On 9/10/16, 3:47 PM, "Eric Ho" wrote:
I notice that some Spark programs would contact someth
AFAIK Kafka uses Zookeeper to coordinate the Kafka clusters ( set of
brokers ).
Consumers usually connect Zookeeper to retrieve the list of brokers. Then
connect the broker.
*Valerio*
On 10 September 2016 at 22:11, Eric Ho wrote:
> I notice that some Spark programs would contact something lik
I notice that some Spark programs would contact something like 'zoo1:2181'
when trying to suck data out of Kafka.
Does the kafka data actually get routed out of zookeeper before delivering
the payload onto Spark ?
--
-eric ho
Hi Eno,
Could you elaborate more on tuning Kafka Streaming applications? What are the
relationships between partitions and num.stream.threads num.consumer.fetchers
and other such parameters? On a single node setup with x partitions, what’s the
best way to make sure these partitions are consumed
Hi,
The MockClientSupplier looks like it would be useful for developers wishing
to write unit tests for kafka streams apps. Is it public? If so, can
someone help me out with the maven coordinates. Currently depending on
these maven coordinates
[org.apache.kafka/kafka-streams "0.10.0.1"]
[org.
I notice that some Spark programs would contact something like 'zoo1:2181'
when trying to suck data out of Kafka.
Does the kafka data actually get routed out of zookeeper before delivering
the payload onto Spark ?
--
-eric ho
Hi Elias,
Good question. The general answer is that each time a record is output, the
timestamp is that of the current Kafka Streams task that processes it, so it's
the internal Kafka Streams time. If the Kafka Streams task is processing
records with event time, the timestamp at any point is th
Hi Caleb,
We have a benchmark that we run nightly to keep track of performance. The
numbers we have do indicate that consuming through streams is indeed slower
than just a pure consumer, however the performance difference is not as large
as you are observing. Would it be possible for you to run