On 1/31/13 3:30 PM, Marc Labbe wrote:
Hi,
I am fairly new to Kafka and Scala, I am trying to see through the consumer
re-design changes, proposed and implemented for 0.8 and after, which will
affect other languages implementations. There are documentation pages on
the wiki, JIRA issues but I still can't figure out what's already there for
0.8, what will be there in the future and how it affects the consumers
written in other languages (Python in my case).
For instance, I am looking at
https://cwiki.apache.org/KAFKA/consumer-client-re-design.html and the very
well documented
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Detailed+Consumer+Coordinator+Design
and
I am not sure what part is in the works, done and still a proposal. I feel
there are changes there already in 0.8 but not completely, referring
especially to KAFKA-364 and KAFKA-264.
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design
and
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Detailed+Consumer+Coordinator+Design
are the current design docs (as far as I know).
Is this all accurate and up to date? There are talks of a coordinator as
well but from what I see, this hasn't been implemented so far.
From my understanding, the client redesign has not been finalized and
it still in-progress/todo.
After all, maybe my question is: other than the wire protocol changes, what
changes should I expect to do to SimpleConsumer client written in Python
for v0.8? What should I do next to implement a high level consumer
(ZookeeperConsumerConnector?) which fits with the design proposal?
With 0.8, you will not need to connect to ZooKeeper from the clients.
With KAFKA-657, offsets are centrally managed by the broker. Any broker
can handle these requests.
Has anyone started making changes to their implementation yet (thinking
Brod or Samsa)? I'll post that question on github too.
I am working updating my Python client:
https://github.com/mumrah/kafka-python, still a ways to go yet. The
biggest change (besides centralized offset management) is that each
topic+partition is owned by a specific broker (the leader). When
producing messages, you must send them to the correct leader. This
requires that clients maintain some state of what belongs where which is
a pain, but such is the cost of replication.
Thanks and cheers!
marc
-David