[ 
https://issues.apache.org/jira/browse/KAFKA-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13750549#comment-13750549
 ] 

Tejas Patil commented on KAFKA-1012:
------------------------------------

Re transactionality:
In the current patch (with embedded producer), a commit request (aka produce 
request for offsets topic) gets written to 2 places: logs of offsets topic and 
offset manager backend storage. If there is an error in writing any offset 
message to the logs, then this would be indicated in the response of the 
produce request. Embedded producer would internally retry the request (with 
failed messages only) after checking the error status in response. Only those 
messages which could make it to the logs are passed to the 2nd part (offset 
manager backend storage). As the backend would be basically a hash table or Zk, 
it is assumed that the offset manager won't fail to write data to the backend. 
To sum up, there is no notion of transactions. Brokers would "greedily" try to 
commit as many messages they can, if some offset messages fail, embedded 
producer would re-send a request just for the failed ones.

Re "per-topic max message size":
Does Kafka support per-topic max message size ? : I could not find such config.
What does "server impact" includes : volume of offsets data stored in logs or 
having large metadata in memory ? Log cleaner would dedupe the logs of this 
topic frequently and so the size of logs would be pruned from time to time. 
About holding this metadata in in-memory hash table, I think its a nice thing 
to have a cap on message size to prevent in-memory table consuming large 
memory. Would include the same in coming patch. It would be helpful even for 
next phase when we move off embedded producer and start using offset commit 
request.

Re "partitioning the offset topic":
Its recommended to have a #(partitions for offsets topic) >= #brokers so that 
all brokers get a somewhat similar[*] amount of traffic of offset commits. The 
replication factor of the offsets topic should be more than that of any normal 
kafka topic to achieve high availability of the offset information.
[*]: There can be a imbalance in server load if some consumer groups have lot 
of consumers or if some consumers have shorter offset commit interval than the 
others. So we cannot get a guarantee about "equal" load across all brokers. We 
expect that the load to be "similar" across all brokers.

Thanks for all your comments [~criccomini] !!!
                
> Implement an Offset Manager and hook offset requests to it
> ----------------------------------------------------------
>
>                 Key: KAFKA-1012
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1012
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: consumer
>            Reporter: Tejas Patil
>            Assignee: Tejas Patil
>            Priority: Minor
>         Attachments: KAFKA-1012.patch, KAFKA-1012-v2.patch
>
>
> After KAFKA-657, we have a protocol for consumers to commit and fetch offsets 
> from brokers. Currently, consumers are not using this API and directly 
> talking with Zookeeper. 
> This Jira will involve following:
> 1. Add a special topic in kafka for storing offsets
> 2. Add an OffsetManager interface which would handle storing, accessing, 
> loading and maintaining consumer offsets
> 3. Implement offset managers for both of these 2 choices : existing ZK based 
> storage or inbuilt storage for offsets.
> 4. Leader brokers would now maintain an additional hash table of offsets for 
> the group-topic-partitions that they lead
> 5. Consumers should now use the OffsetCommit and OffsetFetch API

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to