Jay Kreps created KAFKA-642:
-------------------------------

             Summary: Protocol tweaks for 0.8
                 Key: KAFKA-642
                 URL: https://issues.apache.org/jira/browse/KAFKA-642
             Project: Kafka
          Issue Type: Bug
            Reporter: Jay Kreps


There are a couple of things in the protocol that are not idea. It would be 
good to tweak these for 0.8 so we start clean.

Here is a set of problems and proposals:

Problems:
1. Correlation id is not used across all the requests. I don't think it can 
work as intended because of this.
2. On reflection I am not sure that we need a correlation id field. I think 
that since we need to guarantee that processing is sequential on any particular 
socket we can correlate with a simple queue. (e.g. as the client sends messages 
it adds them to a queue and as it receives responses it just correlates to 
whatever is at the head of the queue).
3. The metadata response seems to have a number of problems. Among them is that 
it weirdly repeats all the broker information many times. The response includes 
the ISR, leader (maybe), and the replicas. Each of these repeat all the broker 
information. This is super weird. I think what we should be doing here is 
including all broker information for all brokers and then just having the 
appropriate ids for the isr, leader, and replicas.
4. For topic discovery I think we need to support the case where no topics are 
specified in the metadata request and for this return information about all 
topics. I don't think we do this now.
5. I don't understand what the creator id is.
6. The offset request and response is not fully thought through and should be 
generalized.

Proposals:
1, 2. Correlation id. This is not strictly speaking needed, but it is maybe 
useful for debugging to be able to trace a particular request from client to 
server. So we will extend this across all the requests.
3. For metadata response I will try to fix this up by normalizing out the 
broker list and having the isr, replicas, and leader field just have the node 
id.
4. This should be uncontroversial and easy to add.
5. Let's remove creator id, it isn't used.
6. Let's generalize offset request. My proposal is below:

Rename TopicMetadata API to ClusterMetadata, as this will contain all the data 
that is known cluster-wide. Then let's generalize the offset request to be 
PartitionMetadata--namely stuff about a particular partition on a particular 
server.

The format of PartitionMetdata would be the following:

PartitionMetadataRequest => [TopicName [PartitionId MinSegmentTime 
MaxSegmentInfos]]
  TopicName => string
  PartitionId => uint32
  MinSegmentTime => uint64
  MaxSegmentInfos => int32

PartitionMetadataResponse => [TopicName [PartitionMetadata]]
  TopicName => string
  PartitionMetadata => PartitionId LogSize NumberOfSegments LogEndOffset 
HighwaterMark [SegmentData]
  SegmentData => StartOffset LastModifiedTime
  LogSize => uint64
  NumberOfSegments => int32
  LogEndOffset => int64
  HighwaterMark => int64

This would be general enough that we could continue to add to it for any new 
pieces of data we need.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to