Those are good questions. See my answers inlined below.



On Fri, Jul 18, 2014 at 1:33 PM, shweta khare <>

> hi,
> I have the following doubts regarding some kafka config parameters:
> For example if I have a Throughput topic with replication factor 1 and a
> single partition 0,then i will see the following files under
> /tmp/kafka-logs/Throughput_0:
> 00000000000000000000.index
> 00000000000000000000.log
> 00000000000070117826.index
> 00000000000070117826.log
> 1) * <>:*
> The period of time we hold log files around after they are removed from the
> > *index*. This period of time allows any in-progress reads to complete
> > uninterrupted without locking. [6000]
>  In the above description, does “*index*” refer to the in-memory
> segment-list and not the 00000****.index file(in example above)?
> As per documentation, kafka maintains an in-memory segment list:
>  To enable read operations, kafka maintains an in-memory range(segment
> > list) for each file. To avoid locking reads while still allowing deletes
> > that modify the segment list we use a copy-on-write style segment list
> > implementation that provides consistent views to allow a binary search to
> > proceed on an immutable static snapshot view of the log segments while
> > deletes are progressing.

Yes, this refers to the in-memory segment list, not the .index file.

> 2) *socket.request.max.bytes: *The maximum request size the server will
> allow.
> how is this different from message.max.bytes (The maximum size of a message
> that the server can receive.)

A request can consist of data from multiple topic partitions and therefore
can contain many messages. A request bigger than socket.request.max.bytes
will be rejected.

> 3) * <>: *
> > The maximum amount of time the *server *will block before answering the
> > fetch request if there isn't sufficient data to immediately satisfy
> > fetch.min.bytes
> Does the server above refer to kafka consumer, which will block for
> How is different from *
> <>* ?
> is used in the server and is used in
the consumer client in case the server doesn't send a response in time. should be larger than

> 4) Is there any correlation between a producer's
> *queue.buffering.max.messages* and  *send.buffer.bytes? *
The former controls how many messages are grouped into a produce request
and the latter controls the socket buffer size.

> 5) Will batching not happen in case producer.type=async and
> request.required.acks=1 or -1 ? Since next message will only be sent after
> an ack is received from leader/all ISR replicas?
Batching is independent of the ack mode. We simply group multiple messages
into a single produce request. The ack mode is used in the produce request.

> 6) *
> <>: *
> After every 10 mins I see the following on my producer side:
> 1200483 [main] INFO  kafka.client.ClientUtils$  - Fetching metadata from
> broker id:0,host:localhost,port:9092 with correlation id 15078270 for 1
> topic(s) Set(Throughput)
> 1200484 [main] INFO  kafka.producer.SyncProducer  - Connected to
> localhost:9092 for producing
> 1200486 [main] INFO  kafka.producer.SyncProducer  - Disconnecting from
> localhost:9092
> 1200486 [main] INFO  kafka.producer.SyncProducer  - Disconnecting from
> sdp08:9092
> 1200487 [main] INFO  kafka.producer.SyncProducer  - Connected to sdp08:9092
> for producing
> Why is there a disconnection and re-connection happening on each metadata
> refresh even though the leader is alive? I have noticed that I loose some
> messages when this happens(with request.required.acks=0) ?

Yes, currently we close the connection after issuing metadata requests to
save idle connections. Refreshing metadata periodically is useful for
picking up changes like increases in # partitions in a topic. The data loss
you saw is related to acks=0. For details, see the explanation in for details.

> thank you,
> shweta

Reply via email to