Re: Some doubts regarding kafka config parameters

Jun Rao Mon, 21 Jul 2014 08:06:37 -0700

Those are good questions. See my answers inlined below.

Thanks,


Jun


On Fri, Jul 18, 2014 at 1:33 PM, shweta khare <shweta.p.kh...@gmail.com>
wrote:

> hi,
>
> I have the following doubts regarding some kafka config parameters:
>
> For example if I have a Throughput topic with replication factor 1 and a
> single partition 0,then i will see the following files under
> /tmp/kafka-logs/Throughput_0:
>
> 00000000000000000000.index
> 00000000000000000000.log
>
> 00000000000070117826.index
> 00000000000070117826.log
>
>
> 1) *log.delete.delay.ms <http://log.delete.delay.ms>:*
>
> The period of time we hold log files around after they are removed from the
> > *index*. This period of time allows any in-progress reads to complete
> > uninterrupted without locking. [6000]
>
>  In the above description, does “*index*” refer to the in-memory
> segment-list and not the 00000****.index file(in example above)?
>
> As per documentation, kafka maintains an in-memory segment list:
>
>  To enable read operations, kafka maintains an in-memory range(segment
> > list) for each file. To avoid locking reads while still allowing deletes
> > that modify the segment list we use a copy-on-write style segment list
> > implementation that provides consistent views to allow a binary search to
> > proceed on an immutable static snapshot view of the log segments while
> > deletes are progressing.
>


Yes, this refers to the in-memory segment list, not the .index file.

>
>
>
> 2) *socket.request.max.bytes: *The maximum request size the server will
> allow.
>
> how is this different from message.max.bytes (The maximum size of a message
> that the server can receive.)
>

A request can consist of data from multiple topic partitions and therefore
can contain many messages. A request bigger than socket.request.max.bytes
will be rejected.


>
> 3) *fetch.wait.max.ms <http://fetch.wait.max.ms>: *
>
> > The maximum amount of time the *server *will block before answering the
> > fetch request if there isn't sufficient data to immediately satisfy
> > fetch.min.bytes
>
> Does the server above refer to kafka consumer, which will block for
> fetch.wait.max.ms? How is fetch.wait.max.ms different from *
> consumer.timeout.ms
> <http://consumer.timeout.ms>* ?
>

fetch.wait.max.ms is used in the server and consumer.timeout.ms is used in
the consumer client in case the server doesn't send a response in time.
consumer.timeout.ms should be larger than fetch.wait.max.ms.


>
> 4) Is there any correlation between a producer's
> *queue.buffering.max.messages* and  *send.buffer.bytes? *
>
>
The former controls how many messages are grouped into a produce request
and the latter controls the socket buffer size.


> 5) Will batching not happen in case producer.type=async and
> request.required.acks=1 or -1 ? Since next message will only be sent after
> an ack is received from leader/all ISR replicas?
>
>
Batching is independent of the ack mode. We simply group multiple messages
into a single produce request. The ack mode is used in the produce request.


> 6) *topic.metadata.refresh.interval.ms
> <http://topic.metadata.refresh.interval.ms>: *
> After every 10 mins I see the following on my producer side:
>
> 1200483 [main] INFO  kafka.client.ClientUtils$  - Fetching metadata from
> broker id:0,host:localhost,port:9092 with correlation id 15078270 for 1
> topic(s) Set(Throughput)
>
> 1200484 [main] INFO  kafka.producer.SyncProducer  - Connected to
> localhost:9092 for producing
>
> 1200486 [main] INFO  kafka.producer.SyncProducer  - Disconnecting from
> localhost:9092
>
> 1200486 [main] INFO  kafka.producer.SyncProducer  - Disconnecting from
> sdp08:9092
>
> 1200487 [main] INFO  kafka.producer.SyncProducer  - Connected to sdp08:9092
> for producing
>
> Why is there a disconnection and re-connection happening on each metadata
> refresh even though the leader is alive? I have noticed that I loose some
> messages when this happens(with request.required.acks=0) ?
>

Yes, currently we close the connection after issuing metadata requests to
save idle connections. Refreshing metadata periodically is useful for
picking up changes like increases in # partitions in a topic. The data loss
you saw is related to acks=0. For details, see the explanation in
http://kafka.apache.org/documentation.html#producerconfigs for details.

>
> thank you,
> shweta
>

Re: Some doubts regarding kafka config parameters

Reply via email to