Sorry, I should have read the release notes before I asked this question.
The answer was in there.

Internally the implementation of the offset storage is just a compacted
> <http://kafka.apache.org/documentation.html#compaction> Kafka topic (
> __consumer_offsets) keyed on the consumer’s group, topic, and partition.
> The offset commit request writes the offset to the compacted Kafka topic
> using the highest level of durability guarantee that Kafka provides (
> acks=-1) so that offsets are never lost in the presence of uncorrelated
> failures. Kafka maintains an in-memory view of the latest offset per
> <consumer group, topic, partition> triplet, so offset fetch requests can be
> served quickly without requiring a full scan of the compacted offsets
> topic. With this feature, consumers can checkpoint offsets very often,
> possibly per message.



On Sun, Feb 8, 2015 at 9:39 AM, Christopher Piggott <cpigg...@gmail.com>
wrote:

> > The consumer used Zookeeper to store offsets, in 0.8.2 there's an option
> to use Kafka itself for that (by setting *offsets.storage = kafka
>
> Does it still really live in zookeeper, and kafka is proxying the requests
> through?
>
> On Sun, Feb 8, 2015 at 9:25 AM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
>
>> Hi Eduardo,
>>
>> 1. "Why sometimes the applications prefer to connect to zookeeper instead
>> brokers?"
>>
>> I assume you are talking about the clients and some of our tools?
>> These are parts of an older design and we are actively working on fixing
>> this. The consumer used Zookeeper to store offsets, in 0.8.2 there's an
>> option to use Kafka itself for that (by setting *offsets.storage =
>> kafka*).
>> We are planning on fixing the tools in 0.9, but obviously they are less
>> performance sensitive than the consumers.
>>
>> 2. Regarding your tests and disk usage - I'm not sure exactly what fills
>> your disk - if its the kafka transaction logs (i.e. log.dir), then we
>> expect to store the size of all messages sent times the replication
>> faction
>> configured for each topic. We keep messages for the amount of time
>> specified in *log.retention* parameters. If the disk is filled within
>> minutes, either set log.retention.minutes very low (at risk of losing data
>> if consumers need restart), or make sure your disk capacity matches the
>> rates in which producers send data.
>>
>> Gwen
>>
>>
>> On Sat, Feb 7, 2015 at 3:01 AM, Eduardo Costa Alfaia <
>> e.costaalf...@unibs.it
>> > wrote:
>>
>> > Hi Guys,
>> >
>> > I have some doubts about the Kafka, the first is Why sometimes the
>> > applications prefer to connect to zookeeper instead brokers? Connecting
>> to
>> > zookeeper could create an overhead, because we are inserting other
>> element
>> > between producer and consumer. Another question is about the information
>> > sent by producer, in my tests the producer send the messages to brokers
>> and
>> > a few minutes my HardDisk is full (my harddisk has 250GB), is there
>> > something to do in the configuration to minimize this?
>> >
>> > Thanks
>> > --
>> > Informativa sulla Privacy: http://www.unibs.it/node/8155
>> >
>>
>
>

Reply via email to