Sorry, I should have read the release notes before I asked this question. The answer was in there.
Internally the implementation of the offset storage is just a compacted > <http://kafka.apache.org/documentation.html#compaction> Kafka topic ( > __consumer_offsets) keyed on the consumer’s group, topic, and partition. > The offset commit request writes the offset to the compacted Kafka topic > using the highest level of durability guarantee that Kafka provides ( > acks=-1) so that offsets are never lost in the presence of uncorrelated > failures. Kafka maintains an in-memory view of the latest offset per > <consumer group, topic, partition> triplet, so offset fetch requests can be > served quickly without requiring a full scan of the compacted offsets > topic. With this feature, consumers can checkpoint offsets very often, > possibly per message. On Sun, Feb 8, 2015 at 9:39 AM, Christopher Piggott <cpigg...@gmail.com> wrote: > > The consumer used Zookeeper to store offsets, in 0.8.2 there's an option > to use Kafka itself for that (by setting *offsets.storage = kafka > > Does it still really live in zookeeper, and kafka is proxying the requests > through? > > On Sun, Feb 8, 2015 at 9:25 AM, Gwen Shapira <gshap...@cloudera.com> > wrote: > >> Hi Eduardo, >> >> 1. "Why sometimes the applications prefer to connect to zookeeper instead >> brokers?" >> >> I assume you are talking about the clients and some of our tools? >> These are parts of an older design and we are actively working on fixing >> this. The consumer used Zookeeper to store offsets, in 0.8.2 there's an >> option to use Kafka itself for that (by setting *offsets.storage = >> kafka*). >> We are planning on fixing the tools in 0.9, but obviously they are less >> performance sensitive than the consumers. >> >> 2. Regarding your tests and disk usage - I'm not sure exactly what fills >> your disk - if its the kafka transaction logs (i.e. log.dir), then we >> expect to store the size of all messages sent times the replication >> faction >> configured for each topic. We keep messages for the amount of time >> specified in *log.retention* parameters. If the disk is filled within >> minutes, either set log.retention.minutes very low (at risk of losing data >> if consumers need restart), or make sure your disk capacity matches the >> rates in which producers send data. >> >> Gwen >> >> >> On Sat, Feb 7, 2015 at 3:01 AM, Eduardo Costa Alfaia < >> e.costaalf...@unibs.it >> > wrote: >> >> > Hi Guys, >> > >> > I have some doubts about the Kafka, the first is Why sometimes the >> > applications prefer to connect to zookeeper instead brokers? Connecting >> to >> > zookeeper could create an overhead, because we are inserting other >> element >> > between producer and consumer. Another question is about the information >> > sent by producer, in my tests the producer send the messages to brokers >> and >> > a few minutes my HardDisk is full (my harddisk has 250GB), is there >> > something to do in the configuration to minimize this? >> > >> > Thanks >> > -- >> > Informativa sulla Privacy: http://www.unibs.it/node/8155 >> > >> > >