Thank you Christian -- I appreciate your taking the time to help me out on
this.

Here's what I found while continuing to dig into this.

If I take 30024 and subtract the number of messages I know I have in Kafka
(3500) I get 26524.

If I reset thus:  set /kafka/consumers/myGroupName/offsets/myTopicName/0
26524

... and then re-run my consumer - I get all 3500 messages again.

If I do this: set /kafka/consumers/myGroupName/offsets/myTopicName/0 26624

In other words, I increase the offset number by 100 -- then I get exactly
3400 messages on my consumer --  exactly 100 less than before which I think
makes sense, since I started the offset 100 higher...

This seems to suggest that each number between 26624 and 30024 in the log
represents one of my 3500 messages on this topic, but what you say suggests
that they represent byte count of the actual messages and not "one number
per message"...

I also find that if I issue this command:

bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic=myTopicName
--broker-list=192.168.56.3:9092  --time=-2

I get back that same number -- 26524...

Hmmmm....  A little confused still...  These messages are literally stored
in the Kafka logs, yes?  I think I'll go digging in there and see...

Thanks again!





On Wed, Feb 17, 2016 at 12:38 PM, Christian Posta <christian.po...@gmail.com
> wrote:

> The number is the log-ordered number of bytes. So really, the offset is
> kinda like the "number of bytes" to begin reading from. 0 means read the
> log from the beginning. The second message is 0 + size of message. So the
> message "ids" are really just the offset of the previous message sizes.
>
> For example, if I have three messages of 10 bytes each, and set the
> consumer offset to 0, i'll read everything. If you set the offset to 10,
> I'll read the second and third messages, and so on.
>
> see more here:
>
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> and here: http://kafka.apache.org/documentation.html#introduction
>
> HTH!
>
> On Wed, Feb 17, 2016 at 12:16 PM, John Bickerstaff <
> j...@johnbickerstaff.com
> > wrote:
>
> > *Use Case: Disaster Recovery & Re-indexing SOLR*
> >
> > I'm using Kafka to hold messages from a service that prepares "documents"
> > for SOLR.
> >
> > A second micro service (a consumer) requests these messages, does any
> final
> > processing, and fires them into SOLR.
> >
> > The whole thing is (in part) designed to be used for disaster recovery -
> > allowing the rebuild of the SOLR index in the shortest possible time.
> >
> > To do this (and to be able to use it for re-indexing SOLR while testing
> > relevancy) I need to be able to "play all messages from the beginning" at
> > will.
> >
> > I find I can use the zkCli.sh tool to delete the Consumer Group Name like
> > this:
> >      rmr /kafka/consumers/myGroupName
> >
> > After which my microservice will get all the messages again when it runs.
> >
> > I was trying to find a way to do this programmatically without actually
> > using the "low level" consumer api since the high level one is so simple
> > and my code already works.  So I started playing with Zookeeper api for
> > duplicating "rmr /kafka/consumers/myGroupName"
> >
> > *The Question: What does that offset actually represent?*
> >
> > It was at this point that I discovered the offset must represent
> something
> > other than what I thought it would.  Things obviously work, but I'm
> > wondering what - exactly do the offsets represent?
> >
> > To clarify - if I run this command on a zookeeper node, after the
> > microservice has run:
> >      get /kafka/consumers/myGroupName/offsets/myTopicName/0
> >
> > I get the following:
> >
> > 30024
> > cZxid = 0x3600000355
> > ctime = Fri Feb 12 07:27:50 MST 2016
> > mZxid = 0x3600000357
> > mtime = Fri Feb 12 07:29:50 MST 2016
> > pZxid = 0x3600000355
> > cversion = 0
> > dataVersion = 2
> > aclVersion = 0
> > ephemeralOwner = 0x0
> > dataLength = 5
> > numChildren = 0
> >
> > Now - I have exactly 3500 messages in this Kafka topic.  I verify that by
> > running this command:
> >      bin/kafka-console-consumer.sh --zookeeper 192.168.56.5:2181/kafka
> > --topic myTopicName --from-beginning
> >
> > When I hit Ctrl-C, it tells me it consumed 3500 messages.
> >
> > So - what does that 30024 actually represent?  If I reset that number to
> 1
> > or 0 and re-run my consumer microservice, I get all the messages again -
> > and the number again goes to 30024.  However, I'm not comfortable to
> trust
> > that because my assumption that the number represents a simple count of
> > messages that have been sent to this consumer is obviously wrong.
> >
> > (I reset the number like this -- to 1 -- and assume there's an API
> command
> > that will do it too.)
> >      set /kafka/consumers/myGroupName/offsets/myTopicName/0 1
> >
> > Can someone help me clarify or point me at a doc that explains what is
> > getting counted here?  You can shoot me if you like for attempting the
> > hack-ish solution of re-setting the offset through the Zookeeper API,
> but I
> > would still like to understand what, exactly, is represented by that
> number
> > 30024.
> >
> > I need to hand off to IT for the Disaster Recovery portion and saying
> > "trust me, it just works" isn't going to fly very far...
> >
> > Thanks.
> >
>
>
>
> --
> *Christian Posta*
> twitter: @christianposta
> http://www.christianposta.com/blog
> http://fabric8.io
>

Reply via email to