Hmmm...  more info.

So, inside /var/log/kafka-logs/myTopicName-0 I find two files

00000000000000026524.index  00000000000000026524.log

Interestingly, they both bear the number of the "lowest" offset returned by
the command I mention above.

If I "cat" the 000.....26524.log file, I get all my messages on the
commandline as if I'd issued the --from-beginning command

I'm not sure what the index has, it's unreadable by the simple tools I've
tried....

I'm still scratching my head a bit - as the link you sent for Kafka
introduction says this:

The messages in the partitions are each assigned a sequential id number
called the *offset* that uniquely identifies each message within the
partition.
I see how that could be exactly what you said (the previous message(s) byte
count) -- but the picture implies that it's a linear progression - 1,2,3
etc...  (and that could be an oversimplification for purposes of the
introduction - I get that...)

Feel free to comment or not - I'm going to keep digging into it as best I
can - any clarifications will be gratefully accepted...



On Wed, Feb 17, 2016 at 1:50 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> Thank you Christian -- I appreciate your taking the time to help me out on
> this.
>
> Here's what I found while continuing to dig into this.
>
> If I take 30024 and subtract the number of messages I know I have in Kafka
> (3500) I get 26524.
>
> If I reset thus:  set /kafka/consumers/myGroupName/offsets/myTopicName/0
> 26524
>
> ... and then re-run my consumer - I get all 3500 messages again.
>
> If I do this: set /kafka/consumers/myGroupName/offsets/myTopicName/0 26624
>
> In other words, I increase the offset number by 100 -- then I get exactly
> 3400 messages on my consumer --  exactly 100 less than before which I think
> makes sense, since I started the offset 100 higher...
>
> This seems to suggest that each number between 26624 and 30024 in the log
> represents one of my 3500 messages on this topic, but what you say suggests
> that they represent byte count of the actual messages and not "one number
> per message"...
>
> I also find that if I issue this command:
>
> bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic=myTopicName
> --broker-list=192.168.56.3:9092  --time=-2
>
> I get back that same number -- 26524...
>
> Hmmmm....  A little confused still...  These messages are literally stored
> in the Kafka logs, yes?  I think I'll go digging in there and see...
>
> Thanks again!
>
>
>
>
>
> On Wed, Feb 17, 2016 at 12:38 PM, Christian Posta <
> christian.po...@gmail.com> wrote:
>
>> The number is the log-ordered number of bytes. So really, the offset is
>> kinda like the "number of bytes" to begin reading from. 0 means read the
>> log from the beginning. The second message is 0 + size of message. So the
>> message "ids" are really just the offset of the previous message sizes.
>>
>> For example, if I have three messages of 10 bytes each, and set the
>> consumer offset to 0, i'll read everything. If you set the offset to 10,
>> I'll read the second and third messages, and so on.
>>
>> see more here:
>>
>> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
>> and here: http://kafka.apache.org/documentation.html#introduction
>>
>> HTH!
>>
>> On Wed, Feb 17, 2016 at 12:16 PM, John Bickerstaff <
>> j...@johnbickerstaff.com
>> > wrote:
>>
>> > *Use Case: Disaster Recovery & Re-indexing SOLR*
>> >
>> > I'm using Kafka to hold messages from a service that prepares
>> "documents"
>> > for SOLR.
>> >
>> > A second micro service (a consumer) requests these messages, does any
>> final
>> > processing, and fires them into SOLR.
>> >
>> > The whole thing is (in part) designed to be used for disaster recovery -
>> > allowing the rebuild of the SOLR index in the shortest possible time.
>> >
>> > To do this (and to be able to use it for re-indexing SOLR while testing
>> > relevancy) I need to be able to "play all messages from the beginning"
>> at
>> > will.
>> >
>> > I find I can use the zkCli.sh tool to delete the Consumer Group Name
>> like
>> > this:
>> >      rmr /kafka/consumers/myGroupName
>> >
>> > After which my microservice will get all the messages again when it
>> runs.
>> >
>> > I was trying to find a way to do this programmatically without actually
>> > using the "low level" consumer api since the high level one is so simple
>> > and my code already works.  So I started playing with Zookeeper api for
>> > duplicating "rmr /kafka/consumers/myGroupName"
>> >
>> > *The Question: What does that offset actually represent?*
>> >
>> > It was at this point that I discovered the offset must represent
>> something
>> > other than what I thought it would.  Things obviously work, but I'm
>> > wondering what - exactly do the offsets represent?
>> >
>> > To clarify - if I run this command on a zookeeper node, after the
>> > microservice has run:
>> >      get /kafka/consumers/myGroupName/offsets/myTopicName/0
>> >
>> > I get the following:
>> >
>> > 30024
>> > cZxid = 0x3600000355
>> > ctime = Fri Feb 12 07:27:50 MST 2016
>> > mZxid = 0x3600000357
>> > mtime = Fri Feb 12 07:29:50 MST 2016
>> > pZxid = 0x3600000355
>> > cversion = 0
>> > dataVersion = 2
>> > aclVersion = 0
>> > ephemeralOwner = 0x0
>> > dataLength = 5
>> > numChildren = 0
>> >
>> > Now - I have exactly 3500 messages in this Kafka topic.  I verify that
>> by
>> > running this command:
>> >      bin/kafka-console-consumer.sh --zookeeper 192.168.56.5:2181/kafka
>> > --topic myTopicName --from-beginning
>> >
>> > When I hit Ctrl-C, it tells me it consumed 3500 messages.
>> >
>> > So - what does that 30024 actually represent?  If I reset that number
>> to 1
>> > or 0 and re-run my consumer microservice, I get all the messages again -
>> > and the number again goes to 30024.  However, I'm not comfortable to
>> trust
>> > that because my assumption that the number represents a simple count of
>> > messages that have been sent to this consumer is obviously wrong.
>> >
>> > (I reset the number like this -- to 1 -- and assume there's an API
>> command
>> > that will do it too.)
>> >      set /kafka/consumers/myGroupName/offsets/myTopicName/0 1
>> >
>> > Can someone help me clarify or point me at a doc that explains what is
>> > getting counted here?  You can shoot me if you like for attempting the
>> > hack-ish solution of re-setting the offset through the Zookeeper API,
>> but I
>> > would still like to understand what, exactly, is represented by that
>> number
>> > 30024.
>> >
>> > I need to hand off to IT for the Disaster Recovery portion and saying
>> > "trust me, it just works" isn't going to fly very far...
>> >
>> > Thanks.
>> >
>>
>>
>>
>> --
>> *Christian Posta*
>> twitter: @christianposta
>> http://www.christianposta.com/blog
>> http://fabric8.io
>>
>
>

Reply via email to