Hi John,

Glad to help :) I ran into similar issues recently being confused by what
does the offsets mean as well so I understand you pain haha.

Best of luck,
Leo

On Tue, Feb 23, 2016 at 1:53 PM, John Bickerstaff <j...@johnbickerstaff.com>
wrote:

> Thanks Leo!
>
> =========
> TL;DR summary:
>
> You're correct - I didn't absolutely need the offset.
> I had to provide Disaster Recovery advice and couldn't explain the offset
> numbers, which wouldn't fly
> Explanation for how I got myself confused in the text below -- in case it
> helps someone else later.
> Thanks for your reply!
> =========
>
> You're right.  Strictly speaking, I don't need the offset.  In my testing
> I've been issuing the rmr /kafka/consumers command from the Zookeeper
> zkCli.sh.
> I'm adding it to my microservice using the Zookeeper API this week - since
> that seems a lot easier than figuring out the low level Kafka API code and
> it works just as well.
>
> Being a developer, I just couldn't help trying to change the least
> significant thing required to get the job done - and the Zookeeper API does
> allow me to change that offset number...  Which led me to try to understand
> why that number wasn't matching my expectations...
>
> In addition, I'm building a SOLR / Kafka / Zookeeper infrastructure from
> scratch and part of my mandate is to provide a handoff to our (very capable
> and very careful) IT manager.  The handoff is to include plans and
> documentation for disaster recovery as well as how to build and manage the
> cluster.
>
> For both of those reasons, my curiosity was piqued and I wanted to find out
> exactly what was going on.  I could just imagine the look on our IT
> manager's face when I said "Trust me, the numbers don't line up, but it
> won't affect disaster recovery."
>
> In hindsight, I understand what I did that confused me.  Since I'm still in
> development "mode" I sent messages to the same topic repeatedly for weeks.
> Then instead of deleting the topic, I issued the following command to reset
> the retention of the messages like this:
>
> bin/kafka-topics.sh --zookeeper 192.168.56.5:2181/kafka --alter --topic
> topicName --config retention.ms=1000
>
> Then I reset it once the messages were deleted, thus:
>
> bin/kafka-topics.sh --zookeeper 192.168.56.5:2181/kafka --alter --topic
> topicName --delete-config retention.ms
>
> What I didn't realize is that (not unreasonably) the offset count isn't
> reset by changing the config retention setting.  As you said, it won't
> necessarily be 0.
>
> Sending the same set of messages repeatedly resulted in having a very large
> count in the offset - a count that bore no relation to the number of
> messages in the topic - which worried me because I couldn't explain it --
> and things I can't explain make me nervous in the context of disaster
> recovery...
>
> I appreciate your confirmation of my theory about what is going on.
>
> --JohnB (aka solrJohn)
>
> On Thu, Feb 18, 2016 at 12:19 PM, Leo Lin <leo....@brigade.com> wrote:
>
> > Hi John,
> >
> > Kafka offsets are sequential id numbers that identify messages in each
> > partition. It might not be sequential within a topic (which can have
> > multiple partition).
> >
> > Offsets don't necessarily start at 0 since messages are deleted.
> >
> > .bin/kafka-run-class.sh kafka.tools.GetOffsetShell is pretty neat to look
> > at offsets in your topic
> >
> > I'm not sure why resetting offset is needed in your case. If you need to
> > read from the beginning using the high level consumer,
> > you just need to delete that consumer group in zookeeper and set
> > "auto.offset.reset"  to "smallest". (this will direct the consumer to
> look
> > for smallest offset if it doesnt find one in zookeeper)
> >
> > On Wed, Feb 17, 2016 at 1:06 PM, John Bickerstaff <
> > j...@johnbickerstaff.com>
> > wrote:
> >
> > > Hmmm...  more info.
> > >
> > > So, inside /var/log/kafka-logs/myTopicName-0 I find two files
> > >
> > > 00000000000000026524.index  00000000000000026524.log
> > >
> > > Interestingly, they both bear the number of the "lowest" offset
> returned
> > by
> > > the command I mention above.
> > >
> > > If I "cat" the 000.....26524.log file, I get all my messages on the
> > > commandline as if I'd issued the --from-beginning command
> > >
> > > I'm not sure what the index has, it's unreadable by the simple tools
> I've
> > > tried....
> > >
> > > I'm still scratching my head a bit - as the link you sent for Kafka
> > > introduction says this:
> > >
> > > The messages in the partitions are each assigned a sequential id number
> > > called the *offset* that uniquely identifies each message within the
> > > partition.
> > > I see how that could be exactly what you said (the previous message(s)
> > byte
> > > count) -- but the picture implies that it's a linear progression -
> 1,2,3
> > > etc...  (and that could be an oversimplification for purposes of the
> > > introduction - I get that...)
> > >
> > > Feel free to comment or not - I'm going to keep digging into it as
> best I
> > > can - any clarifications will be gratefully accepted...
> > >
> > >
> > >
> > > On Wed, Feb 17, 2016 at 1:50 PM, John Bickerstaff <
> > > j...@johnbickerstaff.com>
> > > wrote:
> > >
> > > > Thank you Christian -- I appreciate your taking the time to help me
> out
> > > on
> > > > this.
> > > >
> > > > Here's what I found while continuing to dig into this.
> > > >
> > > > If I take 30024 and subtract the number of messages I know I have in
> > > Kafka
> > > > (3500) I get 26524.
> > > >
> > > > If I reset thus:  set
> > /kafka/consumers/myGroupName/offsets/myTopicName/0
> > > > 26524
> > > >
> > > > ... and then re-run my consumer - I get all 3500 messages again.
> > > >
> > > > If I do this: set /kafka/consumers/myGroupName/offsets/myTopicName/0
> > > 26624
> > > >
> > > > In other words, I increase the offset number by 100 -- then I get
> > exactly
> > > > 3400 messages on my consumer --  exactly 100 less than before which I
> > > think
> > > > makes sense, since I started the offset 100 higher...
> > > >
> > > > This seems to suggest that each number between 26624 and 30024 in the
> > log
> > > > represents one of my 3500 messages on this topic, but what you say
> > > suggests
> > > > that they represent byte count of the actual messages and not "one
> > number
> > > > per message"...
> > > >
> > > > I also find that if I issue this command:
> > > >
> > > > bin/kafka-run-class.sh kafka.tools.GetOffsetShell --topic=myTopicName
> > > > --broker-list=192.168.56.3:9092  --time=-2
> > > >
> > > > I get back that same number -- 26524...
> > > >
> > > > Hmmmm....  A little confused still...  These messages are literally
> > > stored
> > > > in the Kafka logs, yes?  I think I'll go digging in there and see...
> > > >
> > > > Thanks again!
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Feb 17, 2016 at 12:38 PM, Christian Posta <
> > > > christian.po...@gmail.com> wrote:
> > > >
> > > >> The number is the log-ordered number of bytes. So really, the offset
> > is
> > > >> kinda like the "number of bytes" to begin reading from. 0 means read
> > the
> > > >> log from the beginning. The second message is 0 + size of message.
> So
> > > the
> > > >> message "ids" are really just the offset of the previous message
> > sizes.
> > > >>
> > > >> For example, if I have three messages of 10 bytes each, and set the
> > > >> consumer offset to 0, i'll read everything. If you set the offset to
> > 10,
> > > >> I'll read the second and third messages, and so on.
> > > >>
> > > >> see more here:
> > > >>
> > > >>
> > >
> >
> http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
> > > >> and here: http://kafka.apache.org/documentation.html#introduction
> > > >>
> > > >> HTH!
> > > >>
> > > >> On Wed, Feb 17, 2016 at 12:16 PM, John Bickerstaff <
> > > >> j...@johnbickerstaff.com
> > > >> > wrote:
> > > >>
> > > >> > *Use Case: Disaster Recovery & Re-indexing SOLR*
> > > >> >
> > > >> > I'm using Kafka to hold messages from a service that prepares
> > > >> "documents"
> > > >> > for SOLR.
> > > >> >
> > > >> > A second micro service (a consumer) requests these messages, does
> > any
> > > >> final
> > > >> > processing, and fires them into SOLR.
> > > >> >
> > > >> > The whole thing is (in part) designed to be used for disaster
> > > recovery -
> > > >> > allowing the rebuild of the SOLR index in the shortest possible
> > time.
> > > >> >
> > > >> > To do this (and to be able to use it for re-indexing SOLR while
> > > testing
> > > >> > relevancy) I need to be able to "play all messages from the
> > beginning"
> > > >> at
> > > >> > will.
> > > >> >
> > > >> > I find I can use the zkCli.sh tool to delete the Consumer Group
> Name
> > > >> like
> > > >> > this:
> > > >> >      rmr /kafka/consumers/myGroupName
> > > >> >
> > > >> > After which my microservice will get all the messages again when
> it
> > > >> runs.
> > > >> >
> > > >> > I was trying to find a way to do this programmatically without
> > > actually
> > > >> > using the "low level" consumer api since the high level one is so
> > > simple
> > > >> > and my code already works.  So I started playing with Zookeeper
> api
> > > for
> > > >> > duplicating "rmr /kafka/consumers/myGroupName"
> > > >> >
> > > >> > *The Question: What does that offset actually represent?*
> > > >> >
> > > >> > It was at this point that I discovered the offset must represent
> > > >> something
> > > >> > other than what I thought it would.  Things obviously work, but
> I'm
> > > >> > wondering what - exactly do the offsets represent?
> > > >> >
> > > >> > To clarify - if I run this command on a zookeeper node, after the
> > > >> > microservice has run:
> > > >> >      get /kafka/consumers/myGroupName/offsets/myTopicName/0
> > > >> >
> > > >> > I get the following:
> > > >> >
> > > >> > 30024
> > > >> > cZxid = 0x3600000355
> > > >> > ctime = Fri Feb 12 07:27:50 MST 2016
> > > >> > mZxid = 0x3600000357
> > > >> > mtime = Fri Feb 12 07:29:50 MST 2016
> > > >> > pZxid = 0x3600000355
> > > >> > cversion = 0
> > > >> > dataVersion = 2
> > > >> > aclVersion = 0
> > > >> > ephemeralOwner = 0x0
> > > >> > dataLength = 5
> > > >> > numChildren = 0
> > > >> >
> > > >> > Now - I have exactly 3500 messages in this Kafka topic.  I verify
> > that
> > > >> by
> > > >> > running this command:
> > > >> >      bin/kafka-console-consumer.sh --zookeeper
> > > 192.168.56.5:2181/kafka
> > > >> > --topic myTopicName --from-beginning
> > > >> >
> > > >> > When I hit Ctrl-C, it tells me it consumed 3500 messages.
> > > >> >
> > > >> > So - what does that 30024 actually represent?  If I reset that
> > number
> > > >> to 1
> > > >> > or 0 and re-run my consumer microservice, I get all the messages
> > > again -
> > > >> > and the number again goes to 30024.  However, I'm not comfortable
> to
> > > >> trust
> > > >> > that because my assumption that the number represents a simple
> count
> > > of
> > > >> > messages that have been sent to this consumer is obviously wrong.
> > > >> >
> > > >> > (I reset the number like this -- to 1 -- and assume there's an API
> > > >> command
> > > >> > that will do it too.)
> > > >> >      set /kafka/consumers/myGroupName/offsets/myTopicName/0 1
> > > >> >
> > > >> > Can someone help me clarify or point me at a doc that explains
> what
> > is
> > > >> > getting counted here?  You can shoot me if you like for attempting
> > the
> > > >> > hack-ish solution of re-setting the offset through the Zookeeper
> > API,
> > > >> but I
> > > >> > would still like to understand what, exactly, is represented by
> that
> > > >> number
> > > >> > 30024.
> > > >> >
> > > >> > I need to hand off to IT for the Disaster Recovery portion and
> > saying
> > > >> > "trust me, it just works" isn't going to fly very far...
> > > >> >
> > > >> > Thanks.
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> *Christian Posta*
> > > >> twitter: @christianposta
> > > >> http://www.christianposta.com/blog
> > > >> http://fabric8.io
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > "Dream no small dreams for they have no power to move the hearts of men."
> >
> > Johann Wolfgang von Goethe
> >
>



-- 
"Dream no small dreams for they have no power to move the hearts of men."

Johann Wolfgang von Goethe

Reply via email to