Re: Checkpointing with custom metadata

James Cheng Mon, 03 Aug 2015 18:47:52 -0700

Nice new email address, Gwen. :)

On Aug 3, 2015, at 3:17 PM, Gwen Shapira <g...@confluent.io> wrote:

> You are correct. You can see that ZookeeperConsumerConnector is hardcoded
> with null metadata.
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala#L310
> 
> More interesting, it looks like the Metadata is not exposed in the new
> KafkaConsumer either.
> 
> Mind sharing what did you plan to use it for? this will help us figure out
> how to expose it :)
> 

I'm juggling around a couple designs on what I'm trying to do, so it may turn 
out that I actually don't need it.

My generic answer is, I have some application state related to the processing 
of a topic, and I wanted to store it somewhere persistent that will survive 
across process death and disk failure. So storing it with the offset seemed 
like a nice solution. I could alternatively store it in a standalone kafka 
topic instead.

The more detailed answer: I'm doing a time-based merge sort of 3 topics (A, B, 
and C) and outputtiing the results into a new output topic (let's call it 
"sorted-topic"). Except that during the initial creation of "sorted-topic", and 
I want all of topic A to be output to sorted-topic first, and then followed by 
an on-going merge sort of topics B, C, and any updates to A that come along.

I was trying to handle the situation of what happens if I crash when initially 
copying topic A into sorted-topic. And I was thinking that I could save some 
metadata in my topic A checkpoint that says "still doing initial copy of A". So 
that way, when I start up next time, I would know to continue copying topic A 
to the output. Once I have finished copying all of A to sorted-topic, that I 
would store "finished doing initial copy of A" into my checkpoint, and that 
upon restart, I would check that and know to start doing the merge sort of A B 
C.

I have a couple other designs that seem cleaner, tho, so I might not actually 
need it.

-James

> Gwen
> 
> 
> On Mon, Aug 3, 2015 at 1:52 PM, James Cheng <jch...@tivo.com> wrote:
> 
>> According to
>> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-OffsetCommitRequest,
>> we can store custom metadata with our checkpoints. It looks like the high
>> level consumer does not support committing offsets with metadata, and that
>> in order to checkpoint with custom metadata, we have to issue the
>> OffsetCommitRequest ourselves. Is that correct?
>> 
>> Thanks,
>> -James
>> 
>>

Re: Checkpointing with custom metadata

Reply via email to