Re: Consuming a snapshot from log compacted topic

2015-04-14 Thread Will Funnell
Hi, Any update on the above patch? Hoping you might be able to review it soon. Thanks. On 23 February 2015 at 21:21, Will Funnell wrote: > Hey guys, > > I created a patch based on your feedback. > > Let me know what you think. > > https://issues.apache.org/jira/browse/KAFKA-1977 > > On 20 F

Re: Consuming a snapshot from log compacted topic

2015-02-23 Thread Will Funnell
Hey guys, I created a patch based on your feedback. Let me know what you think. https://issues.apache.org/jira/browse/KAFKA-1977 On 20 February 2015 at 01:43, Joel Koshy wrote: > The log end offset (of a partition) changes when messages are appended > to the partition. (It is not correlated w

Re: Consuming a snapshot from log compacted topic

2015-02-19 Thread Joel Koshy
The log end offset (of a partition) changes when messages are appended to the partition. (It is not correlated with the consumer's offset). On Thu, Feb 19, 2015 at 08:58:10PM +, Will Funnell wrote: > So at what point does the log end offset change? When you commit? > > On 19 February 2015 at

Re: Consuming a snapshot from log compacted topic

2015-02-19 Thread Will Funnell
So at what point does the log end offset change? When you commit? On 19 February 2015 at 18:47, Joel Koshy wrote: > > If I consumed up to the log end offset and log compaction happens in > > between, I would have missed some messages. > > Compaction actually only runs on the rolled over segments

Re: Consuming a snapshot from log compacted topic

2015-02-19 Thread Joel Koshy
> If I consumed up to the log end offset and log compaction happens in > between, I would have missed some messages. Compaction actually only runs on the rolled over segments (not the active - i.e., latest segment). The log-end-offset will be in the latest segment which does not participate in com

Re: Consuming a snapshot from log compacted topic

2015-02-19 Thread Will Funnell
> The log end offset is just the end of the committed messages in the log > (the last thing the consumer has access to). It isn't the same as the > cleaner point but is always later than it so it would work just as well. Isn't this just roughly the same value as using c.getOffsetsBefore() with a p

Re: Consuming a snapshot from log compacted topic

2015-02-19 Thread Jay Kreps
The log end offset is just the end of the committed messages in the log (the last thing the consumer has access to). It isn't the same as the cleaner point but is always later than it so it would work just as well. -Jay On Thu, Feb 19, 2015 at 8:54 AM, Will Funnell wrote: > > I'm not sure if I

Re: Consuming a snapshot from log compacted topic

2015-02-19 Thread Will Funnell
> I'm not sure if I misunderstood Jay's suggestion, but I think it is > along the lines of: we expose the log-end-offset (actually the high > watermark) of the partition in the fetch response. However, this is > not exposed to the consumer (either in the new ConsumerRecord class > or the existing M

Re: Consuming a snapshot from log compacted topic

2015-02-18 Thread Jay Kreps
Yeah I was thinking either along the lines Joel was suggesting or else adding a logEndOffset(TopicPartition) method or something like that. As Joel says the consumer actually has this information internally (we return it with the fetch request) but doesn't expose it. -Jay On Wed, Feb 18, 2015 at

Re: Consuming a snapshot from log compacted topic

2015-02-18 Thread Joel Koshy
> > 2. Make the log end offset available more easily in the consumer. > > Was thinking something would need to be added in LogCleanerManager, in the > updateCheckpoints function. Where would be best to publish the information > to make it more easily available, or would you just expose the > offse

Re: Consuming a snapshot from log compacted topic

2015-02-18 Thread Will Funnell
> Do you have to separate the snapshot from the "normal" update flow. We are trying to avoid using another datasource if possible to have one source of truth. > I think what you are saying is that you want to create a snapshot from the > Kafka topic but NOT do continual reads after that point. Fo

Re: Consuming a snapshot from log compacted topic

2015-02-18 Thread Joel Koshy
> You are also correct and perceptive to notice that if you check the end of > the log then begin consuming and read up to that point compaction may have > already kicked in (if the reading takes a while) and hence you might have > an incomplete snapshot. Isn't it sufficient to just repeat the che

Re: Consuming a snapshot from log compacted topic

2015-02-18 Thread Jay Kreps
If you catch up off a compacted topic and keep consuming then you will become consistent with the log. I think what you are saying is that you want to create a snapshot from the Kafka topic but NOT do continual reads after that point. For example you might be creating a backup of the data to a fil

Re: Consuming a snapshot from log compacted topic

2015-02-18 Thread svante karlsson
Do you have to separate the snapshot from the "normal" update flow. I've used a compacting kafka topic as the source of truth to a solr database and fed the topic both with real time updates and "snapshots" from a hive job. This worked very well. The nice point is that there is a seamless transiti

Consuming a snapshot from log compacted topic

2015-02-18 Thread Will Funnell
We are currently using Kafka 0.8.1.1 with log compaction in order to provide streams of messages to our clients. As well as constantly consuming the stream, one of our use cases is to provide a snapshot, meaning the user will receive a copy of every message at least once. Each one of these messag