I agree that log.cleaner.min.compaction.lag.ms gives slightly more
flexibility for potentially-lagging consumers than tuning
segment.roll.ms for the exact same scenario.

If more people think that the use-case of "consumer which must see
every single record, is running on a compacted topic, and is lagging
enough that tuning segment.roll.ms won't help" is important enough
that we need to address, I won't object to proceeding with the KIP
(i.e. I'm probably -0 on this). It is easy to come up with a scenario
in which a feature is helpful (heck, I do it all the time), I'm just
not sure there is a real problem that cannot be addressed using
Kafka's existing behavior.

I do think that it will be an excellent idea to revisit the log
compaction configurations and see whether they make sense to users.
For example, if "log.cleaner.min.compaction.lag.X" can replace
"log.cleaner.min.cleanable.ratio" as an easier-to-tune alternative,
I'll be more excited about the replacement, even without a strong
use-case for a specific compaction lag.

Gwen

On Mon, May 16, 2016 at 7:46 PM, Jay Kreps <j...@confluent.io> wrote:
> I think it would be good to hammer out some of the practical use cases--I
> definitely share your disdain for adding more configs. Here is my sort of
> theoretical understanding of why you might want this.
>
> As you say a consumer bootstrapping itself in the compacted part of the log
> isn't actually traversing through valid states globally. i.e. if you have
> written the following:
>   offset, key, value
>   0, k0, v0
>   1, k1, v1
>   2, k0, v2
> it could be compacted to
>   1, k1, v1
>   2, k0, v2
> Thus at offset 1 in the compacted log, you would have applied k1, but not
> k0. So even though k0 and k1 both have valid values they get applied out of
> order. This is totally normal, there is obviously no way to both compact
> and retain every valid state.
>
> For many things this is a non-issue since they treat items only on a
> per-key basis without any global notion of consistency.
>
> But let's say you want to guarantee you only traverse valid states in a
> caught-up real-time consumer, how can you do this? It's actually a bit
> tough. Generally speaking since we don't compact the active segment a
> real-time consumer should have this property but this doesn't really give a
> hard SLA. With a small segment size and a lagging consumer you could
> imagine the compactor potentially getting ahead of the consumer.
>
> So effectively what this config would establish is a guarantee that as long
> as you consume all messages in log.cleaner.min.compaction.lag.ms you will
> get every single produced record.
>
> -Jay
>
>
>
>
>
> On Mon, May 16, 2016 at 6:42 PM, Gwen Shapira <g...@confluent.io> wrote:
>
>> Hi Eric,
>>
>> Thank you for submitting this improvement suggestion.
>>
>> Do you mind clarifying the use-case for me?
>>
>> Looking at your gist:
>> https://gist.github.com/ewasserman/f8c892c2e7a9cf26ee46
>>
>> If my consumer started reading all the CDC topics from the very
>> beginning in which they were created, without ever stopping, it is
>> obviously guaranteed to see every single consistent state of the
>> database.
>> If my consumer joined late (lets say after Tq got clobbered by Tr) it
>> will get a mixed state, but if it will continue listening on those
>> topics, always following the logs to their end, it is guaranteed to
>> see a consistent state as soon a new transaction commits. Am I missing
>> anything?
>>
>> Basically, I do not understand why you claim: "However, to recover all
>> the tables at the same checkpoint, with each independently compacting,
>> one may need to move to an even more recent checkpoint when a
>> different table had the same read issue with the new checkpoint. Thus
>> one could never be assured of this process terminating."
>>
>> I mean, it is true that you need to continuously read forward in order
>> to get to a consistent state, but why can't you be assured of getting
>> there?
>>
>> We are doing something very similar in KafkaConnect, where we need a
>> consistent view of our configuration. We make sure that if the current
>> state is inconsistent (i.e there is data that are not "committed"
>> yet), we continue reading to the log end until we get to a consistent
>> state.
>>
>> I am not convinced the new functionality is necessary, or even helpful.
>>
>> Gwen
>>
>> On Mon, May 16, 2016 at 4:07 PM, Eric Wasserman
>> <eric.wasser...@gmail.com> wrote:
>> > I would like to begin discussion on KIP-58
>> >
>> > The KIP is here:
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-58+-+Make+Log+Compaction+Point+Configurable
>> >
>> > Jira: https://issues.apache.org/jira/browse/KAFKA-1981
>> >
>> > Pull Request: https://github.com/apache/kafka/pull/1168
>> >
>> > Thanks,
>> >
>> > Eric
>>

Reply via email to