.... and Spark's implementation is another good reason to allow compaction lag.
I'm convinced :) We need to decide: 1) Do we need just .ms config, or anything else? consumer lag is measured (and monitored) in messages, so if we need this feature to somehow work in tandem with consumer lag monitoring, I think we need .messages too. 2) Does this new configuration allows us to get rid of cleaner.ratio config? Gwen On Tue, May 17, 2016 at 9:43 AM, Eric Wasserman <eric.wasser...@gmail.com> wrote: > James, > > Your pictures do an excellent job of illustrating my point. > > My mention of the additional "10's of minutes to hours" refers to how far > after the original target checkpoint (T1 in your diagram) on may need to go > to get to a checkpoint where all partitions of all topics are in the > uncompacted region of their respective logs. In terms of your diagram: the T3 > transaction could have been written 10's of minutes to hours after T1 as that > was how much time it took all readers to get to T1. > >> You would not have to start over from the beginning in order to read to T3. > > While I agree this is technically true, in practice it could be very onerous > to actually do it. For example, we use the Kafka consumer that is part of the > Spark Streaming library to read table topics. It accepts a range of offsets > to read for each partition. Say we originally target ranges from offset 0 to > the offset of T1 for each topic+partition. There really is no way to have the > library arrive at T1 an then "keep going" to T3. What is worse, given Spark's > design, if you lost a worker during your calculations you would be in a > rather sticky position. Spark achieves resiliency not by data redundancy but > by keeping track of how to reproduce the transformations leading to a state. > In the face of a lost worker, Spark would try to re-read that portion of the > data on the lost worker from Kafka. However, in the interim compaction may > have moved past the reproducible checkpoint (T3) rendering the data > inconsistent. At best the entire calculation would need to start over > targeting some later transaction checkpoint. > > Needless to say with the proposed feature everything is quite simple. As long > as we set the compaction lag large enough we can be assured that T1 will > remain in the uncompacted region an thereby be reproducible. Thus reading > from 0 to the offsets in T1 will be sufficient for the duration of the > calculation. > > Eric > >