On Tue, Jan 3, 2017 at 11:01 PM, Ewen Cheslack-Postava <e...@confluent.io> wrote:
> On Tue, Jan 3, 2017 at 6:14 PM, Dong Lin <lindon...@gmail.com> wrote: > > > Hey Ewen, > > > > Thanks for the review. As Radai explained, it would be complex in terms > of > > user configuration if we were to use committed offset to decide data > > deletion. We need a way to specify which groups need to consume data of > > this partition. The broker will also need to consume the entire offsets > > topic in that approach which has some overhead. I don't think it is that > > hard to implement. But it will likely take more time to discuss that > > approach due to the new config and the server side overhead. > > > > We choose to put this API in AdminClient because the API is more like an > > administrative operation (such as listGroups, deleteTopics) than a > consumer > > operation. It is not necessarily called by consumer only. For example, we > > can implement the "delete data before committed offset" approach by > running > > an external service which calls purgeDataBefore() API based on committed > > offset of consumer groups. > > > > I am not aware that AdminClient is not a public API. Suppose it is not > > public now, I assume we plan to make it public in the future as part of > > KIP-4. Are we not making it public because its interface is not stable? > If > > so, can we just tag this new API as not stable in the code? > > > > > The AdminClient planned for KIP-4 is a new Java-based implementation. It's > definitely confusing that both will be (could be?) named AdminClient, but > we've kept the existing Scala AdminClient out of the public API and have > not required KIPs for changes to it. > > That said, I agree that if this type of API makes it into Kafka, having a > (new, Java-based) AdminClient method would definitely be a good idea. An > alternative path might be to have a Consumer-based implementation since > that seems like a very intuitive, natural way to use the protocol. I think > optimizing for the expected use case would be a good idea. > > -Ewen > > Are you saying that the Scala AdminClient is not a public API and we discourage addition of any new feature to this class? I still prefer to add it to AdminClient (Java version in the future and Scala version in the short team) because I feel it belongs to admin operation instead of KafkaConsumer interface. For example, if in the future we implement the "delete data before committed offset" strategy in an external service, I feel it is a bit awkward if the service has to instantiate a KafkaConsumer and call KafkaConsumer.purgeDataBefore(...) to purge data. In other words, our expected use-case doesn't necessarily bind this API with consumer. I am not strong on this issue. Let's see what other committers/developers think about this. > > > > > Thanks, > > Dong > > > > On Tue, Jan 3, 2017 at 3:56 PM, Ewen Cheslack-Postava <e...@confluent.io > > > > wrote: > > > > > Dong, > > > > > > Looks like that's an internal link, > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-107% > > > 3A+Add+purgeDataBefore%28%29+API+in+AdminClient > > > is the right one. > > > > > > I have a question about one of the rejected alternatives: > > > > > > > Using committed offset instead of an extra API to trigger data purge > > > operation. > > > > > > The KIP says this would be more complicated to implement. Why is that? > I > > > think brokers would have to consume the entire offsets topic, but the > > data > > > stored in memory doesn't seem to change and applying this when updated > > > offsets are seen seems basically the same. It might also be possible to > > > make it work even with multiple consumer groups if that was desired > > > (although that'd require tracking more data in memory) as a > > generalization > > > without requiring coordination between the consumer groups. Given the > > > motivation, I'm assuming this was considered unnecessary since this > > > specifically targets intermediate stream processing topics. > > > > > > Another question is why expose this via AdminClient (which isn't public > > API > > > afaik)? Why not, for example, expose it on the Consumer, which is > > > presumably where you'd want access to it since the functionality > depends > > on > > > the consumer actually having consumed the data? > > > > > > -Ewen > > > > > > On Tue, Jan 3, 2017 at 2:45 PM, Dong Lin <lindon...@gmail.com> wrote: > > > > > > > Hi all, > > > > > > > > We created KIP-107 to propose addition of purgeDataBefore() API in > > > > AdminClient. > > > > > > > > Please find the KIP wiki in the link https://iwww.corp.linkedin. > > > > com/wiki/cf/display/ENGS/Kafka+purgeDataBefore%28%29+API+ > > > design+proposal. > > > > We > > > > would love to hear your comments and suggestions. > > > > > > > > Thanks, > > > > Dong > > > > > > > > > >