On Tue, Jan 3, 2017 at 11:01 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> On Tue, Jan 3, 2017 at 6:14 PM, Dong Lin <lindon...@gmail.com> wrote:
>
> > Hey Ewen,
> >
> > Thanks for the review. As Radai explained, it would be complex in terms
> of
> > user configuration if we were to use committed offset to decide data
> > deletion. We need a way to specify which groups need to consume data of
> > this partition. The broker will also need to consume the entire offsets
> > topic in that approach which has some overhead. I don't think it is that
> > hard to implement. But it will likely take more time to discuss that
> > approach due to the new config and the server side overhead.
> >
> > We choose to put this API in AdminClient because the API is more like an
> > administrative operation (such as listGroups, deleteTopics) than a
> consumer
> > operation. It is not necessarily called by consumer only. For example, we
> > can implement the "delete data before committed offset" approach by
> running
> > an external service which calls purgeDataBefore() API based on committed
> > offset of consumer groups.
> >
> > I am not aware that AdminClient is not a public API. Suppose it is not
> > public now, I assume we plan to make it public in the future as part of
> > KIP-4. Are we not making it public because its interface is not stable?
> If
> > so, can we just tag this new API as not stable in the code?
> >
>
>
> The AdminClient planned for KIP-4 is a new Java-based implementation. It's
> definitely confusing that both will be (could be?) named AdminClient, but
> we've kept the existing Scala AdminClient out of the public API and have
> not required KIPs for changes to it.
>
> That said, I agree that if this type of API makes it into Kafka, having a
> (new, Java-based) AdminClient method would definitely be a good idea. An
> alternative path might be to have a Consumer-based implementation since
> that seems like a very intuitive, natural way to use the protocol. I think
> optimizing for the expected use case would be a good idea.
>
> -Ewen
>
> Are you saying that the Scala AdminClient is not a public API and we
discourage addition of any new feature to this class?

I still prefer to add it to AdminClient (Java version in the future and
Scala version in the short team) because I feel it belongs to admin
operation instead of KafkaConsumer interface. For example, if in the future
we implement the "delete data before committed offset" strategy in an
external service, I feel it is a bit awkward if the service has to
instantiate a KafkaConsumer and call KafkaConsumer.purgeDataBefore(...) to
purge data. In other words, our expected use-case doesn't necessarily bind
this API with consumer.

I am not strong on this issue. Let's see what other committers/developers
think about this.


>
> >
> > Thanks,
> > Dong
> >
> > On Tue, Jan 3, 2017 at 3:56 PM, Ewen Cheslack-Postava <e...@confluent.io
> >
> > wrote:
> >
> > > Dong,
> > >
> > > Looks like that's an internal link,
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-107%
> > > 3A+Add+purgeDataBefore%28%29+API+in+AdminClient
> > > is the right one.
> > >
> > > I have a question about one of the rejected alternatives:
> > >
> > > > Using committed offset instead of an extra API to trigger data purge
> > > operation.
> > >
> > > The KIP says this would be more complicated to implement. Why is that?
> I
> > > think brokers would have to consume the entire offsets topic, but the
> > data
> > > stored in memory doesn't seem to change and applying this when updated
> > > offsets are seen seems basically the same. It might also be possible to
> > > make it work even with multiple consumer groups if that was desired
> > > (although that'd require tracking more data in memory) as a
> > generalization
> > > without requiring coordination between the consumer groups. Given the
> > > motivation, I'm assuming this was considered unnecessary since this
> > > specifically targets intermediate stream processing topics.
> > >
> > > Another question is why expose this via AdminClient (which isn't public
> > API
> > > afaik)? Why not, for example, expose it on the Consumer, which is
> > > presumably where you'd want access to it since the functionality
> depends
> > on
> > > the consumer actually having consumed the data?
> > >
> > > -Ewen
> > >
> > > On Tue, Jan 3, 2017 at 2:45 PM, Dong Lin <lindon...@gmail.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > We created KIP-107 to propose addition of purgeDataBefore() API in
> > > > AdminClient.
> > > >
> > > > Please find the KIP wiki in the link https://iwww.corp.linkedin.
> > > > com/wiki/cf/display/ENGS/Kafka+purgeDataBefore%28%29+API+
> > > design+proposal.
> > > > We
> > > > would love to hear your comments and suggestions.
> > > >
> > > > Thanks,
> > > > Dong
> > > >
> > >
> >
>

Reply via email to