wdwars...@gmail.com Bill Warshaw Thanks!
On Mon, Feb 1, 2016 at 1:08 AM, Gwen Shapira <g...@confluent.io> wrote: > What is your wiki user name? > > On Sat, Jan 30, 2016 at 11:35 PM, Bill Warshaw <bill.wars...@appian.com> > wrote: > > > Hello again, > > > > Is there anyone on this thread who has admin access to the Kafka > Confluence > > wiki? I want to create a KIP but I don't have permissions to actually > > create a page. > > > > Bill > > > > On Fri, Jan 22, 2016 at 3:29 PM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > Bill, > > > > > > Sounds good. If you want to drive pushing this feature, you can try to > > > first submit a KIP proposal: > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals > > > > > > This admin command may have some correlations with KIP-4: > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations > > > > > > Guozhang > > > > > > > > > > > > On Fri, Jan 22, 2016 at 10:58 AM, Bill Warshaw < > bill.wars...@appian.com> > > > wrote: > > > > > > > A function such as "deleteUpToOffset(TopicPartition tp, long > > > > minOffsetToRetain)" exposed through AdminUtils would be perfect. I > > would > > > > agree that a one-time admin tool would be a good fit for our use > case, > > as > > > > long as we can programmatically invoke it. I realize that isn't > > > completely > > > > trivial, since AdminUtils just updates Zookeeper metadata. > > > > > > > > On Thu, Jan 21, 2016 at 7:35 PM, Guozhang Wang <wangg...@gmail.com> > > > wrote: > > > > > > > > > Bill, > > > > > > > > > > For your case since once the log is cleaned up to the given offset > > > > > watermark (or threshold, whatever the name is), future cleaning > with > > > the > > > > > same watermark will effectively be a no-op, so I feel your scenario > > > will > > > > be > > > > > better fit as a one-time admin tool to cleanup the logs rather than > > > > > customizing the periodic cleaning policy. Does this sound > reasonable > > to > > > > > you? > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > On Wed, Jan 20, 2016 at 7:09 PM, Bill Warshaw < > > bill.wars...@appian.com > > > > > > > > > wrote: > > > > > > > > > > > For our particular use case, we would need to. This proposal is > > > really > > > > > two > > > > > > separate pieces: custom log compaction policy, and the ability > to > > > set > > > > > > arbitrary key-value pairs in a Topic configuration. > > > > > > > > > > > > I believe that Kafka's current behavior of throwing errors when > it > > > > > > encounters configuration keys that aren't defined is meant to > help > > > > users > > > > > > not misconfigure their configuration files. If that is the sole > > > > > motivation > > > > > > for it, I would propose adding a property namespace, and allow > > users > > > to > > > > > > configure arbitrary properties behind that particular namespace, > > > while > > > > > > still enforcing strict parsing for all other properties. > > > > > > > > > > > > On Wed, Jan 20, 2016 at 9:23 PM, Guozhang Wang < > wangg...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > So do you need to periodically update the key-value pairs to > > > "advance > > > > > the > > > > > > > threshold for each topic"? > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > On Wed, Jan 20, 2016 at 5:51 PM, Bill Warshaw < > > > > bill.wars...@appian.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Compaction would be performed in the same manner as it is > > > > currently. > > > > > > > There > > > > > > > > is a predicate applied in the "shouldRetainMessage" function > in > > > > > > > LogCleaner; > > > > > > > > ultimately we just want to be able to swap a custom > > > implementation > > > > of > > > > > > > that > > > > > > > > particular method in. Nothing else in the compaction > codepath > > > > would > > > > > > need > > > > > > > > to change. > > > > > > > > > > > > > > > > For advancing the "threshold transaction_id", ideally we > would > > be > > > > > able > > > > > > to > > > > > > > > set arbitrary key-value pairs on the topic configuration. We > > > have > > > > > > access > > > > > > > > to the topic configuration during log compaction, so a custom > > > > policy > > > > > > > class > > > > > > > > would also have access to that config, and could read > anything > > we > > > > > > stored > > > > > > > in > > > > > > > > there. > > > > > > > > > > > > > > > > On Wed, Jan 20, 2016 at 8:14 PM, Guozhang Wang < > > > wangg...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hello Bill, > > > > > > > > > > > > > > > > > > Just to clarify your use case, is your "log compaction" > > > executed > > > > > > > > manually, > > > > > > > > > or it is triggered periodically like the current log > cleaning > > > > > by-key > > > > > > > > does? > > > > > > > > > If it is the latter case, how will you advance the > "threshold > > > > > > > > > transaction_id" each time when it executes? > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jan 20, 2016 at 1:50 PM, Bill Warshaw < > > > > > > bill.wars...@appian.com > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Damian, I appreciate your quick response. > > > > > > > > > > > > > > > > > > > > Our transaction_id is incrementing for each transaction, > so > > > we > > > > > will > > > > > > > > only > > > > > > > > > > ever have one message in Kafka with a given > transaction_id. > > > We > > > > > > > thought > > > > > > > > > > about using a rolling counter that is incremented on each > > > > > > checkpoint > > > > > > > as > > > > > > > > > the > > > > > > > > > > key, and manually triggering compaction after the > > checkpoint > > > is > > > > > > > > complete, > > > > > > > > > > but our checkpoints are asynchronous. This means that we > > > would > > > > > > have > > > > > > > a > > > > > > > > > set > > > > > > > > > > of messages appended to the log after the checkpoint > > started, > > > > > with > > > > > > > > value > > > > > > > > > of > > > > > > > > > > the previous key + 1, that would also be compacted down > to > > a > > > > > single > > > > > > > > > entry. > > > > > > > > > > > > > > > > > > > > Our particular custom policy would delete all messages > > whose > > > > key > > > > > > was > > > > > > > > less > > > > > > > > > > than a given transaction_id that we passed in. I can > > > imagine a > > > > > > wide > > > > > > > > > > variety of other custom policies that could be used for > > > > retention > > > > > > > based > > > > > > > > > on > > > > > > > > > > the key and value of the message. > > > > > > > > > > > > > > > > > > > > On Wed, Jan 20, 2016 at 1:35 PM, Bill Warshaw < > > > > > > > bill.wars...@appian.com > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > I'm working on a team that is starting to use Kafka as > a > > > > > > > distributed > > > > > > > > > > > transaction log for a set of in-memory databases which > > can > > > be > > > > > > > > > replicated > > > > > > > > > > > across nodes. We decided to use Kafka instead of > > > Bookkeeper > > > > > for > > > > > > a > > > > > > > > > > variety > > > > > > > > > > > of reasons, but there are a couple spots where Kafka is > > > not a > > > > > > > perfect > > > > > > > > > > fit. > > > > > > > > > > > > > > > > > > > > > > The biggest issue facing us is deleting old > transactions > > > from > > > > > the > > > > > > > log > > > > > > > > > > > after checkpointing the database. We can't use any of > > the > > > > > > built-in > > > > > > > > > size > > > > > > > > > > or > > > > > > > > > > > time-based deletion mechanisms efficiently, because we > > > could > > > > > get > > > > > > > > > > ourselves > > > > > > > > > > > into a dangerous state where we're deleting > transactions > > > that > > > > > > > haven't > > > > > > > > > > been > > > > > > > > > > > checkpointed yet. The current approach we're looking > at > > is > > > > > > > rolling a > > > > > > > > > new > > > > > > > > > > > topic each time we checkpoint, and deleting the old > topic > > > > once > > > > > > all > > > > > > > > > > replicas > > > > > > > > > > > have consumed everything in it. > > > > > > > > > > > > > > > > > > > > > > Another idea we came up with is using a pluggable > > > compaction > > > > > > > policy; > > > > > > > > we > > > > > > > > > > > would set the message key as the offset or transaction > > id, > > > > and > > > > > > the > > > > > > > > > policy > > > > > > > > > > > would delete all messages with a key smaller than that > > id. > > > > > > > > > > > I took a stab at implementing the hook in Kafka for > > > pluggable > > > > > > > > > compaction > > > > > > > > > > > policies at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/compare/trunk...bill-warshaw:pluggable_compaction_policy > > > > > > > > > > > (rough implementation), and it seems fairly > > > straightforward. > > > > > One > > > > > > > > > problem > > > > > > > > > > > that we run into is that the custom policy class can > only > > > > > access > > > > > > > > > > > information that is defined in the configuration, and > the > > > > > > > > configuration > > > > > > > > > > > doesn't allow custom key-value pairs; if we wanted to > > pass > > > it > > > > > > > > > information > > > > > > > > > > > dynamically, we'd have to use some hack like calling > > > > Zookeeper > > > > > > from > > > > > > > > > > within > > > > > > > > > > > the class. > > > > > > > > > > > To get around this, my best idea is to add the ability > to > > > > > specify > > > > > > > > > > > arbitrary key-value pairs in the configuration, that > our > > > > client > > > > > > > could > > > > > > > > > use > > > > > > > > > > > to pass information to the custom policy. Does this > set > > > off > > > > > any > > > > > > > > alarm > > > > > > > > > > > bells for you guys? If so, are there other approaches > we > > > > could > > > > > > > take > > > > > > > > > that > > > > > > > > > > > come to mind? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your time, > > > > > > > > > > > Bill Warshaw > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > <http://appianworld.com> > > > > > > > > > > This message and any attachments are solely for the > > intended > > > > > > > recipient. > > > > > > > > > If > > > > > > > > > > you are not the intended recipient, disclosure, copying, > > use, > > > > or > > > > > > > > > > distribution of the information included in this message > is > > > > > > > prohibited > > > > > > > > -- > > > > > > > > > > please immediately and permanently delete this message. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > <http://appianworld.com> > > > > > > > > This message and any attachments are solely for the intended > > > > > recipient. > > > > > > > If > > > > > > > > you are not the intended recipient, disclosure, copying, use, > > or > > > > > > > > distribution of the information included in this message is > > > > > prohibited > > > > > > -- > > > > > > > > please immediately and permanently delete this message. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > -- > > > > > > <http://appianworld.com> > > > > > > This message and any attachments are solely for the intended > > > recipient. > > > > > If > > > > > > you are not the intended recipient, disclosure, copying, use, or > > > > > > distribution of the information included in this message is > > > prohibited > > > > -- > > > > > > please immediately and permanently delete this message. > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -- Guozhang > > > > > > > > > > > > > -- > > > > <http://appianworld.com> > > > > This message and any attachments are solely for the intended > recipient. > > > If > > > > you are not the intended recipient, disclosure, copying, use, or > > > > distribution of the information included in this message is > prohibited > > -- > > > > please immediately and permanently delete this message. > > > > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > > -- > > <http://appianworld.com> > > This message and any attachments are solely for the intended recipient. > If > > you are not the intended recipient, disclosure, copying, use, or > > distribution of the information included in this message is prohibited -- > > please immediately and permanently delete this message. > > > -- <http://appianworld.com> This message and any attachments are solely for the intended recipient. If you are not the intended recipient, disclosure, copying, use, or distribution of the information included in this message is prohibited -- please immediately and permanently delete this message.