You should be good to go now. On Sun, Jan 31, 2016 at 10:09 PM, Bill Warshaw <bill.wars...@appian.com> wrote:
> wdwars...@gmail.com > Bill Warshaw > > Thanks! > > On Mon, Feb 1, 2016 at 1:08 AM, Gwen Shapira <g...@confluent.io> wrote: > > > What is your wiki user name? > > > > On Sat, Jan 30, 2016 at 11:35 PM, Bill Warshaw <bill.wars...@appian.com> > > wrote: > > > > > Hello again, > > > > > > Is there anyone on this thread who has admin access to the Kafka > > Confluence > > > wiki? I want to create a KIP but I don't have permissions to actually > > > create a page. > > > > > > Bill > > > > > > On Fri, Jan 22, 2016 at 3:29 PM, Guozhang Wang <wangg...@gmail.com> > > wrote: > > > > > > > Bill, > > > > > > > > Sounds good. If you want to drive pushing this feature, you can try > to > > > > first submit a KIP proposal: > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals > > > > > > > > This admin command may have some correlations with KIP-4: > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations > > > > > > > > Guozhang > > > > > > > > > > > > > > > > On Fri, Jan 22, 2016 at 10:58 AM, Bill Warshaw < > > bill.wars...@appian.com> > > > > wrote: > > > > > > > > > A function such as "deleteUpToOffset(TopicPartition tp, long > > > > > minOffsetToRetain)" exposed through AdminUtils would be perfect. I > > > would > > > > > agree that a one-time admin tool would be a good fit for our use > > case, > > > as > > > > > long as we can programmatically invoke it. I realize that isn't > > > > completely > > > > > trivial, since AdminUtils just updates Zookeeper metadata. > > > > > > > > > > On Thu, Jan 21, 2016 at 7:35 PM, Guozhang Wang <wangg...@gmail.com > > > > > > wrote: > > > > > > > > > > > Bill, > > > > > > > > > > > > For your case since once the log is cleaned up to the given > offset > > > > > > watermark (or threshold, whatever the name is), future cleaning > > with > > > > the > > > > > > same watermark will effectively be a no-op, so I feel your > scenario > > > > will > > > > > be > > > > > > better fit as a one-time admin tool to cleanup the logs rather > than > > > > > > customizing the periodic cleaning policy. Does this sound > > reasonable > > > to > > > > > > you? > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > On Wed, Jan 20, 2016 at 7:09 PM, Bill Warshaw < > > > bill.wars...@appian.com > > > > > > > > > > > wrote: > > > > > > > > > > > > > For our particular use case, we would need to. This proposal > is > > > > really > > > > > > two > > > > > > > separate pieces: custom log compaction policy, and the ability > > to > > > > set > > > > > > > arbitrary key-value pairs in a Topic configuration. > > > > > > > > > > > > > > I believe that Kafka's current behavior of throwing errors when > > it > > > > > > > encounters configuration keys that aren't defined is meant to > > help > > > > > users > > > > > > > not misconfigure their configuration files. If that is the > sole > > > > > > motivation > > > > > > > for it, I would propose adding a property namespace, and allow > > > users > > > > to > > > > > > > configure arbitrary properties behind that particular > namespace, > > > > while > > > > > > > still enforcing strict parsing for all other properties. > > > > > > > > > > > > > > On Wed, Jan 20, 2016 at 9:23 PM, Guozhang Wang < > > wangg...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > So do you need to periodically update the key-value pairs to > > > > "advance > > > > > > the > > > > > > > > threshold for each topic"? > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > On Wed, Jan 20, 2016 at 5:51 PM, Bill Warshaw < > > > > > bill.wars...@appian.com > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Compaction would be performed in the same manner as it is > > > > > currently. > > > > > > > > There > > > > > > > > > is a predicate applied in the "shouldRetainMessage" > function > > in > > > > > > > > LogCleaner; > > > > > > > > > ultimately we just want to be able to swap a custom > > > > implementation > > > > > of > > > > > > > > that > > > > > > > > > particular method in. Nothing else in the compaction > > codepath > > > > > would > > > > > > > need > > > > > > > > > to change. > > > > > > > > > > > > > > > > > > For advancing the "threshold transaction_id", ideally we > > would > > > be > > > > > > able > > > > > > > to > > > > > > > > > set arbitrary key-value pairs on the topic configuration. > We > > > > have > > > > > > > access > > > > > > > > > to the topic configuration during log compaction, so a > custom > > > > > policy > > > > > > > > class > > > > > > > > > would also have access to that config, and could read > > anything > > > we > > > > > > > stored > > > > > > > > in > > > > > > > > > there. > > > > > > > > > > > > > > > > > > On Wed, Jan 20, 2016 at 8:14 PM, Guozhang Wang < > > > > wangg...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hello Bill, > > > > > > > > > > > > > > > > > > > > Just to clarify your use case, is your "log compaction" > > > > executed > > > > > > > > > manually, > > > > > > > > > > or it is triggered periodically like the current log > > cleaning > > > > > > by-key > > > > > > > > > does? > > > > > > > > > > If it is the latter case, how will you advance the > > "threshold > > > > > > > > > > transaction_id" each time when it executes? > > > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jan 20, 2016 at 1:50 PM, Bill Warshaw < > > > > > > > bill.wars...@appian.com > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Damian, I appreciate your quick response. > > > > > > > > > > > > > > > > > > > > > > Our transaction_id is incrementing for each > transaction, > > so > > > > we > > > > > > will > > > > > > > > > only > > > > > > > > > > > ever have one message in Kafka with a given > > transaction_id. > > > > We > > > > > > > > thought > > > > > > > > > > > about using a rolling counter that is incremented on > each > > > > > > > checkpoint > > > > > > > > as > > > > > > > > > > the > > > > > > > > > > > key, and manually triggering compaction after the > > > checkpoint > > > > is > > > > > > > > > complete, > > > > > > > > > > > but our checkpoints are asynchronous. This means that > we > > > > would > > > > > > > have > > > > > > > > a > > > > > > > > > > set > > > > > > > > > > > of messages appended to the log after the checkpoint > > > started, > > > > > > with > > > > > > > > > value > > > > > > > > > > of > > > > > > > > > > > the previous key + 1, that would also be compacted down > > to > > > a > > > > > > single > > > > > > > > > > entry. > > > > > > > > > > > > > > > > > > > > > > Our particular custom policy would delete all messages > > > whose > > > > > key > > > > > > > was > > > > > > > > > less > > > > > > > > > > > than a given transaction_id that we passed in. I can > > > > imagine a > > > > > > > wide > > > > > > > > > > > variety of other custom policies that could be used for > > > > > retention > > > > > > > > based > > > > > > > > > > on > > > > > > > > > > > the key and value of the message. > > > > > > > > > > > > > > > > > > > > > > On Wed, Jan 20, 2016 at 1:35 PM, Bill Warshaw < > > > > > > > > bill.wars...@appian.com > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > I'm working on a team that is starting to use Kafka > as > > a > > > > > > > > distributed > > > > > > > > > > > > transaction log for a set of in-memory databases > which > > > can > > > > be > > > > > > > > > > replicated > > > > > > > > > > > > across nodes. We decided to use Kafka instead of > > > > Bookkeeper > > > > > > for > > > > > > > a > > > > > > > > > > > variety > > > > > > > > > > > > of reasons, but there are a couple spots where Kafka > is > > > > not a > > > > > > > > perfect > > > > > > > > > > > fit. > > > > > > > > > > > > > > > > > > > > > > > > The biggest issue facing us is deleting old > > transactions > > > > from > > > > > > the > > > > > > > > log > > > > > > > > > > > > after checkpointing the database. We can't use any > of > > > the > > > > > > > built-in > > > > > > > > > > size > > > > > > > > > > > or > > > > > > > > > > > > time-based deletion mechanisms efficiently, because > we > > > > could > > > > > > get > > > > > > > > > > > ourselves > > > > > > > > > > > > into a dangerous state where we're deleting > > transactions > > > > that > > > > > > > > haven't > > > > > > > > > > > been > > > > > > > > > > > > checkpointed yet. The current approach we're looking > > at > > > is > > > > > > > > rolling a > > > > > > > > > > new > > > > > > > > > > > > topic each time we checkpoint, and deleting the old > > topic > > > > > once > > > > > > > all > > > > > > > > > > > replicas > > > > > > > > > > > > have consumed everything in it. > > > > > > > > > > > > > > > > > > > > > > > > Another idea we came up with is using a pluggable > > > > compaction > > > > > > > > policy; > > > > > > > > > we > > > > > > > > > > > > would set the message key as the offset or > transaction > > > id, > > > > > and > > > > > > > the > > > > > > > > > > policy > > > > > > > > > > > > would delete all messages with a key smaller than > that > > > id. > > > > > > > > > > > > I took a stab at implementing the hook in Kafka for > > > > pluggable > > > > > > > > > > compaction > > > > > > > > > > > > policies at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/kafka/compare/trunk...bill-warshaw:pluggable_compaction_policy > > > > > > > > > > > > (rough implementation), and it seems fairly > > > > straightforward. > > > > > > One > > > > > > > > > > problem > > > > > > > > > > > > that we run into is that the custom policy class can > > only > > > > > > access > > > > > > > > > > > > information that is defined in the configuration, and > > the > > > > > > > > > configuration > > > > > > > > > > > > doesn't allow custom key-value pairs; if we wanted to > > > pass > > > > it > > > > > > > > > > information > > > > > > > > > > > > dynamically, we'd have to use some hack like calling > > > > > Zookeeper > > > > > > > from > > > > > > > > > > > within > > > > > > > > > > > > the class. > > > > > > > > > > > > To get around this, my best idea is to add the > ability > > to > > > > > > specify > > > > > > > > > > > > arbitrary key-value pairs in the configuration, that > > our > > > > > client > > > > > > > > could > > > > > > > > > > use > > > > > > > > > > > > to pass information to the custom policy. Does this > > set > > > > off > > > > > > any > > > > > > > > > alarm > > > > > > > > > > > > bells for you guys? If so, are there other > approaches > > we > > > > > could > > > > > > > > take > > > > > > > > > > that > > > > > > > > > > > > come to mind? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for your time, > > > > > > > > > > > > Bill Warshaw > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > <http://appianworld.com> > > > > > > > > > > > This message and any attachments are solely for the > > > intended > > > > > > > > recipient. > > > > > > > > > > If > > > > > > > > > > > you are not the intended recipient, disclosure, > copying, > > > use, > > > > > or > > > > > > > > > > > distribution of the information included in this > message > > is > > > > > > > > prohibited > > > > > > > > > -- > > > > > > > > > > > please immediately and permanently delete this message. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > <http://appianworld.com> > > > > > > > > > This message and any attachments are solely for the > intended > > > > > > recipient. > > > > > > > > If > > > > > > > > > you are not the intended recipient, disclosure, copying, > use, > > > or > > > > > > > > > distribution of the information included in this message is > > > > > > prohibited > > > > > > > -- > > > > > > > > > please immediately and permanently delete this message. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > <http://appianworld.com> > > > > > > > This message and any attachments are solely for the intended > > > > recipient. > > > > > > If > > > > > > > you are not the intended recipient, disclosure, copying, use, > or > > > > > > > distribution of the information included in this message is > > > > prohibited > > > > > -- > > > > > > > please immediately and permanently delete this message. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > -- Guozhang > > > > > > > > > > > > > > > > -- > > > > > <http://appianworld.com> > > > > > This message and any attachments are solely for the intended > > recipient. > > > > If > > > > > you are not the intended recipient, disclosure, copying, use, or > > > > > distribution of the information included in this message is > > prohibited > > > -- > > > > > please immediately and permanently delete this message. > > > > > > > > > > > > > > > > > > > > > -- > > > > -- Guozhang > > > > > > > > > > -- > > > <http://appianworld.com> > > > This message and any attachments are solely for the intended recipient. > > If > > > you are not the intended recipient, disclosure, copying, use, or > > > distribution of the information included in this message is prohibited > -- > > > please immediately and permanently delete this message. > > > > > > > -- > <http://appianworld.com> > This message and any attachments are solely for the intended recipient. If > you are not the intended recipient, disclosure, copying, use, or > distribution of the information included in this message is prohibited -- > please immediately and permanently delete this message. >