>
> - It seems that the consumer will need to write log.retention.min.timestamp
> periodically to zookeeper as dynamic configuration of the topic, so that
> broker can pick up log.retention.min.timestamp. However, this introduces
> dependency of consumer on zookeeper which is undesirable. Note that
Hey Bill,
I have some follow up questions after Jun's questions:
- It seems that the consumer will need to write log.retention.min.timestamp
periodically to zookeeper as dynamic configuration of the topic, so that
broker can pick up log.retention.min.timestamp. However, this introduces
dependency
Bill,
That's a good question. I am thinking of the following approach for
implementing trim(): (1) client issues metadata request to the broker to
determine the leader of topic/partitions and groups topic/partitions by the
leader broker; (2) client sends a TrimRequest to each broker with
partition
Hi Jun,
Those are valid concerns. For our particular use case, application events
triggering the timestamp update will never occur more than once an hour,
and we maintain a sliding window so that we don't delete messages too close
to what our consumers may be reading.
For more general use cases,
Hi, Bill,
Thanks for the proposal. Sorry for the late reply.
The motivation of the proposal makes sense: don't delete the messages until
the application tells you so.
I am wondering if the current proposal is the best way to address the need
though. There are couple of issues that I saw with the
+1.
On Fri, Oct 7, 2016 at 3:35 PM, Gwen Shapira wrote:
> +1 (binding)
>
> On Wed, Oct 5, 2016 at 1:55 PM, Bill Warshaw wrote:
> > Bumping for visibility. KIP is here:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+
> Add+timestamp-based+log+deletion+policy
> >
> > On Wed, Aug
+1 (binding)
On Wed, Oct 5, 2016 at 1:55 PM, Bill Warshaw wrote:
> Bumping for visibility. KIP is here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
>
> On Wed, Aug 24, 2016 at 2:32 PM Bill Warshaw wrote:
>
>> Hello Guozhang,
>>
>> KIP-71
Bumping for visibility. KIP is here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
On Wed, Aug 24, 2016 at 2:32 PM Bill Warshaw wrote:
> Hello Guozhang,
>
> KIP-71 seems unrelated to this KIP. KIP-47 is just adding a new deletion
> policy (m
Hello Guozhang,
KIP-71 seems unrelated to this KIP. KIP-47 is just adding a new deletion
policy (minimum timestamp), while KIP-71 is allowing deletion and
compaction to coexist.
They both will touch LogManager, but the change for KIP-47 is very isolated.
On Wed, Aug 24, 2016 at 2:21 PM Guozhang
Hi Bill,
I would like to reason if there is any correlation between this KIP and
KIP-71
https://cwiki.apache.org/confluence/display/KAFKA/KIP-71%3A+Enable+log+compaction+and+deletion+to+co-exist
I feel they are orthogonal but would like to double check with you.
Guozhang
On Wed, Aug 24, 2016
I'd like to re-awaken this voting thread now that KIP-33 has merged. This
KIP is now completely unblocked. I have a working branch off of trunk with
my proposed fix, including testing.
On Mon, May 9, 2016 at 8:30 PM Guozhang Wang wrote:
> Jay, Bill:
>
> I'm thinking of one general use case of
Jay, Bill:
I'm thinking of one general use case of using timestamp rather than offset
for log deletion, which is that for expiration handling in data
replication, when the source data store decides to expire some data records
based on their timestamps, today we need to configure the corresponding
Yes, I'd agree that offset is a more precise configuration than timestamp.
If there was a way to set a partition-level configuration, I would rather
use log.retention.min.offset than timestamp. If you have an approach in
mind I'd be open to investigating it.
On Mon, May 2, 2016 at 5:33 PM, Jay Kr
Gotcha, good point. But barring that limitation, you agree that that makes
more sense?
-Jay
On Mon, May 2, 2016 at 2:29 PM, Bill Warshaw wrote:
> The problem with offset as a config option is that offsets are
> partition-specific, so we'd need a per-partition config. This would work
> for our
The problem with offset as a config option is that offsets are
partition-specific, so we'd need a per-partition config. This would work
for our particular use case, where we have single-partition topics, but for
multiple-partition topics it would delete from all partitions based on a
global topic-
I think you are saying you considered a kind of trim() api that would
synchronously chop off the tail of the log starting from a given offset.
That would be one option, but what I was saying was slightly different: in
the proposal you have where there is a config that controls retention that
the us
1. Initially I looked at using the actual offset, by adding a call to
AdminUtils to just delete anything in a given topic/partition to a given
offset. I ran into a lot of trouble here trying to work out how the system
would recognize that every broker had successfully deleted that range from
the
Two comments:
1. Is there a reason to use physical time rather than offset? The idea
is for the consumer to say when it has consumed something so it can be
deleted, right? It seems like offset would be a much more precise way to do
this--i.e. the consumer says "I have checkpointed stat
Thanks. I'm +1 on this proposal given the comment above.
On Mon, May 2, 2016 at 9:34 AM, Bill Warshaw wrote:
> Yeah 1 and 2 could easily be combined into the same predicate.
>
> On Mon, May 2, 2016 at 10:27 AM, Guozhang Wang wrote:
>
> > Can we do 1 and 2 in one pass, and 3 in another pass? It
Yeah 1 and 2 could easily be combined into the same predicate.
On Mon, May 2, 2016 at 10:27 AM, Guozhang Wang wrote:
> Can we do 1 and 2 in one pass, and 3 in another pass? It may result in
> different results but semantically it should be acceptable. Arguably saving
> one pass on the segment li
Can we do 1 and 2 in one pass, and 3 in another pass? It may result in
different results but semantically it should be acceptable. Arguably saving
one pass on the segment list may not be huge, but if it is straight-forward
to do I'd suggest choose this option.
Guozhang
On Mon, May 2, 2016 at 7:
Conditions 1, 2 and 3 will all be checked sequentially. If any of the
three conditions is true, that segment will be deleted.
This is what it looks like in my commit:
https://github.com/apache/kafka/blob/a229462df567f91f76122668037e1bcbbbdff41b/core/src/main/scala/kafka/log/LogManager.scala#L423-
Thanks Bill.
Read through the KIP, LGTM overall. One clarification question:
With this KIP the LogManager's cleanup logic would be, for each segment
1) delete the segment if its last timestamp is < current timstamp -
log.retention.time (ms, minutes, hours, etc).
2) delete the segment if its last
I'd like to re-initiate the vote for KIP-47 now that KIP-33 has been
accepted and is in-progress. I've updated the KIP (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy).
I have a commit with the functionality for KIP-47 ready to go once KIP-33
is
For convenience, the KIP is here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy
Do you mind updating the KIP with time formats we plan on supporting
in the configuration?
On Wed, Mar 9, 2016 at 11:44 AM, Bill Warshaw wrote:
> Hello,
>
> I'd l
Hello,
I'd like to initiate the vote for KIP-47.
Thanks,
Bill Warshaw
26 matches
Mail list logo