Looks good to me.

Any opinions on changing the LayerManager to keep the layers in a Deque
rather than a LinkedList. I think it would only require a change to the
following method:

public final BloomFilter get(int depth)

Performance will be the same as the Deque can be a LinkedList. This is more
about how any custom downstream code is currently using the collection of
layers.

Alex

On Wed, 17 Apr 2024 at 10:00, Claude Warren <cla...@xenei.com> wrote:

> I have an open pull request to fix this problem.  I could use another
> review: https://github.com/apache/commons-collections/pull/476
>
>
> On Tue, Apr 9, 2024 at 11:29 AM Claude Warren <cla...@xenei.com> wrote:
>
> > Alex,
> >
> > I like your solution.  To answer your question. We create a BloomFilter
> > that has a timestamp associated with it.  When the timestamp is greater
> > than System.currentTimeMillis() the filter is removed.  The custom
> cleanup
> > calls Cleanup.removeEmptyTarget().andThen(<timestampCleanup>)
> >
> > I think that creating a cleanup() or clean() method on the
> > LayeredBloomFilter is the appropriate solution and that it should call
> > cleanup() on the LayerManager. (so 2 new methods, one exposed).
> >
> > The next() method is used when external circumstances dictate that a new
> > layer should be created.  I think a StableBloomFilter I implemented
> > required it,  but I do not have the code to hand at the moment.
> >
> > Claude
> >
> >
> > On Tue, Apr 9, 2024 at 10:38 AM Alex Herbert <alex.d.herb...@gmail.com>
> > wrote:
> >
> >> Hi Claude,
> >>
> >> Q. What is your current clean-up filter, i.e.
> >> the Consumer<LinkedList<BloomFilter>>? I assume you are using a custom
> >> one.
> >>
> >> The current collections code only has 2 functional implementations. One
> >> will remove the newest filter if it is empty, one will remove the oldest
> >> filters until the size is below a limit. Since neither of those will
> >> iterate the list and purge stale objects then I assume you are using a
> >> custom clean-up filter. So you had to have created the layer manager
> with
> >> your custom filter. Assuming this then there are at least two solutions
> >> for
> >> the current code:
> >>
> >> 1. The current implementation always calls the clean-up filter with the
> >> same LinkedList since it is final. So you can capture the list and do
> what
> >> you want with it:
> >>
> >>         @SuppressWarnings("rawtypes")
> >>         LinkedList[] captured = new LinkedList[1];
> >>         Consumer<LinkedList<BloomFilter>> cleanup = list -> {
> >>             captured[0] = list;
> >>             // ... do clean-up
> >>         };
> >>
> >>         // Once you have captured the list, you can clean it when you
> >> want:
> >>         // unchecked conversion
> >>         cleanup.accept(captured[0]);
> >>
> >> Obviously this is not ideal as you have to manage the captured list to
> >> call
> >> cleanup. But it delivers exactly what you require in terms of being able
> >> to
> >> call cleanup at any time.
> >>
> >> 2. The call to next() will clean the layers but also add a new layer. So
> >> your custom clean method could clean stale objects and also any empty
> >> filters not at the end of the list. This will avoid building up lots of
> >> empty filters when you frequently trigger next() to purge stale filters.
> >> You can call next() directly on the LayeredBloomFilter. I do not know
> what
> >> extend check you are using so there is some management to be done with
> the
> >> other settings of the LayerManager to avoid removing any functional
> layers
> >> which are currently empty.
> >>
> >> --
> >>
> >> As to exposing the LayerManager and adding a clean() method to the
> >> LayerManager, I think this is not in keeping with the current design.
> The
> >> LayerManager is used during construction and then never used again. So
> >> functionality to act on the layers is public through the
> >> LayeredBloomFilter
> >> (e.g. calling next()). So perhaps the change to the API should be to
> add a
> >> cleanup() method to LayeredBloomFilter. This does the same as next(),
> but
> >> does not add a new layer.
> >>
> >> I cannot recall the use case for next() in the LayeredBloomFilter. Would
> >> the addition of cleanup() make the next() method redundant?
> >>
> >> --
> >>
> >> Note: The typing against LinkedList could be updated to java.util.Deque.
> >> The only issue with this is the method:
> >> public final BloomFilter get(int depth)
> >>
> >> This is not supported by the Deque interface. However the LinkedList
> >> implementation of get(int) will use the iterator from the start or end
> of
> >> the list (whichever is closer) to find the element. This can use the
> >> iterator/descendingIterator method of Deque for the same performance
> (but
> >> the code to do this has to be written).
> >>
> >> Alex
> >>
> >>
> >> On Tue, 9 Apr 2024 at 08:45, Claude Warren <cla...@xenei.com> wrote:
> >>
> >> > Greetings,
> >> >
> >> > I am attempting to use the Bloomfilter code in Kafka to manage PID
> >> > generation.  The requirement is to remove pid tracking after a period
> of
> >> > time.  This is possible with the LayeredBloomFilter but it has an edge
> >> case
> >> > problem.
> >> >
> >> > The LayeredBloomFilter uses the LayerManager to manage the filters
> that
> >> > comprise the layers of the LayerdBloomFilter.
> >> > The LayerManager uses a Consumer<LinkedList<BloomFilter>> called
> >> > filterCleanup to remove old layers.
> >> > The filterCleanup is only called when a new layer is added to the
> >> layered
> >> > filter.
> >> >
> >> > This solution works well in the general case where data is flowing
> >> through
> >> > the layered filter.  However if nothing is added to the filter,
> >> > filterCleanup is not called.
> >> >
> >> > In the Kafka case we have a LayeredBloomFilter for PIDs for each
> >> producer.
> >> > As long as a producer is producing PIDs the filter gets updated.
> >> >
> >> > However, if a producer drops from the system or goes offline for a
> >> period
> >> > of time, then they will no longer be producing PIDs and their old
> >> expired
> >> > data will remain.
> >> >
> >> > We want to remove the producer from the collection when there are no
> >> more
> >> > PIDs being tracked.
> >> >
> >> > I think this can be solved by adding a clean() method to the
> >> LayerManager
> >> > that simply calls the existing filterCleanup.
> >> > It would be easier to access this method if the LayeredBloomFilter
> had a
> >> > method to return the LayerManager that was passed in the constructor.
> >> >
> >> > Does anyone see any issues with this approach?  Are there other
> >> solutions
> >> > to be had?
> >> >
> >> > Questions and comments welcomed.
> >> > --
> >> > LinkedIn: http://www.linkedin.com/in/claudewarren
> >> >
> >>
> >
> >
> > --
> > LinkedIn: http://www.linkedin.com/in/claudewarren
> >
>
>
> --
> LinkedIn: http://www.linkedin.com/in/claudewarren
>

Reply via email to