Looks good to me. Any opinions on changing the LayerManager to keep the layers in a Deque rather than a LinkedList. I think it would only require a change to the following method:
public final BloomFilter get(int depth) Performance will be the same as the Deque can be a LinkedList. This is more about how any custom downstream code is currently using the collection of layers. Alex On Wed, 17 Apr 2024 at 10:00, Claude Warren <cla...@xenei.com> wrote: > I have an open pull request to fix this problem. I could use another > review: https://github.com/apache/commons-collections/pull/476 > > > On Tue, Apr 9, 2024 at 11:29 AM Claude Warren <cla...@xenei.com> wrote: > > > Alex, > > > > I like your solution. To answer your question. We create a BloomFilter > > that has a timestamp associated with it. When the timestamp is greater > > than System.currentTimeMillis() the filter is removed. The custom > cleanup > > calls Cleanup.removeEmptyTarget().andThen(<timestampCleanup>) > > > > I think that creating a cleanup() or clean() method on the > > LayeredBloomFilter is the appropriate solution and that it should call > > cleanup() on the LayerManager. (so 2 new methods, one exposed). > > > > The next() method is used when external circumstances dictate that a new > > layer should be created. I think a StableBloomFilter I implemented > > required it, but I do not have the code to hand at the moment. > > > > Claude > > > > > > On Tue, Apr 9, 2024 at 10:38 AM Alex Herbert <alex.d.herb...@gmail.com> > > wrote: > > > >> Hi Claude, > >> > >> Q. What is your current clean-up filter, i.e. > >> the Consumer<LinkedList<BloomFilter>>? I assume you are using a custom > >> one. > >> > >> The current collections code only has 2 functional implementations. One > >> will remove the newest filter if it is empty, one will remove the oldest > >> filters until the size is below a limit. Since neither of those will > >> iterate the list and purge stale objects then I assume you are using a > >> custom clean-up filter. So you had to have created the layer manager > with > >> your custom filter. Assuming this then there are at least two solutions > >> for > >> the current code: > >> > >> 1. The current implementation always calls the clean-up filter with the > >> same LinkedList since it is final. So you can capture the list and do > what > >> you want with it: > >> > >> @SuppressWarnings("rawtypes") > >> LinkedList[] captured = new LinkedList[1]; > >> Consumer<LinkedList<BloomFilter>> cleanup = list -> { > >> captured[0] = list; > >> // ... do clean-up > >> }; > >> > >> // Once you have captured the list, you can clean it when you > >> want: > >> // unchecked conversion > >> cleanup.accept(captured[0]); > >> > >> Obviously this is not ideal as you have to manage the captured list to > >> call > >> cleanup. But it delivers exactly what you require in terms of being able > >> to > >> call cleanup at any time. > >> > >> 2. The call to next() will clean the layers but also add a new layer. So > >> your custom clean method could clean stale objects and also any empty > >> filters not at the end of the list. This will avoid building up lots of > >> empty filters when you frequently trigger next() to purge stale filters. > >> You can call next() directly on the LayeredBloomFilter. I do not know > what > >> extend check you are using so there is some management to be done with > the > >> other settings of the LayerManager to avoid removing any functional > layers > >> which are currently empty. > >> > >> -- > >> > >> As to exposing the LayerManager and adding a clean() method to the > >> LayerManager, I think this is not in keeping with the current design. > The > >> LayerManager is used during construction and then never used again. So > >> functionality to act on the layers is public through the > >> LayeredBloomFilter > >> (e.g. calling next()). So perhaps the change to the API should be to > add a > >> cleanup() method to LayeredBloomFilter. This does the same as next(), > but > >> does not add a new layer. > >> > >> I cannot recall the use case for next() in the LayeredBloomFilter. Would > >> the addition of cleanup() make the next() method redundant? > >> > >> -- > >> > >> Note: The typing against LinkedList could be updated to java.util.Deque. > >> The only issue with this is the method: > >> public final BloomFilter get(int depth) > >> > >> This is not supported by the Deque interface. However the LinkedList > >> implementation of get(int) will use the iterator from the start or end > of > >> the list (whichever is closer) to find the element. This can use the > >> iterator/descendingIterator method of Deque for the same performance > (but > >> the code to do this has to be written). > >> > >> Alex > >> > >> > >> On Tue, 9 Apr 2024 at 08:45, Claude Warren <cla...@xenei.com> wrote: > >> > >> > Greetings, > >> > > >> > I am attempting to use the Bloomfilter code in Kafka to manage PID > >> > generation. The requirement is to remove pid tracking after a period > of > >> > time. This is possible with the LayeredBloomFilter but it has an edge > >> case > >> > problem. > >> > > >> > The LayeredBloomFilter uses the LayerManager to manage the filters > that > >> > comprise the layers of the LayerdBloomFilter. > >> > The LayerManager uses a Consumer<LinkedList<BloomFilter>> called > >> > filterCleanup to remove old layers. > >> > The filterCleanup is only called when a new layer is added to the > >> layered > >> > filter. > >> > > >> > This solution works well in the general case where data is flowing > >> through > >> > the layered filter. However if nothing is added to the filter, > >> > filterCleanup is not called. > >> > > >> > In the Kafka case we have a LayeredBloomFilter for PIDs for each > >> producer. > >> > As long as a producer is producing PIDs the filter gets updated. > >> > > >> > However, if a producer drops from the system or goes offline for a > >> period > >> > of time, then they will no longer be producing PIDs and their old > >> expired > >> > data will remain. > >> > > >> > We want to remove the producer from the collection when there are no > >> more > >> > PIDs being tracked. > >> > > >> > I think this can be solved by adding a clean() method to the > >> LayerManager > >> > that simply calls the existing filterCleanup. > >> > It would be easier to access this method if the LayeredBloomFilter > had a > >> > method to return the LayerManager that was passed in the constructor. > >> > > >> > Does anyone see any issues with this approach? Are there other > >> solutions > >> > to be had? > >> > > >> > Questions and comments welcomed. > >> > -- > >> > LinkedIn: http://www.linkedin.com/in/claudewarren > >> > > >> > > > > > > -- > > LinkedIn: http://www.linkedin.com/in/claudewarren > > > > > -- > LinkedIn: http://www.linkedin.com/in/claudewarren >