I have an open pull request to fix this problem. I could use another review: https://github.com/apache/commons-collections/pull/476
On Tue, Apr 9, 2024 at 11:29 AM Claude Warren <cla...@xenei.com> wrote: > Alex, > > I like your solution. To answer your question. We create a BloomFilter > that has a timestamp associated with it. When the timestamp is greater > than System.currentTimeMillis() the filter is removed. The custom cleanup > calls Cleanup.removeEmptyTarget().andThen(<timestampCleanup>) > > I think that creating a cleanup() or clean() method on the > LayeredBloomFilter is the appropriate solution and that it should call > cleanup() on the LayerManager. (so 2 new methods, one exposed). > > The next() method is used when external circumstances dictate that a new > layer should be created. I think a StableBloomFilter I implemented > required it, but I do not have the code to hand at the moment. > > Claude > > > On Tue, Apr 9, 2024 at 10:38 AM Alex Herbert <alex.d.herb...@gmail.com> > wrote: > >> Hi Claude, >> >> Q. What is your current clean-up filter, i.e. >> the Consumer<LinkedList<BloomFilter>>? I assume you are using a custom >> one. >> >> The current collections code only has 2 functional implementations. One >> will remove the newest filter if it is empty, one will remove the oldest >> filters until the size is below a limit. Since neither of those will >> iterate the list and purge stale objects then I assume you are using a >> custom clean-up filter. So you had to have created the layer manager with >> your custom filter. Assuming this then there are at least two solutions >> for >> the current code: >> >> 1. The current implementation always calls the clean-up filter with the >> same LinkedList since it is final. So you can capture the list and do what >> you want with it: >> >> @SuppressWarnings("rawtypes") >> LinkedList[] captured = new LinkedList[1]; >> Consumer<LinkedList<BloomFilter>> cleanup = list -> { >> captured[0] = list; >> // ... do clean-up >> }; >> >> // Once you have captured the list, you can clean it when you >> want: >> // unchecked conversion >> cleanup.accept(captured[0]); >> >> Obviously this is not ideal as you have to manage the captured list to >> call >> cleanup. But it delivers exactly what you require in terms of being able >> to >> call cleanup at any time. >> >> 2. The call to next() will clean the layers but also add a new layer. So >> your custom clean method could clean stale objects and also any empty >> filters not at the end of the list. This will avoid building up lots of >> empty filters when you frequently trigger next() to purge stale filters. >> You can call next() directly on the LayeredBloomFilter. I do not know what >> extend check you are using so there is some management to be done with the >> other settings of the LayerManager to avoid removing any functional layers >> which are currently empty. >> >> -- >> >> As to exposing the LayerManager and adding a clean() method to the >> LayerManager, I think this is not in keeping with the current design. The >> LayerManager is used during construction and then never used again. So >> functionality to act on the layers is public through the >> LayeredBloomFilter >> (e.g. calling next()). So perhaps the change to the API should be to add a >> cleanup() method to LayeredBloomFilter. This does the same as next(), but >> does not add a new layer. >> >> I cannot recall the use case for next() in the LayeredBloomFilter. Would >> the addition of cleanup() make the next() method redundant? >> >> -- >> >> Note: The typing against LinkedList could be updated to java.util.Deque. >> The only issue with this is the method: >> public final BloomFilter get(int depth) >> >> This is not supported by the Deque interface. However the LinkedList >> implementation of get(int) will use the iterator from the start or end of >> the list (whichever is closer) to find the element. This can use the >> iterator/descendingIterator method of Deque for the same performance (but >> the code to do this has to be written). >> >> Alex >> >> >> On Tue, 9 Apr 2024 at 08:45, Claude Warren <cla...@xenei.com> wrote: >> >> > Greetings, >> > >> > I am attempting to use the Bloomfilter code in Kafka to manage PID >> > generation. The requirement is to remove pid tracking after a period of >> > time. This is possible with the LayeredBloomFilter but it has an edge >> case >> > problem. >> > >> > The LayeredBloomFilter uses the LayerManager to manage the filters that >> > comprise the layers of the LayerdBloomFilter. >> > The LayerManager uses a Consumer<LinkedList<BloomFilter>> called >> > filterCleanup to remove old layers. >> > The filterCleanup is only called when a new layer is added to the >> layered >> > filter. >> > >> > This solution works well in the general case where data is flowing >> through >> > the layered filter. However if nothing is added to the filter, >> > filterCleanup is not called. >> > >> > In the Kafka case we have a LayeredBloomFilter for PIDs for each >> producer. >> > As long as a producer is producing PIDs the filter gets updated. >> > >> > However, if a producer drops from the system or goes offline for a >> period >> > of time, then they will no longer be producing PIDs and their old >> expired >> > data will remain. >> > >> > We want to remove the producer from the collection when there are no >> more >> > PIDs being tracked. >> > >> > I think this can be solved by adding a clean() method to the >> LayerManager >> > that simply calls the existing filterCleanup. >> > It would be easier to access this method if the LayeredBloomFilter had a >> > method to return the LayerManager that was passed in the constructor. >> > >> > Does anyone see any issues with this approach? Are there other >> solutions >> > to be had? >> > >> > Questions and comments welcomed. >> > -- >> > LinkedIn: http://www.linkedin.com/in/claudewarren >> > >> > > > -- > LinkedIn: http://www.linkedin.com/in/claudewarren > -- LinkedIn: http://www.linkedin.com/in/claudewarren