So what we have then is: *public* *interface* BloomFilter {
*int* andCardinality(BloomFilter other); *int* cardinality(); *boolean* contains(BloomFilter other); *boolean* contains(Hasher hasher); *long*[] getBits(); // Change PrimitiveIterator.OfInt iterator(); Shape getShape(); * // Change boolean* merge(BloomFilter other); *// Change boolean* merge(Hasher hasher); *int* orCardinality(BloomFilter other); *int* xorCardinality(BloomFilter other); } public interface BitCountConsumer { boolean accept(int index, int count); } public interface CountingBloomFilter extends BloomFilter { *// Change boolean* remove(BloomFilter other); // Change *boolean* remove(Hasher hasher); // Change boolean add(CountingBloomFilter other); // Change boolean subtract(CountingBloomFilter other); // Change void getCounts(BitCountConsumer consumer); } *public* *final* *class* StaticHasher *implements* Hasher { *public* StaticHasher(*final* Hasher hasher, *final* Shape shape) { ... } *public* StaticHasher(*final* Iterator<Integer> iter, *final* Shape shape) { ... } *public* StaticHasher(*final* Collection<Integer> collection, *final* Shape shape) { ... } *public* Shape getShape() { *... *} *public* *int* size() { *... *} *// Change public* *int* max() { *... *} } *public* *final* *class* FixedHasher *implements* Hasher { *public* FixedHasher(*final* Hasher hasher, *final* Shape shape) { ... } *public* FixedHasher(*final* Iterator<Integer> iter, *final* Shape shape) { ... } *public* FixedHasher(*final* Collection<Integer> collection, *final* Shape shape) { ... } *public* Shape getShape() { *... *} *public* *int* size() { *... *} *public* *int* max() { *... *} } I have a pull request in to add a CachingHasher. While we are at it, I think we should change Hasher.getBits(Shape) to Hasher.iterator(Shape) to match the BloomFilter.iterator() and to limit confusion between Hasher.getBits( Shape ) (return iterator) and BloomFilter.getBits() returns (long[]) The interface changes look good to me. I agree with your implementation ideas. I do have one use case where I need the BloomFilter.iterator() results in reverse order (I am estimating Log2 of the filter), but I can do that with the methods here. I don't think any of these changes significantly impacts the uses that I currently have. Shall we move forward? Claude On Mon, Mar 2, 2020 at 6:02 PM Alex Herbert <alex.d.herb...@gmail.com> wrote: > > On 02/03/2020 16:12, Claude Warren wrote: > > Does getCounts() return a snapshot of the values when the call was made > or > > does it return values that may be updated during the retrieval. If there > > are 2 threads (one reading counts and one doing a merge) it seems to me > > that the "iterate over the data without constructing objects" approach > > means that the data may not be internally consistent. But then we don't > > really check that issue in the other methods either. So let's go with > the > > fail fast BitCountConsumer approach. > > My assumption on this is that no filter is safe for concurrent use. > Allowing a merge while reading the state is a concurrency issue. I don't > think we want to go down the route of having fail-fast concurrency > checking using a 'modified count' strategy, i.e. all operations save the > current 'modified count' at the start and increment it at the end, if it > changes before the end then there has been concurrent modification. It > would be simpler to state that no filters are designed to be used > concurrently. > > > > > On the StaticHasher it is used for two purposes: > > > > 1) getting a list of enabled bits. > > 2) creating a bloom filter from a list of enabled bits. There was a > > request early on for the ability to test membership in Bloom filter > without > > having to construct a bloom filter. Basically, the developer has a list > of > > values and wants to check membership. > > I think the issue is that for 1 and 2 there are a few scenarios with > different requirements since the list of enabled bits can be small or big. > > One feature of the StaticHasher is the requirement to remove duplicates > and return a sorted order. This allows for an efficient storage form > factor with the intension that it may be around for a while. But it is > an overhead when creating a large one from a known set of non-duplicate > bits, i.e. a Bloom filter. > > For (2) if you want to do a dynamic check to test membership then you > use BloomFilter.contains(Hasher). I would pass an on-the-fly Hasher for > this and not construct the entire set of indices into a StaticHasher > anyway. > > This would be done using the DynamicHasher to build hashes on-the-fly, > which may have duplicates. For the use case where you already know the > bit indices but do not want to create a BloomFilter or a StaticHasher > then perhaps we need another Hasher that is a wrapper around an int[] > that allows duplicates. > > // Returns an unordered set of values that may contain duplicates. > > static Hasher wrap(final int[] values, final Shape shape) { > Objects.requireNonNull(values); > Objects.requireNonNull(shape); > return new Hasher() { > @Override > public OfInt getBits(Shape sameShape) { > if (!shape.equals(sameShape)) { > throw new IllegalArgumentException(); > } > return Arrays.stream(values).iterator(); > } > > @Override > public HashFunctionIdentity getHashFunctionIdentity() { > return shape.getHashFunctionIdentity(); > } > > @Override > public boolean isEmpty() { > return values.length == 0; > } > }; > } > > > > It might make more sense to reimplement the StaticHasher using a long[] > > internally (like the BloomFilter.getBits() method returns). We have the > > shape so we know the maximum length and we could trim it after loading to > > remove trailing zero value longs. This would remove the necessity of > > converting the iterator into a set of some sort. We can do the limit > > checks quickly as we turn bits on. > For small numbers of indices the long[] format is likely to be less > space efficient. > > > > Makes me think we might need to implement StandardBloomFilter to use > long[] > > as well. > > That is just the BitSetBloomFilter. So if you use this format in the > StaticHasher then you have no advantage over actually creating a > BitSetBloomFilter. > > The StaticHasher is returned by the BloomFilter interface. You also have > a method to return the bits using the long[] format. So there is > redundancy here. To drop the redundancy I think you separate the uses > for the canonical representation of the entire filter as a long[] and > the uses for the set bits. If the later is satisfied by the StaticHasher > then it should be replaced by an iterator instead. > > Looking at the code the getHasher() method in BloomFilter is only used > in HasherBloomFilter. This is to add more enabled bits to the current > set it contains. It seems that this use case is better served by using a > Set as the backing storage. > > This leaves the StaticHasher with no usage in the current library. It is > not as fast as wrapping a provided array. It offers no expansion of > storage for use in the HasherBloomFilter. It does however provide the > smallest form for a compact BloomFilter representation. So it could be > renamed to ImmutableHasher. > > I would think it best to consolidate all of this: > > - StaticHasher to stay as an immutable instance of sorted bit indices. > The intention is an efficient compact form for a few bits of a > BloomFilter than can be merged efficiently. It could be used with > pre-exisiting indices but would have to eliminate duplicates and sort them. > - Remove the redundant StaticHasher constructor accepting a StaticHasher > instance since they are immutable. > - Possibly add a method to StaticHasher to merge another StaticHasher to > create a new instance. > - Possibly add a method to StaticHasher to get the maximum index in the > set (it already has size()). This helps convert it to a long[] or any > other class that will call getBits() and use the iterator output. > > - BloomFilter to remove the getHasher() method. The long[] getBits() > method to be documented as returning the canonical format of the filter. > The getHasher() method is just duplicating this information in another > format. > - BloomFilter to have an iterator() method to get a > PrimitiveIterator.OfInt. This replaces the getHasher() method which just > provided access to the Shape and the PrimitiveIterator.OfInt of the > Hasher. BloomFilter already has getShape() so it just needs an iterator. > > - HasherBloomFilter to use the iterator() method when merging a generic > BloomFilter. It can detect instances of HasherBloomFilter and switch to > a merge operation of its internal representation. > - HasherBloomFilter to use different backing storage (e.g. a > Set<Integer>) or adapt StaticHasher to be more efficient for merge. > > - Add a new class akin to StaticHasher that allows duplicates. Document > it as a fast way to create a few bits of a BloomFilter. > > You then have: > > Hasher > > - StaticHasher - immutable set of ordered non-duplicate bits. Can be > merged with another instance. > > - DynamicHasher - generates bits on the fly using a hashing function > > - FixedHasher - an array of indices. Can wrap an input array or be > created using constructors similar to StaticHasher but without the > guarantees of non-duplicates. > > > Lots to think about there. > > > > > Claude > > > > On Mon, Mar 2, 2020 at 1:12 PM Alex Herbert <alex.d.herb...@gmail.com> > > wrote: > > > >> On 02/03/2020 11:32, Claude Warren wrote: > >>> my thought on changing the BloomFilter.merge() to return a boolean is > >> along > >>> the lines you had: return true on successful merge (even if there are > no > >>> changes in the enabled bits). And yes, for most cases the standard > bloom > >>> filter will return true, but the change is really to support extensions > >> to > >>> Bloom filter so I think it is reasonable. > >>> > >>> As for the getCounts(). Suppose we split it into two methods: > >>> > >>> // return the indexes that are enabled. This is equivalent to > >>> CountingBloomFilter.getHasher().getBits( CountingBloomFilter.getShape() > >> ); > >>> Iterator<int> getIndexes() > >>> // get the count for the specific index. > >>> int getCount( int index ); > >>> > >>> With these methods It becomes possible to construct an iterator of > int[] > >> or > >>> Map.Entry<Integer,Integer> or whatever else the developer wants. > >> Having to call getCount() for every index is likely to be slower than > >> all the garbage collection involved with iterating a disposable <index, > >> count> pair. For an array backed storage the access will be order(1). > >> But other storage may be a lot slower. This method ideally should > >> traverse the storage once and make each <index, count> pair available. > >> Ideally this would be without creating each <index, count> pair as a > >> disposable object. > >> > >> Following on from previously, another disadvantage of the consumer > >> approach is the lack of fast fail. You can remedy this by adding a > >> boolean return from the consumer to indicate if you want to consume more > >> items: > >> > >> interface BitCountConsumer { > >> boolean accept(int index, int count); > >> } > >> > >> A CountingBloomFilter implementation then has something like: > >> > >> void getCounts(BitCountConsumer consumer) { > >> while(hasNext()) { > >> next(); > >> if (!consumer.accept(index(), count()) { > >> break; > >> } > >> } > >> } > >> > >> Looking at the getHasher().getBits(...) idea I noticed that StaticHasher > >> is constructed using: > >> > >> Iterator<Integer> iter > >> > >> not: > >> > >> PrimitiveIterator.OfInt > >> > >> I'd prefer the primitive specialisation in this constructor. It would > >> break the HasherBloomFilter merge function but that can be fixed. > >> However it may be redundant. The StaticHasher has some oddities with the > >> available constructors: > >> > >> public StaticHasher(final StaticHasher hasher, final Shape shape) > >> > >> - why have a shape parameter? The constructor checks the shape is the > >> same as that for the hasher. So why not just a plain copy constructor: > >> > >> public StaticHasher(final StaticHasher hasher) > >> > >> This constructor is only used in the unit tests. Given the StaticHasher > >> is immutable then a copy constructor seems odd. I think it should be > >> dropped. > >> > >> > >> Here are is a constructor that is missing IMO: > >> > >> public StaticHasher(final Set<Integer> indices, final Shape shape) > >> > >> - Could be used to generically create a static hasher. The indices are > >> ensured to be unique by the Set but should be checked to be within the > >> shape. > >> > >> > >> With these constructors the public StaticHasher(final Iterator<Integer> > >> iter, final Shape shape) is only used by the BitSetBloomFilter: > >> > >> public StaticHasher getHasher() { > >> return new StaticHasher(bitSet.stream().iterator(), > getShape()); > >> } > >> > >> Becomes: > >> > >> public StaticHasher getHasher() { > >> return new > >> > StaticHasher(bitSet.stream().boxed().collect(Collectors.toCollection(TreeSet::new)), > >> > >> getShape()); > >> } > >> > >> This is a very roundabout way to construct a StaticHasher. Perhaps a > >> constructor accepting a BitSet would be useful? > >> > >> So the API only requires: > >> > >> public StaticHasher(final Hasher hasher, final Shape shape) > >> public StaticHasher(final Set<Integer> indices, final Shape shape) > >> // Maybe > >> public StaticHasher(final BitSet indices, final Shape shape) > >> > >> > >> I wondered about a merge function for the StaticHasher: > >> > >> public StaticHasher merge(final StaticHasher other) > >> > >> The merge function can make some assumptions about the two arrays to > >> merge as the class is final and the values are verified to be sorted and > >> within the Shape. However there is not a use for it other than in the > >> merge of two HasherBloomFilters, which is not a recommended operation > >> anyway. > >> > >> > >>> Claude > >>> > >>> On Mon, Mar 2, 2020 at 10:48 AM Alex Herbert <alex.d.herb...@gmail.com > > > >>> wrote: > >>> > >>>> On 02/03/2020 09:38, Claude Warren wrote: > >>>>> It is not too late to update the BloomFIlter interface to have the > >> merge > >>>>> return a boolean. The incorrect Shape would still throw an > exception, > >> so > >>>>> the return value would only come into play if the bits could not be > >> set. > >>>>> thoughts? > >>>> I don't see the harm in it. But what would the return value be for? > >>>> > >>>> For a standard collection it would be if the collection was changed by > >>>> the operation: > >>>> > >>>> Collection.add/remove return "true if this collection changed as a > >>>> result of the call" > >>>> > >>>> So here is the equivalent: > >>>> > >>>> return "true if this filter was changed as a result of the call" > >>>> > >>>> This is computationally slow to track. It also is confusing if the > >>>> filter was successfully merged but no bits were changed to then return > >>>> false because the filter was actually incorporated. So it would go > along > >>>> the lines that we discussed for the counting Bloom filter: > >>>> > >>>> return "true if this filter was successfully merged as a result of the > >>>> call" > >>>> > >>>> For most cases in the current library it would be true when an > exception > >>>> is not thrown. However the merge of the counting Bloom filter may have > >>>> reason to return false, e.g. overflow. > >>>> > >>>>> On Mon, Mar 2, 2020 at 7:56 AM Claude Warren <cla...@xenei.com> > wrote: > >>>>> > >>>>>> for the remove(), add(), and subtract() methods I agree that void is > >> not > >>>>>> correct and it should be boolean and be the same as the value you > >> would > >>>> get > >>>>>> from calling isValid(). > >>>>>> > >>>>>> You are correct the getCounts() should return an iterator of some > type > >>>> on > >>>>>> int[], I don't know why I thought long[]. I am happy with a plain > >>>>>> Iterator<int[]> as the return. > >>>> For the getCounts() method I am still looking for a way around having > to > >>>> create an <index, count> pair for everything in the filter. An > >>>> alternative to an iterator is to use the consumer idea. Given there is > >>>> no primitive specialisation of BiConsumer<T, U> in JDK 8 functions we > >>>> define our own: > >>>> > >>>> interface BitCountConsumer { > >>>> void accept(int index, int count); > >>>> } > >>>> > >>>> The CountingBloomFilter then has: > >>>> > >>>> void forEachCount(BitCountConsumer consumer); > >>>> // Or > >>>> void getCounts(BitCountConsumer consumer); > >>>> > >>>> > >>>> You can then directly pass the counts from the backing storage to the > >>>> destination. > >>>> > >>>> Advantages: > >>>> - No object creation > >>>> Disadvantages > >>>> - The counts cannot be streamed > >>>> > >>>> An alternative is to provide an Iterator of long with the index and > >>>> count packed into a long with methods to extract them: > >>>> > >>>> PrimativeIterator.OfLong getCounts(); > >>>> > >>>> default static int getIndex(long indexCountPair) { > >>>> return (int) (indexCountPair >>> 32); > >>>> } > >>>> > >>>> default static int getCount(long indexCountPair) { > >>>> return (int) indexCountPair; > >>>> } > >>>> > >>>> This will almost certainly be a cause for bugs/issues from users. > >>>> > >>>> I believe that the counts will be used for 2 main use cases: > >>>> > >>>> 1. Storage > >>>> > >>>> 2. Adding to another counting Bloom filter > >>>> > >>>> Both cases are likely to be done serially and not in parallel. So > >>>> providing a consumer based API to receive the counts would work. > >>>> > >>>> WDYT? > >>>> > >>>>>> Claude > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Mon, Mar 2, 2020 at 1:02 AM Alex Herbert < > alex.d.herb...@gmail.com > >>>>>> wrote: > >>>>>> > >>>>>>>> On 1 Mar 2020, at 15:39, Claude Warren <cla...@xenei.com> wrote: > >>>>>>>> > >>>>>>>> I think the CountingBloomFilter interface needs to extend > >> BloomFilter. > >>>>>>> I said that but did not write it, sorry. > >>>>>>> > >>>>>>>> I think I am confused. > >>>>>>>> > >>>>>>>> I would expect CountingBloomFilter to have > >>>>>>>> > >>>>>>>> interface CountingBloomFilter extends BloomFilter { > >>>>>>>> // these 2 methods are the reverse of merge() > >>>>>>>> void remove( BloomFilter ); > >>>>>>>> void remove( Hasher ); > >>>>>>> Fine. Same intention but different method names. But why void? It > >>>> forces > >>>>>>> you to check if the remove was valid with a second call. On the > plus > >>>> side > >>>>>>> it matches the void merge(…) methods and in most cases a user would > >> not > >>>>>>> care to check anyway. If they are controlling the filter then leave > >> it > >>>> to > >>>>>>> them to make sure they do not remove something they did not add. > >>>>>>> > >>>>>>>> // these 2 methods are the addition/subtraction of counts > >>>>>>>> void add( CountingBloomFilter ) > >>>>>>>> void subtract( CountingBloomFilter ); > >>>>>>> Fine. But same comment as above with the void return. > >>>>>>> > >>>>>>>> // 2 methods to retrieve data > >>>>>>>> Stream<long[]> getCounts(); > >>>>>>> I don’t like this use of long[]. In my previous post I argued that > if > >>>> you > >>>>>>> were to ever want to store more than max integer items in a filter > >>>> then the > >>>>>>> Bloom filter would have more bit indices than max integer. So we > >> never > >>>> have > >>>>>>> to support long counts. A filter that exceeds max integer for a > count > >>>> is > >>>>>>> highly likely to be saturated and no use as a filter anyway. > >>>>>>> > >>>>>>> For most backing implementations the object type of the stream will > >> be > >>>>>>> different so you will have to write a Spliterator<T> implementation > >> or > >>>> wrap > >>>>>>> some iterator anyway. So why not return the Spliterator: > >>>>>>> > >>>>>>> Spliterator<int[]> getCounts(); > >>>>>>> > >>>>>>> Since the backing implementation will likely not store int[] pairs > >> then > >>>>>>> this will have a lot of object creation and garbage collection > >>>> overhead to > >>>>>>> go through the counts. This does not seem to be a big concern here > if > >>>> the > >>>>>>> purpose is the same as for the BloomFilter for long[] getBits(), > i.e. > >>>> to > >>>>>>> get a canonical representation for storage. > >>>>>>> > >>>>>>> Note: The Spliterator may not have a known size (number of non zero > >>>> bits) > >>>>>>> at creation, for example if the counts are stored in a fixed size > >>>> array. > >>>>>>> Thus efficient parallel traversal by binary tree splitting is > limited > >>>> by > >>>>>>> how evenly the counts are distributed. For a backing implementation > >>>> using a > >>>>>>> collection then the size should be known. In this case a > Spliterator > >>>> would > >>>>>>> be of more use than a plain Iterator. You can convert one to the > >> other > >>>>>>> using: > >>>>>>> > >>>>>>> java.util.Spliterators: > >>>>>>> public static<T> Iterator<T> iterator(Spliterator<? extends T> > >>>>>>> spliterator) > >>>>>>> public static <T> Spliterator<T> spliterator(Iterator<? extends T> > >>>>>>> iterator, > >>>>>>> long size, > >>>>>>> int > >> characteristics) > >>>>>>> So which to choose for the API? > >>>>>>> > >>>>>>> The Hasher currently uses an Iterator: > >>>>>>> > >>>>>>> PrimitiveIterator.OfInt getBits(Shape shape); > >>>>>>> > >>>>>>> In the case of a StaticHasher this could return a spliterator. But > >> the > >>>>>>> DynamicHasher needs a reworking of the internal Iterator class. It > >>>> could be > >>>>>>> a Spliterator to use the new IntConsumer API but in most (all) > cases > >>>>>>> splitting would not be possible for dynamic hashing as the parts > are > >>>>>>> produced in order. It is likely that they will be consumed > >>>> sequentially too. > >>>>>>> I would suggest that Spliterator is the more modern implementation, > >>>>>>> despite not always being applicable to parallelisation in a stream. > >>>>>>> Currently the iterator from the Hasher is used in > forEachRemaining() > >>>> and > >>>>>>> while loop is approximately equal measure. The while loops are for > a > >>>> fast > >>>>>>> exit and would be uglier if rewritten for a > >>>>>>> Spliterator.tryAdvance(IntConsumer) syntax. > >>>>>>> > >>>>>>> There is a use of the IteratorChain in HasherBloomFilter that would > >>>> need > >>>>>>> a rethink for spliterators. > >>>>>>> > >>>>>>> The path of least resistance is to use Iterator<int[]> for the API > of > >>>>>>> CountingBloomFilter to be consistent with Hasher’s use of Iterator. > >>>>>>> > >>>>>>> WDYT? > >>>>>>> > >>>>>>> > >>>>>>>> boolean isValid() > >>>>>>> Fine. Allows some level of feedback that the counts are corrupt. > >>>>>>> > >>>>>>>> } > >>>>>>>> > >>>>>>>> Claude > >>>>>>>> > >>>>>>>> > >>>>>>>> On Sun, Mar 1, 2020 at 2:48 PM Alex Herbert < > >> alex.d.herb...@gmail.com > >>>>>>> <mailto:alex.d.herb...@gmail.com>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>>> On 1 Mar 2020, at 09:28, Claude Warren <cla...@xenei.com> > wrote: > >>>>>>>>>> > >>>>>>>>>> The idea of a backing array is fine and the only problem I see > >> with > >>>>>>> it is > >>>>>>>>>> in very large filters (on the order of 10^8 bits and larger) but > >>>>>>> document > >>>>>>>>>> the size calculation and let the developer worry about it. > >>>>>>>>> Let us look at the use case where we max out the array. Using the > >>>> Bloom > >>>>>>>>> filter calculator: > >>>>>>>>> > >>>>>>>>> n = 149,363,281 > >>>>>>>>> p = 0.001000025 (1 in 1000) > >>>>>>>>> m = 2147483647 (256MiB) > >>>>>>>>> k = 10 > >>>>>>>>> > >>>>>>>>> n = 74,681,641 > >>>>>>>>> p = 0.000001 (1 in 999950) > >>>>>>>>> m = 2147483647 (256MiB) > >>>>>>>>> k = 20 > >>>>>>>>> > >>>>>>>>> n = 49,787,761 > >>>>>>>>> p = 0.000000001 (1 in 999924899) > >>>>>>>>> m = 2147483647 (256MiB) > >>>>>>>>> k = 30 > >>>>>>>>> > >>>>>>>>> So you will be able to put somewhere in the order of 10^8 or 10^7 > >>>> items > >>>>>>>>> into the filter. I would say that anyone putting more than that > >> into > >>>>>>> the > >>>>>>>>> filter has an unusual use case. The CountingBloomFilter can throw > >> an > >>>>>>>>> exception if m is too large and will throw an OutOfMemoryError if > >> you > >>>>>>>>> cannot allocate an array large enough. > >>>>>>>>> > >>>>>>>>> One clear point here is that you cannot use a short as a 16-bit > >> count > >>>>>>>>> would easily overflow. So you have to use an integer array for > the > >>>>>>> counts. > >>>>>>>>> A maximum length int[] is roughly 8GB. > >>>>>>>>> > >>>>>>>>> What would another implementation cost in terms of memory? The > >>>>>>>>> TreeMap<Integer, Integer> was the most space efficient. In the > >>>> previous > >>>>>>>>> e-mail the saturation of a Bloom filter bits was approximately > 50% > >>>>>>> when at > >>>>>>>>> the intended capacity. So we have to estimate the size of a > TreeMap > >>>>>>>>> containing Integer.MAX_VALUE/2 indices ~ 2^30. The memory test > >> shows > >>>>>>> the > >>>>>>>>> TreeMap memory scales linearly with entries: > >>>>>>>>> > >>>>>>>>> 32768 / 65536 (0.500) : TreeMap<Integer, Integer> = > 1834061 > >>>>>>> bytes > >>>>>>>>> 65536 / 131072 (0.500) : TreeMap<Integer, Integer> = > 3669080 > >>>>>>> bytes > >>>>>>>>> 131072 / 262144 (0.500) : TreeMap<Integer, Integer> = > 7339090 > >>>>>>> bytes > >>>>>>>>> So what is the memory for a TreeMap with 2^30 indices. I make it > >>>> about: > >>>>>>>>> (2^30 / 131,072) * 7,339,090 bytes ~ 6e10 bytes = 55.99 GB > >>>>>>>>> > >>>>>>>>> I would say that this amount of RAM is unusual. It is definitely > >> not > >>>> as > >>>>>>>>> efficient as using an array. So very large counting Bloom filters > >> are > >>>>>>> going > >>>>>>>>> to require some thought as to the hardware they run on. This may > >> not > >>>>>>> be the > >>>>>>>>> case in 10 years time. > >>>>>>>>> > >>>>>>>>> I would say that we try an int[] backing array for the storage > >>>>>>>>> implementation and document it’s limitations. A different > >>>>>>> implementation > >>>>>>>>> could be provided in future if required. > >>>>>>>>> > >>>>>>>>> This could be done by making CountingBloomFilter an interface > that > >>>>>>> extends > >>>>>>>>> BloomFilter with the methods: > >>>>>>>>> > >>>>>>>>> subtract(BloomFilter filter) > >>>>>>>>> subtract(Hasher filter) > >>>>>>>>> > >>>>>>>>> These will negate the effect of the corresponding > >> merge(BloomFilter) > >>>>>>>>> operation. > >>>>>>>>> > >>>>>>>>> Do we also need access to the counts and add/subtract of another > >>>>>>>>> CountingBloomFilter?: > >>>>>>>>> > >>>>>>>>> add(CountingBloomFilter filter); > >>>>>>>>> subtract(CountingBloomFilter filter); > >>>>>>>>> > >>>>>>>>> Iterator<int[]> getCounts(); > >>>>>>>>> int getSize(); // Number of items added > >>>>>>>>> > >>>>>>>>> The CountingBloomFilter is then an interface that defines how to > >>>>>>> reverse > >>>>>>>>> the merge of some bits into the filter. > >>>>>>>>> > >>>>>>>>> My concern is the inefficiency of the creation of objects in any > >>>> method > >>>>>>>>> that provides access to the counts (e.g. above using an iterator > as > >>>> for > >>>>>>>>> Hasher.getBits()). I presume this method would be to allow some > >> type > >>>> of > >>>>>>>>> storage/serialisation of the filter akin to the long[] getBits() > >>>>>>> method of > >>>>>>>>> BloomFilter. So it may be better done using a method: > >>>>>>>>> > >>>>>>>>> int getCount(int index); > >>>>>>>>> > >>>>>>>>> The caller can then use long[] getBits() to get the indices set > in > >>>> the > >>>>>>>>> filter and then for each non-zero bit index call getCount(index). > >> Or > >>>>>>> just > >>>>>>>>> not include the method as the counts are only of concern when > >> storing > >>>>>>> the > >>>>>>>>> filter. This functionality is cleaner pushed into an > >> implementation. > >>>>>>>>> In a previous post we discussed whether to throw an exception on > >>>>>>>>> overflow/underflow or raise in an invalid flag. Using the invalid > >>>> flag > >>>>>>> idea > >>>>>>>>> the interface would be: > >>>>>>>>> > >>>>>>>>> interface CountingBloomFilter { > >>>>>>>>> int add(CountingBloomFilter filter); > >>>>>>>>> int subtract(BloomFilter filter); > >>>>>>>>> int subtract(Hasher filter); > >>>>>>>>> int subtract(CountingBloomFilter filter); > >>>>>>>>> int getStatus(); > >>>>>>>>> > >>>>>>>>> // Maybe > >>>>>>>>> int getSize(); > >>>>>>>>> int getCount(int index); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> The status would be negative if any operation > >> overflowed/underflowed, > >>>>>>> or > >>>>>>>>> zero if OK. The current status is returned by the add/subtract > >>>> methods. > >>>>>>>>> However I note that overflow may not be a concern. The number of > >>>> items > >>>>>>> to > >>>>>>>>> add to a filter to create overflow would be using a filter with a > >>>>>>> number of > >>>>>>>>> bits that is unrealistic to store in memory: > >>>>>>>>> > >>>>>>>>> n = 2147483647 > >>>>>>>>> p = 0.001000025 (1 in 1000) > >>>>>>>>> m = 30875634182 (3.59GiB) > >>>>>>>>> k = 10 > >>>>>>>>> > >>>>>>>>> If you want to add 2 billion items (and overflow an integer > count) > >>>> then > >>>>>>>>> your filter would be so big it would break the rest of the API > that > >>>>>>> uses a > >>>>>>>>> 32-bit int for the bit index. > >>>>>>>>> > >>>>>>>>> Thus only underflow is a realistic concern. This could be > >> documented > >>>> as > >>>>>>>>> handled in an implementation specific manner (i.e. throw or > >> ignore). > >>>>>>> The > >>>>>>>>> API is then simplified to: > >>>>>>>>> > >>>>>>>>> interface CountingBloomFilter { > >>>>>>>>> boolean add(CountingBloomFilter filter); > >>>>>>>>> boolean subtract(BloomFilter filter); > >>>>>>>>> boolean subtract(Hasher filter); > >>>>>>>>> boolean subtract(CountingBloomFilter filter); > >>>>>>>>> int getStatus(); > >>>>>>>>> > >>>>>>>>> // Maybe > >>>>>>>>> int getSize(); > >>>>>>>>> int getCount(int index); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> The boolean is used to state that add/subtract did not > >>>> over/underflow. > >>>>>>>>> Implementations can throw if they require it. > >>>>>>>>> > >>>>>>>>> The question then becomes what does getSize() represent if an > >>>>>>> add/subtract > >>>>>>>>> did not execute cleanly. Under this scheme it would be the number > >> of > >>>>>>> (add - > >>>>>>>>> subtract) operations. The status flag would be used to indicate > if > >>>> the > >>>>>>> size > >>>>>>>>> is valid, or any of the counts from getCount(). The simpler API > is > >> to > >>>>>>> not > >>>>>>>>> allow access to counts/size or adding/subtracting counts: > >>>>>>>>> > >>>>>>>>> interface CountingBloomFilter { > >>>>>>>>> boolean subtract(BloomFilter filter); > >>>>>>>>> boolean subtract(Hasher filter); > >>>>>>>>> int getStatus(); > >>>>>>>>> // Or something like ... > >>>>>>>>> boolean isValid(); > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> This filter is then only concerned with reversing the merge of > >> Bloom > >>>>>>>>> filters with a valid status flag to indicate that the current > state > >>>> is > >>>>>>>>> consistent (i.e. all filters have been cleanly > merged/subtracted). > >>>>>>>>> > >>>>>>>>> WDYT? > >>>>>>>>> > >>>>>>>>>> As for the merge question. merge is a standard bloom filter > >>>>>>> operation. > >>>>>>>>> It > >>>>>>>>>> is well defined in the literature. Merging a bloom filter into > a > >>>>>>>>> counting > >>>>>>>>>> bloom filter means incrementing the bit counts. I think that > >>>>>>>>> merge/remove > >>>>>>>>>> should continue to operate as though the parameter were a > standard > >>>>>>> bloom > >>>>>>>>>> filter. > >>>>>>>>>> > >>>>>>>>> OK. So the count is to represent the number of filters that had a > >> bit > >>>>>>> set > >>>>>>>>> at that index. This makes it more clear. > >>>>>>>>> > >>>>>>>>>> We had spoken of implementing and adding/deleting method pair > that > >>>>>>> would > >>>>>>>>>> operate on CountingBloomFilters and would add/subtract the > counts. > >>>>>>> (e.g. > >>>>>>>>>> add(CountingBloomFilter) and subtract(CountingBloomFilter)) > >>>>>>>>>> > >>>>>>>>>> I disagree with your proposal for the merge(Hasher) > >> implementation, > >>>>>>> and I > >>>>>>>>>> am not certain that an add(Hasher) makes sense. First consider > >> that > >>>>>>> the > >>>>>>>>>> Hasher returns the bits that are to be enabled in the Bloom > filter > >>>> so > >>>>>>>>>> collisions are expected. In the normal course of events a > Hasher > >> is > >>>>>>> used > >>>>>>>>>> to create a normal Bloom filter where all the duplicates are > >>>> removed. > >>>>>>>>> That > >>>>>>>>>> filter is then merged into a CountingBloomFilter. So in some > >> sense > >>>>>>> the > >>>>>>>>>> Hasher and the normal Bloom filter are the same. So I would > >> expect > >>>>>>> the > >>>>>>>>>> merge of the Hasher and the merge of the normal Bloom filter > >> created > >>>>>>> from > >>>>>>>>>> that hasher into a CountingBloomFilter to yield the same result. > >> If > >>>>>>> you > >>>>>>>>>> wanted to add an add(Hasher)/delete(Hasher) pair of functions > to a > >>>>>>>>>> CountingBloomFilter you could implement with duplicate counting, > >> but > >>>>>>> I am > >>>>>>>>>> not certain of the validity of such a count and I fear that it > >>>> muddies > >>>>>>>>> the > >>>>>>>>>> waters with respect to what the CountingBloomFilter is counting. > >>>>>>>>> Agreed. > >>>>>>>>> > >>>>>>>>>> Claude > >>>>>>>>>> > >>>>>>>>>> On Sat, Feb 29, 2020 at 2:13 PM Alex Herbert < > >>>>>>> alex.d.herb...@gmail.com > >>>>>>>>> <mailto:alex.d.herb...@gmail.com <mailto: > alex.d.herb...@gmail.com > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>>> On 29 Feb 2020, at 07:46, Claude Warren <cla...@xenei.com > >>>> <mailto: > >>>>>>> cla...@xenei.com> <mailto: > >>>>>>>>> cla...@xenei.com <mailto:cla...@xenei.com>>> wrote: > >>>>>>>>>>>> Alex would you take a look at pull request 131 [1]. it adds a > >> new > >>>>>>>>> hasher > >>>>>>>>>>>> implementation and makes the HashFunctionValidator available > for > >>>>>>> public > >>>>>>>>>>> use. > >>>>>>>>>>>> https://github.com/apache/commons-collections/pull/131 < > >>>>>>> https://github.com/apache/commons-collections/pull/131> < > >>>>>>>>> https://github.com/apache/commons-collections/pull/131 < > >>>>>>> https://github.com/apache/commons-collections/pull/131>> < > >>>>>>>>>>> https://github.com/apache/commons-collections/pull/131 < > >>>>>>> https://github.com/apache/commons-collections/pull/131> < > >>>>>>>>> https://github.com/apache/commons-collections/pull/131 < > >>>>>>> https://github.com/apache/commons-collections/pull/131>>> > >>>>>>>>>>> OK. I’ll take a look. > >>>>>>>>>>> > >>>>>>>>>>> I’ve been thinking about the counting Bloom filter and the > >> backing > >>>>>>>>>>> storage. In summary: > >>>>>>>>>>> > >>>>>>>>>>> 1. The backing storage should be a fixed array. > >>>>>>>>>>> 2. Merging a Hasher should count duplicate indices, not flatten > >>>> them > >>>>>>> all > >>>>>>>>>>> to a single count. > >>>>>>>>>>> > >>>>>>>>>>> For background I’ve used the standard formulas to estimate the > >>>>>>> number of > >>>>>>>>>>> indices that will be non-zero in a Bloom filter. The wikipedia > >> page > >>>>>>>>> gives > >>>>>>>>>>> this formula for the expected number of bits set to 0 (E(q)) if > >> you > >>>>>>> have > >>>>>>>>>>> inserted i elements into a filter of size m using k hash > >> functions: > >>>>>>>>>>> E(q) = (1 - 1/m)^ki" > >>>>>>>>>>> > >>>>>>>>>>> So a rough guess of the number of indices (bits) used by a > filter > >>>> is > >>>>>>>>>>> 1-E(q). > >>>>>>>>>>> > >>>>>>>>>>> Here is a table of Bloom filters with different collision > >>>>>>> probabilities > >>>>>>>>>>> and the proportion of bits that will be set when 1%, 10%, 100% > of > >>>> the > >>>>>>>>>>> capacity of the filter has been met: > >>>>>>>>>>> > >>>>>>>>>>> n p m k I E(q) bits > >>>>>>>>>>> 1000 1E-04 19171 13 10 0.9932 0.0068 > >>>>>>>>>>> 1000 1E-04 19171 13 100 0.9344 0.0656 > >>>>>>>>>>> 1000 1E-04 19171 13 1000 0.5076 0.4924 > >>>>>>>>>>> 1000 1E-05 23963 17 10 0.9929 0.0071 > >>>>>>>>>>> 1000 1E-05 23963 17 100 0.9315 0.0685 > >>>>>>>>>>> 1000 1E-05 23963 17 1000 0.4919 0.5081 > >>>>>>>>>>> 1000 1E-06 28756 20 10 0.9931 0.0069 > >>>>>>>>>>> 1000 1E-06 28756 20 100 0.9328 0.0672 > >>>>>>>>>>> 1000 1E-06 28756 20 1000 0.4988 0.5012 > >>>>>>>>>>> 10000 1E-06 287552 20 100 0.9931 0.0069 > >>>>>>>>>>> 10000 1E-06 287552 20 1000 0.9328 0.0672 > >>>>>>>>>>> 10000 1E-06 287552 20 10000 0.4988 0.5012 > >>>>>>>>>>> > >>>>>>>>>>> The point is that if you create a Bloom filter and fill it to > 10% > >>>> of > >>>>>>> the > >>>>>>>>>>> intended capacity the number of indices used will be about 6-7% > >> of > >>>>>>> the > >>>>>>>>>>> filter bits. > >>>>>>>>>>> > >>>>>>>>>>> So how to store the counts? Currently the counting bloom filter > >>>> uses > >>>>>>> a > >>>>>>>>>>> TreeMap<Integer, Integer>. I tried: > >>>>>>>>>>> > >>>>>>>>>>> TreeMap<Integer, Integer> > >>>>>>>>>>> HashMap<Integer, Integer> > >>>>>>>>>>> TreeSet<MutableCount> > >>>>>>>>>>> HashSet<MutableCount> > >>>>>>>>>>> int[] > >>>>>>>>>>> > >>>>>>>>>>> The MutableCount is a custom object that stores the bit index > and > >>>>>>> uses > >>>>>>>>> it > >>>>>>>>>>> for a hash code and then has a mutable integer count field. It > >>>> allows > >>>>>>>>> the > >>>>>>>>>>> count to be incremented/decremented if the object is in the > set: > >>>>>>>>>>> > >>>>>>>>>>> static final class MutableCount implements > >>>>>>> Comparable<MutableCount> { > >>>>>>>>>>> final int key; > >>>>>>>>>>> int count; > >>>>>>>>>>> // etc > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> This is adapted from the Bag<T> collection which stores an item > >>>> count > >>>>>>>>> with > >>>>>>>>>>> a MutableInteger. Here the mutable count is part of the same > >>>> object T > >>>>>>>>>>> inserted into the Set. So you can find the object, change the > >> count > >>>>>>> and > >>>>>>>>> not > >>>>>>>>>>> have to put it back into the set. This is more efficient than > >>>>>>> changing > >>>>>>>>> the > >>>>>>>>>>> Integer stored in a Map. > >>>>>>>>>>> > >>>>>>>>>>> I’ve estimated memory usage using an idea based on this article > >>>> from > >>>>>>>>>>> JavaWorld: Java Tip 130: Do you know your data size? [1]. > >>>>>>>>>>> > >>>>>>>>>>> Basically you: > >>>>>>>>>>> > >>>>>>>>>>> - create an object and throw it away. All classes are then > >>>>>>> initialised. > >>>>>>>>>>> - Then you free memory (run garbage collection) and get the > >> current > >>>>>>>>> memory > >>>>>>>>>>> usage > >>>>>>>>>>> - Then create a lot of your object (held in an array) > >>>>>>>>>>> - Then measure memory usage again > >>>>>>>>>>> - memory = (after - before) / count > >>>>>>>>>>> > >>>>>>>>>>> Here is some example output for n bits set in size m: > >>>>>>>>>>> > >>>>>>>>>>> 13107 / 262144 (0.050) : TreeMap<Integer, Integer> = > >> 733947 > >>>>>>> bytes > >>>>>>>>>>> 26214 / 262144 (0.100) : TreeMap<Integer, Integer> = > >> 1467866 > >>>>>>> bytes > >>>>>>>>>>> 13107 / 262144 (0.050) : TreeSet<MutableCount> = > >> 838928 > >>>>>>> bytes > >>>>>>>>>>> 26214 / 262144 (0.100) : TreeSet<MutableCount> = > >> 1677776 > >>>>>>> bytes > >>>>>>>>>>> 13107 / 262144 (0.050) : HashMap<Integer, Integer> = > >> 1677712 > >>>>>>> bytes > >>>>>>>>>>> 26214 / 262144 (0.100) : HashMap<Integer, Integer> = > >> 2306739 > >>>>>>> bytes > >>>>>>>>>>> 13107 / 262144 (0.050) : HashSet<MutableCount> = > >> 1782664 > >>>>>>> bytes > >>>>>>>>>>> 26214 / 262144 (0.100) : HashSet<MutableCount> = > >> 2516656 > >>>>>>> bytes > >>>>>>>>>>> 0 / 262144 (0.000) : int[] = > >> 1048608 > >>>>>>> bytes > >>>>>>>>>>> 0 / 262144 (0.000) : short[] = > >> 524320 > >>>>>>> bytes > >>>>>>>>>>> The estimate is accurate to 0.0001% for the arrays so the > method > >> is > >>>>>>>>>>> working. The HashMap was created with the capacity set to the > >>>>>>> expected > >>>>>>>>>>> capacity of the filter (m). > >>>>>>>>>>> > >>>>>>>>>>> I’ve chosen these sizes because at 5% full a HashSet is less > >> memory > >>>>>>>>>>> efficient than using a fixed size array, and at 10% the TreeSet > >> is > >>>>>>> also > >>>>>>>>>>> less efficient. > >>>>>>>>>>> > >>>>>>>>>>> Note that the java.util.Tree/HashSet versions just wrap a Map > and > >>>>>>>>> insert a > >>>>>>>>>>> dummy object for all keys in the Map. So here a Set is not as > >>>>>>> efficient > >>>>>>>>> as > >>>>>>>>>>> a Map because in the Map test I always inserted the same > Integer > >>>>>>> object > >>>>>>>>>>> representing 1. This would be the same as using a Set with an > >>>> Integer > >>>>>>>>> key > >>>>>>>>>>> but here the Set had to contain the MutableCount which has an > >> extra > >>>>>>> int > >>>>>>>>>>> field and is larger than an Integer. > >>>>>>>>>>> > >>>>>>>>>>> These data lead me to think that a counting Bloom filter should > >>>> just > >>>>>>>>> use a > >>>>>>>>>>> fixed size backing array: > >>>>>>>>>>> > >>>>>>>>>>> - If created using the same Shape as a standard Bloom filter it > >>>> uses > >>>>>>> a > >>>>>>>>>>> fixed size. This has high memory cost when the filter is empty > >> but > >>>>>>> when > >>>>>>>>> it > >>>>>>>>>>> exceeds 10% of the intended capacity it is more efficient than > a > >>>>>>> dynamic > >>>>>>>>>>> backing storage. > >>>>>>>>>>> - All operations will operate in order(n) time for an operation > >>>> with > >>>>>>>>>>> another filter with n indices. Each individual index count in > the > >>>>>>> filter > >>>>>>>>>>> will have order(1) time for access/update. Performance will be > >>>>>>> limited > >>>>>>>>> by > >>>>>>>>>>> the memory cache of the entire array. > >>>>>>>>>>> > >>>>>>>>>>> The issue is that a counting Bloom filter with a very low > number > >> of > >>>>>>>>>>> inserted items will be memory inefficient. But under what > >>>>>>> circumstance > >>>>>>>>> will > >>>>>>>>>>> such a filter be used in a short-term lifecycle? If it is > simply > >> to > >>>>>>>>> merge > >>>>>>>>>>> into another filter then this can be done using a merge with a > >>>>>>> Hasher. > >>>>>>>>> If > >>>>>>>>>>> counts are to be merged then perhaps we provide a method to > merge > >>>>>>> counts > >>>>>>>>>>> using the same data structure returned by the > CountingBloomFilter > >>>>>>>>>>> getCounts() method, e.g. using a stream of <index,count> pairs: > >>>>>>>>>>> > >>>>>>>>>>> Stream<int[]> getCounts(); > >>>>>>>>>>> void add(Stream<int[]> counts); > >>>>>>>>>>> > >>>>>>>>>>> The issue here is the Shape and HashFunctionIdentity of the > >> origin > >>>> of > >>>>>>>>> the > >>>>>>>>>>> merge cannot be validated. So just leave it out and use the > merge > >>>>>>> with a > >>>>>>>>>>> Hasher. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Thus the next issue with the counting Bloom filter > >> implementation. > >>>>>>>>>>> Currently when it merges with a Hasher it puts all the indices > >>>> into a > >>>>>>>>> Set > >>>>>>>>>>> and so will only increment the count by 1 for each index > >> identified > >>>>>>> by > >>>>>>>>> the > >>>>>>>>>>> Hasher. This appears to miss the entire point of the counting > >> Bloom > >>>>>>>>> filter. > >>>>>>>>>>> If I hash an objects to generate k indices I would hope that I > do > >>>>>>> get k > >>>>>>>>>>> indices. But the hash may not be perfect and I may get [1, k] > >>>> indices > >>>>>>>>> with > >>>>>>>>>>> some duplications. This is part of the signature of that object > >>>> with > >>>>>>> the > >>>>>>>>>>> given hash. So surely a counting Bloom filter should > accommodate > >>>>>>> this. > >>>>>>>>> If > >>>>>>>>>>> my Hasher generates the same index 20 times this should result > in > >>>> the > >>>>>>>>> count > >>>>>>>>>>> of that index incrementing by 20. > >>>>>>>>>>> > >>>>>>>>>>> The result if that if an object is added direct to a counting > >> Bloom > >>>>>>>>> filter > >>>>>>>>>>> using a Hasher it will have a different result that if added > to a > >>>>>>>>> standard > >>>>>>>>>>> Bloom filter and then that filter added to the counting Bloom > >>>> filter. > >>>>>>>>>>> Opinions on this? > >>>>>>>>>>> > >>>>>>>>>>> Alex > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> [1] > >> http://www.javaworld.com/javaworld/javatips/jw-javatip130.html > >>>> < > >>>>>>> http://www.javaworld.com/javaworld/javatips/jw-javatip130.html> > >>>>>>>>>>>> On Tue, Feb 25, 2020 at 12:35 AM Alex Herbert < > >>>>>>>>> alex.d.herb...@gmail.com <mailto:alex.d.herb...@gmail.com>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> I have created a PR that contains most of the changes > discussed > >>>> in > >>>>>>>>> this > >>>>>>>>>>>>> thread (see [1]). > >>>>>>>>>>>>> > >>>>>>>>>>>>> Please review the changes and comment here or on GitHub. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I have left the CountingBloomFilter alone. I will reimplement > >> the > >>>>>>>>>>>>> add/subtract functionality as discussed into another PR. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Alex > >>>>>>>>>>>>> > >>>>>>>>>>>>> [1] https://github.com/apache/commons-collections/pull/133 < > >>>>>>> https://github.com/apache/commons-collections/pull/133> < > >>>>>>>>>>>>> https://github.com/apache/commons-collections/pull/133 < > >>>>>>> https://github.com/apache/commons-collections/pull/133>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> I like: Like Like - The likeliest place on the web > >>>>>>>>>>>> <http://like-like.xenei.com <http://like-like.xenei.com/>> > >>>>>>>>>>>> LinkedIn: http://www.linkedin.com/in/claudewarren < > >>>>>>> http://www.linkedin.com/in/claudewarren> > >>>>>>>>>> -- > >>>>>>>>>> I like: Like Like - The likeliest place on the web > >>>>>>>>>> <http://like-like.xenei.com <http://like-like.xenei.com/> < > >>>>>>> http://like-like.xenei.com/ <http://like-like.xenei.com/>>> > >>>>>>>>>> LinkedIn: http://www.linkedin.com/in/claudewarren < > >>>>>>> http://www.linkedin.com/in/claudewarren> < > >>>>>>>>> http://www.linkedin.com/in/claudewarren < > >>>>>>> http://www.linkedin.com/in/claudewarren>> > >>>>>>>> -- > >>>>>>>> I like: Like Like - The likeliest place on the web > >>>>>>>> <http://like-like.xenei.com <http://like-like.xenei.com/>> > >>>>>>>> LinkedIn: http://www.linkedin.com/in/claudewarren < > >>>>>>> http://www.linkedin.com/in/claudewarren> > >>>>>>> > >>>>>> -- > >>>>>> I like: Like Like - The likeliest place on the web > >>>>>> <http://like-like.xenei.com> > >>>>>> LinkedIn: http://www.linkedin.com/in/claudewarren > >>>>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>>> For additional commands, e-mail: dev-h...@commons.apache.org > >>>> > >>>> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >> For additional commands, e-mail: dev-h...@commons.apache.org > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren