@Gilles I am happy to rename the package without the plural if that is the standard, I will also fix the indent issue. Is there a definition that can be quickly imported into Eclipse to do the proper formatting?
I am adding/updating all comments in the code. FilterConfig contains a main method to make it easy to assess how the various settings influence the length and number of functions in the filter. This could go into another class or a Utility class but neither of those seem like a better solution. I agree with your assessment of final and immutable for methods, variables and classes. I am removing the FilterConfig constructor for BloomFilter and ensuring the BloomFilter makes a defensive copy. The difference between match() and equals() is that equals() is a bit vector equality and match is the standard bloom filter comparison T&C = T. Documentation is being added to the match() and inverseMatch() to explain this. STORED_SIZE is similar to Long.BYTES (perhaps it should be renamed). It identifies how many bytes the FilterConfig will use when written to a DataOuptut or read from a DataInput. I think the Utils class can be removed once the other code is moved to Java8 streams and not extended iterators. Hash function is changed to Objects.hash() Constructors are/will be private where factories make sense. For some the simple constructor makes more sense. (e.g. BloomFilter(BitSet) ) I have no problem removing the Serializable on FilterConfig, but can you explain (or point to a post) why it should be avoided? The update/build methods are being documented and an explanation of the difference included. getLog() is also being documented The asBitSet() method was to extract the bitset from the bloom filter so that it could be stored otherwise used with BitSet based processes. The name of the method is being chagned to getBitSet() and it makes a defensive copy. The ProtoBloomFilter is not a factory. It is really a bean and there are times when it makes sense to use the ProtoBloomFilter. The name is important as it identifies what it is. It contains the necessary information to build any configuration of BloomFilter. There is a proposal to use ProtoBloomFilters in service calls. It could be defined as an interface with a concrete implementation in the ProtoBloomFilterBuilder. As an example of ProtoBloomFilter usage. Say you have a Person object that has the user's first name, last name, eye color, hair color. The ProtoBloomFilterBuilder would accept the 4 values as 4 calls to update() followed by a build() to build the protoBloomFilter. ProtoBloomFilter proto = ne wProtoBloomFilterBuilder().update( "Claude" ).update( "Warren" ).update( "brown" ).update( "silver" ).build(); I can create a bloom filter configuration that has 1x10^7 items and accepts 1 / 1x10^6 collisions. FilterConfig config1 = new FilterConfig( 10000000, 1000000) and build the BloomFilter BloomFilter filter1 = proto.create( config1 ) I can use that filter to determine if the person is in a collection of 1x10^7 items. The implementation of that collection could be multiple buckets of 1x10^4 items with a collision rate of 5x10^3 items. To check that I create an appropriate FilterConfig FilterConfig config2 = new FilterConfig( 10000, 5000 ) and then create the proper bloom filter BloomFilter filter2 = proto.create( config2 ) Claude On Fri, Sep 13, 2019 at 2:22 PM Gilles Sadowski <gillese...@gmail.com> wrote: > > > [...] > > > > Gilles, > > > > Take a look at the PR, I added comments to allow the PR to have 0 deps. > > > > Gary > > > > > > IMO, the package should be named "bloomfilter" (without "s"). > Naming seems inconsistent in [Collections]: Some (package) > names are singular, other plural. > > * Indent must be 4 spaces. > * All fields and methods must be commented. > * "FilterConfig" contains a "main" method. > * Local variables and fields must be declared "final" whenever constant. > * Some methods are declared "final", others not. > * "BloomFilter" should be made immutable. > * "BloomFilter" could do without a second constructor: "FilterConfig" > should have a method that returns a "BitSet". > * "BloomFilter" must make a defensive copy of its "bitSet" argument. > * What's the difference between "match" and "equals"? > * Why is "STORED_SIZE" public? > * "Utils" classes should be avoided. > * Hash function should be imported/shaded from "Commons Codec" > (and defined there if missing). > * Constructor(s) should be private (instantiation through factory > methods (cf. ValJO). > * "Serializable" should be avoided. > * The "update" methods are missing explanations (or reference) about > their purpose. Also, it seems that "update" and "build" are redundant. > * Does "getLog" return a standard value? If so, the reference should > appear in the Javadoc (not just as code comment). > * What is the purpose of method "asBitSet()"? > * (IIUC) "ProtoBloomFilter" should be renamed "BloomFilterFactory". > "ProtoBloomFilterBuilder" should be a static inner class of that > factory. > > Regards, > Gilles > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren