Re: Class name H1Config

2019-06-07 Thread Claude Warren
+1 I think that Gary has a good point. On Fri, Jun 7, 2019 at 2:15 PM Gary Gregory wrote: > Hi All: > > I find the abbreviation of "HTTP" to "H" in "H1Config" pretty odd. > > Can't we just call this class Http1Config? > > Gary > -- I like: Like Like - The likeliest place on the web

Configuration diff

2019-07-11 Thread Claude Warren
Greetings, I was just working on an issue for work and discovered that we really needed a configuration diff. Something that will compare key values between two configurations. The result would be an object that would tell you all the differences for either configuration and perhaps a pretty pr

Re: [ALL] POM file standardisation of layout

2019-08-15 Thread Claude Warren
Since the POM is an XML document how about a simple XSLT that will convert them all to the same format. Alternatively an XML diff could be performed where each leaf node is contextualized by generating the the path from the root to the leaf, the can be sorted and a standard diff performed to deter

New Sub-project Proposal.

2019-09-10 Thread Claude Warren
Having spoken with several people at ApacheCon, I would like to see a bloomfilter sub project. I have code that is already under Apache License that I am willing to contribute as the basis The goal of the sub-project would be to produce a reference implementation that could be used by other proje

Re: New Sub-project Proposal.

2019-09-11 Thread Claude Warren
> > having read, what a bloom filter is, a subproject sounds unnecessary > > to me. I'd recommend, that you contribute your code to Commons > > Collections, which seems to me to be a logical target. > > > > Jochen > > > > On Tue, Sep 10, 2019 at 8:45 PM C

Re: New Sub-project Proposal.

2019-09-11 Thread Claude Warren
t a bloom filter is, a subproject sounds unnecessary > > > to me. I'd recommend, that you contribute your code to Commons > > > Collections, which seems to me to be a logical target. > > > > > > Jochen > > > > > > On Tue, Sep 10, 201

Re: New Sub-project Proposal.

2019-09-11 Thread Claude Warren
@stain. You have correctly identified the code in my repository. The code could be refactored to use streams or we could bring the jena iterator extensions into commons. I had suggested that at one time but there were concerns about conflicts with existing code. Duplication with of functionality

Re: New Sub-project Proposal.

2019-09-12 Thread Claude Warren
> > > > We would need to do IP clearance to bring in the code formally to ASF. It > > should be easy if it is just you who made it under Apache license. > > > > On Wed, 11 Sep 2019, 18:44 Claude Warren, wrote: > > > > > @stain. You have correctly identi

Re: New Sub-project Proposal.

2019-09-12 Thread Claude Warren
de, so accepting them is optional. Claude On Thu, Sep 12, 2019 at 9:28 AM Stian Soiland-Reyes wrote: > On Thu, 12 Sep 2019 08:06:59 +0100, Claude Warren > wrote: > > Actually the code I was thinking of is the multi-filter branch. It > cleans > > up some names and simplifie

Re: New Sub-project Proposal.

2019-09-12 Thread Claude Warren
@Gilles Missed your suggestion about modularity. Can you point me to the original message or paraphrase it here? Claude On Thu, Sep 12, 2019 at 11:03 AM Gilles Sadowski wrote: > Le jeu. 12 sept. 2019 à 10:28, Stian Soiland-Reyes a > écrit : > > > > On Thu, 12 Sep 2019 08:0

Re: New Sub-project Proposal.

2019-09-12 Thread Claude Warren
The base code depended on commons-lang3 for building hashes. Is this acceptable or should the hash generation code from lang3 be cut and pasted into the classes. Not sure what the standard is in this project. On Thu, Sep 12, 2019 at 4:14 PM Claude Warren wrote: > @Gilles > > Mi

Re: New Sub-project Proposal.

2019-09-13 Thread Claude Warren
@Gilles I am happy to rename the package without the plural if that is the standard, I will also fix the indent issue. Is there a definition that can be quickly imported into Eclipse to do the proper formatting? I am adding/updating all comments in the code. FilterConfig contains a main method

Re: New Sub-project Proposal.

2019-09-15 Thread Claude Warren
. Claude On Mon, Sep 16, 2019 at 2:01 AM Gary Gregory wrote: > On Sun, Sep 15, 2019 at 8:17 PM sebb wrote: > > > On Mon, 16 Sep 2019 at 00:17, Gilles Sadowski > > wrote: > > > > > > Hi. > > > > > > Le sam. 14 sept. 2019 à 08:15,

Re: [Collections] Bring Abstract* tests into it's own test jar

2019-09-20 Thread Claude Warren
If you are looking for tests for interfaces take a look at contract testing (https://github.com/Claudenw/junit-contracts) On Fri, Sep 20, 2019 at 9:40 AM Rohan Suri wrote: > Apologies I wasn't aware the test jars are being released as well. > Adding "classifier: tests" when specifying the depe

Re: New Sub-project Proposal.

2019-09-23 Thread Claude Warren
For the style issues is there an Eclipse style package that meets the commons style or some other tool that will correctly configure the format and style options in Eclipse? On Mon, Sep 23, 2019 at 10:54 AM Gilles Sadowski wrote: > Hello. > > Here are a few comment from a quick browse of tod

Re: New Sub-project Proposal.

2019-09-23 Thread Claude Warren
/MessageDigest.html [2] https://www.dictionary.com/browse/proto?s=t On Mon, Sep 23, 2019 at 11:13 AM Claude Warren wrote: > For the style issues is there an Eclipse style package that meets the > commons style or some other tool that will correctly configure the format > and style options in Eclips

Re: New Sub-project Proposal.

2019-09-23 Thread Claude Warren
oom filters is not serializable, actually bloom filters themselves should be serializable. How does one accomplish that if serialization is left to later? Claude On Mon, Sep 23, 2019 at 8:00 PM Gilles Sadowski wrote: > Hi. > > Le lun. 23 sept. 2019 à 12:59, Claude Warren a écrit : &

Re: New Sub-project Proposal.

2019-09-25 Thread Claude Warren
There is no associated JIRA report. Not even sure how to generate or build one. Claude On Wed, Sep 25, 2019 at 5:42 PM Gilles Sadowski wrote: > Hi. > > Is there a JIRA report associated with the proposal? > > It would help review if there were several PRs that > differentiates between "core" f

Re: New Sub-project Proposal.

2019-09-25 Thread Claude Warren
Created COLLECTIONS-728 ( https://issues.apache.org/jira/browse/COLLECTIONS-728). I hope that was appropriate. On Wed, Sep 25, 2019 at 6:41 PM Gilles Sadowski wrote: > Hello. > > 2019-09-25 19:13 UTC+02:00, Claude Warren : > > There is no associated JIRA report. Not even sure

[collections] ProtoBloomFilter.Builder -- opinions sought

2019-10-07 Thread Claude Warren
Greetings, I am preparing a pull request to bring Bloom Filters in the the collections package. The pull requests is at https://github.com/apache/commons-collections/pull/83 and the Jira ticket is https://issues.apache.org/jira/browse/COLLECTIONS-728 The package has a builder that create Proto

[collections] BloomFilter or BitSet functions?

2019-10-07 Thread Claude Warren
As noted earlier I am preparing a contribution of Bloom Filter classes to the collections module. As part of this submission there are several methods that operate on BitSets that are used as part of Bloom Filter manipulation and analysis. My question is, should these be contributed as Bloom Fil

Re: [collections] BloomFilter or BitSet functions?

2019-10-12 Thread Claude Warren
the BitSet functionality should wait until someone wants it. Right now I can proceed without it and provide the semantically sound methods for the BloomFilters. Claude On Fri, Oct 11, 2019 at 12:36 AM Gilles Sadowski wrote: > Hello. > > Le lun. 7 oct. 2019 à 19:42, Claude Warren

Re: [collections] BloomFilter or BitSet functions?

2019-10-13 Thread Claude Warren
ntribution WIP. If anyone objects or thinks that this change is not proper please let me know soonest. Thanks, Claude On Sun, Oct 13, 2019 at 7:07 AM Claude Warren wrote: > I believe the functions should be in a separate class as it increases the > separation of concerns. > > The met

[collections] BloomFilter package architecture discussion

2019-10-14 Thread Claude Warren
Greetings, I feel like I have been beating my head against a wall trying to get this contribution accepted; this is not a complaint about process or personalities, just a statement of how I feel. I realize that I have made progress but I also realize that there is a long way to go. Furthermore, t

Re: [ALL] Update to commons security page

2019-10-15 Thread Claude Warren
If the style is to rely on external code to do input validation, then I think that should be in the javadocs as well as on the page you mention. Claude On Tue, Oct 15, 2019 at 10:59 AM sebb wrote: > It might be useful to add a note to the commons security page about > automated vulnerability ch

Re: [collections] BloomFilter package architecture discussion

2019-10-15 Thread Claude Warren
On Tue, Oct 15, 2019 at 1:46 AM Gilles Sadowski wrote: > Hello. > > > > > Furthermore, > > the other potential users and supporters have not responded to any > > communication about this proposal so I am floundering on that front too. > > Who are they? > Developers I have worked with or know of

Re: [collections] BloomFilter package architecture discussion

2019-10-17 Thread Claude Warren
; 2019-10-15 20:05 UTC+02:00, Claude Warren : > > On Tue, Oct 15, 2019 at 1:46 AM Gilles Sadowski > > wrote: > > > >> Hello. > >> > >> > >> > >> > Furthermore, > >> > the other potential users and supporters have not resp

Re: [collections] BloomFilter package architecture discussion

2019-10-17 Thread Claude Warren
On Wed, Oct 16, 2019 at 2:08 AM Gilles Sadowski wrote: > Hi. > > 2019-10-15 20:05 UTC+02:00, Claude Warren : > > On Tue, Oct 15, 2019 at 1:46 AM Gilles Sadowski > > wrote: > > > >> Hello. > >> > >> > >> > >> > Further

Re: strange change to src/main/java/org/apache/bcel/generic/FieldGenOrMethodGen.java

2019-10-18 Thread Claude Warren
The change from public to private would indicate a major version change as it changes the API. Though I suppose this could also be done if code were being contributed to a project from outside. In which case the minor (middle) number would have to have changed. In either case changing from a pro

Re: [All] Using "SemVer"?

2019-10-18 Thread Claude Warren
+1 ensures interoperability for our users for minimal pain on our side. On Fri, Oct 18, 2019 at 12:01 PM Emmanuel Bourg wrote: > Le 18/10/2019 à 12:46, Gilles Sadowski a écrit : > > > Why not state it explicitly (and make it a formal requirement for > > a release)? > > -1, it restricts our freed

[collections] Bloom filter - Discussion of Shape

2019-10-18 Thread Claude Warren
I think the other discussion is getting a bit long so I thought we could start this discussion here and see if we can close out the other discussion with agreement on the remaining topics. The “Shape” of a bloom filter (excluding the hash algo) is defined mathematically by Number of Items (AKA:

Re: [collections] BloomFilter package architecture discussion

2019-10-19 Thread Claude Warren
On Fri, Oct 18, 2019 at 3:22 PM Gilles Sadowski wrote: > Hi. > > >>> [...] > > > > > > Maybe I was not clear enough: I'm not saying that we should prefer > > > some representation (of the state) over another; only that the how > > > the state is represented externally need not be the same as the

Re: [collections] BloomFilter package architecture discussion

2019-10-19 Thread Claude Warren
-10-19 16:20 UTC+02:00, Claude Warren : > > On Fri, Oct 18, 2019 at 3:22 PM Gilles Sadowski > > wrote: > > > >> Hi. > >> > >> >>> [...] > >> > > > >> > > Maybe I was not clear enough: I'm not saying

Re: [collections] BloomFilter package architecture discussion

2019-10-20 Thread Claude Warren
, say al, a2, ..., ad. Finally, all d bits addressed by al through ad are set to 1." -- Burton Bloom, "Space/Time Trade-offs in Hash Coding with Allowable Errors"[1] Claude [1] http://crystal.uta.edu/~mcguigan/cse6350/papers/Bloom.pdf On Sat, Oct 19, 2019 at 11:40 PM Claude War

Re: [All] Using "SemVer"?

2019-10-21 Thread Claude Warren
I think there is a belief in the general using public that Apache follows SemVer for most Java code. I think that it would be best if each module specified if it follows SemVer or not -- just to clear up any confusion on the part of the user. Claude On Fri, Oct 18, 2019 at 4:08 PM Stefan Bodewig

Re: [collections] BloomFilter package architecture discussion

2019-10-28 Thread Claude Warren
m. 20 oct. 2019 à 14:20, Claude Warren a écrit : > > > > A Bloom filter is a set of bits: > > It is not, according to the quote here: > > > "The hash area is considered as N individual addressable bits, with > > addresses 0 through N - 1. It is assumed that all

Re: [lang] immutable BitSet

2019-10-28 Thread Claude Warren
Having "ImmutableBitSet" inherit from "BitSet" breaks the latter's contract. > > no more so than the ImmutableSet breaks the Set contract. Yes it does but the pattern is well established. -- I like: Like Like - The likeliest place on the web LinkedIn: http://www.lin

[CODEC] Sign Extension Error in Murmur3

2019-11-03 Thread Claude Warren
There is an error in the current Murmur3 code introduced by sign extension errors. This is documented in CODEC-264.[1] I have created a pull request to fix it.[2] While the code changes did not change any of the existing Murmur3 tests, I did add new tests that failed until the changes were appli

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-03 Thread Claude Warren
> > > > byte b = -1; > > (int) b != (b & 0xff); > > b << 8 != (b & 0xff) << 8; > > b << 16 != (b & 0xff) << 16; > > > > The original code has the use of the 0xff mask for most of the murmur3 > algorithm. It has been misse

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-03 Thread Claude Warren
As a third option as @melloware said in the pull request comments the original implementation came from Apache Hive. The current hashes could be named hash31Hive and hash128Hive On Mon, Nov 4, 2019 at 12:53 AM Claude Warren wrote: > I think the way to prove they behave correctly is to test

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-04 Thread Claude Warren
the issue. I think in general we should follow the same action with any digest defect: New methods and Javadoc old. Claude On Mon, Nov 4, 2019 at 8:08 AM Alex Herbert wrote: > > > > On 4 Nov 2019, at 02:13, sebb wrote: > > > > On Mon, 4 Nov 2019 at 00:53, Claude

Re: [IO] Don't deprecate IOUtils#closeQuietly()

2019-11-05 Thread Claude Warren
+1 I have similar use cases. On Tue, Nov 5, 2019 at 3:12 PM Gary Gregory wrote: > In this vein, I'd also like to add a null-safe close(Closeable) method. > > Gary > > On Tue, Nov 5, 2019 at 10:10 AM Gary Gregory > wrote: > > > Hi All: > > > > I propose that we do NOT deprecate IOUtils#closeQuie

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-11 Thread Claude Warren
does not impact the code in the wild. Claude On Mon, Nov 4, 2019 at 10:50 AM Claude Warren wrote: > There are a number of issues with the format and potential bugs in the > Codec Murmur3 code. ( See spotbugs, PMD, and codestyle reports) The one > that tripped me up was the mix of tab/sp

Re: [CODEC] Sign Extension Error in Murmur3

2019-11-11 Thread Claude Warren
https://github.com/apache/commons-codec/pull/27 On Mon, Nov 11, 2019 at 4:25 PM Claude Warren wrote: > I took the approach that I would leave the original code there and add new > methods hash128_x64 and hash32_x86. I also marked the older methods as > deprecated with a note

[commons-codec] is there a commons-codec2

2019-11-24 Thread Claude Warren
Is there a commons-codec2? I see it listed as group ID: org.apache.commons artifact ID: commons-codec2 version: 2.0-SNAPSHOT in the apache snapshot repository: https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-codec2/2.0-SNAPSHOT/ But I don't see any way to

[Collection] adding Codec dependency?

2019-11-30 Thread Claude Warren
Greetings, I have a contribution[1] that requires a Murmur128 hash. The options are add the hash and associated tests to the Collections classes or use the Murmur hash found in the Commons Codec project. In the contribution I elected to go with adding the dependency on the codec 1.14 version.[2]

Re: [Collection] adding Codec dependency?

2019-12-01 Thread Claude Warren
Hasher implementations rather than the name and function that is currently used. Then it might be much easier for integrators to implement other hash functions. I could make it an optional dependency. On Sat, Nov 30, 2019 at 4:13 PM sebb wrote: > On Sat, 30 Nov 2019 at 12:48, Claude Warren wr

Re: [codec] release soon

2019-12-26 Thread Claude Warren
For the contributions and issues I was involved in, the javadoc appear to be correct. Claude On Thu, Dec 26, 2019 at 1:30 PM Gary Gregory wrote: > It looks like we will need a new version of Commons Codec out before we can > use new code there from Commons Collections. So now's the time to poli

Re: [collections] bloom filters comments

2019-12-27 Thread Claude Warren
On Fri, Dec 27, 2019 at 1:02 AM Gary Gregory wrote: > Hi Claude and all: > > Here are a couple of comments on the bloom filter PR. > > 10,100 ft level comment we do not have to worry about today: Before the > release, we might want to split Commons Collection into a multi-module > project and hav

Re: [collections] bloom filters comments

2019-12-28 Thread Claude Warren
AbstractBloomFilter class. Claude On Sat, Dec 28, 2019 at 2:01 AM Gary Gregory wrote: > On Fri, Dec 27, 2019 at 11:36 AM Claude Warren wrote: > > > On Fri, Dec 27, 2019 at 1:02 AM Gary Gregory > > wrote: > > > > > Hi Claude and all: > > > > > > H

Re: [commons-codec] branch master updated: Change AssertionError to IllegalStateException

2019-12-28 Thread Claude Warren
Wherever the note is found the javadoc should include * This implementation contains a sign-extension bug in the seed initialization. * This manifests if the seed is negative. On Sat, Dec 28, 2019 at 1:45 AM Gary Gregory wrote: > On Fri, Dec 27, 2019 at 8:17 PM wrote: > > > This is a

Re: [collections] bloom filters comments

2019-12-29 Thread Claude Warren
e the Set operations into their own class > with static methods? The set operations all operate on 2 Bloom filters. > Moving them would clarify the AbstractBloomFilter class. > > Claude > > > On Sat, Dec 28, 2019 at 2:01 AM Gary Gregory > wrote: > >> On Fri, Dec 2

Re: [collections] bloom filters comments

2019-12-29 Thread Claude Warren
It is currently a sub-class. There was a suggestion to move it outside of the BloomFilter interface. On Sun, Dec 29, 2019 at 3:47 PM Gilles Sadowski wrote: > Le dim. 29 déc. 2019 à 15:30, Claude Warren a écrit : > > > > If the Shape class (BloomFilter.Shape) is extracted from

Re: [codec] release soon

2019-12-29 Thread Claude Warren
. It seems to be to add a default > >>> block > >>> > for > >>> > >> the switch statement. > >>> > >> > >>> > > > >>> > > I'm OK to drop the code, or replace the AssewrtionErro

Re: [VOTE] Release Apache Commons Codec 1.14 based on RC1

2020-01-03 Thread Claude Warren
I am not a PMC member but, I'll report any way +0 (non binding) *FindBug* issues show: MurmurHash3 case fall through issues. I believe these are expected and can be fixed with an annotation. Suggest release and fix in next update *CPD* issues shows private static long getLittleEndianLong(final

Re: [collections] bloom filters comments

2020-01-08 Thread Claude Warren
I believe the issue (I think history is at https://issues.apache.org/jira/browse/COLLECTIONS-728?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=17003600) is about the identification of hash implementations. Currently there are a couple of classes involved:

Re: [collections] bloom filters comments

2020-01-08 Thread Claude Warren
ter if > there's some code to read along. And I am used to GitHub/GitLab diff > interface. > So I agree with Gary that this could be a good time for a PR (maybe a > draft one). > Bruno > > > On Thursday, 9 January 2020, 6:32:09 am NZDT, Claude Warren < > cla...@x

Re: [collections] bloom filters comments

2020-01-13 Thread Claude Warren
. Kinoshita wrote: > Sorry, I'd read Gary's reply as if there was no PR yet. Reviewed it a bit > now, lots of tests! Will play with the code and read the comments and > finish the review by the end of the week. > > Thanks Claude > > On Thursday, 9 January 2020,

[collections] Bloom filter signature calculation

2020-01-23 Thread Claude Warren
The HashFunctionIdentity.getSignature() method is intended to be used as in a quick comparison of a HashFunctionIdentities. As such it is supposed to encompass the name, signedness and process as well as some indication that the function implementation is the same as any other implementation of th

Re: [Collections] Bloom filters are in

2020-01-25 Thread Claude Warren
Now to get some documentation done! On Sat, Jan 25, 2020 at 1:58 PM Gary Gregory wrote: > ... git master. Thank you Claude! > > Gary > -- I like: Like Like - The likeliest place on the web LinkedIn: http://www.linkedin.com/in/claudewarren

[collections] example code.

2020-01-26 Thread Claude Warren
I see that there is no example code directory in the collections project. I was thinking of contributing an example of how to construct a Bloom filter that operates like the Hadoop Bloom filter but this seems like something that we may not want to include in the library. In the Jena project we hav

FOSDEM 2020

2020-01-26 Thread Claude Warren
Is anyone on this list (besides me) planning on attending FOSDEM 2020[1]? If so would you be interested in hosting the Apache table? By "hosting" I mean we stand there talk to people and promote Apache commons. Claude [1] https://cwiki.apache.org/confluence/display/COMDEV/FOSDEM+2020 -- I like

Re: [collections] example code.

2020-01-26 Thread Claude Warren
> > garydgreg...@gmail.com> wrote: > > > > I think the simplest is to create an examples package under src/test > which > > also let you put example data under src/resources. > > > > This way, that code would get processed just like any other test code > &

Re: [collections] Bloom filters

2020-02-17 Thread Claude Warren
Alex, Thank you for your comments. See comments inline. On Mon, Feb 17, 2020 at 3:20 PM Alex Herbert wrote: > I had a look through all the BloomFilter code. Thanks Claude for the > contribution. > > Some items that need clarifying: > > > 1. HashFunctionIdentity.Signedness > > This is not ful

Re: [collections] Bloom filters

2020-02-18 Thread Claude Warren
On Mon, Feb 17, 2020 at 9:52 PM Alex Herbert wrote: > > > > On 17 Feb 2020, at 20:30, Claude Warren wrote: > > > > Alex, > > > > Thank you for your comments. > > > > See comments inline. > > > > > > > > On Mon, Feb 17, 20

Re: [collections] Bloom filters

2020-02-18 Thread Claude Warren
On Tue, Feb 18, 2020 at 9:12 AM Alex Herbert wrote: > > My maths is rusty. If A=0xF000ABCD as interpreted as an unsigned and > > B=0xF000ABCD but interpreted as a signed does (A mod N) = (B mod N) for > all > > positive values of N? If so then you are correct and Signedness does not > > matte

Re: [collections] Bloom filters

2020-02-18 Thread Claude Warren
? Is the order of the indices important? > > Or do you have some benchmarks to show that the TreeMap handles lots of > growth and shrinkage better than a HashMap. There are situations where > each one would be a better choice and so perhaps this is a case for > having a

Re: [collections] Bloom filters

2020-02-19 Thread Claude Warren
18 Feb 2020, at 22:34, Gary Gregory wrote: > > > > On Tue, Feb 18, 2020 at 5:32 PM Claude Warren wrote: > > > >> Last one first, why a tree map? I think it is a holdover from an > earlier > >> implementation. It can be any reasonable Map (e.g. HashMap).

Re: [collections] Bloom filters

2020-02-19 Thread Claude Warren
; } among other changes. Those were the changes I was referring to. Claude On Wed, Feb 19, 2020 at 11:33 PM Alex Herbert wrote: > > > > On 19 Feb 2020, at 21:14, Claude Warren wrote: > > > > I think the compromise solution of removing the thrown exception and &

Re: [collections] Bloom filters

2020-02-28 Thread Claude Warren
Alex would you take a look at pull request 131 [1]. it adds a new hasher implementation and makes the HashFunctionValidator available for public use. https://github.com/apache/commons-collections/pull/131 On Tue, Feb 25, 2020 at 12:35 AM Alex Herbert wrote: > I have created a PR that contains

Re: [collections] Bloom filters

2020-03-01 Thread Claude Warren
duplicate counting, but I am not certain of the validity of such a count and I fear that it muddies the waters with respect to what the CountingBloomFilter is counting. Claude On Sat, Feb 29, 2020 at 2:13 PM Alex Herbert wrote: > > > > On 29 Feb 2020, at 07:46, Claude Warren wrote:

Re: [collections] Bloom filters

2020-03-01 Thread Claude Warren
> > On 1 Mar 2020, at 09:28, Claude Warren wrote: > > > > The idea of a backing array is fine and the only problem I see with it is > > in very large filters (on the order of 10^8 bits and larger) but document > > the size calculation and let the developer worry about

Re: [collections] Bloom filters

2020-03-01 Thread Claude Warren
I am happy with a plain Iterator as the return. Claude On Mon, Mar 2, 2020 at 1:02 AM Alex Herbert wrote: > > > > On 1 Mar 2020, at 15:39, Claude Warren wrote: > > > > I think the CountingBloomFilter interface needs to extend BloomFilter. > > I said that but d

Re: [collections] Bloom filters

2020-03-02 Thread Claude Warren
It is not too late to update the BloomFIlter interface to have the merge return a boolean. The incorrect Shape would still throw an exception, so the return value would only come into play if the bits could not be set. thoughts? On Mon, Mar 2, 2020 at 7:56 AM Claude Warren wrote: > for

Re: [collections] Bloom filters

2020-03-02 Thread Claude Warren
the specific index. int getCount( int index ); With these methods It becomes possible to construct an iterator of int[] or Map.Entry or whatever else the developer wants. Claude On Mon, Mar 2, 2020 at 10:48 AM Alex Herbert wrote: > On 02/03/2020 09:38, Claude Warren wrote: > > It i

Re: [collections] Bloom filters

2020-03-02 Thread Claude Warren
limit checks quickly as we turn bits on. Makes me think we might need to implement StandardBloomFilter to use long[] as well. Claude On Mon, Mar 2, 2020 at 1:12 PM Alex Herbert wrote: > > On 02/03/2020 11:32, Claude Warren wrote: > > my thought on changing the BloomFilter.merge(

Re: [collections] Bloom filters

2020-03-02 Thread Claude Warren
antly impacts the uses that I currently have. Shall we move forward? Claude On Mon, Mar 2, 2020 at 6:02 PM Alex Herbert wrote: > > On 02/03/2020 16:12, Claude Warren wrote: > > Does getCounts() return a snapshot of the values when the call was made > or > > does it return

Re: [collections] Bloom filters

2020-03-03 Thread Claude Warren
:54 AM Alex Herbert wrote: > > On 02/03/2020 22:34, Claude Warren wrote: > > So what we have then is: > > > > *public* *interface* BloomFilter { > > > > *int* andCardinality(BloomFilter other); > > > > *int* cardinality(); &g

Re: [collections] Bloom filters

2020-03-08 Thread Claude Warren
With the upcoming change the StaticHash usage model has changed. It was serving two purposes: 1. as a mechanism to preserve the list of integers from the BloomFilter as well as the shape. 2. as a way to construct a Hasher from a collection of integers and a shape so that they could be

Restart Build?

2020-03-08 Thread Claude Warren
I have a pull request ( https://github.com/apache/commons-collections/pull/131) that failed due to an external connection being reset. Is there a way to restart the build without creating a new pull request or pushing to git? Claude -- I like: Like Like - The likeliest place on the web

Re: [collections] Bloom filters

2020-03-14 Thread Claude Warren
Shape is not intended to "Perform the standard computations using some of n, m, k, p to produce optimal values for the other values of n, m, k, p:" that is left to the developer to determine possibly with the help of https://hur.st/bloomfilter/ as referenced in the class javadoc. However, writing t

Re: [collections] Bloom filters

2020-03-14 Thread Claude Warren
of bits. But bundling > the hash function identity and number of hash functions saves you having to > pass that separately to any Bloom filter and removes the requirement to > specify these separately in the Bloom filter interface. > > > > On Sat, 14 Mar 2020, 09:31 Claude Warr

[BloomFilters] changes to BloomFilter

2020-03-15 Thread Claude Warren
We have spoken elsewhere about removing getHasher() and adding iterator() In addition should we add forEachBit( IntConsumer )? -- I like: Like Like - The likeliest place on the web LinkedIn: http://www.linkedin.com/in/claudewarren

Re: [BloomFilters] changes to BloomFilter

2020-03-16 Thread Claude Warren
es an Iterator. On Sun, Mar 15, 2020 at 6:08 PM Alex Herbert wrote: > On Sun, 15 Mar 2020, 17:27 Claude Warren, wrote: > > > We have spoken elsewhere about removing getHasher() and adding iterator() > > In addition should we add forEachBit( IntConsumer )?I > > > I was thi

Re: [BloomFilters] changes to BloomFilter

2020-03-16 Thread Claude Warren
Shape) will return the same values. Did I misunderstand something? Claude On Mon, Mar 16, 2020 at 6:34 PM Alex Herbert wrote: > > On 16/03/2020 07:57, Claude Warren wrote: > > I made a quick pass at changing getHasher() to iterator(). > > A look at the feasibility or have you s

Re: [BloomFilters] changes to BloomFilter

2020-03-17 Thread Claude Warren
> In summary: > > 1. change Hasher getBits to iterator > agree > 2. improve documentation of Hasher and the contract that it should fulfil > with respect to items and a Shape > absolutly > 3. potentially drop Hasher.Builder unless there is a way to reset the > Builder or

Re: [BloomFilters] changes to BloomFilter

2020-03-17 Thread Claude Warren
/pull/131 get merged so that we can have more than one example of a hasher that actually hashes. On Tue, Mar 17, 2020 at 1:53 PM Alex Herbert wrote: > > > > On 17 Mar 2020, at 11:08, Claude Warren wrote: > > > > On Tue, Mar 17, 2020 at 12:28 AM Alex Herbert <mail

Re: [BloomFilters] changes to BloomFilter

2020-03-17 Thread Claude Warren
On Tue, Mar 17, 2020 at 4:38 PM Alex Herbert wrote: > > > > On 17 Mar 2020, at 15:41, Claude Warren wrote: > > > > I agree with the HashFunction changes. > > OK, but which ones? > DOH! this one... > > Changing HashFunction to have two methods: > >

Re: [BloomFilters] changes to BloomFilter

2020-03-17 Thread Claude Warren
n used with a Hasher, remove the duplicates, and perform the same test. I see no reason not to add them. On Tue, Mar 17, 2020 at 6:23 PM Alex Herbert wrote: > > > > On 17 Mar 2020, at 17:06, Claude Warren wrote: > > > > On Tue, Mar 17, 2020 at 4:38 PM Alex Herbert > &g

Re: [BloomFilters] changes to BloomFilter

2020-03-18 Thread Claude Warren
pairs somehow. On Tue, Mar 17, 2020 at 10:34 PM Claude Warren wrote: > Builder discussion: > > Let's go with > > >> Builder with(CharSequence, Charset); > >> Builder withUnencoded(CharSequence); > > Shape Discussion: > > as for getNumberOfBytes() it sho

Re: [BloomFilters] changes to BloomFilter

2020-03-18 Thread Claude Warren
at 11:50 AM Alex Herbert wrote: > > > > On 18 Mar 2020, at 11:14, Claude Warren wrote: > > > > On a slightly different note. CountingBloomFilters have no way to > perform > > a reload. All other bloom filters you can dump the bits and reload > > (trivial)

Re: [BloomFilters] changes to BloomFilter

2020-03-18 Thread Claude Warren
Store values in long blocks or as integers in a list, that sort of thing. Perhaps in a month or so when we really have some idea. On Wed, Mar 18, 2020 at 2:16 PM Claude Warren wrote: > You don't need Iterator iterator() as we have forEachCount( > BitCountConsumer ) > > I guess

Re: [BloomFilters] changes to BloomFilter

2020-03-18 Thread Claude Warren
y Shape. It also may be longer. If you want to create a copy of the byte[] you have to know how long it should be. The only way to determine that is from Shape, and currently only if you do the Ceil() method noted above. There is a convenience in knowing how long (in bytes) the buffer can be. On W

Re: [BloomFilters] changes to BloomFilter

2020-03-18 Thread Claude Warren
bit indexes (via OfInt) and there are ways to reconstruct a BloomFilter if you were to write that out and read it back. On Wed, Mar 18, 2020 at 4:07 PM Alex Herbert wrote: > > > > On 18 Mar 2020, at 14:39, Claude Warren wrote: > > > >>> Shape Discussion: > >&

Re: [BloomFilters] changes to BloomFilter

2020-03-22 Thread Claude Warren
move the BloomFilter > API forward that consolidates the current functionality but makes it > simpler to use for the common case. > > > On 18 Mar 2020, at 17:12, Claude Warren wrote: > > > > bf.getBits() * Long.BYTES may be as long as Math.Ceil( > > Shape.getNumberOf

Re: [BloomFilters] changes to BloomFilter

2020-04-22 Thread Claude Warren
Bloom filters should not use generics. That has been my stated opinion. They are not like other collections in that you don't get out what you put in. They are collections of hashes so the idea that generics should be used to somehow define what goes in is misleading. If commons-collections is s

Re: what became of beanshell in Apache commons?

2020-04-25 Thread Claude Warren
I don't know for certain but here is what I piected together. The project currently resides at: https://github.com/beanshell/beanshell The documentation there says: BeanShell was proposed as an incubator project > to move to Apache > Softwar

Re: [BloomFilters] changes to BloomFilter

2020-05-10 Thread Claude Warren
I keep wondering if Bloom filters belong in Collections. They are not a collection in the standard sense of the word. Would it make more sense to spit it out as a new Commons project? How does one even go about that? On Wed, Apr 22, 2020 at 5:37 PM Alex Herbert wrote: > On Wed, 22 Apr 2020 at

Re: [all] Thoughts on build system maven -> gradle??

2020-07-17 Thread Claude Warren
-1 from me. I have a philosophical objection. Much like HTTP's mod_rewrite[1] gradle's greatest strength is that it allows the developer to do so much in so many ways. But its greatest weakness is that it allows the developer to do so much in so many ways. My experience with Ant and Gradle is t

[Commons-Collections] remove bloom filters?

2021-08-30 Thread Claude Warren
Greetings, I see that the Bloom filter implementation has not been released. It would be in V4.5. I have not had time to come back and clean it up as it should be to make is simpler and faster. I am concerned that there may be an upcoming release of 4.5 which would lock the implementation and

  1   2   3   >