Gary and Alex, Any thoughts on this?
Claude On Wed, May 1, 2024 at 7:55 AM Claude Warren <cla...@xenei.com> wrote: > Good suggestions. > > short-circuit. We could make this distinction by including it in the name: >> forEachUntil(Predicate ...), forEachUnless, ... > > > We need the unit name in the method name. All Bloom filters implement > IndexProducer and BitmapProducer and since they use Predicate method > parameters they will conflict. > > > I have opened a ticket [1] with the list of tasks, which I think is now: > > - Be clear that producers are like interruptible iterators with > predicate tests acting as a switch to short-circuit the iteration. > - Rename classes: > - CellConsumer to CellPredicate (?) > - Rename BitMap to BitMaps. > - Rename methods: > - Producer forEachX() to forEachUntil() > - The semantic nomenclature: > - Bitmaps are arrays of bits not a BitMaps object. > - Indexes are ints and not an instance of a Collection object. > - Cells are pairs of ints representing an index and a value. They > are not Pair<> objects. > - Producers iterate over collections of the object (Bitmap, Index, > Cell) applying a predicate to do work and stop the iteration early if > necessary. They are carriers/transporters of Bloom filter enabled bits. > They allow us to query the contents of the Bloom filter in an > implementation agnostic way. > > > In thinking about the term Producer, other terms could be used > Interrogator (sounds like you can add a query), Extractor might work. But > it has also come to mind that there is a "compute" series of methods in the > ConcurrentMap class. Perhaps the term we want is not "forEach", but > "process". The current form of usage is something like: > > IndexProducer ip = .... > ip.forEachIndex(idx -> someIntPredicate) > > We could change the name from XProducer to XProcessor, or XExtractor; and > the method to processXs. So the above code would look like: > > IndexExtractor ix = .... > ix.processIndexs(idx -> someIntPredicate) > > another example > > BitMapExtractor bx = ..... > bx.processBitMaps(bitmap -> someBitMapPredicate) > > Claude > > [1] https://issues.apache.org/jira/browse/COLLECTIONS-854 > > > On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory <ggreg...@apache.org> > wrote: > >> >> >> On 2024/04/30 14:33:47 Alex Herbert wrote: >> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory <ggreg...@apache.org> >> wrote: >> > >> > > Hi Claude, >> > > >> > > Thank you for the detailed reply :-) A few comments below. >> > > >> > > On 2024/04/30 06:29:38 Claude Warren wrote: >> > > > I will see if I can clarify the javadocs and make things clearer. >> > > > >> > > > What I think I specifically heard is: >> > > > >> > > > - Be clear that producers are fast fail iterators with predicate >> > > tests. >> > > > - Rename CellConsumer to CellPredicate (?) >> > > >> > > Agreed (as suggested by Albert) >> > > >> > > > - The semantic nomenclature: >> > > > - Bitmaps are arrays of bits not a BitMap object. >> > > > - Indexes are ints and not an instance of a Collection object. >> > > > - Cells are pairs of ints representing an index and a value. >> They >> > > > are not Pair<> objects. >> > > > - Producers iterate over collections of the object (Bitmap, >> Index, >> > > > Cell) applying a predicate to do work and stop the iteration >> early >> > > if >> > > > necessary. They are carriers/transporters of Bloom filter >> enabled >> > > bits. >> > > > They allow us to query the contents of the Bloom filter in an >> > > > implementation agnostic way. >> > > >> > > As you say naming is hard. The above is a great example and a good >> > > exercise I've gone through at work and in other FOSS projects: >> "Producers >> > > iterate over collections of the object...". In general when I see or >> write >> > > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers run", >> you >> > > get the idea ;-) I know that either the class (or method) name is bad >> or >> > > the Javadoc/documentation is bad; not _wrong_, just bad in the sense >> that >> > > it's confusing (to me). >> > > >> > > I am not advocating for a specific change ATM but I want to discuss >> the >> > > option because it is possible the current name is not as good as it >> could >> > > be. It could end up as an acceptable compromise if we cannot use more >> Java >> > > friendly terms though. >> > > >> > > Whenever I see a class that implements a "forEach"-kind of method, I >> think >> > > "Iterable". >> > > >> > >> > Here we should think "Collection", or generally more than 1. In the Java >> > sense an Iterable is something you can walk through to the >> > end, possibly removing elements as you go using the Iterator interface. >> We >> > would not require supporting removal, and we want to control a >> > short-circuit. We could make this distinction by including it in the >> name: >> > forEachUntil(Predicate ...), forEachUnless, ... >> >> I really like the idea of have the method name reflect the short-circuit >> aspect of the operation! >> >> We do not have to invent a new method name though, Stream uses >> "takeWhile(Predicate)" in Java 9: >> https://docs.oracle.com/javase/9/docs/api/java/util/stream/Stream.html#takeWhile-java.util.function.Predicate- >> >> The question becomes whether this name make the class too close to a >> Stream and introduces some kind of confusion. If that's the case, then >> forEachUntil() is good. >> >> > >> > >> > > >> > > Note the difference with "Iterator", and I had to lookup the >> difference >> > > since the former implements "forEach" and the later >> "forEachRemaining"! >> > > "Iterable" is also a factory of "Iterator"s. >> > > >> > > Should the Producers ever be implementations of Iterable or Iterator? >> > > Right now, the answer is no because of the short-circuit aspect of >> using a >> > > predicate. I'm not using the term fail-fast here because I don't >> think of >> > > the iteration being in error (please tell me if I'm wrong). >> > > >> > > If not iterable, then we should not use that name as part of the class >> > > name. Generally, the short-circuit aspect of Producers do not make a >> bad >> > > candidates for implementations of Iterable since it can throw >> (unchecked) >> > > exceptions. Different for call sites granted, but I'm just mentioning >> it >> > > for fun. >> > > >> > >> > I already mentioned throwing runtime exceptions for the short-circuit >> > functionality, and that it was ruled out on the basis of performance >> (given >> > a lot of short-circuiting is expected) and convenience for the caller. I >> > don't think we should go there. Design the API for the intended purpose, >> > and not push it into a box that is easily recognisable. >> >> Sounds good. >> >> > >> > >> > > >> > > So maybe there's nothing to do. I just want to be clear about it. For >> > > example, I think of "factory" and "producer" as synonyms but in this >> case, >> > > this is not a traditional application of the factory pattern. >> > > >> > > As an aside I can see that Producers would not be Streams out of the >> box >> > > because Stream#filter(Predicate) filters but does not short-circuit >> > > iteration. While Stream#findAny() and #findFirst() don't fit the >> > > short-circuit bill, we could implement a #findLast() in a Stream >> > > implementation. What I do not know is if Streams otherwise fit the >> bill of >> > > Bloom filters. >> > > >> > >> > In the case of small loops with few instructions per loop, the overhead >> of >> > Streams is significant. Unfortunately we don't have any performance >> tests >> > for this library, but I wouldn't change to Streams without knowing it >> does >> > not impact performance. Performance is a key feature of Bloom filters. >> > Otherwise you can achieve some of their functionality with conventional >> > collections. >> >> OK. >> >> > >> > >> > > >> > > > >> > > > Does that basically cover the confusion? If there are better >> terms, >> > > let's >> > > > hash them out now before I update the javadocs. >> > > > >> > > > As an aside, Cells and Bitmaps are referenced in the literature. >> For the >> > > > most part the rest is made up out of whole cloth. So we could >> change >> > > > "Producer" to something else but we would need a good name. >> > > >> > > We have a class called BitMap and methods that use "BitMap" in the >> same >> > > but I think I am more comfortable with the term reuse now. >> > > >> > > The question that remains is must it be public? Since the Javadoc >> mentions >> > > it is about indices and bit positions, could all these methods be >> moved to >> > > the package-private IndexUtils? My concern is to reduce the public and >> > > protected API surface we will have to support and keep for the >> lifetime of >> > > the 4.x code base. >> > > >> > >> > It was originally package-private and Claude requested it be made public >> > for general re-use. >> > >> > We could use the name BitMaps, in keeping with the pluralisation of >> classes >> > seen in the JDK when working with elements of a given type: Executors, >> > Collections, etc. >> >> Agreed! I really like the plural convention for these static utility >> classes. >> >> Gary >> >> > >> > Alex >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> > > -- > LinkedIn: http://www.linkedin.com/in/claudewarren > -- LinkedIn: http://www.linkedin.com/in/claudewarren