A couple of messages back, I proposed a some documentation updates and a few names changes:
In thinking about the term Producer, other terms could be used Interrogator > (sounds like you can add a query), Extractor might work. But it has also > come to mind that there is a "compute" series of methods in the > ConcurrentMap class. Perhaps the term we want is not "forEach", but > "process". The current form of usage is something like: > IndexProducer ip = .... > ip.forEachIndex(idx -> someIntPredicate) > We could change the name from XProducer to XProcessor, or XExtractor; and > the method to processXs. So the above code would look like: > IndexExtractor ix = .... > ix.processIndexs(idx -> someIntPredicate) > another example > BitMapExtractor bx = ..... > bx.processBitMaps(bitmap -> someBitMapPredicate) So unless there is an objection I will open a ticket to - rename XProducer to XExtractor - rename the XExtractor forEachX method to processX method that takes a Predicate<X> argument - Add the documentation described previously. Are we agreed? Claude On Fri, May 3, 2024 at 5:49 PM Gary Gregory <garydgreg...@gmail.com> wrote: > LGTM. Maybe the current PR (LGTM) should be merged first, Alex, how does > that PR look to you? > > Gary > > On Fri, May 3, 2024, 11:44 AM Claude Warren <cla...@xenei.com> wrote: > >> Gary and Alex, >> >> Any thoughts on this? >> >> Claude >> >> On Wed, May 1, 2024 at 7:55 AM Claude Warren <cla...@xenei.com> wrote: >> >>> Good suggestions. >>> >>> short-circuit. We could make this distinction by including it in the >>>> name: >>>> forEachUntil(Predicate ...), forEachUnless, ... >>> >>> >>> We need the unit name in the method name. All Bloom filters implement >>> IndexProducer and BitmapProducer and since they use Predicate method >>> parameters they will conflict. >>> >>> >>> I have opened a ticket [1] with the list of tasks, which I think is now: >>> >>> - Be clear that producers are like interruptible iterators with >>> predicate tests acting as a switch to short-circuit the iteration. >>> - Rename classes: >>> - CellConsumer to CellPredicate (?) >>> - Rename BitMap to BitMaps. >>> - Rename methods: >>> - Producer forEachX() to forEachUntil() >>> - The semantic nomenclature: >>> - Bitmaps are arrays of bits not a BitMaps object. >>> - Indexes are ints and not an instance of a Collection object. >>> - Cells are pairs of ints representing an index and a value. >>> They are not Pair<> objects. >>> - Producers iterate over collections of the object (Bitmap, >>> Index, Cell) applying a predicate to do work and stop the iteration >>> early >>> if necessary. They are carriers/transporters of Bloom filter enabled >>> bits. They allow us to query the contents of the Bloom filter in an >>> implementation agnostic way. >>> >>> >>> In thinking about the term Producer, other terms could be used >>> Interrogator (sounds like you can add a query), Extractor might work. But >>> it has also come to mind that there is a "compute" series of methods in the >>> ConcurrentMap class. Perhaps the term we want is not "forEach", but >>> "process". The current form of usage is something like: >>> >>> IndexProducer ip = .... >>> ip.forEachIndex(idx -> someIntPredicate) >>> >>> We could change the name from XProducer to XProcessor, or XExtractor; >>> and the method to processXs. So the above code would look like: >>> >>> IndexExtractor ix = .... >>> ix.processIndexs(idx -> someIntPredicate) >>> >>> another example >>> >>> BitMapExtractor bx = ..... >>> bx.processBitMaps(bitmap -> someBitMapPredicate) >>> >>> Claude >>> >>> [1] https://issues.apache.org/jira/browse/COLLECTIONS-854 >>> >>> >>> On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory <ggreg...@apache.org> >>> wrote: >>> >>>> >>>> >>>> On 2024/04/30 14:33:47 Alex Herbert wrote: >>>> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory <ggreg...@apache.org> >>>> wrote: >>>> > >>>> > > Hi Claude, >>>> > > >>>> > > Thank you for the detailed reply :-) A few comments below. >>>> > > >>>> > > On 2024/04/30 06:29:38 Claude Warren wrote: >>>> > > > I will see if I can clarify the javadocs and make things clearer. >>>> > > > >>>> > > > What I think I specifically heard is: >>>> > > > >>>> > > > - Be clear that producers are fast fail iterators with >>>> predicate >>>> > > tests. >>>> > > > - Rename CellConsumer to CellPredicate (?) >>>> > > >>>> > > Agreed (as suggested by Albert) >>>> > > >>>> > > > - The semantic nomenclature: >>>> > > > - Bitmaps are arrays of bits not a BitMap object. >>>> > > > - Indexes are ints and not an instance of a Collection >>>> object. >>>> > > > - Cells are pairs of ints representing an index and a >>>> value. They >>>> > > > are not Pair<> objects. >>>> > > > - Producers iterate over collections of the object (Bitmap, >>>> Index, >>>> > > > Cell) applying a predicate to do work and stop the >>>> iteration early >>>> > > if >>>> > > > necessary. They are carriers/transporters of Bloom filter >>>> enabled >>>> > > bits. >>>> > > > They allow us to query the contents of the Bloom filter in >>>> an >>>> > > > implementation agnostic way. >>>> > > >>>> > > As you say naming is hard. The above is a great example and a good >>>> > > exercise I've gone through at work and in other FOSS projects: >>>> "Producers >>>> > > iterate over collections of the object...". In general when I see >>>> or write >>>> > > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers >>>> run", you >>>> > > get the idea ;-) I know that either the class (or method) name is >>>> bad or >>>> > > the Javadoc/documentation is bad; not _wrong_, just bad in the >>>> sense that >>>> > > it's confusing (to me). >>>> > > >>>> > > I am not advocating for a specific change ATM but I want to discuss >>>> the >>>> > > option because it is possible the current name is not as good as it >>>> could >>>> > > be. It could end up as an acceptable compromise if we cannot use >>>> more Java >>>> > > friendly terms though. >>>> > > >>>> > > Whenever I see a class that implements a "forEach"-kind of method, >>>> I think >>>> > > "Iterable". >>>> > > >>>> > >>>> > Here we should think "Collection", or generally more than 1. In the >>>> Java >>>> > sense an Iterable is something you can walk through to the >>>> > end, possibly removing elements as you go using the Iterator >>>> interface. We >>>> > would not require supporting removal, and we want to control a >>>> > short-circuit. We could make this distinction by including it in the >>>> name: >>>> > forEachUntil(Predicate ...), forEachUnless, ... >>>> >>>> I really like the idea of have the method name reflect the >>>> short-circuit aspect of the operation! >>>> >>>> We do not have to invent a new method name though, Stream uses >>>> "takeWhile(Predicate)" in Java 9: >>>> https://docs.oracle.com/javase/9/docs/api/java/util/stream/Stream.html#takeWhile-java.util.function.Predicate- >>>> >>>> The question becomes whether this name make the class too close to a >>>> Stream and introduces some kind of confusion. If that's the case, then >>>> forEachUntil() is good. >>>> >>>> > >>>> > >>>> > > >>>> > > Note the difference with "Iterator", and I had to lookup the >>>> difference >>>> > > since the former implements "forEach" and the later >>>> "forEachRemaining"! >>>> > > "Iterable" is also a factory of "Iterator"s. >>>> > > >>>> > > Should the Producers ever be implementations of Iterable or >>>> Iterator? >>>> > > Right now, the answer is no because of the short-circuit aspect of >>>> using a >>>> > > predicate. I'm not using the term fail-fast here because I don't >>>> think of >>>> > > the iteration being in error (please tell me if I'm wrong). >>>> > > >>>> > > If not iterable, then we should not use that name as part of the >>>> class >>>> > > name. Generally, the short-circuit aspect of Producers do not make >>>> a bad >>>> > > candidates for implementations of Iterable since it can throw >>>> (unchecked) >>>> > > exceptions. Different for call sites granted, but I'm just >>>> mentioning it >>>> > > for fun. >>>> > > >>>> > >>>> > I already mentioned throwing runtime exceptions for the short-circuit >>>> > functionality, and that it was ruled out on the basis of performance >>>> (given >>>> > a lot of short-circuiting is expected) and convenience for the >>>> caller. I >>>> > don't think we should go there. Design the API for the intended >>>> purpose, >>>> > and not push it into a box that is easily recognisable. >>>> >>>> Sounds good. >>>> >>>> > >>>> > >>>> > > >>>> > > So maybe there's nothing to do. I just want to be clear about it. >>>> For >>>> > > example, I think of "factory" and "producer" as synonyms but in >>>> this case, >>>> > > this is not a traditional application of the factory pattern. >>>> > > >>>> > > As an aside I can see that Producers would not be Streams out of >>>> the box >>>> > > because Stream#filter(Predicate) filters but does not short-circuit >>>> > > iteration. While Stream#findAny() and #findFirst() don't fit the >>>> > > short-circuit bill, we could implement a #findLast() in a Stream >>>> > > implementation. What I do not know is if Streams otherwise fit the >>>> bill of >>>> > > Bloom filters. >>>> > > >>>> > >>>> > In the case of small loops with few instructions per loop, the >>>> overhead of >>>> > Streams is significant. Unfortunately we don't have any performance >>>> tests >>>> > for this library, but I wouldn't change to Streams without knowing it >>>> does >>>> > not impact performance. Performance is a key feature of Bloom filters. >>>> > Otherwise you can achieve some of their functionality with >>>> conventional >>>> > collections. >>>> >>>> OK. >>>> >>>> > >>>> > >>>> > > >>>> > > > >>>> > > > Does that basically cover the confusion? If there are better >>>> terms, >>>> > > let's >>>> > > > hash them out now before I update the javadocs. >>>> > > > >>>> > > > As an aside, Cells and Bitmaps are referenced in the literature. >>>> For the >>>> > > > most part the rest is made up out of whole cloth. So we could >>>> change >>>> > > > "Producer" to something else but we would need a good name. >>>> > > >>>> > > We have a class called BitMap and methods that use "BitMap" in the >>>> same >>>> > > but I think I am more comfortable with the term reuse now. >>>> > > >>>> > > The question that remains is must it be public? Since the Javadoc >>>> mentions >>>> > > it is about indices and bit positions, could all these methods be >>>> moved to >>>> > > the package-private IndexUtils? My concern is to reduce the public >>>> and >>>> > > protected API surface we will have to support and keep for the >>>> lifetime of >>>> > > the 4.x code base. >>>> > > >>>> > >>>> > It was originally package-private and Claude requested it be made >>>> public >>>> > for general re-use. >>>> > >>>> > We could use the name BitMaps, in keeping with the pluralisation of >>>> classes >>>> > seen in the JDK when working with elements of a given type: Executors, >>>> > Collections, etc. >>>> >>>> Agreed! I really like the plural convention for these static utility >>>> classes. >>>> >>>> Gary >>>> >>>> > >>>> > Alex >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>> >>>> >>> >>> -- >>> LinkedIn: http://www.linkedin.com/in/claudewarren >>> >> >> >> -- >> LinkedIn: http://www.linkedin.com/in/claudewarren >> > -- LinkedIn: http://www.linkedin.com/in/claudewarren