I have updated Collections-854 [1] to reflect the naming that we have been talking about and will start on the refactoring soon. Please start watching that ticket.
Claude [1] https://issues.apache.org/jira/browse/COLLECTIONS-854 On Mon, May 13, 2024 at 12:33 PM Claude Warren <cla...@xenei.com> wrote: > A couple of messages back, I proposed a some documentation updates and a > few names changes: > > In thinking about the term Producer, other terms could be used >> Interrogator (sounds like you can add a query), Extractor might work. But >> it has also come to mind that there is a "compute" series of methods in the >> ConcurrentMap class. Perhaps the term we want is not "forEach", but >> "process". The current form of usage is something like: >> IndexProducer ip = .... >> ip.forEachIndex(idx -> someIntPredicate) >> We could change the name from XProducer to XProcessor, or XExtractor; and >> the method to processXs. So the above code would look like: >> IndexExtractor ix = .... >> ix.processIndexs(idx -> someIntPredicate) >> another example >> BitMapExtractor bx = ..... >> bx.processBitMaps(bitmap -> someBitMapPredicate) > > > So unless there is an objection I will open a ticket to > > - rename XProducer to XExtractor > - rename the XExtractor forEachX method to processX method that takes > a Predicate<X> argument > - Add the documentation described previously. > > Are we agreed? > Claude > > On Fri, May 3, 2024 at 5:49 PM Gary Gregory <garydgreg...@gmail.com> > wrote: > >> LGTM. Maybe the current PR (LGTM) should be merged first, Alex, how does >> that PR look to you? >> >> Gary >> >> On Fri, May 3, 2024, 11:44 AM Claude Warren <cla...@xenei.com> wrote: >> >>> Gary and Alex, >>> >>> Any thoughts on this? >>> >>> Claude >>> >>> On Wed, May 1, 2024 at 7:55 AM Claude Warren <cla...@xenei.com> wrote: >>> >>>> Good suggestions. >>>> >>>> short-circuit. We could make this distinction by including it in the >>>>> name: >>>>> forEachUntil(Predicate ...), forEachUnless, ... >>>> >>>> >>>> We need the unit name in the method name. All Bloom filters implement >>>> IndexProducer and BitmapProducer and since they use Predicate method >>>> parameters they will conflict. >>>> >>>> >>>> I have opened a ticket [1] with the list of tasks, which I think is now: >>>> >>>> - Be clear that producers are like interruptible iterators with >>>> predicate tests acting as a switch to short-circuit the iteration. >>>> - Rename classes: >>>> - CellConsumer to CellPredicate (?) >>>> - Rename BitMap to BitMaps. >>>> - Rename methods: >>>> - Producer forEachX() to forEachUntil() >>>> - The semantic nomenclature: >>>> - Bitmaps are arrays of bits not a BitMaps object. >>>> - Indexes are ints and not an instance of a Collection object. >>>> - Cells are pairs of ints representing an index and a value. >>>> They are not Pair<> objects. >>>> - Producers iterate over collections of the object (Bitmap, >>>> Index, Cell) applying a predicate to do work and stop the iteration >>>> early >>>> if necessary. They are carriers/transporters of Bloom filter enabled >>>> bits. They allow us to query the contents of the Bloom filter in an >>>> implementation agnostic way. >>>> >>>> >>>> In thinking about the term Producer, other terms could be used >>>> Interrogator (sounds like you can add a query), Extractor might work. But >>>> it has also come to mind that there is a "compute" series of methods in the >>>> ConcurrentMap class. Perhaps the term we want is not "forEach", but >>>> "process". The current form of usage is something like: >>>> >>>> IndexProducer ip = .... >>>> ip.forEachIndex(idx -> someIntPredicate) >>>> >>>> We could change the name from XProducer to XProcessor, or XExtractor; >>>> and the method to processXs. So the above code would look like: >>>> >>>> IndexExtractor ix = .... >>>> ix.processIndexs(idx -> someIntPredicate) >>>> >>>> another example >>>> >>>> BitMapExtractor bx = ..... >>>> bx.processBitMaps(bitmap -> someBitMapPredicate) >>>> >>>> Claude >>>> >>>> [1] https://issues.apache.org/jira/browse/COLLECTIONS-854 >>>> >>>> >>>> On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory <ggreg...@apache.org> >>>> wrote: >>>> >>>>> >>>>> >>>>> On 2024/04/30 14:33:47 Alex Herbert wrote: >>>>> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory <ggreg...@apache.org> >>>>> wrote: >>>>> > >>>>> > > Hi Claude, >>>>> > > >>>>> > > Thank you for the detailed reply :-) A few comments below. >>>>> > > >>>>> > > On 2024/04/30 06:29:38 Claude Warren wrote: >>>>> > > > I will see if I can clarify the javadocs and make things clearer. >>>>> > > > >>>>> > > > What I think I specifically heard is: >>>>> > > > >>>>> > > > - Be clear that producers are fast fail iterators with >>>>> predicate >>>>> > > tests. >>>>> > > > - Rename CellConsumer to CellPredicate (?) >>>>> > > >>>>> > > Agreed (as suggested by Albert) >>>>> > > >>>>> > > > - The semantic nomenclature: >>>>> > > > - Bitmaps are arrays of bits not a BitMap object. >>>>> > > > - Indexes are ints and not an instance of a Collection >>>>> object. >>>>> > > > - Cells are pairs of ints representing an index and a >>>>> value. They >>>>> > > > are not Pair<> objects. >>>>> > > > - Producers iterate over collections of the object >>>>> (Bitmap, Index, >>>>> > > > Cell) applying a predicate to do work and stop the >>>>> iteration early >>>>> > > if >>>>> > > > necessary. They are carriers/transporters of Bloom filter >>>>> enabled >>>>> > > bits. >>>>> > > > They allow us to query the contents of the Bloom filter in >>>>> an >>>>> > > > implementation agnostic way. >>>>> > > >>>>> > > As you say naming is hard. The above is a great example and a good >>>>> > > exercise I've gone through at work and in other FOSS projects: >>>>> "Producers >>>>> > > iterate over collections of the object...". In general when I see >>>>> or write >>>>> > > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers >>>>> run", you >>>>> > > get the idea ;-) I know that either the class (or method) name is >>>>> bad or >>>>> > > the Javadoc/documentation is bad; not _wrong_, just bad in the >>>>> sense that >>>>> > > it's confusing (to me). >>>>> > > >>>>> > > I am not advocating for a specific change ATM but I want to >>>>> discuss the >>>>> > > option because it is possible the current name is not as good as >>>>> it could >>>>> > > be. It could end up as an acceptable compromise if we cannot use >>>>> more Java >>>>> > > friendly terms though. >>>>> > > >>>>> > > Whenever I see a class that implements a "forEach"-kind of method, >>>>> I think >>>>> > > "Iterable". >>>>> > > >>>>> > >>>>> > Here we should think "Collection", or generally more than 1. In the >>>>> Java >>>>> > sense an Iterable is something you can walk through to the >>>>> > end, possibly removing elements as you go using the Iterator >>>>> interface. We >>>>> > would not require supporting removal, and we want to control a >>>>> > short-circuit. We could make this distinction by including it in the >>>>> name: >>>>> > forEachUntil(Predicate ...), forEachUnless, ... >>>>> >>>>> I really like the idea of have the method name reflect the >>>>> short-circuit aspect of the operation! >>>>> >>>>> We do not have to invent a new method name though, Stream uses >>>>> "takeWhile(Predicate)" in Java 9: >>>>> https://docs.oracle.com/javase/9/docs/api/java/util/stream/Stream.html#takeWhile-java.util.function.Predicate- >>>>> >>>>> The question becomes whether this name make the class too close to a >>>>> Stream and introduces some kind of confusion. If that's the case, then >>>>> forEachUntil() is good. >>>>> >>>>> > >>>>> > >>>>> > > >>>>> > > Note the difference with "Iterator", and I had to lookup the >>>>> difference >>>>> > > since the former implements "forEach" and the later >>>>> "forEachRemaining"! >>>>> > > "Iterable" is also a factory of "Iterator"s. >>>>> > > >>>>> > > Should the Producers ever be implementations of Iterable or >>>>> Iterator? >>>>> > > Right now, the answer is no because of the short-circuit aspect of >>>>> using a >>>>> > > predicate. I'm not using the term fail-fast here because I don't >>>>> think of >>>>> > > the iteration being in error (please tell me if I'm wrong). >>>>> > > >>>>> > > If not iterable, then we should not use that name as part of the >>>>> class >>>>> > > name. Generally, the short-circuit aspect of Producers do not make >>>>> a bad >>>>> > > candidates for implementations of Iterable since it can throw >>>>> (unchecked) >>>>> > > exceptions. Different for call sites granted, but I'm just >>>>> mentioning it >>>>> > > for fun. >>>>> > > >>>>> > >>>>> > I already mentioned throwing runtime exceptions for the short-circuit >>>>> > functionality, and that it was ruled out on the basis of performance >>>>> (given >>>>> > a lot of short-circuiting is expected) and convenience for the >>>>> caller. I >>>>> > don't think we should go there. Design the API for the intended >>>>> purpose, >>>>> > and not push it into a box that is easily recognisable. >>>>> >>>>> Sounds good. >>>>> >>>>> > >>>>> > >>>>> > > >>>>> > > So maybe there's nothing to do. I just want to be clear about it. >>>>> For >>>>> > > example, I think of "factory" and "producer" as synonyms but in >>>>> this case, >>>>> > > this is not a traditional application of the factory pattern. >>>>> > > >>>>> > > As an aside I can see that Producers would not be Streams out of >>>>> the box >>>>> > > because Stream#filter(Predicate) filters but does not short-circuit >>>>> > > iteration. While Stream#findAny() and #findFirst() don't fit the >>>>> > > short-circuit bill, we could implement a #findLast() in a Stream >>>>> > > implementation. What I do not know is if Streams otherwise fit the >>>>> bill of >>>>> > > Bloom filters. >>>>> > > >>>>> > >>>>> > In the case of small loops with few instructions per loop, the >>>>> overhead of >>>>> > Streams is significant. Unfortunately we don't have any performance >>>>> tests >>>>> > for this library, but I wouldn't change to Streams without knowing >>>>> it does >>>>> > not impact performance. Performance is a key feature of Bloom >>>>> filters. >>>>> > Otherwise you can achieve some of their functionality with >>>>> conventional >>>>> > collections. >>>>> >>>>> OK. >>>>> >>>>> > >>>>> > >>>>> > > >>>>> > > > >>>>> > > > Does that basically cover the confusion? If there are better >>>>> terms, >>>>> > > let's >>>>> > > > hash them out now before I update the javadocs. >>>>> > > > >>>>> > > > As an aside, Cells and Bitmaps are referenced in the >>>>> literature. For the >>>>> > > > most part the rest is made up out of whole cloth. So we could >>>>> change >>>>> > > > "Producer" to something else but we would need a good name. >>>>> > > >>>>> > > We have a class called BitMap and methods that use "BitMap" in the >>>>> same >>>>> > > but I think I am more comfortable with the term reuse now. >>>>> > > >>>>> > > The question that remains is must it be public? Since the Javadoc >>>>> mentions >>>>> > > it is about indices and bit positions, could all these methods be >>>>> moved to >>>>> > > the package-private IndexUtils? My concern is to reduce the public >>>>> and >>>>> > > protected API surface we will have to support and keep for the >>>>> lifetime of >>>>> > > the 4.x code base. >>>>> > > >>>>> > >>>>> > It was originally package-private and Claude requested it be made >>>>> public >>>>> > for general re-use. >>>>> > >>>>> > We could use the name BitMaps, in keeping with the pluralisation of >>>>> classes >>>>> > seen in the JDK when working with elements of a given type: >>>>> Executors, >>>>> > Collections, etc. >>>>> >>>>> Agreed! I really like the plural convention for these static utility >>>>> classes. >>>>> >>>>> Gary >>>>> >>>>> > >>>>> > Alex >>>>> > >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>> >>>>> >>>> >>>> -- >>>> LinkedIn: http://www.linkedin.com/in/claudewarren >>>> >>> >>> >>> -- >>> LinkedIn: http://www.linkedin.com/in/claudewarren >>> >> > > -- > LinkedIn: http://www.linkedin.com/in/claudewarren > -- LinkedIn: http://www.linkedin.com/in/claudewarren