A couple of messages back, I proposed a some documentation updates and a
few names changes:

In thinking about the term Producer, other terms could be used Interrogator
> (sounds like you can add a query), Extractor might work.  But it has also
> come to mind that there is a "compute" series of methods in the
> ConcurrentMap class.  Perhaps the term we want is not "forEach", but
> "process".  The current form of usage is something like:
> IndexProducer ip = ....
> ip.forEachIndex(idx -> someIntPredicate)
> We could change the name from XProducer to XProcessor, or XExtractor; and
> the method to processXs.  So the above code would look like:
> IndexExtractor ix = ....
> ix.processIndexs(idx -> someIntPredicate)
> another example
> BitMapExtractor bx = .....
> bx.processBitMaps(bitmap -> someBitMapPredicate)


So unless there is an objection I will open a ticket to

   - rename XProducer to XExtractor
   - rename the XExtractor forEachX method to processX method that takes a
   Predicate<X> argument
   - Add the documentation described previously.

Are we agreed?
Claude

On Fri, May 3, 2024 at 5:49 PM Gary Gregory <garydgreg...@gmail.com> wrote:

> LGTM. Maybe the current PR (LGTM) should be merged first, Alex, how does
> that PR look to you?
>
> Gary
>
> On Fri, May 3, 2024, 11:44 AM Claude Warren <cla...@xenei.com> wrote:
>
>> Gary and Alex,
>>
>> Any thoughts on this?
>>
>> Claude
>>
>> On Wed, May 1, 2024 at 7:55 AM Claude Warren <cla...@xenei.com> wrote:
>>
>>> Good suggestions.
>>>
>>> short-circuit. We could make this distinction by including it in the
>>>> name:
>>>> forEachUntil(Predicate ...), forEachUnless, ...
>>>
>>>
>>> We need the unit name in the method name.  All Bloom filters implement
>>> IndexProducer and BitmapProducer and since they use Predicate method
>>> parameters they will conflict.
>>>
>>>
>>> I have opened a ticket [1] with the list of tasks, which I think is now:
>>>
>>>    - Be clear that producers are like interruptible iterators with
>>>    predicate tests acting as a switch to short-circuit the iteration.
>>>    - Rename classes:
>>>       - CellConsumer to CellPredicate (?)
>>>       - Rename BitMap to BitMaps.
>>>    - Rename methods:
>>>       - Producer forEachX() to forEachUntil()
>>>       - The semantic nomenclature:
>>>       - Bitmaps are arrays of bits not a BitMaps object.
>>>       - Indexes are ints and not an instance of a Collection object.
>>>       - Cells are pairs of ints representing an index and a value.
>>>       They are not Pair<> objects.
>>>       - Producers iterate over collections of the object (Bitmap,
>>>       Index, Cell) applying a predicate to do work and stop the iteration 
>>> early
>>>       if necessary.  They are carriers/transporters of Bloom filter enabled
>>>       bits.  They allow us to query the contents of the Bloom filter in an
>>>       implementation agnostic way.
>>>
>>>
>>> In thinking about the term Producer, other terms could be used
>>> Interrogator (sounds like you can add a query), Extractor might work.  But
>>> it has also come to mind that there is a "compute" series of methods in the
>>> ConcurrentMap class.  Perhaps the term we want is not "forEach", but
>>> "process".  The current form of usage is something like:
>>>
>>> IndexProducer ip = ....
>>> ip.forEachIndex(idx -> someIntPredicate)
>>>
>>> We could change the name from XProducer to XProcessor, or XExtractor;
>>> and the method to processXs.  So the above code would look like:
>>>
>>> IndexExtractor ix = ....
>>> ix.processIndexs(idx -> someIntPredicate)
>>>
>>> another example
>>>
>>> BitMapExtractor bx = .....
>>> bx.processBitMaps(bitmap -> someBitMapPredicate)
>>>
>>> Claude
>>>
>>> [1] https://issues.apache.org/jira/browse/COLLECTIONS-854
>>>
>>>
>>> On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory <ggreg...@apache.org>
>>> wrote:
>>>
>>>>
>>>>
>>>> On 2024/04/30 14:33:47 Alex Herbert wrote:
>>>> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory <ggreg...@apache.org>
>>>> wrote:
>>>> >
>>>> > > Hi Claude,
>>>> > >
>>>> > > Thank you for the detailed reply :-) A few comments below.
>>>> > >
>>>> > > On 2024/04/30 06:29:38 Claude Warren wrote:
>>>> > > > I will see if I can clarify the javadocs and make things clearer.
>>>> > > >
>>>> > > > What I think I specifically heard is:
>>>> > > >
>>>> > > >    - Be clear that producers are fast fail iterators with
>>>> predicate
>>>> > > tests.
>>>> > > >    - Rename CellConsumer to CellPredicate (?)
>>>> > >
>>>> > > Agreed (as suggested by Albert)
>>>> > >
>>>> > > >    - The semantic nomenclature:
>>>> > > >       - Bitmaps are arrays of bits not a BitMap object.
>>>> > > >       - Indexes are ints and not an instance of a Collection
>>>> object.
>>>> > > >       - Cells are pairs of ints representing an index and a
>>>> value.  They
>>>> > > >       are not Pair<> objects.
>>>> > > >       - Producers iterate over collections of the object (Bitmap,
>>>> Index,
>>>> > > >       Cell) applying a predicate to do work and stop the
>>>> iteration early
>>>> > > if
>>>> > > >       necessary.  They are carriers/transporters of Bloom filter
>>>> enabled
>>>> > > bits.
>>>> > > >       They allow us to query the contents of the Bloom filter in
>>>> an
>>>> > > >       implementation agnostic way.
>>>> > >
>>>> > > As you say naming is hard. The above is a great example and a good
>>>> > > exercise I've gone through at work and in other FOSS projects:
>>>> "Producers
>>>> > > iterate over collections of the object...". In general when I see
>>>> or write
>>>> > > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers
>>>> run", you
>>>> > > get the idea ;-) I know that either the class (or method) name is
>>>> bad or
>>>> > > the Javadoc/documentation is bad; not _wrong_, just bad in the
>>>> sense that
>>>> > > it's confusing (to me).
>>>> > >
>>>> > > I am not advocating for a specific change ATM but I want to discuss
>>>> the
>>>> > > option because it is possible the current name is not as good as it
>>>> could
>>>> > > be. It could end up as an acceptable compromise if we cannot use
>>>> more Java
>>>> > > friendly terms though.
>>>> > >
>>>> > > Whenever I see a class that implements a "forEach"-kind of method,
>>>> I think
>>>> > > "Iterable".
>>>> > >
>>>> >
>>>> > Here we should think "Collection", or generally more than 1. In the
>>>> Java
>>>> > sense an Iterable is something you can walk through to the
>>>> > end, possibly removing elements as you go using the Iterator
>>>> interface. We
>>>> > would not require supporting removal, and we want to control a
>>>> > short-circuit. We could make this distinction by including it in the
>>>> name:
>>>> > forEachUntil(Predicate ...), forEachUnless, ...
>>>>
>>>> I really like the idea of have the method name reflect the
>>>> short-circuit aspect of the operation!
>>>>
>>>> We do not have to invent a new method name though, Stream uses
>>>> "takeWhile(Predicate)" in Java 9:
>>>> https://docs.oracle.com/javase/9/docs/api/java/util/stream/Stream.html#takeWhile-java.util.function.Predicate-
>>>>
>>>> The question becomes whether this name make the class too close to a
>>>> Stream and introduces some kind of confusion. If that's the case, then
>>>> forEachUntil() is good.
>>>>
>>>> >
>>>> >
>>>> > >
>>>> > > Note the difference with "Iterator", and I had to lookup the
>>>> difference
>>>> > > since the former implements "forEach" and the  later
>>>> "forEachRemaining"!
>>>> > > "Iterable" is also a factory of "Iterator"s.
>>>> > >
>>>> > > Should the Producers ever be implementations of Iterable or
>>>> Iterator?
>>>> > > Right now, the answer is no because of the short-circuit aspect of
>>>> using a
>>>> > > predicate. I'm not using the term fail-fast here because I don't
>>>> think of
>>>> > > the iteration being in error (please tell me if I'm wrong).
>>>> > >
>>>> > > If not iterable, then we should not use that name as part of the
>>>> class
>>>> > > name. Generally, the short-circuit aspect of Producers do not make
>>>> a bad
>>>> > > candidates for implementations of Iterable since it can throw
>>>> (unchecked)
>>>> > > exceptions. Different for call sites granted, but I'm just
>>>> mentioning it
>>>> > > for fun.
>>>> > >
>>>> >
>>>> > I already mentioned throwing runtime exceptions for the short-circuit
>>>> > functionality, and that it was ruled out on the basis of performance
>>>> (given
>>>> > a lot of short-circuiting is expected) and convenience for the
>>>> caller. I
>>>> > don't think we should go there. Design the API for the intended
>>>> purpose,
>>>> > and not push it into a box that is easily recognisable.
>>>>
>>>> Sounds good.
>>>>
>>>> >
>>>> >
>>>> > >
>>>> > > So maybe there's nothing to do. I just want to be clear about it.
>>>> For
>>>> > > example, I think of "factory" and "producer" as synonyms but in
>>>> this case,
>>>> > > this is not a traditional application of the factory pattern.
>>>> > >
>>>> > > As an aside I can see that Producers would not be Streams out of
>>>> the box
>>>> > > because Stream#filter(Predicate) filters but does not short-circuit
>>>> > > iteration. While Stream#findAny() and #findFirst() don't fit the
>>>> > > short-circuit bill, we could implement a #findLast() in a Stream
>>>> > > implementation. What I do not know is if Streams otherwise fit the
>>>> bill of
>>>> > > Bloom filters.
>>>> > >
>>>> >
>>>> > In the case of small loops with few instructions per loop, the
>>>> overhead of
>>>> > Streams is significant. Unfortunately we don't have any performance
>>>> tests
>>>> > for this library, but I wouldn't change to Streams without knowing it
>>>> does
>>>> > not impact performance. Performance is a key feature of Bloom filters.
>>>> > Otherwise you can achieve some of their functionality with
>>>> conventional
>>>> > collections.
>>>>
>>>> OK.
>>>>
>>>> >
>>>> >
>>>> > >
>>>> > > >
>>>> > > > Does that basically cover the confusion?   If there are better
>>>> terms,
>>>> > > let's
>>>> > > > hash them out now before I update the javadocs.
>>>> > > >
>>>> > > > As an aside, Cells and Bitmaps are referenced in the literature.
>>>> For the
>>>> > > > most part the rest is made up out of whole cloth.  So we could
>>>> change
>>>> > > > "Producer" to something else but we would need a good name.
>>>> > >
>>>> > > We have a class called BitMap and methods that use "BitMap" in the
>>>> same
>>>> > > but I think I am more comfortable with the term reuse now.
>>>> > >
>>>> > > The question that remains is must it be public? Since the Javadoc
>>>> mentions
>>>> > > it is about indices and bit positions, could all these methods be
>>>> moved to
>>>> > > the package-private IndexUtils? My concern is to reduce the public
>>>> and
>>>> > > protected API surface we will have to support and keep for the
>>>> lifetime of
>>>> > > the 4.x code base.
>>>> > >
>>>> >
>>>> > It was originally package-private and Claude requested it be made
>>>> public
>>>> > for general re-use.
>>>> >
>>>> > We could use the name BitMaps, in keeping with the pluralisation of
>>>> classes
>>>> > seen in the JDK when working with elements of a given type: Executors,
>>>> > Collections, etc.
>>>>
>>>> Agreed! I really like the plural convention for these static utility
>>>> classes.
>>>>
>>>> Gary
>>>>
>>>> >
>>>> > Alex
>>>> >
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>>
>>>>
>>>
>>> --
>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>>
>>
>>
>> --
>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>
>

-- 
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to