I have updated Collections-854 [1] to reflect the naming that we have been
talking about and will start on the refactoring soon.  Please start
watching that ticket.

Claude

[1]  https://issues.apache.org/jira/browse/COLLECTIONS-854


On Mon, May 13, 2024 at 12:33 PM Claude Warren <cla...@xenei.com> wrote:

> A couple of messages back, I proposed a some documentation updates and a
> few names changes:
>
> In thinking about the term Producer, other terms could be used
>> Interrogator (sounds like you can add a query), Extractor might work.  But
>> it has also come to mind that there is a "compute" series of methods in the
>> ConcurrentMap class.  Perhaps the term we want is not "forEach", but
>> "process".  The current form of usage is something like:
>> IndexProducer ip = ....
>> ip.forEachIndex(idx -> someIntPredicate)
>> We could change the name from XProducer to XProcessor, or XExtractor; and
>> the method to processXs.  So the above code would look like:
>> IndexExtractor ix = ....
>> ix.processIndexs(idx -> someIntPredicate)
>> another example
>> BitMapExtractor bx = .....
>> bx.processBitMaps(bitmap -> someBitMapPredicate)
>
>
> So unless there is an objection I will open a ticket to
>
>    - rename XProducer to XExtractor
>    - rename the XExtractor forEachX method to processX method that takes
>    a Predicate<X> argument
>    - Add the documentation described previously.
>
> Are we agreed?
> Claude
>
> On Fri, May 3, 2024 at 5:49 PM Gary Gregory <garydgreg...@gmail.com>
> wrote:
>
>> LGTM. Maybe the current PR (LGTM) should be merged first, Alex, how does
>> that PR look to you?
>>
>> Gary
>>
>> On Fri, May 3, 2024, 11:44 AM Claude Warren <cla...@xenei.com> wrote:
>>
>>> Gary and Alex,
>>>
>>> Any thoughts on this?
>>>
>>> Claude
>>>
>>> On Wed, May 1, 2024 at 7:55 AM Claude Warren <cla...@xenei.com> wrote:
>>>
>>>> Good suggestions.
>>>>
>>>> short-circuit. We could make this distinction by including it in the
>>>>> name:
>>>>> forEachUntil(Predicate ...), forEachUnless, ...
>>>>
>>>>
>>>> We need the unit name in the method name.  All Bloom filters implement
>>>> IndexProducer and BitmapProducer and since they use Predicate method
>>>> parameters they will conflict.
>>>>
>>>>
>>>> I have opened a ticket [1] with the list of tasks, which I think is now:
>>>>
>>>>    - Be clear that producers are like interruptible iterators with
>>>>    predicate tests acting as a switch to short-circuit the iteration.
>>>>    - Rename classes:
>>>>       - CellConsumer to CellPredicate (?)
>>>>       - Rename BitMap to BitMaps.
>>>>    - Rename methods:
>>>>       - Producer forEachX() to forEachUntil()
>>>>       - The semantic nomenclature:
>>>>       - Bitmaps are arrays of bits not a BitMaps object.
>>>>       - Indexes are ints and not an instance of a Collection object.
>>>>       - Cells are pairs of ints representing an index and a value.
>>>>       They are not Pair<> objects.
>>>>       - Producers iterate over collections of the object (Bitmap,
>>>>       Index, Cell) applying a predicate to do work and stop the iteration 
>>>> early
>>>>       if necessary.  They are carriers/transporters of Bloom filter enabled
>>>>       bits.  They allow us to query the contents of the Bloom filter in an
>>>>       implementation agnostic way.
>>>>
>>>>
>>>> In thinking about the term Producer, other terms could be used
>>>> Interrogator (sounds like you can add a query), Extractor might work.  But
>>>> it has also come to mind that there is a "compute" series of methods in the
>>>> ConcurrentMap class.  Perhaps the term we want is not "forEach", but
>>>> "process".  The current form of usage is something like:
>>>>
>>>> IndexProducer ip = ....
>>>> ip.forEachIndex(idx -> someIntPredicate)
>>>>
>>>> We could change the name from XProducer to XProcessor, or XExtractor;
>>>> and the method to processXs.  So the above code would look like:
>>>>
>>>> IndexExtractor ix = ....
>>>> ix.processIndexs(idx -> someIntPredicate)
>>>>
>>>> another example
>>>>
>>>> BitMapExtractor bx = .....
>>>> bx.processBitMaps(bitmap -> someBitMapPredicate)
>>>>
>>>> Claude
>>>>
>>>> [1] https://issues.apache.org/jira/browse/COLLECTIONS-854
>>>>
>>>>
>>>> On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory <ggreg...@apache.org>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 2024/04/30 14:33:47 Alex Herbert wrote:
>>>>> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory <ggreg...@apache.org>
>>>>> wrote:
>>>>> >
>>>>> > > Hi Claude,
>>>>> > >
>>>>> > > Thank you for the detailed reply :-) A few comments below.
>>>>> > >
>>>>> > > On 2024/04/30 06:29:38 Claude Warren wrote:
>>>>> > > > I will see if I can clarify the javadocs and make things clearer.
>>>>> > > >
>>>>> > > > What I think I specifically heard is:
>>>>> > > >
>>>>> > > >    - Be clear that producers are fast fail iterators with
>>>>> predicate
>>>>> > > tests.
>>>>> > > >    - Rename CellConsumer to CellPredicate (?)
>>>>> > >
>>>>> > > Agreed (as suggested by Albert)
>>>>> > >
>>>>> > > >    - The semantic nomenclature:
>>>>> > > >       - Bitmaps are arrays of bits not a BitMap object.
>>>>> > > >       - Indexes are ints and not an instance of a Collection
>>>>> object.
>>>>> > > >       - Cells are pairs of ints representing an index and a
>>>>> value.  They
>>>>> > > >       are not Pair<> objects.
>>>>> > > >       - Producers iterate over collections of the object
>>>>> (Bitmap, Index,
>>>>> > > >       Cell) applying a predicate to do work and stop the
>>>>> iteration early
>>>>> > > if
>>>>> > > >       necessary.  They are carriers/transporters of Bloom filter
>>>>> enabled
>>>>> > > bits.
>>>>> > > >       They allow us to query the contents of the Bloom filter in
>>>>> an
>>>>> > > >       implementation agnostic way.
>>>>> > >
>>>>> > > As you say naming is hard. The above is a great example and a good
>>>>> > > exercise I've gone through at work and in other FOSS projects:
>>>>> "Producers
>>>>> > > iterate over collections of the object...". In general when I see
>>>>> or write
>>>>> > > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers
>>>>> run", you
>>>>> > > get the idea ;-) I know that either the class (or method) name is
>>>>> bad or
>>>>> > > the Javadoc/documentation is bad; not _wrong_, just bad in the
>>>>> sense that
>>>>> > > it's confusing (to me).
>>>>> > >
>>>>> > > I am not advocating for a specific change ATM but I want to
>>>>> discuss the
>>>>> > > option because it is possible the current name is not as good as
>>>>> it could
>>>>> > > be. It could end up as an acceptable compromise if we cannot use
>>>>> more Java
>>>>> > > friendly terms though.
>>>>> > >
>>>>> > > Whenever I see a class that implements a "forEach"-kind of method,
>>>>> I think
>>>>> > > "Iterable".
>>>>> > >
>>>>> >
>>>>> > Here we should think "Collection", or generally more than 1. In the
>>>>> Java
>>>>> > sense an Iterable is something you can walk through to the
>>>>> > end, possibly removing elements as you go using the Iterator
>>>>> interface. We
>>>>> > would not require supporting removal, and we want to control a
>>>>> > short-circuit. We could make this distinction by including it in the
>>>>> name:
>>>>> > forEachUntil(Predicate ...), forEachUnless, ...
>>>>>
>>>>> I really like the idea of have the method name reflect the
>>>>> short-circuit aspect of the operation!
>>>>>
>>>>> We do not have to invent a new method name though, Stream uses
>>>>> "takeWhile(Predicate)" in Java 9:
>>>>> https://docs.oracle.com/javase/9/docs/api/java/util/stream/Stream.html#takeWhile-java.util.function.Predicate-
>>>>>
>>>>> The question becomes whether this name make the class too close to a
>>>>> Stream and introduces some kind of confusion. If that's the case, then
>>>>> forEachUntil() is good.
>>>>>
>>>>> >
>>>>> >
>>>>> > >
>>>>> > > Note the difference with "Iterator", and I had to lookup the
>>>>> difference
>>>>> > > since the former implements "forEach" and the  later
>>>>> "forEachRemaining"!
>>>>> > > "Iterable" is also a factory of "Iterator"s.
>>>>> > >
>>>>> > > Should the Producers ever be implementations of Iterable or
>>>>> Iterator?
>>>>> > > Right now, the answer is no because of the short-circuit aspect of
>>>>> using a
>>>>> > > predicate. I'm not using the term fail-fast here because I don't
>>>>> think of
>>>>> > > the iteration being in error (please tell me if I'm wrong).
>>>>> > >
>>>>> > > If not iterable, then we should not use that name as part of the
>>>>> class
>>>>> > > name. Generally, the short-circuit aspect of Producers do not make
>>>>> a bad
>>>>> > > candidates for implementations of Iterable since it can throw
>>>>> (unchecked)
>>>>> > > exceptions. Different for call sites granted, but I'm just
>>>>> mentioning it
>>>>> > > for fun.
>>>>> > >
>>>>> >
>>>>> > I already mentioned throwing runtime exceptions for the short-circuit
>>>>> > functionality, and that it was ruled out on the basis of performance
>>>>> (given
>>>>> > a lot of short-circuiting is expected) and convenience for the
>>>>> caller. I
>>>>> > don't think we should go there. Design the API for the intended
>>>>> purpose,
>>>>> > and not push it into a box that is easily recognisable.
>>>>>
>>>>> Sounds good.
>>>>>
>>>>> >
>>>>> >
>>>>> > >
>>>>> > > So maybe there's nothing to do. I just want to be clear about it.
>>>>> For
>>>>> > > example, I think of "factory" and "producer" as synonyms but in
>>>>> this case,
>>>>> > > this is not a traditional application of the factory pattern.
>>>>> > >
>>>>> > > As an aside I can see that Producers would not be Streams out of
>>>>> the box
>>>>> > > because Stream#filter(Predicate) filters but does not short-circuit
>>>>> > > iteration. While Stream#findAny() and #findFirst() don't fit the
>>>>> > > short-circuit bill, we could implement a #findLast() in a Stream
>>>>> > > implementation. What I do not know is if Streams otherwise fit the
>>>>> bill of
>>>>> > > Bloom filters.
>>>>> > >
>>>>> >
>>>>> > In the case of small loops with few instructions per loop, the
>>>>> overhead of
>>>>> > Streams is significant. Unfortunately we don't have any performance
>>>>> tests
>>>>> > for this library, but I wouldn't change to Streams without knowing
>>>>> it does
>>>>> > not impact performance. Performance is a key feature of Bloom
>>>>> filters.
>>>>> > Otherwise you can achieve some of their functionality with
>>>>> conventional
>>>>> > collections.
>>>>>
>>>>> OK.
>>>>>
>>>>> >
>>>>> >
>>>>> > >
>>>>> > > >
>>>>> > > > Does that basically cover the confusion?   If there are better
>>>>> terms,
>>>>> > > let's
>>>>> > > > hash them out now before I update the javadocs.
>>>>> > > >
>>>>> > > > As an aside, Cells and Bitmaps are referenced in the
>>>>> literature.  For the
>>>>> > > > most part the rest is made up out of whole cloth.  So we could
>>>>> change
>>>>> > > > "Producer" to something else but we would need a good name.
>>>>> > >
>>>>> > > We have a class called BitMap and methods that use "BitMap" in the
>>>>> same
>>>>> > > but I think I am more comfortable with the term reuse now.
>>>>> > >
>>>>> > > The question that remains is must it be public? Since the Javadoc
>>>>> mentions
>>>>> > > it is about indices and bit positions, could all these methods be
>>>>> moved to
>>>>> > > the package-private IndexUtils? My concern is to reduce the public
>>>>> and
>>>>> > > protected API surface we will have to support and keep for the
>>>>> lifetime of
>>>>> > > the 4.x code base.
>>>>> > >
>>>>> >
>>>>> > It was originally package-private and Claude requested it be made
>>>>> public
>>>>> > for general re-use.
>>>>> >
>>>>> > We could use the name BitMaps, in keeping with the pluralisation of
>>>>> classes
>>>>> > seen in the JDK when working with elements of a given type:
>>>>> Executors,
>>>>> > Collections, etc.
>>>>>
>>>>> Agreed! I really like the plural convention for these static utility
>>>>> classes.
>>>>>
>>>>> Gary
>>>>>
>>>>> >
>>>>> > Alex
>>>>> >
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>>>
>>>>>
>>>>
>>>> --
>>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>>>
>>>
>>>
>>> --
>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>>
>>
>
> --
> LinkedIn: http://www.linkedin.com/in/claudewarren
>


-- 
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to