Gary and Alex,

Any thoughts on this?

Claude

On Wed, May 1, 2024 at 7:55 AM Claude Warren <cla...@xenei.com> wrote:

> Good suggestions.
>
> short-circuit. We could make this distinction by including it in the name:
>> forEachUntil(Predicate ...), forEachUnless, ...
>
>
> We need the unit name in the method name.  All Bloom filters implement
> IndexProducer and BitmapProducer and since they use Predicate method
> parameters they will conflict.
>
>
> I have opened a ticket [1] with the list of tasks, which I think is now:
>
>    - Be clear that producers are like interruptible iterators with
>    predicate tests acting as a switch to short-circuit the iteration.
>    - Rename classes:
>       - CellConsumer to CellPredicate (?)
>       - Rename BitMap to BitMaps.
>    - Rename methods:
>       - Producer forEachX() to forEachUntil()
>       - The semantic nomenclature:
>       - Bitmaps are arrays of bits not a BitMaps object.
>       - Indexes are ints and not an instance of a Collection object.
>       - Cells are pairs of ints representing an index and a value.  They
>       are not Pair<> objects.
>       - Producers iterate over collections of the object (Bitmap, Index,
>       Cell) applying a predicate to do work and stop the iteration early if
>       necessary.  They are carriers/transporters of Bloom filter enabled bits.
>       They allow us to query the contents of the Bloom filter in an
>       implementation agnostic way.
>
>
> In thinking about the term Producer, other terms could be used
> Interrogator (sounds like you can add a query), Extractor might work.  But
> it has also come to mind that there is a "compute" series of methods in the
> ConcurrentMap class.  Perhaps the term we want is not "forEach", but
> "process".  The current form of usage is something like:
>
> IndexProducer ip = ....
> ip.forEachIndex(idx -> someIntPredicate)
>
> We could change the name from XProducer to XProcessor, or XExtractor; and
> the method to processXs.  So the above code would look like:
>
> IndexExtractor ix = ....
> ix.processIndexs(idx -> someIntPredicate)
>
> another example
>
> BitMapExtractor bx = .....
> bx.processBitMaps(bitmap -> someBitMapPredicate)
>
> Claude
>
> [1] https://issues.apache.org/jira/browse/COLLECTIONS-854
>
>
> On Tue, Apr 30, 2024 at 4:51 PM Gary D. Gregory <ggreg...@apache.org>
> wrote:
>
>>
>>
>> On 2024/04/30 14:33:47 Alex Herbert wrote:
>> > On Tue, 30 Apr 2024 at 14:45, Gary D. Gregory <ggreg...@apache.org>
>> wrote:
>> >
>> > > Hi Claude,
>> > >
>> > > Thank you for the detailed reply :-) A few comments below.
>> > >
>> > > On 2024/04/30 06:29:38 Claude Warren wrote:
>> > > > I will see if I can clarify the javadocs and make things clearer.
>> > > >
>> > > > What I think I specifically heard is:
>> > > >
>> > > >    - Be clear that producers are fast fail iterators with predicate
>> > > tests.
>> > > >    - Rename CellConsumer to CellPredicate (?)
>> > >
>> > > Agreed (as suggested by Albert)
>> > >
>> > > >    - The semantic nomenclature:
>> > > >       - Bitmaps are arrays of bits not a BitMap object.
>> > > >       - Indexes are ints and not an instance of a Collection object.
>> > > >       - Cells are pairs of ints representing an index and a value.
>> They
>> > > >       are not Pair<> objects.
>> > > >       - Producers iterate over collections of the object (Bitmap,
>> Index,
>> > > >       Cell) applying a predicate to do work and stop the iteration
>> early
>> > > if
>> > > >       necessary.  They are carriers/transporters of Bloom filter
>> enabled
>> > > bits.
>> > > >       They allow us to query the contents of the Bloom filter in an
>> > > >       implementation agnostic way.
>> > >
>> > > As you say naming is hard. The above is a great example and a good
>> > > exercise I've gone through at work and in other FOSS projects:
>> "Producers
>> > > iterate over collections of the object...". In general when I see or
>> write
>> > > a Javadoc of the form "Foo bars" or "Runners walk" or "Walkers run",
>> you
>> > > get the idea ;-) I know that either the class (or method) name is bad
>> or
>> > > the Javadoc/documentation is bad; not _wrong_, just bad in the sense
>> that
>> > > it's confusing (to me).
>> > >
>> > > I am not advocating for a specific change ATM but I want to discuss
>> the
>> > > option because it is possible the current name is not as good as it
>> could
>> > > be. It could end up as an acceptable compromise if we cannot use more
>> Java
>> > > friendly terms though.
>> > >
>> > > Whenever I see a class that implements a "forEach"-kind of method, I
>> think
>> > > "Iterable".
>> > >
>> >
>> > Here we should think "Collection", or generally more than 1. In the Java
>> > sense an Iterable is something you can walk through to the
>> > end, possibly removing elements as you go using the Iterator interface.
>> We
>> > would not require supporting removal, and we want to control a
>> > short-circuit. We could make this distinction by including it in the
>> name:
>> > forEachUntil(Predicate ...), forEachUnless, ...
>>
>> I really like the idea of have the method name reflect the short-circuit
>> aspect of the operation!
>>
>> We do not have to invent a new method name though, Stream uses
>> "takeWhile(Predicate)" in Java 9:
>> https://docs.oracle.com/javase/9/docs/api/java/util/stream/Stream.html#takeWhile-java.util.function.Predicate-
>>
>> The question becomes whether this name make the class too close to a
>> Stream and introduces some kind of confusion. If that's the case, then
>> forEachUntil() is good.
>>
>> >
>> >
>> > >
>> > > Note the difference with "Iterator", and I had to lookup the
>> difference
>> > > since the former implements "forEach" and the  later
>> "forEachRemaining"!
>> > > "Iterable" is also a factory of "Iterator"s.
>> > >
>> > > Should the Producers ever be implementations of Iterable or Iterator?
>> > > Right now, the answer is no because of the short-circuit aspect of
>> using a
>> > > predicate. I'm not using the term fail-fast here because I don't
>> think of
>> > > the iteration being in error (please tell me if I'm wrong).
>> > >
>> > > If not iterable, then we should not use that name as part of the class
>> > > name. Generally, the short-circuit aspect of Producers do not make a
>> bad
>> > > candidates for implementations of Iterable since it can throw
>> (unchecked)
>> > > exceptions. Different for call sites granted, but I'm just mentioning
>> it
>> > > for fun.
>> > >
>> >
>> > I already mentioned throwing runtime exceptions for the short-circuit
>> > functionality, and that it was ruled out on the basis of performance
>> (given
>> > a lot of short-circuiting is expected) and convenience for the caller. I
>> > don't think we should go there. Design the API for the intended purpose,
>> > and not push it into a box that is easily recognisable.
>>
>> Sounds good.
>>
>> >
>> >
>> > >
>> > > So maybe there's nothing to do. I just want to be clear about it. For
>> > > example, I think of "factory" and "producer" as synonyms but in this
>> case,
>> > > this is not a traditional application of the factory pattern.
>> > >
>> > > As an aside I can see that Producers would not be Streams out of the
>> box
>> > > because Stream#filter(Predicate) filters but does not short-circuit
>> > > iteration. While Stream#findAny() and #findFirst() don't fit the
>> > > short-circuit bill, we could implement a #findLast() in a Stream
>> > > implementation. What I do not know is if Streams otherwise fit the
>> bill of
>> > > Bloom filters.
>> > >
>> >
>> > In the case of small loops with few instructions per loop, the overhead
>> of
>> > Streams is significant. Unfortunately we don't have any performance
>> tests
>> > for this library, but I wouldn't change to Streams without knowing it
>> does
>> > not impact performance. Performance is a key feature of Bloom filters.
>> > Otherwise you can achieve some of their functionality with conventional
>> > collections.
>>
>> OK.
>>
>> >
>> >
>> > >
>> > > >
>> > > > Does that basically cover the confusion?   If there are better
>> terms,
>> > > let's
>> > > > hash them out now before I update the javadocs.
>> > > >
>> > > > As an aside, Cells and Bitmaps are referenced in the literature.
>> For the
>> > > > most part the rest is made up out of whole cloth.  So we could
>> change
>> > > > "Producer" to something else but we would need a good name.
>> > >
>> > > We have a class called BitMap and methods that use "BitMap" in the
>> same
>> > > but I think I am more comfortable with the term reuse now.
>> > >
>> > > The question that remains is must it be public? Since the Javadoc
>> mentions
>> > > it is about indices and bit positions, could all these methods be
>> moved to
>> > > the package-private IndexUtils? My concern is to reduce the public and
>> > > protected API surface we will have to support and keep for the
>> lifetime of
>> > > the 4.x code base.
>> > >
>> >
>> > It was originally package-private and Claude requested it be made public
>> > for general re-use.
>> >
>> > We could use the name BitMaps, in keeping with the pluralisation of
>> classes
>> > seen in the JDK when working with elements of a given type: Executors,
>> > Collections, etc.
>>
>> Agreed! I really like the plural convention for these static utility
>> classes.
>>
>> Gary
>>
>> >
>> > Alex
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>
>
> --
> LinkedIn: http://www.linkedin.com/in/claudewarren
>


-- 
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to