Re: [DISCUSSs] new Iterator

Claude Warren Sun, 20 Oct 2024 01:03:35 -0700

@Gary Gregory <garydgreg...@gmail.com>

in response to your comments on "ofStream".  I was thinking of renaming it
to "fromStream" but then realized that it is really just another
implementation of "create" so ExtendedIterator<T> create(Stream<T> stream)
is what I am proposing now.


In response to the comments on `addTo`  Streams collect build instances of
whatever they are putting the objects into.  This method adds the objects
to an existing collection.  I have often found that when working with
Streams I need to collect the objects into a container and then add them
all to another container.  This shortens that process.

Finally, why not just use streams.

   - All instances of Streams that I have seen require a collection of the
   object being streamed.  Thus all instances have to be in memory before the
   stream starts.
   - There are numerous performance tests that show that streams have a
   very high overhead for creation.  Several projects prohibit the use of
   streams on the hot path for performance reasons.  However, iterators are
   allowed.
   - There are cases where you just don't know how many of something there
   are, or there may be an unbounded number of objects (think streaming
   systems).  My current example is reading from something like an S3 bucket
   where you make a REST call to get a list of objects.  The status from the
   REST call tells you if there are more objects to process.  Wrapping the
   code to create an iterator over the complete collection of objects is
   fairly trivial (lazy collection of iterator approach).  Now you have an
   iterator that just handles getting all the objects from storage.  As an
   added bonus (or problem if you are a stickler for iterators failing if the
   underlying collection changes) is that you can continue to poll the
   iterator after it has returnd false to "hasNext()" and if a new object is
   placed in storage it will find it and retrieve it.





On Sat, Oct 19, 2024 at 11:19 PM Claude Warren <cla...@xenei.com> wrote:

> @mdrob
>
> The reason to use an Iterator is that, from my experience, Stream requires
> all the base objects to be in existence.  In addition there are times when
> you don't know how many objects there will be or that you have an iterator
> or iterable to start with and not a stream.  An example would be reading
> from something like an S3 bucket where you make a call to get a list of
> object summaries.  The list of object summaries may not be complete and so
> you have to make more calls to the API to get more and more object
> summaries.  This is very easy to wrap with an iterator that will allow you
> to get all the object summaries and then you can do stream-like processing
> on the object summaries to process all the files in the S3 bucket.
>
> @Gary Gregory <garydgreg...@gmail.com>
>
> I have an implementation of this class in the a pull request for Rat
> 0.17.  It is effectively a wrapper on some of the other iterator classes in
> commons-collections.  With your example of how to do the
> unwinding/flattening iterator I can now rewrite it so that it does not need
> any other new classes in commons-collections.
>
> I am open to renaming methods as appropriate.  I will put together a pull
> request on Sunday that shows how to implement the methods and will make
> method name changes as per your recommendations above.
>
> Claude
>


-- 
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: [DISCUSSs] new Iterator

Reply via email to