I have a task (script) that involves processing hundreds of thousands of files 
- which I read in from the Files stream.  To make it perform well, the 
processing part is multi-threaded.
  I wonder if any of these stream gatherers may simplify the code?  Due to the 
very large number of files, I can't put them all into a collection. And I 
suspect that creating a virtual thread for each file would exceed limits too.
Here I use a 'synchronized' method for each thread to grab the next file from 
the stream.  I'm not sure if this is the most appropriate usage pattern, though 
it works quite well.   I tried using GPars too, but I couldn't see how to share 
the stream effectively. This is a very trimmed version of what I do:

['opts' here are the command line params via the cliBuilder]

pListStream = Files.walk(opts.i.toPath()).filter(Files::isRegularFile).filter(p 
-> p.fileName.toString() ==~ fileMatch).iterator()

synchronized File nextF () {
    if (pListStream.hasNext()) {
..    RETURN A FILE or null to signify the end
}

// Start the threads to process and fill the queue
def futures = (1..opts.t).collect { threadNum -> 
    t = new Thread(new convertFiles(this, threadNum - 1, opts))
    t.start()
    t
}
// detect ctrl-c so we can print the stats so far before stopping
def withInteruptionListener = { Closure cloj, Closure onInterrupt ->
    def thread = { onInterrupt?.call() } as Thread
    Runtime.runtime.addShutdownHook (thread)
    cloj();
    Runtime.runtime.removeShutdownHook (thread)
}

class convertFiles implements Runnable {
    void run() {
        try {
            while (tnefFile = parent.nextF()) {
  PROCESS FILE..
}


Merlin Beedell 
-----Original Message-----
From: Paul King <pa...@asert.com.au> 
Sent: 24 March 2025 03:11
To: users@groovy.apache.org
Subject: Re: withIndex for streams

For interest, the code using Gatherers4J looks like:

    assert names.stream()
        .gather(Gatherers4j.filterIndexed {index, element -> index ==
3 }) // JDK24
        .findFirst().get() == 'arne'

You can also use something like this using vanilla streams:

    assert names.stream().skip(3).limit(1).findFirst().get() == 'arne'

I also forgot about Tim Yates' library:

    https://timyates.github.io/groovy-stream/

It has various "withIndex" methods, e.g. mapWithIndex, zipWithIndex, 
filterWithIndex, flatMapWithIndex, untilWithIndex, tapWithIndex, 
tapEveryWithIndex.
With groovy-stream, you'd do:

    assert Stream.from(names).filterWithIndex{ n, i -> i == 3 }.toList()[0] == 
'arne'

You pose a good question though about whether functionality like this should be 
brought into the main Groovy modules.

Cheers, Paul.

On Sun, Mar 23, 2025 at 1:52 PM Paul King <pa...@asert.com.au> wrote:
>
> It might be worth exploring this. I'll note that gatherers (JDK 24) 
> provide a hook for adding such functionality in Java. Gatherers4j has 
> withIndex (though we'd likely implement it differently):
>
> https://tginsberg.github.io/gatherers4j/gatherers/sequence-operations/
> withindex/
>
> As well as a bunch of other "index" operations.
>
> Paul.
>
> On Fri, Mar 21, 2025 at 2:00 AM Per Nyfelt <per.nyf...@nordnet.se> wrote:
> >
> > Hi ,
> >
> >
> >
> > I suggest that the withIndex method in DefaultGroovyMethods is 
> > overloaded with an option to support streams as well
> >
> >
> >
> > Given
> >
> >
> >
> > names = ['per', 'karin', 'tage', 'arne', 'sixten', 'ulrik']
> >
> >
> >
> > I can find the 4:th element with
> >
> > println names[3]
> >
> >
> >
> > or if I only have an iterator with
> >
> > println names.iterator().withIndex().find { it, idx -> 3 == idx }[0]
> >
> >
> >
> > For a stream I can do it by using a 
> > java.util.concurrent.atomic.AtomicInteger:
> >
> > AtomicInteger index = new AtomicInteger()
> >
> > println names.stream().find(n -> 3 == index.getAndIncrement())
> >
> >
> >
> > I have seen that libraries more and more often will expose a stream 
> > api rather than a Collection or an Iterator so it would be very nice 
> > if I could just do
> >
> > println names.stream().withIndex().find { it, idx -> 3 == idx}[0]
> >
> >
> >
> > or alternatively add a findWithIndex so that this would be possible:
> >
> > println names.stream().findWithIndex { it, idx -> 3 == idx}
> >
> >
> >
> > What do you think?
> >
> >
> >
> > Regards,
> >
> > Per

Reply via email to