On Saturday, December 16, 2017 at 2:39:14 PM UTC-6, Matan wrote:
>
> Hi, 
>
> As this thread seems to have been going down this path, I am joining it 
> after having spent some time fiddling the source code of some clojure.core 
> transducers and familiarizing with how to create, compose and use 
> transducers in transducing processes. By the way I think the reference 
> <https://clojure.org/reference/transducers> could be more explicit about 
> the relationship between transducers, transducing processes and contexts 
> for applying transducers (as is, IMO a lot of ambiguity arises, causing a 
> lot of confusion in getting started). So, it was noted earlier in this 
> thread by Alex Miller:
>
> You're starting from a lazy sequence, not a self-reducible collection. 
>> That's not wrong, but it's removing a key transduce/reduce power to work 
>> with reducible colls.
>
>
> I think that's also the case with applying any transducer to a file input 
> (?!) and I am therefore wondering about:
>
>    1. I didn't fully grasp the difference between self-reducible 
>    collections v.s. other ones (in this context, and in general). 
>    Can you please delineate?
>    
> I'm referring primarily to collections that implement their own reduce() 
method (like vectors and lists) vs seqs.

>
>    1. Roughly how much performance lag do we get when not working a 
>    transduction from a (self) reducible collection, and moreso why exactly? 
>    
> Vectors and lists are concrete, have all their own data available, and can 
directly iterate through the data in a tight loop. Seqs must be realized 
and this entails object creation, synchronization, and object destruction 
overhead per element (or for chunked seqs, per chunk). 

Some collections can be iterated like a seq OR reduce themselves (vectors, 
lists, seqs on arrays, and the collection produced by range, cycle, repeat, 
and iterate).

>
>    1. Should we typically choose a different vehicle for stream 
>    processing from large files, over using transducers? My current use case 
> is 
>    stream-processing from large files.
>    
> Stream processing is just another means of producing values. The question 
is really in how you represent the stream. Seqs have some inherent 
overhead. Presumably you don't want to read the entire stream and put it in 
a collection. The trick then is to create an object that is reducible, not 
a seq, and reads the stream. Probably the easiest way is to use something 
Iterable that can provide an iterator over the stream. The CollReduce 
protocol is extended to Iterable so this is already built in. Then 
reduce/transduce over the iterable.

An eduction combines a reducible collection and a transformation 
(transducer) into a collection that delays its execution until the point 
where you reduce it (this has some of the same utility as a lazy sequence 
in delaying execution). 

How exactly you want to iterate over reading the stream depends on what 
you're doing (Java provides streams, readers, and channels for a variety of 
different use cases). In any case you want to have an Iterator 
implementation (hasNext() and next()) that can provide the "next" item. 
Things like Apache Commons IOUtils can give you line iterators over a 
reader for example. 

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to