The iota library implements a reducible object on top of files. It may be worth 
trying out for your use-case.

> On 17 Dec 2017, at 00:32, Alex Miller <a...@puredanger.com> wrote:
> 
> 
> 
>> On Saturday, December 16, 2017 at 2:39:14 PM UTC-6, Matan wrote:
>> Hi, 
>> 
>> As this thread seems to have been going down this path, I am joining it 
>> after having spent some time fiddling the source code of some clojure.core 
>> transducers and familiarizing with how to create, compose and use 
>> transducers in transducing processes. By the way I think the reference could 
>> be more explicit about the relationship between transducers, transducing 
>> processes and contexts for applying transducers (as is, IMO a lot of 
>> ambiguity arises, causing a lot of confusion in getting started). So, it was 
>> noted earlier in this thread by Alex Miller:
>> 
>>> You're starting from a lazy sequence, not a self-reducible collection. 
>>> That's not wrong, but it's removing a key transduce/reduce power to work 
>>> with reducible colls.
>> 
>> I think that's also the case with applying any transducer to a file input 
>> (?!) and I am therefore wondering about:
>> I didn't fully grasp the difference between self-reducible collections v.s. 
>> other ones (in this context, and in general). 
>> Can you please delineate?
> I'm referring primarily to collections that implement their own reduce() 
> method (like vectors and lists) vs seqs.
>> Roughly how much performance lag do we get when not working a transduction 
>> from a (self) reducible collection, and moreso why exactly? 
> Vectors and lists are concrete, have all their own data available, and can 
> directly iterate through the data in a tight loop. Seqs must be realized and 
> this entails object creation, synchronization, and object destruction 
> overhead per element (or for chunked seqs, per chunk). 
> 
> Some collections can be iterated like a seq OR reduce themselves (vectors, 
> lists, seqs on arrays, and the collection produced by range, cycle, repeat, 
> and iterate).
>> Should we typically choose a different vehicle for stream processing from 
>> large files, over using transducers? My current use case is 
>> stream-processing from large files.
> Stream processing is just another means of producing values. The question is 
> really in how you represent the stream. Seqs have some inherent overhead. 
> Presumably you don't want to read the entire stream and put it in a 
> collection. The trick then is to create an object that is reducible, not a 
> seq, and reads the stream. Probably the easiest way is to use something 
> Iterable that can provide an iterator over the stream. The CollReduce 
> protocol is extended to Iterable so this is already built in. Then 
> reduce/transduce over the iterable.
> 
> An eduction combines a reducible collection and a transformation (transducer) 
> into a collection that delays its execution until the point where you reduce 
> it (this has some of the same utility as a lazy sequence in delaying 
> execution). 
> 
> How exactly you want to iterate over reading the stream depends on what 
> you're doing (Java provides streams, readers, and channels for a variety of 
> different use cases). In any case you want to have an Iterator implementation 
> (hasNext() and next()) that can provide the "next" item. Things like Apache 
> Commons IOUtils can give you line iterators over a reader for example. 
> -- 
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your 
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to