The iota library implements a reducible object on top of files. It may be worth trying out for your use-case.
> On 17 Dec 2017, at 00:32, Alex Miller <a...@puredanger.com> wrote: > > > >> On Saturday, December 16, 2017 at 2:39:14 PM UTC-6, Matan wrote: >> Hi, >> >> As this thread seems to have been going down this path, I am joining it >> after having spent some time fiddling the source code of some clojure.core >> transducers and familiarizing with how to create, compose and use >> transducers in transducing processes. By the way I think the reference could >> be more explicit about the relationship between transducers, transducing >> processes and contexts for applying transducers (as is, IMO a lot of >> ambiguity arises, causing a lot of confusion in getting started). So, it was >> noted earlier in this thread by Alex Miller: >> >>> You're starting from a lazy sequence, not a self-reducible collection. >>> That's not wrong, but it's removing a key transduce/reduce power to work >>> with reducible colls. >> >> I think that's also the case with applying any transducer to a file input >> (?!) and I am therefore wondering about: >> I didn't fully grasp the difference between self-reducible collections v.s. >> other ones (in this context, and in general). >> Can you please delineate? > I'm referring primarily to collections that implement their own reduce() > method (like vectors and lists) vs seqs. >> Roughly how much performance lag do we get when not working a transduction >> from a (self) reducible collection, and moreso why exactly? > Vectors and lists are concrete, have all their own data available, and can > directly iterate through the data in a tight loop. Seqs must be realized and > this entails object creation, synchronization, and object destruction > overhead per element (or for chunked seqs, per chunk). > > Some collections can be iterated like a seq OR reduce themselves (vectors, > lists, seqs on arrays, and the collection produced by range, cycle, repeat, > and iterate). >> Should we typically choose a different vehicle for stream processing from >> large files, over using transducers? My current use case is >> stream-processing from large files. > Stream processing is just another means of producing values. The question is > really in how you represent the stream. Seqs have some inherent overhead. > Presumably you don't want to read the entire stream and put it in a > collection. The trick then is to create an object that is reducible, not a > seq, and reads the stream. Probably the easiest way is to use something > Iterable that can provide an iterator over the stream. The CollReduce > protocol is extended to Iterable so this is already built in. Then > reduce/transduce over the iterable. > > An eduction combines a reducible collection and a transformation (transducer) > into a collection that delays its execution until the point where you reduce > it (this has some of the same utility as a lazy sequence in delaying > execution). > > How exactly you want to iterate over reading the stream depends on what > you're doing (Java provides streams, readers, and channels for a variety of > different use cases). In any case you want to have an Iterator implementation > (hasNext() and next()) that can provide the "next" item. Things like Apache > Commons IOUtils can give you line iterators over a reader for example. > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to the Google Groups > "Clojure" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.