Hi, As this thread seems to have been going down this path, I am joining it after having spent some time fiddling the source code of some clojure.core transducers and familiarizing with how to create, compose and use transducers in transducing processes. By the way I think the reference <https://clojure.org/reference/transducers> could be more explicit about the relationship between transducers, transducing processes and contexts for applying transducers (as is, IMO a lot of ambiguity arises, causing a lot of confusion in getting started). So, it was noted earlier in this thread by Alex Miller:
You're starting from a lazy sequence, not a self-reducible collection. > That's not wrong, but it's removing a key transduce/reduce power to work > with reducible colls. I think that's also the case with applying any transducer to a file input (?!) and I am therefore wondering about: 1. I didn't fully grasp the difference between self-reducible collections v.s. other ones (in this context, and in general). Can you please delineate? 2. Roughly how much performance lag do we get when not working a transduction from a (self) reducible collection, and moreso why exactly? 3. Should we typically choose a different vehicle for stream processing from large files, over using transducers? My current use case is stream-processing from large files. Thanks in advance for your reply, Matan On Tuesday, November 28, 2017 at 1:53:42 PM UTC+2, Renzo Borgatti wrote: > > > > On 28 Nov 2017, at 02:54, Alex Miller <al...@puredanger.com > <javascript:>> wrote: > > > > I would say transducers are preferable when: > > > > 1) you have reducible collections > > 2) you have a lot of pipelined transformations (transducers handle these > in one pass with no intermediate data) > > 3) the number of elements is "large" (this amplifies the memory and perf > savings from #2) > > 4) you put to produce a concrete output collection (seqs need an extra > step to pour the seq into a collection; transducers can create it directly) > > 5) you want a reusable transformation that can be used in multiple > contexts (reduction, sequence, core.async, etc) > > I agree with the above Alex, although I think that is the kind of > checklist I'd look at if performance optimizations is my primary goal. In > any other case, I'd reach for transducers as the default. There are then > several corner cases to understand, but that's true for normal sequential > processing too. > > Renzo > > > > > On Monday, November 27, 2017 at 8:33:50 PM UTC-6, Jiacai Liu wrote: > > > Also, most of the performance boost from transducers is due to less > garbage being created, and some times the heap of the JVM is so large > you'll never see much change from switching to transducers. > > > > Thanks for this tip. I seldom use transducers in my daily work, and I > was convinced transducers are a better choice in whatever situation after > reading some articles. But the test shows it isn't an easy choice, only > when do something reducible, will transducers make more sense. > > > > On Tuesday, November 28, 2017 at 5:07:10 AM UTC+8, tbc++ wrote: > > >> Also, I think the transducer version should always be faster, no > matter the size of the source collection (no threshold). > > > > It's a bit more complicated than that, mostly because transducer > pipelines require about 2 allocations per step during creation. Also, most > of the performance boost from transducers is due to less garbage being > created, and some times the heap of the JVM is so large you'll never see > much change from switching to transducers. > > > > Don't get me wrong, transducers are great and I often default to them > over seqs, but in micro-benchmarks like this there's too much in play to > always see a 100% performance boost. > > > > On Mon, Nov 27, 2017 at 12:55 PM, David Bürgin <dbue...@gluet.ch> > wrote: > > Jiacai – > > > > I saw you updated the gist. Just in case it passed you by: performance > > profits from the source collection being reducible. So pouring ‘dataset’ > > into a vector beforehand should speed up the processing quite a bit. > > > > Also, I think the transducer version should always be faster, no matter > > the size of the source collection (no threshold). > > > > > > -- > > David > > > > -- > > You received this message because you are subscribed to the Google > > Groups "Clojure" group. > > To post to this group, send email to clo...@googlegroups.com > > Note that posts from new members are moderated - please be patient with > your first post. > > To unsubscribe from this group, send email to > > clojure+u...@googlegroups.com > > For more options, visit this group at > > http://groups.google.com/group/clojure?hl=en > > --- > > You received this message because you are subscribed to the Google > Groups "Clojure" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to clojure+u...@googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > > > > > > > -- > > “One of the main causes of the fall of the Roman Empire was that–lacking > zero–they had no way to indicate successful termination of their C > programs.” > > (Robert Firth) > > > > -- > > You received this message because you are subscribed to the Google > > Groups "Clojure" group. > > To post to this group, send email to clo...@googlegroups.com > <javascript:> > > Note that posts from new members are moderated - please be patient with > your first post. > > To unsubscribe from this group, send email to > > clojure+u...@googlegroups.com <javascript:> > > For more options, visit this group at > > http://groups.google.com/group/clojure?hl=en > > --- > > You received this message because you are subscribed to the Google > Groups "Clojure" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to clojure+u...@googlegroups.com <javascript:>. > > For more options, visit https://groups.google.com/d/optout. > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.