First, I love this discussion! Great questions, great thinking. Up top, I wanted to mention a couple things that have changed in alpha6: - Eduction is no longer Seqable and thus the return from eduction is not seqable (but it is reducible and iterable). You can use iterator-seq to get a chunked seq over the top if you need one. - Prior to alpha6, sequence with transformations returned a LazyTransformer. That's gone and now transformations are done using TransformerIterator, which is subsequently wrapped with a (now chunked) iterator-seq. - There are lots of performance implications due to those changes and I would recommend re-testing any perf test related to sequence or eduction on alpha6 to get a fresh picture as anything pre-alpha6 is not comparable. - eduction now takes multiple transformations, not just one, and composes them. This is designed for mechanical rewriting (hello tool developers!!) of ->> chains like this:
(->> s (interpose 5) (partition-all 2)) to this: (->> s (eduction (interpose 5) (partition-all 2))) That particular example has timings in CLJ-1669 and shows a reduction from 501 microseconds to 108 microseconds on a source vector, comparable savings on a source sequence. You do need to be careful to not include non-transducer functions in that chain, or stop the rewrite when you fall into a materialization function at the end (for example, sort). On Wednesday, April 1, 2015 at 2:13:07 AM UTC-5, Tassilo Horn wrote: > > Hi all, > > I've switched many nested filter/map/mapcat applications in my code to > using transducers. That brought a moderate speedup in certain cases > and the deeper the nesting has been before, the clearer the transducers > code is in comparison, so yay! :-) > Depending what you're transducing over, you may see some additional gains in alpha6. > However, I'm still quite unsure about the difference between `sequence` > and `eduction`. From the docs and experimentation, I came to the > assumptions below and I'd be grateful if someone with more knowledge > could verify/falsify/add: > > - Return types differ: Sequence returns a standard lazy seq, eductions > an instance of Eduction. > Yes, although I think the actual return type of eduction is not important, rather the implemented interfaces of Iterable and IReduceInit are the key thing. > - Eductions are reducible/sequable/iterable, i.e., basically I can use > no longer explicitly seqable, although seq will automatically wrap a chunked iterator-seq around it if passed to a sequence function. > them wherever a (lazy) seq would also do, so sequence and eduction > are quite interchangeable except when poking at internals, e.g., > (.contains (sequence ...) x) works whereas (.contains (eduction ...) > x) doesn't. > No guarantees made on those internals, only on what is promised in the docstrings (reducible/iterable for eduction and sequence for sequence). :) > - Both compute their contents lazily. > eduction is not lazy by itself. if used in a non-seq context, it is eagerly recomputed on each use. If you use it in a seq context, then it will be computed and cached as needed. > - Lazy seqs cache their already realized contents, eductions compute > them over and over again on each iteration. > Yes. > > Because of that, I came to the conclusion that whenever I ask myself if > one of my functions should return a lazy seq or an eduction, I should > use these rules: > > 1. If the function is likely to be used like > > (let [xs (seq-producing-fn args)] > (or (do-stuff-with xs) > (do-other-stuff-with xs) > ...)) > > that is, the resulting seq is likely to be bound to a variable > which is then used multiple times (and thus lazy seq caching is > benefitical), then use sequence. > Yes, it makes sense to take advantage of seq caching this way. > 2. If it is a private function only used internally and never with the > usage pattern of point 1, then definitively use eduction. > 3. If its a public function which usually isn't used with a pattern as > in point 1, then I'm unsure. eduction is probably more efficient > but sequence fits better in the original almost everything returns > a lazy seq design. Also, the latter has the benefit that users of > the library don't need to know anything about transducers. > > Is that sensible? Or am I completely wrong with my assumptions about > sequence and eduction? > You're definitely on the right track. How the result is going to be used is pretty important - eduction is designed to be consumed entirely for a result and thus gives up the caching of sequences. If you are going to consume the entire thing once, particularly in the context of a reduction, then eduction is far more efficient. Eduction does not need to allocate or cache intermediate sequence objects, so is much more memory and allocation efficient when used in this way (where the entire transformation and use can be iterable/reducible). > On a related note, could someone please clarify the statement from the > transducers docs for `sequence`? > > ,----[ Docs of sequence at http://clojure.org/transducers ] > | The resulting sequence elements are incrementally computed. These > | sequences will consume input incrementally as needed and fully realize > | intermediate operations. This behavior differs from the equivalent > | operations on lazy sequences. > `---- > > I'm curious about the "fully realize intermediate operations" part. > Transducers are a pull model, pulled from the output. As you need each output element of the sequence returned from sequence, you must fully complete a "step", so there is *no* laziness inside transducer functions. If an expanding transformation (like mapcat) produces a large number of intermediate elements (or in worst case, an infinite one), then the transducer will eagerly the entire expansion at the point where that first element is needed. That is a case where lazy seqs will probably be better. I think this case is actually relatively rare though. The new chunked sequence return may actually make this particular case worse too (well worse in doing more eager bunches of work, better in amortizing when that happens over a large source). > Does it mean that in a "traditional" > > (mapcat #(range %) (range 10000)) > > the inner range is also evaluated lazy but with > > (sequence (mapcat #(range %)) (range 10000)) > > it is not? It seems so. Yes, exactly. > At least dorun-ning these two expressions > shows that the "traditional" version is more than twice as fast than the > transducer version. Also, the same seems to hold for > > (eduction (mapcat #(range %)) (range 10000)) > > which is exactly as fast (or rather slow) as the sequence version. > > You might want to re-time these with alpha6 to see the impact of chunking (I'm not sure whether I'd expect it to be better or worse actually.) :) But wouldn't that mean that transducers with mapcat where the mapcatted > function isn't super-cheap is a bad idea in general at least from a > performance POV? > It depends on the use case - as a wise man named Rich once told me, "Why guess when you can measure?". :) > > Bye, > Tassilo > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.