Good question.

Clojure's evaluation semantics dictate that the arguments are evaluated 
(computed) *before* calling the function. So(set coll) is computed before 
being passed to `partial`.  Partial receives a function (a value) and 
arguments (also values) and returns back a new function that saves those 
original arguments (which happen to be stuffed away in Java final fields).

All three of your filters return a transducer.  All three of the inputs to 
`remove` are a partial function.  Each of the arguments to the call to 
partial is a set.  They are all essentially equivalent, and should perform 
the same.  (Except filter3 happens to create the set twice; once in the 
let, and once as the arg to partial).  So over a billion item collection, 
the set in your examples will only be computed once, once, and twice 
respectively.

Note however that sets *are* functions that evaluate whether the argument 
is in the set. This means you could remove the call to partial and shorten 
to:

(defn filter-contains1 [edn-file]
  (remove (set (read-edn-file edn-file))))


Tangentially:
(remove even?)
Will be faster than
(remove (fn [i] (even? i)))
because in the first case the dereference of the var 'even?' happens only 
once and the value inside the var will be passed to `remove` at the outset. 
 In the second example the var dereference happens for every single item 
(though it's very cheap).  The second example is equivalent to writing (remove 
#'even?)

On Tuesday, June 23, 2015 at 6:07:06 PM UTC-4, Sam Raker wrote:
>
> Let's say that, as part of an xf, I want to filter out everything in a 
> sequence that's also in some other sequence. Here are some ways of doing 
> that:
>
> (defn filter-contains1 [edn-file] (remove (partial contains? (set (read-
> edn-file edn-file)))))
>
> (defn filter-contains2 [coll] (remove (partial contains? (set coll))))
>
> (def filter-contains3 [coll] (let [coll-as-set (set coll)] (remove (
> partial contains? (set coll)))))
>
> I have the strong suspicion that `filter-contains3` is the best of the 3, 
> and `filter-contains1` the worst. The internal mechanics of transduce are a 
> bit of a mystery to me, however: if `filter-contains2` were to be used on a 
> collection of, say, a million items, would `coll` be cast to a set a 
> million times, or is Clojure/the JVM smarter than that? I'm also wondering 
> if anyone has any "best practices" (or whatever) they can share relating to 
> this kind of intersection of transducers/xfs and closures. It seems to me, 
> for example, that something like
>
> (defn my-thing [coll & stuff]
>   (let [s (set coll)]
>   ...
>   (comp
>     ...
>    (map foo)
>    (filter bar)
>    (remove (partial contains? s))
>    ...
>
> is awkward, but that a lot of limited-use transducer factory functions 
> (like the ones above) aren't exactly optimal, either.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to