On Fri, 25 Jul 2014, Christopher Elwell wrote:

> New to Clojure, how is this function that I wrote? Any suggestions for
> improvement; is it too complicated?

'filter' is a great tool to reach for in many cases, but (as you found)
it's not ideal if you need your test predicate to change as it processes
the sequence. This results in the most non-idiomatic part of your code:
the use of the atom.

Three typical ways to build up state via one or more accumulators as you
work your way through a seq are:

  (1) reduce, supplying empty collections for your accumulators' bases;
  (2) recursion via loop/recur, introducing the accumulators in the loop
      declaration; or
  (3) a lazy sequence using a similar recursive strategy.

> It filters a sequence, leaving only the first occurrence of each item
> in the seq that has a matching prefix (get-form-id-without-timestamp
> gets just the id prefix).
>
> (defn only-keep-unique-ids [ids]
>   (let [seen-ids (atom #{})
>         filter-fn #(let [raw-form-id (get-form-id-without-timestamp %)
>                          is-unique (not (contains? @seen-ids raw-form-id))]
>                      (do (swap! seen-ids conj raw-form-id)
>                          is-unique))]
>     (filter filter-fn ids)))

Here's an example of strategy #1: 'seen' collects the keys encountered (as
determined by a passed-in function 'f') and is also used as a predicate if
the 'if', while 'ret' builds up a vector containing the full items as they
are encountered.

user> (defn unique-by [f coll]
        (second
         (reduce (fn [[seen ret] val]
                  (let [key (f val)]
                    (if (seen key)
                        [seen ret]
                        [(conj seen key) (conj ret val)])))
                 [#{} []]
                 coll)))
#'user/unique-by


Since reduce is not lazy, this will process the entire sequence immediately
and then return a vector of the results:

user> (defn prefix [st]
        (subs st 0 3))
#'user/prefix
user> (unique-by prefix ["foobar" "foobaz" "wombat" "womble"])
["foobar" "wombat"]


For comparison, here's a lazy implementation (strategy #3):

user> (defn unique-by
        ([f coll]
           ((fn step [f coll seen]
              (lazy-seq
               (when-let [[val & vals] (seq coll)]
                 (let [key (f val)]
                   (if (seen key)
                     (step f vals seen)
                     (cons val (step f vals (conj seen key))))))))
            f coll #{})))
#'user/unique-by
user> (unique-by prefix ["foobar" "foobaz" "wombat" "womble"])
("foobar" "wombat")

Since this is fully lazy, you can use it on infinite sequences...but the
example below will be non-terminating if you try to take 3 or more items
from the resulting sequence!

user> (take 2 (unique-by prefix (cycle ["foobar" "foobaz" "wombat" "womble"])))
("foobar" "wombat")


Note that the version in your message is also lazy, since it relies on 'filter'
to do the processing.

Hope this helps! I'd recommend trying to implement strategy #2 yourself
using the above as models, as this pattern is one that you can expect to
encounter frequently. Clojure idioms tend to favor 'reduce' over 'loop/recur',
but I've found that there are situations where the latter can result in more
readable code.

Paul

P.S.: I saw Ben's suggestion using 'group-by' right before sending this...it's
a great example of how far you can get using just the standard sequence
functions.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to