Re: Performance of seq on empty collections

Alan Mon, 15 Nov 2010 13:41:57 -0800

Yes, the API *does* suggest using seq to check for emptiness. (empty?
x) is implemented as (not (seq x)). You certainly won't ever get
improved performance by using empty? - at best you break even, most of
the time you lose. For example:


(if (empty? x)
  ; empty branch
  ; not-empty branch
Can be replaced with
(if (seq x)
  ; not-empty branch
  ; empty branch

This is usually more readable, since the empty case tends to be less
interesting. Further, you usually have to seq the object anyway if
it's not empty, so the seq call becomes free except in the case where
the collection is in fact empty:
(if-let [x (seq x)]
  ; do stuff with x
  ; give up

compared to:
(if (not (seq x)) ; this is what empty? does
  ; give up
  (let [useful-var (seq x)]
    ; do stuff with useful, seq'd version of x

Of course performance isn't usually the main driver, so if you feel
empty? really is more expressive in your case, go for it. But the OP
seems to care about performance, and suggesting empty? is off the
mark.

On Nov 14, 11:42 am, David Sletten <da...@bosatsu.net> wrote:
> On Nov 14, 2010, at 2:16 PM, Eric Kobrin wrote:
>
> > In the API it is suggested to use `seq` to check if coll is empty.
>
> Your timing results raise some interesting questions, however, the API 
> doesn't suggest using 'seq' to check if a collection is empty. That's what 
> 'empty?' is for. The documentation note suggests (for style purposes 
> apparently) that you use 'seq' to test that the collection is not empty. So 
> to be precise you are testing two different things below. For instance, 
> (identical? coll []) is true when coll is an empty vector. (seq coll) is true 
> when coll is not empty. The correct equivalent would be to test (empty? coll).
>
> Of course, this doesn't change the results. I get similar timings with empty?:
> user=> (let [iterations 100000000] (time (dotimes [_ iterations]
>                                     (identical? [] []))) (time (dotimes [_ 
> iterations] (empty? []))))
> "Elapsed time: 2.294 msecs"
> "Elapsed time: 2191.256 msecs"
> nil
> user=> (let [iterations 100000000] (time (dotimes [_ iterations]              
>                                                                               
>                                                                               
>                                     
>                                            (identical? "" ""))) (time 
> (dotimes [_ iterations] (empty? ""))))
> "Elapsed time: 2.657 msecs"
> "Elapsed time: 4654.622 msecs"
> nil
> user=> (let [iterations 100000000] (time (dotimes [_ iterations]              
>                                                                               
>                                                                               
>                                     
>                                            (identical? () ()))) (time 
> (dotimes [_ iterations] (empty? ()))))
> "Elapsed time: 2.608 msecs"
> "Elapsed time: 2144.142 msecs"
> nil
>
> This isn't so surprising though, considering that 'identical?' is the 
> simplest possible test you could try--do two references point to the same 
> object in memory? It can't get any more efficient than that.
>
> Have all good days,
> David Sletten
>
>
>
> > I was working on some code recently found that my biggest performance
> > bottleneck was calling `seq` to check for emptiness. The calls to
> > `seq` were causing lots of object allocation and taking noticeable CPU
> > time. I switched to using `identical?` to explicitly compare against
> > the empty vector and was rewarded with a drastic reduction in
> > execution time.
>
> > Here are some hasty tests showing just how big the difference can be:
>
> > user=> (let [iterations 100000000] (time (dotimes [_ iterations]
> > (identical? [] []))) (time (dotimes [_ iterations] (seq []))))
> > "Elapsed time: 3.512 msecs"
> > "Elapsed time: 2512.366 msecs"
> > nil
> > user=> (let [iterations 100000000] (time (dotimes [_ iterations]
> > (identical? "" ""))) (time (dotimes [_ iterations] (seq ""))))
> > "Elapsed time: 3.898 msecs"
> > "Elapsed time: 5607.865 msecs"
> > nil
> > user=> (let [iterations 100000000] (time (dotimes [_ iterations]
> > (identical? () ()))) (time (dotimes [_ iterations] (seq ()))))
> > "Elapsed time: 3.768 msecs"
> > "Elapsed time: 2258.095 msecs"
> > nil
>
> > Has any thought been given to providing a faster `empty?` that is not
> > based on seq?
>
> > Thanks,
> > Eric Kobrin
>
> > --
> > You received this message because you are subscribed to the Google
> > Groups "Clojure" group.
> > To post to this group, send email to clojure@googlegroups.com
> > Note that posts from new members are moderated - please be patient with 
> > your first post.
> > To unsubscribe from this group, send email to
> > clojure+unsubscr...@googlegroups.com
> > For more options, visit this group at
> >http://groups.google.com/group/clojure?hl=en
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: Performance of seq on empty collections

Reply via email to