Running a program like that with cake run is awful, use AOT:

(ns clj-play.mapper
  (:use [clojure.java.io :only [reader]])
  (:use [clojure.string :only [split]])
  (:gen-class))

(defn mapper [lines]
  (doseq [line lines]
    (doseq [word (split line #"\s+")]
           (println (str word "\t1")))))

(defn -main
  []
  (mapper (line-seq (reader *in*))))

Run with something like:

time java -server -cp ./classes:lib/clojure-1.3.0-beta1.jar foo.mapper <
input.txt

I see that this takes around 16s w/ 20 copies of the text. Python is 13s
seconds. Use some lower level Java facilities and you'll likely trounce the
Python.

David

On Fri, Jul 8, 2011 at 7:05 PM, Christopher <vth...@gmail.com> wrote:

> Hi all,
>
> I have recently been watching a set of videos from O'Reilly on
> MapReduce. The author of the series is using Python for all of the
> examples, but, in an effort to use Clojure more, I've been following
> along and writing my code in Clojure. When I implemented the mapper
> function that he described in both languages, I noticed that the
> Python version was running quite a bit faster and I was wondering if
> you all could help me understand why that is the case. I've pasted the
> code for each solution below. Also, I am using cake to run the Clojure
> code so my thoughts are, since it keeps a JVM up and running at all
> times, that should remove the JVM startup time from the equation. The
> input file that I am using is the Hound of the Baskervilles from
> Project Guttenberg (http://www.gutenberg.org/cache/epub/2852/
> pg2852.txt). I've also noticed that with an even longer text as input
> (for example, I copied the text of the input.txt 10 times into a file)
> the Clojure code slows significantly more. In some cases I had to just
> stop the code with a Ctrl-c. Any ideas you all have on what could be
> causing this would be great. I'm not trying to start any battles
> between Python and Clojure, as I love them both, I'm strictly trying
> to learn how to be a better programmer in Clojure.
>
> Thanks ahead of time for any help you all can give.
>
> Christopher
>
> ;; mapper.clj
>
> (use ['clojure.java.io :only '(reader)])
> (use ['clojure.string :only '(split)])
>
> (defn mapper [lines]
>  (doseq [line lines]
>    (doseq [word (split line #"\s+")]
>      (println (str word "\t1")))))
>
> (mapper (line-seq (reader *in*)))
>
>
> I am running the code above with the following command and I get the
> output below
>
> % time cake run mapper.clj < input.txt
> real    0m3.573s
> user    0m2.031s
> sys     0m1.528s
>
>
> # mapper.py
>
> #!/usr/bin/env
> python
>
> import sys
>
> def mapper(lines):
>    for line in lines:
>        words = line.split()
>        for word in words:
>            print "{0}\t1".format(word)
>
> def main():
>    mapper(sys.stdin)
>
> if __name__ == '__main__':
>    main()
>
> % time mapper.py < input.txt
> real    0m0.661s
> user    0m0.105s
> sys     0m0.083s
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to