Running a program like that with cake run is awful, use AOT:
(ns clj-play.mapper
(:use [clojure.java.io :only [reader]])
(:use [clojure.string :only [split]])
(:gen-class))
(defn mapper [lines]
(doseq [line lines]
(doseq [word (split line #"\s+")]
(println (str word "\t1")))))
(defn -main
[]
(mapper (line-seq (reader *in*))))
Run with something like:
time java -server -cp ./classes:lib/clojure-1.3.0-beta1.jar foo.mapper <
input.txt
I see that this takes around 16s w/ 20 copies of the text. Python is 13s
seconds. Use some lower level Java facilities and you'll likely trounce the
Python.
David
On Fri, Jul 8, 2011 at 7:05 PM, Christopher <[email protected]> wrote:
> Hi all,
>
> I have recently been watching a set of videos from O'Reilly on
> MapReduce. The author of the series is using Python for all of the
> examples, but, in an effort to use Clojure more, I've been following
> along and writing my code in Clojure. When I implemented the mapper
> function that he described in both languages, I noticed that the
> Python version was running quite a bit faster and I was wondering if
> you all could help me understand why that is the case. I've pasted the
> code for each solution below. Also, I am using cake to run the Clojure
> code so my thoughts are, since it keeps a JVM up and running at all
> times, that should remove the JVM startup time from the equation. The
> input file that I am using is the Hound of the Baskervilles from
> Project Guttenberg (http://www.gutenberg.org/cache/epub/2852/
> pg2852.txt). I've also noticed that with an even longer text as input
> (for example, I copied the text of the input.txt 10 times into a file)
> the Clojure code slows significantly more. In some cases I had to just
> stop the code with a Ctrl-c. Any ideas you all have on what could be
> causing this would be great. I'm not trying to start any battles
> between Python and Clojure, as I love them both, I'm strictly trying
> to learn how to be a better programmer in Clojure.
>
> Thanks ahead of time for any help you all can give.
>
> Christopher
>
> ;; mapper.clj
>
> (use ['clojure.java.io :only '(reader)])
> (use ['clojure.string :only '(split)])
>
> (defn mapper [lines]
> (doseq [line lines]
> (doseq [word (split line #"\s+")]
> (println (str word "\t1")))))
>
> (mapper (line-seq (reader *in*)))
>
>
> I am running the code above with the following command and I get the
> output below
>
> % time cake run mapper.clj < input.txt
> real 0m3.573s
> user 0m2.031s
> sys 0m1.528s
>
>
> # mapper.py
>
> #!/usr/bin/env
> python
>
> import sys
>
> def mapper(lines):
> for line in lines:
> words = line.split()
> for word in words:
> print "{0}\t1".format(word)
>
> def main():
> mapper(sys.stdin)
>
> if __name__ == '__main__':
> main()
>
> % time mapper.py < input.txt
> real 0m0.661s
> user 0m0.105s
> sys 0m0.083s
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to [email protected]
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en