Running a program like that with cake run is awful, use AOT: (ns clj-play.mapper (:use [clojure.java.io :only [reader]]) (:use [clojure.string :only [split]]) (:gen-class))
(defn mapper [lines] (doseq [line lines] (doseq [word (split line #"\s+")] (println (str word "\t1"))))) (defn -main [] (mapper (line-seq (reader *in*)))) Run with something like: time java -server -cp ./classes:lib/clojure-1.3.0-beta1.jar foo.mapper < input.txt I see that this takes around 16s w/ 20 copies of the text. Python is 13s seconds. Use some lower level Java facilities and you'll likely trounce the Python. David On Fri, Jul 8, 2011 at 7:05 PM, Christopher <vth...@gmail.com> wrote: > Hi all, > > I have recently been watching a set of videos from O'Reilly on > MapReduce. The author of the series is using Python for all of the > examples, but, in an effort to use Clojure more, I've been following > along and writing my code in Clojure. When I implemented the mapper > function that he described in both languages, I noticed that the > Python version was running quite a bit faster and I was wondering if > you all could help me understand why that is the case. I've pasted the > code for each solution below. Also, I am using cake to run the Clojure > code so my thoughts are, since it keeps a JVM up and running at all > times, that should remove the JVM startup time from the equation. The > input file that I am using is the Hound of the Baskervilles from > Project Guttenberg (http://www.gutenberg.org/cache/epub/2852/ > pg2852.txt). I've also noticed that with an even longer text as input > (for example, I copied the text of the input.txt 10 times into a file) > the Clojure code slows significantly more. In some cases I had to just > stop the code with a Ctrl-c. Any ideas you all have on what could be > causing this would be great. I'm not trying to start any battles > between Python and Clojure, as I love them both, I'm strictly trying > to learn how to be a better programmer in Clojure. > > Thanks ahead of time for any help you all can give. > > Christopher > > ;; mapper.clj > > (use ['clojure.java.io :only '(reader)]) > (use ['clojure.string :only '(split)]) > > (defn mapper [lines] > (doseq [line lines] > (doseq [word (split line #"\s+")] > (println (str word "\t1"))))) > > (mapper (line-seq (reader *in*))) > > > I am running the code above with the following command and I get the > output below > > % time cake run mapper.clj < input.txt > real 0m3.573s > user 0m2.031s > sys 0m1.528s > > > # mapper.py > > #!/usr/bin/env > python > > import sys > > def mapper(lines): > for line in lines: > words = line.split() > for word in words: > print "{0}\t1".format(word) > > def main(): > mapper(sys.stdin) > > if __name__ == '__main__': > main() > > % time mapper.py < input.txt > real 0m0.661s > user 0m0.105s > sys 0m0.083s > > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with > your first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en