Here's a very ugly low-level version just to show that it can be done: (ns clj-play.mapper (:use [clojure.java.io :only [reader]]) (:use [clojure.string :only [split]]) (:gen-class))
(set! *warn-on-reflection* true) (defn mapper [^java.io.BufferedReader r ^java.io.OutputStreamWriter out] (loop [^String line (.readLine r)] (when line (doseq [^String word (.split line "\\s+")] (.append out (.concat word "\t1\n")) (.flush out)) (recur (.readLine r))))) (defn -main [] (mapper (reader *in*) *out*)) I see that the Python version and the Clojure version are identical ~14.7-8s for 20 copies of the text so this looks like it's pretty much IO bound at this point. David On Fri, Jul 8, 2011 at 9:04 PM, David Nolen <dnolen.li...@gmail.com> wrote: > Running a program like that with cake run is awful, use AOT: > > (ns clj-play.mapper > (:use [clojure.java.io :only [reader]]) > (:use [clojure.string :only [split]]) > (:gen-class)) > > (defn mapper [lines] > (doseq [line lines] > (doseq [word (split line #"\s+")] > (println (str word "\t1"))))) > > (defn -main > [] > (mapper (line-seq (reader *in*)))) > > Run with something like: > > time java -server -cp ./classes:lib/clojure-1.3.0-beta1.jar foo.mapper < > input.txt > > I see that this takes around 16s w/ 20 copies of the text. Python is 13s > seconds. Use some lower level Java facilities and you'll likely trounce the > Python. > > David > > On Fri, Jul 8, 2011 at 7:05 PM, Christopher <vth...@gmail.com> wrote: > >> Hi all, >> >> I have recently been watching a set of videos from O'Reilly on >> MapReduce. The author of the series is using Python for all of the >> examples, but, in an effort to use Clojure more, I've been following >> along and writing my code in Clojure. When I implemented the mapper >> function that he described in both languages, I noticed that the >> Python version was running quite a bit faster and I was wondering if >> you all could help me understand why that is the case. I've pasted the >> code for each solution below. Also, I am using cake to run the Clojure >> code so my thoughts are, since it keeps a JVM up and running at all >> times, that should remove the JVM startup time from the equation. The >> input file that I am using is the Hound of the Baskervilles from >> Project Guttenberg (http://www.gutenberg.org/cache/epub/2852/ >> pg2852.txt). I've also noticed that with an even longer text as input >> (for example, I copied the text of the input.txt 10 times into a file) >> the Clojure code slows significantly more. In some cases I had to just >> stop the code with a Ctrl-c. Any ideas you all have on what could be >> causing this would be great. I'm not trying to start any battles >> between Python and Clojure, as I love them both, I'm strictly trying >> to learn how to be a better programmer in Clojure. >> >> Thanks ahead of time for any help you all can give. >> >> Christopher >> >> ;; mapper.clj >> >> (use ['clojure.java.io :only '(reader)]) >> (use ['clojure.string :only '(split)]) >> >> (defn mapper [lines] >> (doseq [line lines] >> (doseq [word (split line #"\s+")] >> (println (str word "\t1"))))) >> >> (mapper (line-seq (reader *in*))) >> >> >> I am running the code above with the following command and I get the >> output below >> >> % time cake run mapper.clj < input.txt >> real 0m3.573s >> user 0m2.031s >> sys 0m1.528s >> >> >> # mapper.py >> >> #!/usr/bin/env >> python >> >> import sys >> >> def mapper(lines): >> for line in lines: >> words = line.split() >> for word in words: >> print "{0}\t1".format(word) >> >> def main(): >> mapper(sys.stdin) >> >> if __name__ == '__main__': >> main() >> >> % time mapper.py < input.txt >> real 0m0.661s >> user 0m0.105s >> sys 0m0.083s >> >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clojure@googlegroups.com >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+unsubscr...@googlegroups.com >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en > > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en