Hi David, Thanks for the comments and the code rewrite. This is excellent information. I just tried it out on my own system and got the same results. This is a really great example of how to optimize Clojure code. I'm considering using Clojure for some more research-oriented work where I will need to analyze large chunks of data and getting insight like this into how to properly optimize the code is invaluable.
Thanks a bunch you guys for all the help, I really appreciate it and I learned quite a bit. Christopher On Jul 8, 6:23 pm, David Nolen <dnolen.li...@gmail.com> wrote: > Here's a very ugly low-level version just to show that it can be done: > > (ns clj-play.mapper > (:use [clojure.java.io :only [reader]]) > (:use [clojure.string :only [split]]) > (:gen-class)) > > (set! *warn-on-reflection* true) > > (defn mapper [^java.io.BufferedReader r ^java.io.OutputStreamWriter out] > (loop [^String line (.readLine r)] > (when line > (doseq [^String word (.split line "\\s+")] > (.append out (.concat word "\t1\n")) > (.flush out)) > (recur (.readLine r))))) > > (defn -main > [] > (mapper (reader *in*) *out*)) > > I see that the Python version and the Clojure version are identical ~14.7-8s > for 20 copies of the text so this looks like it's pretty much IO bound at > this point. > > David > > > > On Fri, Jul 8, 2011 at 9:04 PM, David Nolen <dnolen.li...@gmail.com> wrote: > > Running a program like that with cake run is awful, use AOT: > > > (ns clj-play.mapper > > (:use [clojure.java.io :only [reader]]) > > (:use [clojure.string :only [split]]) > > (:gen-class)) > > > (defn mapper [lines] > > (doseq [line lines] > > (doseq [word (split line #"\s+")] > > (println (str word "\t1"))))) > > > (defn -main > > [] > > (mapper (line-seq (reader *in*)))) > > > Run with something like: > > > time java -server -cp ./classes:lib/clojure-1.3.0-beta1.jar foo.mapper < > > input.txt > > > I see that this takes around 16s w/ 20 copies of the text. Python is 13s > > seconds. Use some lower level Java facilities and you'll likely trounce the > > Python. > > > David > > > On Fri, Jul 8, 2011 at 7:05 PM, Christopher <vth...@gmail.com> wrote: > > >> Hi all, > > >> I have recently been watching a set of videos from O'Reilly on > >> MapReduce. The author of the series is using Python for all of the > >> examples, but, in an effort to use Clojure more, I've been following > >> along and writing my code in Clojure. When I implemented the mapper > >> function that he described in both languages, I noticed that the > >> Python version was running quite a bit faster and I was wondering if > >> you all could help me understand why that is the case. I've pasted the > >> code for each solution below. Also, I am using cake to run the Clojure > >> code so my thoughts are, since it keeps a JVM up and running at all > >> times, that should remove the JVM startup time from the equation. The > >> input file that I am using is the Hound of the Baskervilles from > >> Project Guttenberg (http://www.gutenberg.org/cache/epub/2852/ > >> pg2852.txt). I've also noticed that with an even longer text as input > >> (for example, I copied the text of the input.txt 10 times into a file) > >> the Clojure code slows significantly more. In some cases I had to just > >> stop the code with a Ctrl-c. Any ideas you all have on what could be > >> causing this would be great. I'm not trying to start any battles > >> between Python and Clojure, as I love them both, I'm strictly trying > >> to learn how to be a better programmer in Clojure. > > >> Thanks ahead of time for any help you all can give. > > >> Christopher > > >> ;; mapper.clj > > >> (use ['clojure.java.io :only '(reader)]) > >> (use ['clojure.string :only '(split)]) > > >> (defn mapper [lines] > >> (doseq [line lines] > >> (doseq [word (split line #"\s+")] > >> (println (str word "\t1"))))) > > >> (mapper (line-seq (reader *in*))) > > >> I am running the code above with the following command and I get the > >> output below > > >> % time cake run mapper.clj < input.txt > >> real 0m3.573s > >> user 0m2.031s > >> sys 0m1.528s > > >> # mapper.py > > >> #!/usr/bin/env > >> python > > >> import sys > > >> def mapper(lines): > >> for line in lines: > >> words = line.split() > >> for word in words: > >> print "{0}\t1".format(word) > > >> def main(): > >> mapper(sys.stdin) > > >> if __name__ == '__main__': > >> main() > > >> % time mapper.py < input.txt > >> real 0m0.661s > >> user 0m0.105s > >> sys 0m0.083s > > >> -- > >> You received this message because you are subscribed to the Google > >> Groups "Clojure" group. > >> To post to this group, send email to clojure@googlegroups.com > >> Note that posts from new members are moderated - please be patient with > >> your first post. > >> To unsubscribe from this group, send email to > >> clojure+unsubscr...@googlegroups.com > >> For more options, visit this group at > >>http://groups.google.com/group/clojure?hl=en -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en