Hi David,

Thanks for the comments and the code rewrite. This is excellent
information. I just tried it out on my own system and got the same
results. This is a really great example of how to optimize Clojure
code. I'm considering using Clojure for some more research-oriented
work where I will need to analyze large chunks of data and getting
insight like this into how to properly optimize the code is
invaluable.

Thanks a bunch you guys for all the help, I really appreciate it and I
learned quite a bit.

Christopher

On Jul 8, 6:23 pm, David Nolen <dnolen.li...@gmail.com> wrote:
> Here's a very ugly low-level version just to show that it can be done:
>
> (ns clj-play.mapper
>   (:use [clojure.java.io :only [reader]])
>   (:use [clojure.string :only [split]])
>   (:gen-class))
>
> (set! *warn-on-reflection* true)
>
> (defn mapper [^java.io.BufferedReader r ^java.io.OutputStreamWriter out]
>   (loop [^String line (.readLine r)]
>     (when line
>       (doseq [^String word (.split line "\\s+")]
>         (.append out (.concat word "\t1\n"))
>         (.flush out))
>       (recur (.readLine r)))))
>
> (defn -main
>   []
>   (mapper (reader *in*) *out*))
>
> I see that the Python version and the Clojure version are identical ~14.7-8s
> for 20 copies of the text so this looks like it's pretty much IO bound at
> this point.
>
> David
>
>
>
> On Fri, Jul 8, 2011 at 9:04 PM, David Nolen <dnolen.li...@gmail.com> wrote:
> > Running a program like that with cake run is awful, use AOT:
>
> > (ns clj-play.mapper
> >   (:use [clojure.java.io :only [reader]])
> >   (:use [clojure.string :only [split]])
> >   (:gen-class))
>
> > (defn mapper [lines]
> >   (doseq [line lines]
> >     (doseq [word (split line #"\s+")]
> >            (println (str word "\t1")))))
>
> > (defn -main
> >   []
> >   (mapper (line-seq (reader *in*))))
>
> > Run with something like:
>
> > time java -server -cp ./classes:lib/clojure-1.3.0-beta1.jar foo.mapper <
> > input.txt
>
> > I see that this takes around 16s w/ 20 copies of the text. Python is 13s
> > seconds. Use some lower level Java facilities and you'll likely trounce the
> > Python.
>
> > David
>
> > On Fri, Jul 8, 2011 at 7:05 PM, Christopher <vth...@gmail.com> wrote:
>
> >> Hi all,
>
> >> I have recently been watching a set of videos from O'Reilly on
> >> MapReduce. The author of the series is using Python for all of the
> >> examples, but, in an effort to use Clojure more, I've been following
> >> along and writing my code in Clojure. When I implemented the mapper
> >> function that he described in both languages, I noticed that the
> >> Python version was running quite a bit faster and I was wondering if
> >> you all could help me understand why that is the case. I've pasted the
> >> code for each solution below. Also, I am using cake to run the Clojure
> >> code so my thoughts are, since it keeps a JVM up and running at all
> >> times, that should remove the JVM startup time from the equation. The
> >> input file that I am using is the Hound of the Baskervilles from
> >> Project Guttenberg (http://www.gutenberg.org/cache/epub/2852/
> >> pg2852.txt). I've also noticed that with an even longer text as input
> >> (for example, I copied the text of the input.txt 10 times into a file)
> >> the Clojure code slows significantly more. In some cases I had to just
> >> stop the code with a Ctrl-c. Any ideas you all have on what could be
> >> causing this would be great. I'm not trying to start any battles
> >> between Python and Clojure, as I love them both, I'm strictly trying
> >> to learn how to be a better programmer in Clojure.
>
> >> Thanks ahead of time for any help you all can give.
>
> >> Christopher
>
> >> ;; mapper.clj
>
> >> (use ['clojure.java.io :only '(reader)])
> >> (use ['clojure.string :only '(split)])
>
> >> (defn mapper [lines]
> >>  (doseq [line lines]
> >>    (doseq [word (split line #"\s+")]
> >>      (println (str word "\t1")))))
>
> >> (mapper (line-seq (reader *in*)))
>
> >> I am running the code above with the following command and I get the
> >> output below
>
> >> % time cake run mapper.clj < input.txt
> >> real    0m3.573s
> >> user    0m2.031s
> >> sys     0m1.528s
>
> >> # mapper.py
>
> >> #!/usr/bin/env
> >> python
>
> >> import sys
>
> >> def mapper(lines):
> >>    for line in lines:
> >>        words = line.split()
> >>        for word in words:
> >>            print "{0}\t1".format(word)
>
> >> def main():
> >>    mapper(sys.stdin)
>
> >> if __name__ == '__main__':
> >>    main()
>
> >> % time mapper.py < input.txt
> >> real    0m0.661s
> >> user    0m0.105s
> >> sys     0m0.083s
>
> >> --
> >> You received this message because you are subscribed to the Google
> >> Groups "Clojure" group.
> >> To post to this group, send email to clojure@googlegroups.com
> >> Note that posts from new members are moderated - please be patient with
> >> your first post.
> >> To unsubscribe from this group, send email to
> >> clojure+unsubscr...@googlegroups.com
> >> For more options, visit this group at
> >>http://groups.google.com/group/clojure?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to