Here's a very ugly low-level version just to show that it can be done:

(ns clj-play.mapper
  (:use [clojure.java.io :only [reader]])
  (:use [clojure.string :only [split]])
  (:gen-class))

(set! *warn-on-reflection* true)

(defn mapper [^java.io.BufferedReader r ^java.io.OutputStreamWriter out]
  (loop [^String line (.readLine r)]
    (when line
      (doseq [^String word (.split line "\\s+")]
        (.append out (.concat word "\t1\n"))
        (.flush out))
      (recur (.readLine r)))))

(defn -main
  []
  (mapper (reader *in*) *out*))

I see that the Python version and the Clojure version are identical ~14.7-8s
for 20 copies of the text so this looks like it's pretty much IO bound at
this point.

David

On Fri, Jul 8, 2011 at 9:04 PM, David Nolen <dnolen.li...@gmail.com> wrote:

> Running a program like that with cake run is awful, use AOT:
>
> (ns clj-play.mapper
>   (:use [clojure.java.io :only [reader]])
>   (:use [clojure.string :only [split]])
>   (:gen-class))
>
> (defn mapper [lines]
>   (doseq [line lines]
>     (doseq [word (split line #"\s+")]
>            (println (str word "\t1")))))
>
> (defn -main
>   []
>   (mapper (line-seq (reader *in*))))
>
> Run with something like:
>
> time java -server -cp ./classes:lib/clojure-1.3.0-beta1.jar foo.mapper <
> input.txt
>
> I see that this takes around 16s w/ 20 copies of the text. Python is 13s
> seconds. Use some lower level Java facilities and you'll likely trounce the
> Python.
>
> David
>
> On Fri, Jul 8, 2011 at 7:05 PM, Christopher <vth...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have recently been watching a set of videos from O'Reilly on
>> MapReduce. The author of the series is using Python for all of the
>> examples, but, in an effort to use Clojure more, I've been following
>> along and writing my code in Clojure. When I implemented the mapper
>> function that he described in both languages, I noticed that the
>> Python version was running quite a bit faster and I was wondering if
>> you all could help me understand why that is the case. I've pasted the
>> code for each solution below. Also, I am using cake to run the Clojure
>> code so my thoughts are, since it keeps a JVM up and running at all
>> times, that should remove the JVM startup time from the equation. The
>> input file that I am using is the Hound of the Baskervilles from
>> Project Guttenberg (http://www.gutenberg.org/cache/epub/2852/
>> pg2852.txt). I've also noticed that with an even longer text as input
>> (for example, I copied the text of the input.txt 10 times into a file)
>> the Clojure code slows significantly more. In some cases I had to just
>> stop the code with a Ctrl-c. Any ideas you all have on what could be
>> causing this would be great. I'm not trying to start any battles
>> between Python and Clojure, as I love them both, I'm strictly trying
>> to learn how to be a better programmer in Clojure.
>>
>> Thanks ahead of time for any help you all can give.
>>
>> Christopher
>>
>> ;; mapper.clj
>>
>> (use ['clojure.java.io :only '(reader)])
>> (use ['clojure.string :only '(split)])
>>
>> (defn mapper [lines]
>>  (doseq [line lines]
>>    (doseq [word (split line #"\s+")]
>>      (println (str word "\t1")))))
>>
>> (mapper (line-seq (reader *in*)))
>>
>>
>> I am running the code above with the following command and I get the
>> output below
>>
>> % time cake run mapper.clj < input.txt
>> real    0m3.573s
>> user    0m2.031s
>> sys     0m1.528s
>>
>>
>> # mapper.py
>>
>> #!/usr/bin/env
>> python
>>
>> import sys
>>
>> def mapper(lines):
>>    for line in lines:
>>        words = line.split()
>>        for word in words:
>>            print "{0}\t1".format(word)
>>
>> def main():
>>    mapper(sys.stdin)
>>
>> if __name__ == '__main__':
>>    main()
>>
>> % time mapper.py < input.txt
>> real    0m0.661s
>> user    0m0.105s
>> sys     0m0.083s
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clojure@googlegroups.com
>> Note that posts from new members are moderated - please be patient with
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+unsubscr...@googlegroups.com
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to