Hi all,
I have recently been watching a set of videos from O'Reilly on
MapReduce. The author of the series is using Python for all of the
examples, but, in an effort to use Clojure more, I've been following
along and writing my code in Clojure. When I implemented the mapper
function that he described in both languages, I noticed that the
Python version was running quite a bit faster and I was wondering if
you all could help me understand why that is the case. I've pasted the
code for each solution below. Also, I am using cake to run the Clojure
code so my thoughts are, since it keeps a JVM up and running at all
times, that should remove the JVM startup time from the equation. The
input file that I am using is the Hound of the Baskervilles from
Project Guttenberg (http://www.gutenberg.org/cache/epub/2852/
pg2852.txt). I've also noticed that with an even longer text as input
(for example, I copied the text of the input.txt 10 times into a file)
the Clojure code slows significantly more. In some cases I had to just
stop the code with a Ctrl-c. Any ideas you all have on what could be
causing this would be great. I'm not trying to start any battles
between Python and Clojure, as I love them both, I'm strictly trying
to learn how to be a better programmer in Clojure.
Thanks ahead of time for any help you all can give.
Christopher
;; mapper.clj
(use ['clojure.java.io :only '(reader)])
(use ['clojure.string :only '(split)])
(defn mapper [lines]
(doseq [line lines]
(doseq [word (split line #"\s+")]
(println (str word "\t1")))))
(mapper (line-seq (reader *in*)))
I am running the code above with the following command and I get the
output below
% time cake run mapper.clj < input.txt
real 0m3.573s
user 0m2.031s
sys 0m1.528s
# mapper.py
#!/usr/bin/env
python
import sys
def mapper(lines):
for line in lines:
words = line.split()
for word in words:
print "{0}\t1".format(word)
def main():
mapper(sys.stdin)
if __name__ == '__main__':
main()
% time mapper.py < input.txt
real 0m0.661s
user 0m0.105s
sys 0m0.083s
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en