Hi Gary, Thanks for the correction. It was a spur-of-the-moment statement. And I find out that when reusing JVM, Clojure also needs to be paid attention to. For instance, I define an atom in the mapper namespace to use as a counter. It turns out that the next mapper task run in this JVM will inherit the value. So I need to reset! it in the mapper-setup function.
Also I tried :aot :all, it doesn't help. I guess it's the clojure.core namespace that takes a lot of startup time. As for the short running process of each mapper task, it's really an issue. Our hadoop team seems to cut the inputs into too small parts, so the number of mapper task is corresponding to the splits. Thus reusing JVM or use CombineFileInputFormat may help with this situation. After all, Clojure's not slow. I'm happy again~ On Saturday, April 27, 2013 3:58:01 AM UTC+8, Gary Trakhman wrote: > > this doesn't quite make sense: "Since Clojure is well-known for its > concurrency feature, running in the same JVM should be out of question." > > All the concurrency features built in to clojure are concerned with things > that happen in the same process, unless you consider things like 'making it > easier to use queues' to be those kinds of features that affect > multi-process stuff. > > A concurrency focus doesn't say anything about philosophy of > processes/threads or anything like that. MapReduce is dictating the > process model here which goes against the grain of what java's good at > doing, with the assumption that you'll be working with a *lot* of data and > won't care. Something that takes 21 seconds isn't optimal use of > mapreduce. Try it with something that takes a few minutes at least. > > However, you can help startup time in a number of ways, AOT compilation > can help a bit, as well as judicious use of third-party code and keeping > class dependencies low. > > > On Fri, Apr 26, 2013 at 12:02 PM, Ji Zhang <zhan...@gmail.com<javascript:> > > wrote: > >> Hi, >> >> I believe I can confirm that it's the startup time issue. >> >> I set mapred.job.reuse.jvm.num.tasks=-1 and the overall time is >> approximate to the pure java one. The best mapper task is the same time, >> 3sec. Before it is 10sec. The lowest task's difference is still 12sec >> (21sec - 9 sec). >> >> Since Clojure is well-known for its concurrency feature, running in the >> same JVM should be out of question. >> >> Besides, my company's hadoop team seems to cut the files into too small >> pieces, which result in too many mapper tasks. So the reuse is crucial for >> clojure. >> >> >> On Friday, April 26, 2013 6:05:33 PM UTC+8, Ji Zhang wrote: >>> >>> Hi, >>> >>> I'm writing map-reduce job with Clojure, yet to find that it seems to be >>> much slower than a Jave job. >>> >>> So I write a simple test case, and upload to gist: >>> https://gist.github.com/**jizhang/5466149<https://gist.github.com/jizhang/5466149> >>> >>> At the end of code, there is execution outputs, here are some >>> significant stats: >>> >>> Average time taken by Map tasks: Java 7sec, Clojure 19sec >>> CPU time spent (ms): Java 244,000, Clojure 1,145,440 >>> >>> I'm wondering what slows down the Clojure written map-reduce job. Am I >>> using it wrong, or it's just an inappropriate senario. >>> >>> Any thoughts will be great. Thanks! >>> >>> Jerry >>> >> -- >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clo...@googlegroups.com<javascript:> >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+u...@googlegroups.com <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> --- >> You received this message because you are subscribed to the Google Groups >> "Clojure" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to clojure+u...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.