On Jul 22, 2:23 pm, Casey Webster <casey...@gmail.com> wrote: > I can't answer your question, but I would like to better understand > the > problem you are trying to solve. The Apache Hadoop/MapReduce java > application isn't really that "large" by modern standards, although it > is generally run with large heap sizes for performance (-Xmx1024m or > larger for the mapred.child.java.opts parameter). > > MapReduce is designed to do extremely fast parallel data set > processing > on terabytes of data over hundreds of physical nodes; what advantage > would a pure Python approach have here?
We're always taught that it is a good idea to reduce the number of dependencies for a project. So you could use Disco, or Dumbo, or even Skynet. But then you've introduced another system you have to manage. Each new system will have a learning curve, which is lessened if the system is written in an environment/language you already work with/ understand. And since a large cost with environments like erlang and java is in understanding them any issues that are out of the ordinary can be killer; changing the heap size as you mentioned above for Java could be one of these issues that a non-java dev trying to use Hadoop might come across. With the advent of cloud computing and the new semi-structured/ document databases that are coming to the fore sometimes you need to use MapReduce on smaller/fewer systems to the same effect. Also, you may need only to ensure that a job is done in a timely fashion without taking up too many resources, rather than lightening-fast. Dumbo/disco in these situations may be considered overkill. Implementations like BashReduce <http://blog.last.fm/2009/04/06/ mapreduce-bash-script> are perfect for such scenarios. I'm simply wondering if there's another simpler/smaller implementation of MapReduce that plays nicely with Python but doesn't require the setup/ knowledge overhead of more "robust" implementations such as hadoop and disco... maybe similar to Ruby's Skynet. -- http://mail.python.org/mailman/listinfo/python-list