Phillip, We've had great success writing simple, project specific algorithms to split content into chunks appropriate for ETL type, Python based processing in a hosted cloud environment like Amazon EC2 or the recently launched Rackspace Cloud Servers. Since we're purchasing our cloud hosting time in 1 hour blocks, we divide our data into much larger chunks than what a traditional map-reduce technique might use. For many of our projects, the data transfer time to and from the cloud takes the majority of clock time.
Malcolm -- http://mail.python.org/mailman/listinfo/python-list