Re: how to set one map task for each input key-value pair

2009-11-26 Thread Jason Venner
The only thing that comes immediately to mind is to write your own custom input format that knows how to tell where the boundaries are in your data set, and uses those to specify the beginning and end of the input splits. You can also tell the framework not to split your individual input files by

mapreduce with non-text data

2009-11-26 Thread Upendra Dadi
Hi, Are there any use cases, examples of use of Hadoop MapReduce for non-text data? Only examples that I see on the web are for text data. Any pointers in that direction is greatly appreciated. Thanks. Regards, Upendra

how to set one map task for each input key-value pair

2009-11-26 Thread Upendra Dadi
Hi, I am trying to use MapReduce with some scientific data. I have key-value pairs such that the size of the value can range from few megabytes to several hundreds of megabytes. What happens when the size of the value exceeds block size? How do I set it up so that each key-value pair is assoc

RE: Environment for Hadoop Proto Typing - Amazon Web Services

2009-11-26 Thread Sirota, Peter
You can use Amazon Elastic MapReduce. It is a hosted Hadoop service which charges by the hour. Here is more information: http://aws.amazon.com/elasticmapreduce/ There are also a bunch of sample applications and tutorials available here: http://developer.amazonwebservices.com/connect/kbcategory.

Re: Environment for Hadoop Proto Typing - Amazon Web Services

2009-11-26 Thread Jeff Zhang
Amazon EC2 will charge you by hours, so I think it will fit for your requirement. Jeff Zhang On Thu, Nov 26, 2009 at 1:42 PM, Palikala, Rajendra (CCL) < rpalik...@carnival.com> wrote: > > Hi, > > I am planning to develop some proto-types on Hadoop for ETL to a > datwarehouse. But I don't have

Environment for Hadoop Proto Typing - Amazon Web Services

2009-11-26 Thread Palikala, Rajendra (CCL)
Hi, I am planning to develop some proto-types on Hadoop for ETL to a datwarehouse. But I don't have enough nodes (hardware/computers) to test the performance of Hadoop. I want to give a demo on performance. I heard of Amazon Web Services that they provide some services like this. But I am not