The only thing that comes immediately to mind is to write your own custom
input format that knows how to tell where the boundaries are in your data
set, and uses those to specify the beginning and end of the input splits.
You can also tell the framework not to split your individual input files by
Hi,
Are there any use cases, examples of use of Hadoop MapReduce for non-text
data? Only examples that I see on the web are for text data. Any pointers in
that direction is greatly appreciated. Thanks.
Regards,
Upendra
Hi,
I am trying to use MapReduce with some scientific data. I have key-value
pairs such that the size of the value can range from few megabytes to
several hundreds of megabytes. What happens when the size of the value
exceeds block size? How do I set it up so that each key-value pair is
assoc
You can use Amazon Elastic MapReduce. It is a hosted Hadoop service which
charges by the hour. Here is more information:
http://aws.amazon.com/elasticmapreduce/
There are also a bunch of sample applications and tutorials available here:
http://developer.amazonwebservices.com/connect/kbcategory.
Amazon EC2 will charge you by hours, so I think it will fit for your
requirement.
Jeff Zhang
On Thu, Nov 26, 2009 at 1:42 PM, Palikala, Rajendra (CCL) <
rpalik...@carnival.com> wrote:
>
> Hi,
>
> I am planning to develop some proto-types on Hadoop for ETL to a
> datwarehouse. But I don't have
Hi,
I am planning to develop some proto-types on Hadoop for ETL to a datwarehouse.
But I don't have enough nodes (hardware/computers) to test the performance of
Hadoop. I want to give a demo on performance. I heard of Amazon Web Services
that they provide some services like this. But I am not