Thanks very much for the prompt reply! It makes perfect sense. I'll give
it a try.
Grace
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Thursday, August 18, 2011 10:03 AM
To: common-dev@hadoop.apache.org
Subject: Re: question about file input format
Grace,
In
ailto:ha...@cloudera.com]
> Sent: Wednesday, August 17, 2011 9:36 PM
> To: common-dev@hadoop.apache.org
> Subject: Re: question about file input format
>
> Zhixuan,
>
> You'll require two things here, as you've deduced correctly:
>
> Under InputFormat
>
o read the file to memory right? How should I implement the next
function accordingly?
Thanks again,
Grace
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Wednesday, August 17, 2011 9:36 PM
To: common-dev@hadoop.apache.org
Subject: Re: question about file input forma
Zhixuan,
You'll require two things here, as you've deduced correctly:
Under InputFormat
- isSplitable -> False
- getRecordReader -> A simple implementation that reads the whole
file's bytes to an array/your-construct and passes it (as part of
next(), etc.).
For example, here's a simple record re
What file format do you want to use ?
If it's Text or SequenceFile, or any other existing derivative of
FileInputFormat, just override isSplittable and rely on the actual RecordReader.
Arun
On Aug 17, 2011, at 3:58 PM, Zhixuan Zhu wrote:
> I'm new Hadoop and currently using Hadoop 0.20.2 to tr
I'm new Hadoop and currently using Hadoop 0.20.2 to try out some simple
tasks. I'm trying to send each whole file of the input directory to the
mapper without splitting them line by line. How should I set the input
format class? I know I could derive a customized FileInputFormat class
and override