RE: question about file input format

2011-08-18 Thread Zhixuan Zhu
Thanks very much for the prompt reply! It makes perfect sense. I'll give it a try. Grace -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Thursday, August 18, 2011 10:03 AM To: common-dev@hadoop.apache.org Subject: Re: question about file input format Grace, In

Re: question about file input format

2011-08-18 Thread Harsh J
ailto:ha...@cloudera.com] > Sent: Wednesday, August 17, 2011 9:36 PM > To: common-dev@hadoop.apache.org > Subject: Re: question about file input format > > Zhixuan, > > You'll require two things here, as you've deduced correctly: > > Under InputFormat >

RE: question about file input format

2011-08-18 Thread Zhixuan Zhu
o read the file to memory right? How should I implement the next function accordingly? Thanks again, Grace -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Wednesday, August 17, 2011 9:36 PM To: common-dev@hadoop.apache.org Subject: Re: question about file input forma

Re: question about file input format

2011-08-17 Thread Harsh J
Zhixuan, You'll require two things here, as you've deduced correctly: Under InputFormat - isSplitable -> False - getRecordReader -> A simple implementation that reads the whole file's bytes to an array/your-construct and passes it (as part of next(), etc.). For example, here's a simple record re

Re: question about file input format

2011-08-17 Thread Arun C Murthy
What file format do you want to use ? If it's Text or SequenceFile, or any other existing derivative of FileInputFormat, just override isSplittable and rely on the actual RecordReader. Arun On Aug 17, 2011, at 3:58 PM, Zhixuan Zhu wrote: > I'm new Hadoop and currently using Hadoop 0.20.2 to tr

question about file input format

2011-08-17 Thread Zhixuan Zhu
I'm new Hadoop and currently using Hadoop 0.20.2 to try out some simple tasks. I'm trying to send each whole file of the input directory to the mapper without splitting them line by line. How should I set the input format class? I know I could derive a customized FileInputFormat class and override