Re: Question about

2012-09-13 Thread Harsh J
MR does not read the files in the front-end (unless a partitioner such as the TOP demands it). The actual block-level read is done via the DFSClient class (its sub-classes DFSInputStream and DFSOutputStream - the first one should be where your interest lies.) All MR cares about is scheduling the d

RE: Question about

2012-09-12 Thread Charles Baker
Hi Vivian. Take a look at TextInputFormat and the RecordReader classes. This is set via JobConf.setInputFormat(). -Chuck -Original Message- From: Vivi Lang [mailto:sqlxwei...@gmail.com] Sent: Wednesday, September 12, 2012 5:10 PM To: hdfs-dev@hadoop.apache.org Subject: Question about Hi

Re: Question about pig & HDFS

2011-08-26 Thread Daniel Dai
Pig by default use plain text file as input/output, unless you write a custom LoadFunc/StoreFunc. There is no specific Pig storage format. You can copy the file to local using copyToLocal. If you want to export directly to SQL table, you need to write a StoreFunc. Pig work on tuple rather than K,V

Re: Question about hadoop namenode -format -clusterid

2011-05-11 Thread Bharath Mundlapudi
Correct way to format a namenode : /bin/hdfs namenode -format -clusterid PS: Set your environment right like common home etc. Only first time it requires the cluster id, second time onwards it will remember cluster id and prompt you to format this particular cluster id. I have filed a Jira