MR does not read the files in the front-end (unless a partitioner such
as the TOP demands it). The actual block-level read is done via the
DFSClient class (its sub-classes DFSInputStream and DFSOutputStream -
the first one should be where your interest lies.)
All MR cares about is scheduling the d
Hi Vivian. Take a look at TextInputFormat and the RecordReader classes. This
is set via JobConf.setInputFormat().
-Chuck
-Original Message-
From: Vivi Lang [mailto:sqlxwei...@gmail.com]
Sent: Wednesday, September 12, 2012 5:10 PM
To: hdfs-dev@hadoop.apache.org
Subject: Question about
Hi
Pig by default use plain text file as input/output, unless you write a
custom LoadFunc/StoreFunc. There is no specific Pig storage format.
You can copy the file to local using copyToLocal. If you want to
export directly to SQL table, you need to write a StoreFunc. Pig work
on tuple rather than K,V
Correct way to format a namenode :
/bin/hdfs namenode -format -clusterid
PS: Set your environment right like common home etc.
Only first time it requires the cluster id, second time onwards it will
remember cluster id and prompt you to format this particular cluster id.
I have filed a Jira