Hi Shengjie, This question is specific to CDH and hence does not belong to the Apache HDFS development lists (Which is for HDFS project developers). I've hence moved your question to CDH's own user lists cdh-u...@cloudera.org ( https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user ).
My answers inline. On Fri, Dec 7, 2012 at 6:57 PM, Shengjie Min <kelvin....@gmail.com> wrote: > Hi, > > Is there any instructions or documents covering migration from hadoop hdfs > cdh3 to cdh4 since all the docs I found are talking about in place > upgrading ONLY? > You are correct that at present there is no migration guide. I'll reach out to the docs team behind the site to add one in as it may be helpful to others too. > I have two hadoop clusters, My target is to use hadoop -cp to copy all the > hdfs files from *cluster1* to*cluster2* > > *Cluster1:* Hadoop 0.20.2-cdh3u4 > > *Cluster2:* Hadoop 2.0.0-cdh4.1.1 > > Now, even just running dfs -ls command against *cluster1* remotely on * > cluster2* as below: > > hadoop fs -ls hdfs://cluster1-namenode:8020/hbase > Using regular FS commands (using hdfs:// Scheme) between CDH3 and CDH4 will not work as both have different protocol versions (and are incompatible with one another over regular RPC calls). It is normal to see the exception you got there when you attempt this. > I think it's due to the hadoop version difference. In my case, cdh3 cluster > doesn't have mapred deployed which rules out all the distcp, bhase > copytable options. And the hbase replication ability is not available on > cdh3 cluster neither. I am struggling to think of a way to migrate the hdfs > data from *cluster1* to *cluster2.* > > HDFS provides a DistCp tool that lets you do this. It leverages mapreduce to run in a fast manner, and copies provided paths completely. DistCp can also leverage the HFTP file system (hftp://) that is exposed by HDFS over the web server (Simple HTTP based HDFS access) You can invoke on your CDH4 HDFS cluster the following command for more options: $ hadoop distcp What you may probably need is: $ hadoop distcp hftp://cdh3-namenode:50070/<path to copy> <destination on CDH4> > -- > All the best, > Shengjie Min > -- Harsh J