Dear members, i try copy data from hdfs clusters. I have one cluster version: Hadoop 0.20.1+169.127 Subversion -r 2157de3c7179c7e244c907fb9c8804e1c076f050 Compiled by root on Sun Jan 16 19:29:48 UTC 2011 >From source with checksum e28f0ec421b292b8d07210057a756bc8
nn1 And another one: Hadoop 2.0.0-cdh4.1.1 Subversion file:///data/1/jenkins/workspace/generic-package-rhel64-6-0/topdir/BUILD/hadoop-2.0.0-cdh4.1.1/src/hadoop-common-project/hadoop-common -r 581959ba23e4af85afd8db98b7687662fe9c5f20 Compiled by jenkins on Tue Oct 16 11:19:12 PDT 2012 >From source with checksum 95f5c7f30b4030f1f327758e7b2bd61f nn2 When i try to copy data i get this error, i run this command on server with hadoop 2.0.0: hadoop distcp -p -i hftp://nn1:50070/test/100m hdfs:///testdata/ 12/10/29 12:03:25 INFO tools.DistCp: srcPaths=[hftp://css-st-heartbeat.scartel.dc:50070/test/100m] 12/10/29 12:03:25 INFO tools.DistCp: destPath=hdfs:/testdata 12/10/29 12:03:26 WARN conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id 12/10/29 12:03:26 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 12/10/29 12:03:28 INFO tools.DistCp: sourcePathsCount=1 12/10/29 12:03:28 INFO tools.DistCp: filesToCopyCount=1 12/10/29 12:03:28 INFO tools.DistCp: bytesToCopyCount=100.0m 12/10/29 12:03:28 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 12/10/29 12:03:28 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 12/10/29 12:03:28 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/10/29 12:03:28 INFO mapred.LocalJobRunner: OutputCommitter set in config null 12/10/29 12:03:28 INFO mapred.JobClient: Running job: job_local_0001 12/10/29 12:03:28 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 12/10/29 12:03:28 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 12/10/29 12:03:28 INFO util.ProcessTree: setsid exited with exit code 0 12/10/29 12:03:28 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@6c8b058b 12/10/29 12:03:28 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead 12/10/29 12:03:28 INFO mapred.MapTask: numReduceTasks: 0 12/10/29 12:03:28 INFO tools.DistCp: FAIL 100m : java.io.IOException: HTTP_OK expected, received 400 at org.apache.hadoop.hdfs.HftpFileSystem$RangeHeaderUrlOpener.connect(HftpFileSystem.java:365) at org.apache.hadoop.hdfs.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:119) at org.apache.hadoop.hdfs.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103) at org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:187) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:424) at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:547) at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:314) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:393) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:327) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:263) 12/10/29 12:03:28 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 12/10/29 12:03:28 INFO mapred.LocalJobRunner: 12/10/29 12:03:28 INFO mapred.Task: Task attempt_local_0001_m_000000_0 is allowed to commit now 12/10/29 12:03:28 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000000_0' to hdfs://test2.video.scartel.dc:8020/testdata/_distcp_logs_svpizr 12/10/29 12:03:28 INFO mapred.LocalJobRunner: Copied: 0 Skipped: 0 Failed: 1 12/10/29 12:03:28 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done. 12/10/29 12:03:29 INFO mapred.JobClient: map 100% reduce 0% 12/10/29 12:03:29 INFO mapred.JobClient: Job complete: job_local_0001 12/10/29 12:03:29 INFO mapred.JobClient: Counters: 26 12/10/29 12:03:29 INFO mapred.JobClient: File System Counters 12/10/29 12:03:29 INFO mapred.JobClient: FILE: Number of bytes read=175990 12/10/29 12:03:29 INFO mapred.JobClient: FILE: Number of bytes written=263692 12/10/29 12:03:29 INFO mapred.JobClient: FILE: Number of read operations=0 12/10/29 12:03:29 INFO mapred.JobClient: FILE: Number of large read operations=0 12/10/29 12:03:29 INFO mapred.JobClient: FILE: Number of write operations=0 12/10/29 12:03:29 INFO mapred.JobClient: HDFS: Number of bytes read=0 12/10/29 12:03:29 INFO mapred.JobClient: HDFS: Number of bytes written=975 12/10/29 12:03:29 INFO mapred.JobClient: HDFS: Number of read operations=7 12/10/29 12:03:29 INFO mapred.JobClient: HDFS: Number of large read operations=0 12/10/29 12:03:29 INFO mapred.JobClient: HDFS: Number of write operations=6 12/10/29 12:03:29 INFO mapred.JobClient: HFTP: Number of bytes read=0 12/10/29 12:03:29 INFO mapred.JobClient: HFTP: Number of bytes written=0 12/10/29 12:03:29 INFO mapred.JobClient: HFTP: Number of read operations=0 12/10/29 12:03:29 INFO mapred.JobClient: HFTP: Number of large read operations=0 12/10/29 12:03:29 INFO mapred.JobClient: HFTP: Number of write operations=0 12/10/29 12:03:29 INFO mapred.JobClient: Map-Reduce Framework 12/10/29 12:03:29 INFO mapred.JobClient: Map input records=1 12/10/29 12:03:29 INFO mapred.JobClient: Map output records=1 12/10/29 12:03:29 INFO mapred.JobClient: Input split bytes=145 12/10/29 12:03:29 INFO mapred.JobClient: Spilled Records=0 12/10/29 12:03:29 INFO mapred.JobClient: CPU time spent (ms)=0 12/10/29 12:03:29 INFO mapred.JobClient: Physical memory (bytes) snapshot=0 12/10/29 12:03:29 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0 12/10/29 12:03:29 INFO mapred.JobClient: Total committed heap usage (bytes)=250413056 12/10/29 12:03:29 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter 12/10/29 12:03:29 INFO mapred.JobClient: BYTES_READ=128 12/10/29 12:03:29 INFO mapred.JobClient: distcp 12/10/29 12:03:29 INFO mapred.JobClient: Bytes expected=104857600 12/10/29 12:03:29 INFO mapred.JobClient: Files failed=1 I don't understand why this error occur, may be it is a bug in Hadoop 0.20.1 or in Hadoop 2.0.0? -- С уважением Селявка Евгений