Re: Why failed to use Distcp over FTP protocol?

sam liu Wed, 24 Apr 2013 19:38:02 -0700

I could execute:
- hadoop fs -ls ftp://ftpuser:ftpuser@hostname/tmp/testdir
- hadoop fs -lsr ftp://ftpuser:ftpuser@hostname/tmp/testdir


Is there any special requirement to ftp configurations for running distcp
tool? In my env, if issue 'hadoop fs -lsr ftp://ftpuser:ftpuser@hostname',
it will return the root path of my linux file system.


2013/4/24 Daryn Sharp <da...@yahoo-inc.com>

>  Listing the root is a bit of a special case that is different than N-many
> directories deep.  Can you list
> ftp://hadoopadm:xxxxxxxx@ftphostname/some/dir/file or
> ftp://hadoopadm:xxxxxxxx@ftphostname/some/dir?  I suspect ftp fs has a
> bug, so they will fail too.
>
>  On Apr 23, 2013, at 8:03 PM, sam liu wrote:
>
>  I can success execute "hadoop fs -ls 
> ftp://hadoopadm:xxxxxxxx@ftphostname<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>",
> it returns the root path of linux system.
>
> But failed to execute "hadoop fs -rm
> ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here";, and it returns:
> rm: Delete failed 
> ftp://hadoopadm:xxxxxxxx<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
> @ftphostname/some/path/here<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
>
>
> 2013/4/24 Daryn Sharp <da...@yahoo-inc.com>
>
>> The ftp fs is listing the contents of the given path's parent directory,
>> and then trying to match the basename of each child path returned against
>> the basename of the given path – quite inefficient…  The FNF is it didn't
>> find a match for the basename.  It may be that the ftp server isn't
>> returning a listing in exactly the expected format so it's being parsed
>> incorrectly.
>>
>>  Does "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here";
>> work?  Or "hadoop fs -rm
>> ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here";?  Those cmds should
>> exercise the same code paths where you are experiencing errors.
>>
>>  Daryn
>>
>>  On Apr 22, 2013, at 9:06 PM, sam liu wrote:
>>
>>  I encountered IOException and FileNotFoundException:
>>
>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>> attempt_201304160910_2135_m_
>> 000000_0, Status : FAILED
>> java.io.IOException: The temporary job-output directory
>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't
>>  exist!
>>     at
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>     at
>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>>     at
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>     at
>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>     at
>> java.security.AccessController.doPrivileged(AccessController.java:310)
>>     at javax.security.auth.Subject.doAs(Subject.java:573)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>
>>
>> ... ...
>>
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
>> job_201304160910_2135
>> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
>> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>> reduces waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps
>> waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map
>> Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
>> task_201304160910_2135_m_000000
>> With failures, global counters are inaccurate; consider running with -i
>> Copy failed: java.io.FileNotFoundException: File
>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not
>> exist.
>>     at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
>>     at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
>>     at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
>>     at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>>     at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>>     at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>     at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>
>>
>> 2013/4/23 sam liu <samliuhad...@gmail.com>
>>
>>> I encountered IOException and FileNotFoundException:
>>>
>>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>>> attempt_201304160910_2135_m_000000_0, Status : FAILED
>>> java.io.IOException: The temporary job-output directory
>>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't
>>>  exist!
>>>     at
>>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>>     at
>>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>>>     at
>>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>>     at
>>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
>>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>>     at
>>> java.security.AccessController.doPrivileged(AccessController.java:310)
>>>     at javax.security.auth.Subject.doAs(Subject.java:573)
>>>     at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>>
>>>
>>> ... ...
>>>
>>> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
>>> job_201304160910_2135
>>> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
>>> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>>> reduces waiting after reserving slots (ms)=0
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>>> maps waiting after reserving slots (ms)=0
>>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
>>> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map
>>> Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
>>> task_201304160910_2135_m_000000
>>> With failures, global counters are inaccurate; consider running with -i
>>> Copy failed: java.io.FileNotFoundException: File
>>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_tmp_i74spu does not
>>> exist.
>>>     at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
>>>     at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
>>>     at
>>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
>>>     at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
>>>     at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
>>>     at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>     at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>>>
>>>
>>> 2013/4/23 Daryn Sharp <da...@yahoo-inc.com>
>>>
>>>> I believe it should work…  What error message did you receive?
>>>>
>>>> Daryn
>>>>
>>>> On Apr 22, 2013, at 3:45 AM, sam liu wrote:
>>>>
>>>> > Hi Experts,
>>>> >
>>>> > I failed to execute following command, does not Distcp support FTP
>>>> protocol?
>>>> >
>>>> > hadoop distcp ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/file1.txt
>>>> > hdfs:///tmp/file1.txt
>>>> >
>>>> > Thanks!
>>>>
>>>>
>>>
>>
>>
>
>

Re: Why failed to use Distcp over FTP protocol?

Reply via email to