Re: Why failed to use Distcp over FTP protocol?

2013-04-22 Thread sam liu
I encountered IOException and FileNotFoundException:

13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
attempt_201304160910_2135_m_
00_0, Status : FAILED
java.io.IOException: The temporary job-output directory
ftp://hadoopadm:@ftphostname/tmp/_distcp_logs_i74spu/_temporary
doesn't exist!
at
org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
at
org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
at
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
at
org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:820)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at
java.security.AccessController.doPrivileged(AccessController.java:310)
at javax.security.auth.Subject.doAs(Subject.java:573)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
at org.apache.hadoop.mapred.Child.main(Child.java:249)


... ...

13/04/17 17:11:42 INFO mapred.JobClient: Job complete: job_201304160910_2135
13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
13/04/17 17:11:42 INFO mapred.JobClient: Failed map tasks=1
13/04/17 17:11:42 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=33785
13/04/17 17:11:42 INFO mapred.JobClient: Launched map tasks=4
13/04/17 17:11:42 INFO mapred.JobClient: Total time spent by all
reduces waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
13/04/17 17:11:42 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=6436
13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks
exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201304160910_2135_m_00
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.FileNotFoundException: File
ftp://hadoopadm:@ftphostname/tmp/_distcp_tmp_i74spu does not exist.
at
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
at org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)


2013/4/23 sam liu 

> I encountered IOException and FileNotFoundException:
>
> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
> attempt_201304160910_2135_m_00_0, Status : FAILED
> java.io.IOException: The temporary job-output directory
> ftp://hadoopadm:@ftphostname/tmp/_distcp_logs_i74spu/_temporary
> doesn't exist!
> at
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
> at
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
> at
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
> at
> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:820)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at
> java.security.AccessController.doPrivileged(AccessController.java:310)
> at javax.security.auth.Subject.doAs(Subject.java:573)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
> ... ...
>
> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
> job_201304160910_2135
> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
> 13/04/17 17:11:42 INFO mapred.JobClient: Failed map tasks=1
> 13/04/17 17:11:42 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=33785
> 13/04/17 17:11:42 INFO mapred.JobClient: Launched map tasks=4
> 13/04/17 17:11:42 INFO mapred.JobClient: Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 13/04/17 17:11:42 INFO mapred.JobClient: Total time spent by all maps
> waiting after reserving slots (ms)=0
> 13/04/17 17:11:42 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=6436
> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks
> exceeded allowed limit. FailedCoun

For release 2.0.X, about when will have a stable release?

2013-04-22 Thread sam liu
Hi,

The current release of 2.0.X is 2.0.3-alpha, and about when will have a
stable release?

Sam Liu

Thanks!


Encounter 'error: possibly undefined macro: AC_PROG_LIBTOOL', when build Hadoop project in SUSE 11(x86_64)

2013-04-22 Thread sam liu
Hi Experts,

I failed to build Hadoop 1.1.1 source code project in SUSE 11(x86_64), and
encounter a issue:

 [exec] configure.ac:48: error: possibly undefined macro:
AC_PROG_LIBTOOL
 [exec]   If this token and others are legitimate, please use
m4_pattern_allow.
 [exec]   See the Autoconf documentation.
 [exec] autoreconf: /usr/local/bin/autoconf failed with exit status: 1

Even after installing libtool.x86_64 2.2.6b-13.16.1 on it, the issue still
exists.

Anyone knows this issue?

Thanks!

Sam Liu


Re: Encounter 'error: possibly undefined macro: AC_PROG_LIBTOOL', when build Hadoop project in SUSE 11(x86_64)

2013-04-23 Thread sam liu
autoconf.noarch 2.68-4.1


2013/4/23 Harsh J 

> What version of autoconf are you using?
>
> On Tue, Apr 23, 2013 at 12:18 PM, sam liu  wrote:
> > Hi Experts,
> >
> > I failed to build Hadoop 1.1.1 source code project in SUSE 11(x86_64),
> and
> > encounter a issue:
> >
> >  [exec] configure.ac:48: error: possibly undefined macro:
> > AC_PROG_LIBTOOL
> >  [exec]   If this token and others are legitimate, please use
> > m4_pattern_allow.
> >  [exec]   See the Autoconf documentation.
> >  [exec] autoreconf: /usr/local/bin/autoconf failed with exit status:
> 1
> >
> > Even after installing libtool.x86_64 2.2.6b-13.16.1 on it, the issue
> still
> > exists.
> >
> > Anyone knows this issue?
> >
> > Thanks!
> >
> > Sam Liu
>
>
>
> --
> Harsh J
>


Re: Why failed to use Distcp over FTP protocol?

2013-04-23 Thread sam liu
I can success execute "hadoop fs -ls
ftp://hadoopadm:@ftphostname<ftp://hadoopadm:@ftphostname/some/path/here>",
it returns the root path of linux system.

But failed to execute "hadoop fs -rm
ftp://hadoopadm:@ftphostname/some/path/here";, and it returns:
rm: Delete failed
ftp://hadoopadm:<ftp://hadoopadm:@ftphostname/some/path/here>
@ftphostname/some/path/here<ftp://hadoopadm:@ftphostname/some/path/here>


2013/4/24 Daryn Sharp 

>  The ftp fs is listing the contents of the given path's parent directory,
> and then trying to match the basename of each child path returned against
> the basename of the given path – quite inefficient…  The FNF is it didn't
> find a match for the basename.  It may be that the ftp server isn't
> returning a listing in exactly the expected format so it's being parsed
> incorrectly.
>
>  Does "hadoop fs -ls ftp://hadoopadm:@ftphostname/some/path/here";
> work?  Or "hadoop fs -rm
> ftp://hadoopadm:@ftphostname/some/path/here";?  Those cmds should
> exercise the same code paths where you are experiencing errors.
>
>  Daryn
>
>  On Apr 22, 2013, at 9:06 PM, sam liu wrote:
>
>  I encountered IOException and FileNotFoundException:
>
> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
> attempt_201304160910_2135_m_
> 00_0, Status : FAILED
> java.io.IOException: The temporary job-output directory
> ftp://hadoopadm:@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't
>  exist!
> at
> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
> at
> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
> at
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
> at
> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:820)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at
> java.security.AccessController.doPrivileged(AccessController.java:310)
> at javax.security.auth.Subject.doAs(Subject.java:573)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
> ... ...
>
> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
> job_201304160910_2135
> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
> 13/04/17 17:11:42 INFO mapred.JobClient: Failed map tasks=1
> 13/04/17 17:11:42 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=33785
> 13/04/17 17:11:42 INFO mapred.JobClient: Launched map tasks=4
> 13/04/17 17:11:42 INFO mapred.JobClient: Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 13/04/17 17:11:42 INFO mapred.JobClient: Total time spent by all maps
> waiting after reserving slots (ms)=0
> 13/04/17 17:11:42 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=6436
> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks
> exceeded allowed limit. FailedCount: 1. LastFailedTask:
> task_201304160910_2135_m_00
> With failures, global counters are inaccurate; consider running with -i
> Copy failed: java.io.FileNotFoundException: File
> ftp://hadoopadm:@ftphostname/tmp/_distcp_tmp_i74spu does not
> exist.
> at
> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
> at
> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
> at
> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:279)
> at org.apache.hadoop.tools.DistCp.fullyDelete(DistCp.java:963)
> at org.apache.hadoop.tools.DistCp.copy(DistCp.java:672)
> at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
>
>
> 2013/4/23 sam liu 
>
>> I encountered IOException and FileNotFoundException:
>>
>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>> attempt_201304160910_2135_m_00_0, Status : FAILED
>> java.io.IOException: The temporary job-output directory
>> ftp://hadoopadm:@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't
>>  exist!
>> at
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>> at
>> org.apache.hadoop.mapred.File

Re: Why failed to use Distcp over FTP protocol?

2013-04-23 Thread sam liu
Now,  I can successfully run "hadoop distcp
ftp://ftpuser:ftpuser@hostname/tmp/test1.txt
hdfs:///tmp/test1.txt"

But failed on "hadoop distcp hdfs:///tmp/test1.txt
ftp://ftpuser:ftpuser@hostname/tmp/test1.txt.v1";, it returns issue like:
attempt_20130440_0005_m_00_1: log4j:ERROR Could not connect to
remote log4j server at [localhost]. We will try again later.
13/04/23 18:59:05 INFO mapred.JobClient: Task Id :
attempt_20130440_0005_m_00_2, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
at
org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at
java.security.AccessController.doPrivileged(AccessController.java:310)
at javax.security.auth.Subject.doAs(Subject.java:573)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)


2013/4/24 sam liu 

> I can success execute "hadoop fs -ls 
> ftp://hadoopadm:@ftphostname<ftp://hadoopadm:@ftphostname/some/path/here>",
> it returns the root path of linux system.
>
> But failed to execute "hadoop fs -rm
> ftp://hadoopadm:@ftphostname/some/path/here";, and it returns:
> rm: Delete failed 
> ftp://hadoopadm:<ftp://hadoopadm:@ftphostname/some/path/here>
> @ftphostname/some/path/here<ftp://hadoopadm:@ftphostname/some/path/here>
>
>
> 2013/4/24 Daryn Sharp 
>
>>  The ftp fs is listing the contents of the given path's parent directory,
>> and then trying to match the basename of each child path returned against
>> the basename of the given path – quite inefficient…  The FNF is it didn't
>> find a match for the basename.  It may be that the ftp server isn't
>> returning a listing in exactly the expected format so it's being parsed
>> incorrectly.
>>
>>  Does "hadoop fs -ls ftp://hadoopadm:@ftphostname/some/path/here";
>> work?  Or "hadoop fs -rm
>> ftp://hadoopadm:@ftphostname/some/path/here";?  Those cmds should
>> exercise the same code paths where you are experiencing errors.
>>
>>  Daryn
>>
>>  On Apr 22, 2013, at 9:06 PM, sam liu wrote:
>>
>>  I encountered IOException and FileNotFoundException:
>>
>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>> attempt_201304160910_2135_m_
>> 00_0, Status : FAILED
>> java.io.IOException: The temporary job-output directory
>> ftp://hadoopadm:@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't
>>  exist!
>> at
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>> at
>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>> at
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>> at
>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:820)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> at
>> java.security.AccessController.doPrivileged(AccessController.java:310)
>> at javax.security.auth.Subject.doAs(Subject.java:573)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>
>>
>> ... ...
>>
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
>> job_201304160910_2135
>> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
>> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
>> 13/04/17 17:11:42 INFO mapred.JobClient: Failed map tasks=1
>> 13/04/17 17:11:42 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=33785
>> 13/04/17 17:11:42 INFO mapred.JobClient: Launched map tasks=4
>> 13/04/17 17:11:42 INFO mapred.JobClient: Total time spent by all
>> reduces waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient: Total time spent by all maps
>> waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=6436
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map
>> Tasks exceeded allowed limit. FailedCount

Re: Why failed to use Distcp over FTP protocol?

2013-04-23 Thread sam liu
If I execute 'hadoop distcp hdfs:///tmp/test1.txt
ftp://ftpuser:ftpuser@hostname/tmp/', the exception will be:
attempt_20130440_0006_m_00_1: log4j:ERROR Could not connect to
remote log4j server at [localhost]. We will try again later.
13/04/23 19:31:33 INFO mapred.JobClient: Task Id :
attempt_20130440_0006_m_00_2, Status : FAILED
java.io.IOException: Cannot rename parent(source):
ftp://ftpuser:ftpuser@hostname/tmp/_distcp_logs_o6gzfy/_temporary/_attempt_20130440_0006_m_00_2,
parent(destination):
ftp://ftpuser:ftpu...@bdvm104.svl.ibm.com/tmp/_distcp_logs_o6gzfy
at
org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:547)
at
org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:512)
at
org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:154)
at
org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:172)
at
org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:132)
at
org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:221)
at org.apache.hadoop.mapred.Task.commit(Task.java:1019)
at org.apache.hadoop.mapred.Task.done(Task.java:889)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:373)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at
java.security.AccessController.doPrivileged(AccessController.java:310)
at javax.security.auth.Subject.doAs(Subject.java:573)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)



2013/4/24 sam liu 

> Now,  I can successfully run "hadoop distcp 
> ftp://ftpuser:ftpuser@hostname/tmp/test1.txt
> hdfs:///tmp/test1.txt"
>
> But failed on "hadoop distcp hdfs:///tmp/test1.txt
> ftp://ftpuser:ftpuser@hostname/tmp/test1.txt.v1";, it returns issue like:
> attempt_20130440_0005_m_00_1: log4j:ERROR Could not connect to
> remote log4j server at [localhost]. We will try again later.
> 13/04/23 18:59:05 INFO mapred.JobClient: Task Id :
> attempt_20130440_0005_m_00_2, Status : FAILED
> java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
> at
> org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
>
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at
> java.security.AccessController.doPrivileged(AccessController.java:310)
> at javax.security.auth.Subject.doAs(Subject.java:573)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
> 2013/4/24 sam liu 
>
>> I can success execute "hadoop fs -ls 
>> ftp://hadoopadm:@ftphostname<ftp://hadoopadm:@ftphostname/some/path/here>",
>> it returns the root path of linux system.
>>
>> But failed to execute "hadoop fs -rm
>> ftp://hadoopadm:@ftphostname/some/path/here";, and it returns:
>> rm: Delete failed 
>> ftp://hadoopadm:<ftp://hadoopadm:@ftphostname/some/path/here>
>> @ftphostname/some/path/here<ftp://hadoopadm:@ftphostname/some/path/here>
>>
>>
>> 2013/4/24 Daryn Sharp 
>>
>>>  The ftp fs is listing the contents of the given path's parent
>>> directory, and then trying to match the basename of each child path
>>> returned against the basename of the given path – quite inefficient…  The
>>> FNF is it didn't find a match for the basename.  It may be that the ftp
>>> server isn't returning a listing in exactly the expected format so it's
>>> being parsed incorrectly.
>>>
>>>  Does "hadoop fs -ls ftp://hadoopadm:@ftphostname/some/path/here";
>>> work?  Or "hadoop fs -rm
>>> ftp://hadoopadm:@ftphostname/some/path/here";?  Those cmds
>>> should exercise the same code paths where you are experiencing errors.
>>>
>>>  Daryn
>>>
>>>  On Apr 22, 2013, at 9:06 PM, sam liu wrote:
>>>
>>>  I encountered IOException and FileNotFoundException:
>>>
>>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>>> attempt_201304160910_2135_m_
>>> 00_0, Status : FAILED
>>> java.io.IOException: The temporary job-output directory
>>> ftp://hadoopadm:

Re: Why failed to use Distcp over FTP protocol?

2013-04-24 Thread sam liu
I could execute:
- hadoop fs -ls ftp://ftpuser:ftpuser@hostname/tmp/testdir
- hadoop fs -lsr ftp://ftpuser:ftpuser@hostname/tmp/testdir

Is there any special requirement to ftp configurations for running distcp
tool? In my env, if issue 'hadoop fs -lsr ftp://ftpuser:ftpuser@hostname',
it will return the root path of my linux file system.


2013/4/24 Daryn Sharp 

>  Listing the root is a bit of a special case that is different than N-many
> directories deep.  Can you list
> ftp://hadoopadm:@ftphostname/some/dir/file or
> ftp://hadoopadm:@ftphostname/some/dir?  I suspect ftp fs has a
> bug, so they will fail too.
>
>  On Apr 23, 2013, at 8:03 PM, sam liu wrote:
>
>  I can success execute "hadoop fs -ls 
> ftp://hadoopadm:@ftphostname<ftp://hadoopadm:@ftphostname/some/path/here>",
> it returns the root path of linux system.
>
> But failed to execute "hadoop fs -rm
> ftp://hadoopadm:@ftphostname/some/path/here";, and it returns:
> rm: Delete failed 
> ftp://hadoopadm:<ftp://hadoopadm:@ftphostname/some/path/here>
> @ftphostname/some/path/here<ftp://hadoopadm:@ftphostname/some/path/here>
>
>
> 2013/4/24 Daryn Sharp 
>
>> The ftp fs is listing the contents of the given path's parent directory,
>> and then trying to match the basename of each child path returned against
>> the basename of the given path – quite inefficient…  The FNF is it didn't
>> find a match for the basename.  It may be that the ftp server isn't
>> returning a listing in exactly the expected format so it's being parsed
>> incorrectly.
>>
>>  Does "hadoop fs -ls ftp://hadoopadm:@ftphostname/some/path/here";
>> work?  Or "hadoop fs -rm
>> ftp://hadoopadm:@ftphostname/some/path/here";?  Those cmds should
>> exercise the same code paths where you are experiencing errors.
>>
>>  Daryn
>>
>>  On Apr 22, 2013, at 9:06 PM, sam liu wrote:
>>
>>  I encountered IOException and FileNotFoundException:
>>
>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>> attempt_201304160910_2135_m_
>> 00_0, Status : FAILED
>> java.io.IOException: The temporary job-output directory
>> ftp://hadoopadm:@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't
>>  exist!
>> at
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>> at
>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>> at
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>> at
>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.(MapTask.java:820)
>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>> at
>> java.security.AccessController.doPrivileged(AccessController.java:310)
>> at javax.security.auth.Subject.doAs(Subject.java:573)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>
>>
>> ... ...
>>
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
>> job_201304160910_2135
>> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
>> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
>> 13/04/17 17:11:42 INFO mapred.JobClient: Failed map tasks=1
>> 13/04/17 17:11:42 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=33785
>> 13/04/17 17:11:42 INFO mapred.JobClient: Launched map tasks=4
>> 13/04/17 17:11:42 INFO mapred.JobClient: Total time spent by all
>> reduces waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient: Total time spent by all maps
>> waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=6436
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map
>> Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
>> task_201304160910_2135_m_00
>> With failures, global counters are inaccurate; consider running with -i
>> Copy failed: java.io.FileNotFoundException: File
>> ftp://hadoopadm:@ftphostname/tmp/_distcp_tmp_i74spu does not
>> exist.
>> at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:419)
>> at
>> org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPFileSystem.java:302)
>> a

How to build hadoop-2.0.3-alpha-src project to get a project like hadoop-2.0.3-alpha?

2013-04-25 Thread sam liu
Hi,

I got hadoop-2.0.3-alpha-src.tar.gz and hadoop-2.0.3-alpha.tar.gz, but
found they have different structures as below:
- hadoop-2.0.3-alpha  contains folder/file:
bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin
share
- hadoop-2.0.3-alpha-src.tar.gz contains folder/file:
BUILDING.txt   hadoop-client  hadoop-hdfs-project
hadoop-project   hadoop-yarn-project
pom.xml   releasenotes.HDFS.2.0.3-alpha.html
dev-supporthadoop-common-project  hadoop-mapreduce-project
hadoop-project-dist  LICENSE.txt
README.txt
releasenotes.MAPREDUCE.2.0.3-alpha.html
hadoop-assemblies  hadoop-disthadoop-minicluster
hadoop-tools NOTICE.txt
releasenotes.HADOOP.2.0.3-alpha.html  releasenotes.YARN.2.0.3-alpha.html

And then, in hadoop-2.0.3-alpha-src, I successfully run 'mvn package -Pdist
-DskipTests -Dtar', but do not know how to get a target project which has
similar folder/file structure like the downloaded 'hadoop-2.0.3-alpha'
project after building. Any suggestions?

Thanks!

Sam Liu


Failed to install openssl-devel 1.0.0-20.el6 on OS RHELS 6.3 x86_64

2013-04-26 Thread sam liu
Hi,

For building Hadoop on OS RHELS 6.3 x86_64, I tried to install
openssl-devel, but failed. The exception is as below. The required version
of glibc-common is 2.12-1.47.el6, but mine installed one is 2.12-1.80.el6
and newer than it. Why does it fail? How to resolve this issue?

---> Package nss-softokn-freebl.i686 0:3.12.9-11.el6 will be installed
--> Finished Dependency Resolution
Error: Package: glibc-2.12-1.47.el6.i686 (rhel-cd)
   Requires: glibc-common = 2.12-1.47.el6
   Installed: glibc-common-2.12-1.80.el6.x86_64
(@anaconda-RedHatEnterpriseLinux-201206132210.x86_64/6.3)
   glibc-common = 2.12-1.80.el6
   Available: glibc-common-2.12-1.47.el6.x86_64 (rhel-cd)
   glibc-common = 2.12-1.47.el6
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest


Sam Liu

Thanks!


Why could not find finished jobs in yarn.resourcemanager.webapp.address?

2013-05-01 Thread sam liu
Hi,

I launched yarn and its webapp on port 18088, and then successfully
launched and executed some test MR jobs like 'hadoop jar
share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.3-alpha.jar pi 2 30'.

But, when login the web console in browser, I could find any finished jobs
in the 'FINISHED Applications' tab. Why?


Thanks!

Sam Liu


Re: Why could not find finished jobs in yarn.resourcemanager.webapp.address?

2013-05-02 Thread sam liu
Can anyone help this issue? Thanks!


2013/5/2 sam liu 

> Hi,
>
> I launched yarn and its webapp on port 18088, and then successfully
> launched and executed some test MR jobs like 'hadoop jar
> share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.3-alpha.jar pi 2 30'.
>
> But, when login the web console in browser, I could find any finished jobs
> in the 'FINISHED Applications' tab. Why?
>
>
> Thanks!
>
> Sam Liu
>


What's the difference between release 1.1.1, 1.2.0 and 3.0.0?

2012-11-27 Thread sam liu
Hi Experts,

Who can answer my following questions? We want to know which release is
suitable to us.Thanks a lot!

- What's the difference between release 1.1.1, 1.2.0 and 3.0.0?
- What are their release time?

Sam Liu


Re: What's the difference between release 1.1.1, 1.2.0 and 3.0.0?

2012-11-27 Thread sam liu
Hi Harsh,

Thanks very much for your detailed explanation!

For 1.x line, we really want to know which release could be used by us, so
have further questions:
- Is 1.2.0 more advanced that 1.1.1?
- Do we have general release time of above two releases?

For 2.x line:
- Will its stable release contain all fixes and features of 1.x line?
- Can we know the general release time of the coming stable release of 2.x
line?

Sam Liu

2012/11/28 Harsh J 

> Hi,
>
> [Speaking with HDFS in mind]
>
> The 1.x line is the current stable/maintenance line that has features
> similar to that of 0.20.x before it, with append+sync and security features
> added on top of the pre-existing HDFS.
>
> The 2.x line carries several fixes and brand-new features (high
> availability, protobuf RPCs, federated namenodes, etc.) for HDFS, along
> with several performance optimizations, and is quite a big improvement over
> the 1.x line. The last release of 2.x, was 2.0.2, released a couple of
> months ago IIRC. This branch is very new, and is approaching full stability
> soon (Although, there's been no blocker kinda problems with HDFS at least,
> AFAICT).
>
> 3.x is an placeholder value for "trunk", it has not been branched for any
> release yet. We are currently focussed on improving the 2.x line further.
>
>
> On Wed, Nov 28, 2012 at 9:01 AM, sam liu  wrote:
>
> > Hi Experts,
> >
> > Who can answer my following questions? We want to know which release is
> > suitable to us.Thanks a lot!
> >
> > - What's the difference between release 1.1.1, 1.2.0 and 3.0.0?
> > - What are their release time?
> >
> > Sam Liu
> >
>
>
>
> --
> Harsh J
>


[jira] [Created] (HDFS-4527) For shortening the time of TaskTracker heartbeat, decouple the statics collection operations

2013-02-23 Thread sam liu (JIRA)
sam liu created HDFS-4527:
-

 Summary: For shortening the time of TaskTracker heartbeat, 
decouple the statics collection operations
 Key: HDFS-4527
 URL: https://issues.apache.org/jira/browse/HDFS-4527
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: performance
Affects Versions: 1.1.1
Reporter: sam liu


In each heartbeat of TaskTracker, it will calculate some system statics, like 
the free disk space, available virtual/physical memory, cpu usage, etc. 
However, it's not necessary to calculate all the statics in every heartbeat, 
and this will consume many system resource and impace the performance of 
TaskTracker heartbeat. Furthermore, the characteristics of system 
properties(disk, memory, cpu) are different and it's better to collect their 
statics in different intervals.

To reduce the latency of TaskTracker heartbeat, one solution is to decouple all 
the system statics collection operations from it, and issue separate threads to 
do the statics collection works when the TaskTracker starts. The threads could 
be three: the first one is to collect cpu related statics in a short interval; 
the second one is to collect memory related statics in a normal interval; the 
third one is to collect disk related statics in a long interval. And all the 
interval could be customized by the parameter 
"mapred.stats.collection.interval" in the mapred-site.xml. At last, the 
heartbeat could get values of system statics from the memory directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5046) Hang when add/remove a datanode into/from a 2 datanode cluster

2013-07-30 Thread sam liu (JIRA)
sam liu created HDFS-5046:
-

 Summary: Hang when add/remove a datanode into/from a 2 datanode 
cluster
 Key: HDFS-5046
 URL: https://issues.apache.org/jira/browse/HDFS-5046
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 1.1.1
 Environment: Red Hat Enterprise Linux Server release 5.3, 64 bit
Reporter: sam liu


1. Install a Hadoop 1.1.1 cluster, with 2 datanodes: dn1 and dn2. And, in 
hdfs-site.xml, set the 'dfs.replication' to 2
2. Add node dn3 into the cluster as a new datanode, and did not change the 
'dfs.replication' value in hdfs-site.xml and keep it as 2
note: step 2 passed
3. Decommission dn3 from the cluster
Expected result: dn3 could be decommissioned successfully
Actual result:
a). decommission progress hangs and the status always be 'Waiting DataNode 
status: Decommissioned'. But, if I execute 'hadoop dfs -setrep -R 2 /', the 
decommission continues and will be completed finally.
b). However, if the initial cluster includes >= 3 datanodes, this issue won't 
be encountered when add/remove another datanode. For example, if I setup a 
cluster with 3 datanodes, and then I can successfully add the 4th datanode into 
it, and then also can successfully remove the 4th datanode from the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-7002) Failed to rolling upgrade hdfs from 2.2.0 to 2.4.1

2014-09-05 Thread sam liu (JIRA)
sam liu created HDFS-7002:
-

 Summary: Failed to rolling upgrade hdfs from 2.2.0 to 2.4.1
 Key: HDFS-7002
 URL: https://issues.apache.org/jira/browse/HDFS-7002
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: journal-node, namenode, qjm
Affects Versions: 2.4.1, 2.2.0
Reporter: sam liu
Priority: Blocker






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7053) Failed to rollback hdfs version from 2.4.1 to 2.2.0

2014-09-11 Thread sam liu (JIRA)
sam liu created HDFS-7053:
-

 Summary: Failed to rollback hdfs version from 2.4.1 to 2.2.0
 Key: HDFS-7053
 URL: https://issues.apache.org/jira/browse/HDFS-7053
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, namenode
Affects Versions: 2.4.1
Reporter: sam liu
Priority: Blocker


I can successfully upgrade from 2.2.0 to 2.4.1 with QJM HA enabled and with 
downtime, but failed to rollback from 2.4.1 to 2.2.0. The error message:
 2014-09-10 16:50:29,599 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: 
Exception in namenode join
 org.apache.hadoop.HadoopIllegalArgumentException: Invalid startup option. 
Cannot perform DFS upgrade with HA enabled.
  at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1207)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320)
 2014-09-10 16:50:29,601 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7114) Secondary NameNode failed to rollback from 2.4.1 to 2.2.0

2014-09-21 Thread sam liu (JIRA)
sam liu created HDFS-7114:
-

 Summary: Secondary NameNode failed to rollback from 2.4.1 to 2.2.0
 Key: HDFS-7114
 URL: https://issues.apache.org/jira/browse/HDFS-7114
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.2.0
Reporter: sam liu
Priority: Blocker


Can upgrade from 2.2.0 to 2.4.1, but failed to rollback the secondary namenode 
with following issue.

2014-09-22 10:41:28,358 FATAL 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Failed to start 
secondary namenode
org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
version of storage directory /var/hadoop/tmp/hdfs/dfs/namesecondary. Reported: 
-56. Expecting = -47.
at 
org.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1082)
at 
org.apache.hadoop.hdfs.server.common.Storage.setFieldsFromProperties(Storage.java:890)
at 
org.apache.hadoop.hdfs.server.namenode.NNStorage.setFieldsFromProperties(NNStorage.java:585)
at 
org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:921)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.recoverCreate(SecondaryNameNode.java:913)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:249)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:199)
at 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:652)
2014-09-22 10:41:28,360 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1
2014-09-22 10:41:28,363 INFO 
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: SHUTDOWN_MSG:



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7585) TestEnhancedByteBufferAccess hard code the block size

2015-01-05 Thread sam liu (JIRA)
sam liu created HDFS-7585:
-

 Summary: TestEnhancedByteBufferAccess hard code the block size
 Key: HDFS-7585
 URL: https://issues.apache.org/jira/browse/HDFS-7585
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: sam liu
Assignee: sam liu
Priority: Blocker


The test TestEnhancedByteBufferAccess hard code the block size, and it fails 
with exceptions on power linux.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7624) TestFileAppendRestart hardcode block size without considering native OS

2015-01-15 Thread sam liu (JIRA)
sam liu created HDFS-7624:
-

 Summary: TestFileAppendRestart hardcode block size without 
considering native OS
 Key: HDFS-7624
 URL: https://issues.apache.org/jira/browse/HDFS-7624
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: sam liu
Assignee: sam liu


TestFileAppendRestart hardcode block size with 'BLOCK_SIZE = 4096', however 
it's incorrect on some platforms. For example, on power platform, the correct 
value is 65536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7625) TestPersistBlocks hardcode block size without considering native OS

2015-01-15 Thread sam liu (JIRA)
sam liu created HDFS-7625:
-

 Summary: TestPersistBlocks hardcode block size without considering 
native OS
 Key: HDFS-7625
 URL: https://issues.apache.org/jira/browse/HDFS-7625
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: sam liu
Assignee: sam liu


TestPersistBlocks hardcode block size with 'BLOCK_SIZE = 4096', however it's 
incorrect on some platforms. For example, on power platform, the correct value 
is 65536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7626) TestPipelinesFailover hardcode block size without considering native OS

2015-01-15 Thread sam liu (JIRA)
sam liu created HDFS-7626:
-

 Summary: TestPipelinesFailover hardcode block size without 
considering native OS
 Key: HDFS-7626
 URL: https://issues.apache.org/jira/browse/HDFS-7626
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: sam liu
Assignee: sam liu


TestPipelinesFailover hardcode block size with 'BLOCK_SIZE = 4096', however 
it's incorrect on some platforms. For example, on power platform, the correct 
value is 65536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7627) TestCacheDirectives hardcode block size without considering native OS

2015-01-15 Thread sam liu (JIRA)
sam liu created HDFS-7627:
-

 Summary: TestCacheDirectives hardcode block size without 
considering native OS
 Key: HDFS-7627
 URL: https://issues.apache.org/jira/browse/HDFS-7627
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: sam liu
Assignee: sam liu


TestCacheDirectives hardcode block size with 'BLOCK_SIZE = 4096', however it's 
incorrect on some platforms. For example, on power platform, the correct value 
is 65536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7628) TestNameEditsConfigs hardcode block size without considering native OS

2015-01-15 Thread sam liu (JIRA)
sam liu created HDFS-7628:
-

 Summary: TestNameEditsConfigs hardcode block size without 
considering native OS
 Key: HDFS-7628
 URL: https://issues.apache.org/jira/browse/HDFS-7628
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: sam liu
Assignee: sam liu


TestNameEditsConfigs hardcode block size with 'BLOCK_SIZE = 4096', however it's 
incorrect on some platforms. For example, on power platform, the correct value 
is 65536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7629) TestDisableConnCache hardcode block size without considering native OS

2015-01-15 Thread sam liu (JIRA)
sam liu created HDFS-7629:
-

 Summary: TestDisableConnCache hardcode block size without 
considering native OS
 Key: HDFS-7629
 URL: https://issues.apache.org/jira/browse/HDFS-7629
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: sam liu
Assignee: sam liu


TestDisableConnCache hardcode block size with 'BLOCK_SIZE = 4096', however it's 
incorrect on some platforms. For example, on power platform, the correct value 
is 65536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7630) TestConnCache hardcode block size without considering native OS

2015-01-15 Thread sam liu (JIRA)
sam liu created HDFS-7630:
-

 Summary: TestConnCache hardcode block size without considering 
native OS
 Key: HDFS-7630
 URL: https://issues.apache.org/jira/browse/HDFS-7630
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: sam liu
Assignee: sam liu
 Attachments: HDFS-7630.001.patch

TestConnCache hardcode block size with 'BLOCK_SIZE = 4096', however it's 
incorrect on some platforms. For example, on power platform, the correct value 
is 65536.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)