[jira] [Created] (HDFS-6063) TestAclCLI fails intermittently

2014-03-06 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-6063:
--

 Summary: TestAclCLI fails intermittently
 Key: HDFS-6063
 URL: https://issues.apache.org/jira/browse/HDFS-6063
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe


TestAclCLI seems to fail intermittently when running Test ID: \[24\]: 
copyFromLocal: copying file into a directory with a default ACL.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6064) DFSConfigKeys.DFS_BLOCKREPORT_INTERVAL_MSEC_DEFAULT is not updated with latest block report interval of 6 hrs

2014-03-06 Thread Vinayakumar B (JIRA)
Vinayakumar B created HDFS-6064:
---

 Summary: DFSConfigKeys.DFS_BLOCKREPORT_INTERVAL_MSEC_DEFAULT is 
not updated with latest block report interval of 6 hrs
 Key: HDFS-6064
 URL: https://issues.apache.org/jira/browse/HDFS-6064
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Minor


DFSConfigKeys.DFS_BLOCKREPORT_INTERVAL_MSEC_DEFAULT is not updated with latest 
block report interval of 6 hrs

where as hdfs-default.xml having value of 6hours



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6063) TestAclCLI fails intermittently when running test 24: copyFromLocal

2014-03-06 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HDFS-6063.
--

Resolution: Duplicate

This jira should be a duplicate of HDFS-6058.

> TestAclCLI fails intermittently when running test 24: copyFromLocal
> ---
>
> Key: HDFS-6063
> URL: https://issues.apache.org/jira/browse/HDFS-6063
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>
> TestAclCLI seems to fail intermittently when running Test ID: \[24\]: 
> copyFromLocal: copying file into a directory with a default ACL.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6065) HDFS zero-copy reads should return empty buffer on EOF

2014-03-06 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-6065:
--

 Summary: HDFS zero-copy reads should return empty buffer on EOF 
 Key: HDFS-6065
 URL: https://issues.apache.org/jira/browse/HDFS-6065
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


It would be nice if the HDFS zero-copy reads mechanism returned an empty buffer 
on EOF.  Currently, it throws UnsupportedOperationException on EOF when using 
zero-copy reads, which seems confusing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6066) logGenerationStamp is not needed to reduce editlog size

2014-03-06 Thread chenping (JIRA)
chenping created HDFS-6066:
--

 Summary: logGenerationStamp is not needed to reduce editlog size
 Key: HDFS-6066
 URL: https://issues.apache.org/jira/browse/HDFS-6066
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: chenping
Priority: Minor


almost every logGenerationStamp has a logAddBlock too, so we can get the newest 
gs from the logAddBlock operation indirectly.this will reduce the edit log size.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: FileSystem and FileContext Janitor, at your service !

2014-03-06 Thread Steve Loughran
On 5 March 2014 19:07, Jay Vyas  wrote:

> Hi HCFS Community :)
>
> This is Jay...  Some of you know me I hack on a broad range of file
> system and hadoop ecosystem interoperability stuff.  I just wanted to
> introduce myself and let you folks know im going to be working to help
> clean up the existing unit testing frameworks for the FileSystem and
> FileContext APIs.  I've listed some bullets below .
>
> - byte code inspection based code coverage for file system APIs with a tool
> such as corbertura.
>
> - HADOOP-9361 points out that there are many different types of file
> systems.
>
>
It adds a lot more structure to the tests with an XML declaration of each
FS (in the -test) JAR.

It's pretty much complete except for some discrepancies between file:// and
hdfs that I need to fix in file:
-handling of mkdirs if the destination exists and is a file (currently:
returns 0)
-seek() on a closed stream. Currently appears to work,  at least on OS/X.


> - Creating mock file systems which can be used to validate API tests, which
> emulate different FS semantics (atomic directory creation, eventual
> consistency, strict consistency, POSIX compliance, append support, etc...)
>

That's an interesting thought, adding some inconsistency semantics on top
of an existing FS to emulate blobstore
behaviour. How would you do this? A in-memory RAM FS could do some of this,
but to test YARN it has to be visible across processes.
We'd really need an in-ram simulation of semantics that also offered an RPC
API of some form.



>
> Is anyone interested in the above issues or have any opinions on how /
> where i should get started?
>
> Our end goal is to have a more transparent and portable set of test APIs
> for the hadoop file system implementors, across the board : so that we can
> all test our individual implementations confidently.
>
> So, anywhere i can lend a hand - let me know.  I think this effort will
> require all of us in the file system community to join forces, and it will
> benefit us all immensly in the long run as well.
>
>
I should do another '9361 patch, once I get those final quirks in file://
sorted out so that it is consistent with HDFS.
1. HDFS is and continues to be, the definition of the semantics of all
filesystem interfaces.
2. It'd be good if we understood more about what accidental features of the
FS code depends on. e.g. does anything rely on mkdirs() being atomic? Of
0x00 being a valid char in a filename? How do programs fail when blocksize
is too small (try setting it to 1 and see how pig reacts)? How much code
depends on close() being near-instantaneous and never failing? Blobstores
do their write then, and can break both these requirements -which is
something a mock FS could add atop file:

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Build failed in Jenkins: Hadoop-Hdfs-trunk #1693

2014-03-06 Thread Apache Jenkins Server
See 

Changes:

[cmccabe] HDFS-6061. Allow dfs.datanode.shared.file.descriptor.path to contain 
multiple entries and fall back when needed (cmccabe)

[wheat9] HDFS-6058. Fix TestHDFSCLI failures after HADOOP-8691 change. 
Contributed by Akira Ajisaka and Haohui Mai.

[cmccabe] HDFS-6057. DomainSocketWatcher.watcherThread should be marked as a 
daemon thread (cmccabe)

[wheat9] HADOOP-10386. Log proxy hostname in various exceptions being thrown in 
a HA setup. Contributed by Haohui Mai.

[arp] HADOOP-10211. Enable RPC protocol to negotiate SASL-QOP values between 
clients and servers. (Contributed by Benoy Antony)

[atm] HDFS-5898. Allow NFS gateway to login/relogin from its kerberos keytab. 
Contributed by Abin Shahab.

[wheat9] HDFS-5857. TestWebHDFS#testNamenodeRestart fails intermittently with 
NPE. Contributed By Mit Desai.

[vinodkv] YARN-1761. Modified RMAdmin CLI to check whether HA is enabled or not 
before it executes any of the HA admin related commands. Contributed by Xuan 
Gong.

[jianhe] YARN-1752. Fixed ApplicationMasterService to reject unregister request 
if AM did not register before. Contributed by Rohith Sharma.

[brandonli] HDFS-6044. Add property for setting the NFS look up time for users. 
Contributed by Brandon Li

[brandonli] HDFS-6043. Give HDFS daemons NFS3 and Portmap their own OPTS. 
Contributed by Brandon Li

[kasha] YARN-1785. FairScheduler treats app lookup failures as ERRORs. (bc Wong 
via kasha)

[jing9] HDFS-5167. Add metrics about the NameNode retry cache. Contributed by 
Tsuyoshi OZAWA.

--
[...truncated 7722 lines...]
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestInputPath.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestReduceFetch.java
AU
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/NotificationTestCase.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMiniMRChildTask.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRClientClusterFactory.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/EmptyInputFormat.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMRCJCFileInputFormat.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestIFileStreams.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/MiniMRYarnClusterAdapter.java
AU
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestIFile.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestReduceTask.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/jobcontrol
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/jobcontrol/TestLocalJobControl.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/jobcontrol/TestJobControl.java
AU
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/jobcontrol/JobControlTestUtils.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestMiniMRClientCluster.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestCommandLineJobSubmission.java
AU
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestConcatenatedCompressedInput.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientRedirect.java
A 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop

Hadoop-Hdfs-trunk - Build # 1693 - Still Failing

2014-03-06 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1693/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 7915 lines...]
[Hadoop-Hdfs-trunk] $ /bin/bash -x /tmp/hudson8617401815617124843.sh
+ source 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/nightly/hudsonEnv.sh
++ export JAVA_HOME=/home/jenkins/tools/java/latest
++ JAVA_HOME=/home/jenkins/tools/java/latest
++ export ANT_HOME=/home/jenkins/tools/ant/latest
++ ANT_HOME=/home/jenkins/tools/ant/latest
++ export XERCES_HOME=/home/jenkins/tools/xerces/latest
++ XERCES_HOME=/home/jenkins/tools/xerces/latest
++ export ECLIPSE_HOME=/home/jenkins/tools/eclipse/latest
++ ECLIPSE_HOME=/home/jenkins/tools/eclipse/latest
++ export FORREST_HOME=/home/jenkins/tools/forrest/latest
++ FORREST_HOME=/home/jenkins/tools/forrest/latest
++ export JAVA5_HOME=/home/jenkins/tools/java5/latest
++ JAVA5_HOME=/home/jenkins/tools/java5/latest
++ export FINDBUGS_HOME=/home/jenkins/tools/findbugs/latest
++ FINDBUGS_HOME=/home/jenkins/tools/findbugs/latest
++ export CLOVER_HOME=/home/jenkins/tools/clover/latest
++ CLOVER_HOME=/home/jenkins/tools/clover/latest
++ export MAVEN_HOME=/home/jenkins/tools/maven/latest
++ MAVEN_HOME=/home/jenkins/tools/maven/latest
++ export 
PATH=/home/hudson/tools/java/latest1.6/bin:/home/hudson/tools/java/latest1.6/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/jenkins/tools/java/latest/bin:/home/jenkins/tools/ant/latest/bin:
++ 
PATH=/home/hudson/tools/java/latest1.6/bin:/home/hudson/tools/java/latest1.6/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/jenkins/tools/java/latest/bin:/home/jenkins/tools/ant/latest/bin:
++ export ANT_OPTS=-Xmx2048m
++ ANT_OPTS=-Xmx2048m
++ export MAVEN_OPTS=-Xmx2048m
++ MAVEN_OPTS=-Xmx2048m
++ TRUNK=/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/trunk
++ ulimit -n
32768
+ export MAVEN_OPTS=-Xmx2048m
+ MAVEN_OPTS=-Xmx2048m
+ cd trunk/
+ /home/jenkins/tools/maven/latest/bin/mvn clean install -DskipTests 
-Drequire.test.libhadoop -Pnative
Error occurred during initialization of VM
Cannot create VM thread. Out of system resources.
+ cd hadoop-hdfs-project
+ /home/jenkins/tools/maven/latest/bin/mvn clean verify checkstyle:checkstyle 
findbugs:findbugs -Drequire.test.libhadoop -Pdist -Pnative -Dtar -Pdocs -fae
Error occurred during initialization of VM
Cannot create VM thread. Out of system resources.
Build step 'Execute shell' marked build as failure
Archiving artifacts
Recording test results
Updating HDFS-6044
Updating HADOOP-10211
Updating HDFS-6043
Updating HDFS-6061
Updating HDFS-5167
Updating YARN-1752
Updating YARN-1761
Updating HADOOP-8691
Updating HADOOP-10386
Updating HDFS-6058
Updating HDFS-5898
Updating HDFS-6057
Updating YARN-1785
Updating HDFS-5857
Sending e-mails to: hdfs-dev@hadoop.apache.org
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
No tests ran.

Re: FileSystem and FileContext Janitor, at your service !

2014-03-06 Thread Jay Vyas
steve you mentioned:

>> but to test YARN it has to be visible across processes.

What do you mean by "test yarn"?   I think for the FileSystem APIs unit
testing, we dont care about YARN, do we?





On Thu, Mar 6, 2014 at 6:02 AM, Steve Loughran wrote:

> On 5 March 2014 19:07, Jay Vyas  wrote:
>
> > Hi HCFS Community :)
> >
> > This is Jay...  Some of you know me I hack on a broad range of file
> > system and hadoop ecosystem interoperability stuff.  I just wanted to
> > introduce myself and let you folks know im going to be working to help
> > clean up the existing unit testing frameworks for the FileSystem and
> > FileContext APIs.  I've listed some bullets below .
> >
> > - byte code inspection based code coverage for file system APIs with a
> tool
> > such as corbertura.
> >
> > - HADOOP-9361 points out that there are many different types of file
> > systems.
> >
> >
> It adds a lot more structure to the tests with an XML declaration of each
> FS (in the -test) JAR.
>
> It's pretty much complete except for some discrepancies between file:// and
> hdfs that I need to fix in file:
> -handling of mkdirs if the destination exists and is a file (currently:
> returns 0)
> -seek() on a closed stream. Currently appears to work,  at least on OS/X.
>
>
> > - Creating mock file systems which can be used to validate API tests,
> which
> > emulate different FS semantics (atomic directory creation, eventual
> > consistency, strict consistency, POSIX compliance, append support,
> etc...)
> >
>
> That's an interesting thought, adding some inconsistency semantics on top
> of an existing FS to emulate blobstore
> behaviour. How would you do this? A in-memory RAM FS could do some of this,
> but to test YARN it has to be visible across processes.
> We'd really need an in-ram simulation of semantics that also offered an RPC
> API of some form.
>
>
>
> >
> > Is anyone interested in the above issues or have any opinions on how /
> > where i should get started?
> >
> > Our end goal is to have a more transparent and portable set of test APIs
> > for the hadoop file system implementors, across the board : so that we
> can
> > all test our individual implementations confidently.
> >
> > So, anywhere i can lend a hand - let me know.  I think this effort will
> > require all of us in the file system community to join forces, and it
> will
> > benefit us all immensly in the long run as well.
> >
> >
> I should do another '9361 patch, once I get those final quirks in file://
> sorted out so that it is consistent with HDFS.
> 1. HDFS is and continues to be, the definition of the semantics of all
> filesystem interfaces.
> 2. It'd be good if we understood more about what accidental features of the
> FS code depends on. e.g. does anything rely on mkdirs() being atomic? Of
> 0x00 being a valid char in a filename? How do programs fail when blocksize
> is too small (try setting it to 1 and see how pig reacts)? How much code
> depends on close() being near-instantaneous and never failing? Blobstores
> do their write then, and can break both these requirements -which is
> something a mock FS could add atop file:
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Jay Vyas
http://jayunit100.blogspot.com


In-Memory Reference FS implementations

2014-03-06 Thread Jay Vyas
As part of HADOOP-9361, im visioning this.

1) - We create In Memory FS implementation of different Reference
FileSystems, each of which specifies appropriate tests , and passes those
tests , i.e.

   InMemStrictlyConsistentFS (i.e. hdfs)
   InMemEventuallyConsistentFS (blob stores)
   InMemMinmalFS (a very minimal gaurantee FS, for maybe

The beauty of this is - it gives us simple, easily testable reference
implementations that we can base our complex real world file system unit
tests off of.

2) Then, downstream vendors can just "pick" which of these file systems
they are most close to, and modify their particular file system to declare
semantics using the matching FS as a template.



-- 
Jay Vyas
http://jayunit100.blogspot.com


Re: FileSystem and FileContext Janitor, at your service !

2014-03-06 Thread Steve Loughran
I was thinking to test YARN-hosted apps like MapReduce, we need to see how
they handle filesystems with different consistency/atomicity models, and
YARN -even MiniYARNCluster -forks things off.

If the MR commit logic is isolated, that could be tested in the JUnit JVM.
But for other applications -example: Tez, its probably too complex to mock




On 6 March 2014 16:17, Jay Vyas  wrote:

> steve you mentioned:
>
> >> but to test YARN it has to be visible across processes.
>
> What do you mean by "test yarn"?   I think for the FileSystem APIs unit
> testing, we dont care about YARN, do we?
>
>
>
>
>
> On Thu, Mar 6, 2014 at 6:02 AM, Steve Loughran  >wrote:
>
> > On 5 March 2014 19:07, Jay Vyas  wrote:
> >
> > > Hi HCFS Community :)
> > >
> > > This is Jay...  Some of you know me I hack on a broad range of file
> > > system and hadoop ecosystem interoperability stuff.  I just wanted to
> > > introduce myself and let you folks know im going to be working to help
> > > clean up the existing unit testing frameworks for the FileSystem and
> > > FileContext APIs.  I've listed some bullets below .
> > >
> > > - byte code inspection based code coverage for file system APIs with a
> > tool
> > > such as corbertura.
> > >
> > > - HADOOP-9361 points out that there are many different types of file
> > > systems.
> > >
> > >
> > It adds a lot more structure to the tests with an XML declaration of each
> > FS (in the -test) JAR.
> >
> > It's pretty much complete except for some discrepancies between file://
> and
> > hdfs that I need to fix in file:
> > -handling of mkdirs if the destination exists and is a file (currently:
> > returns 0)
> > -seek() on a closed stream. Currently appears to work,  at least on OS/X.
> >
> >
> > > - Creating mock file systems which can be used to validate API tests,
> > which
> > > emulate different FS semantics (atomic directory creation, eventual
> > > consistency, strict consistency, POSIX compliance, append support,
> > etc...)
> > >
> >
> > That's an interesting thought, adding some inconsistency semantics on top
> > of an existing FS to emulate blobstore
> > behaviour. How would you do this? A in-memory RAM FS could do some of
> this,
> > but to test YARN it has to be visible across processes.
> > We'd really need an in-ram simulation of semantics that also offered an
> RPC
> > API of some form.
> >
> >
> >
> > >
> > > Is anyone interested in the above issues or have any opinions on how /
> > > where i should get started?
> > >
> > > Our end goal is to have a more transparent and portable set of test
> APIs
> > > for the hadoop file system implementors, across the board : so that we
> > can
> > > all test our individual implementations confidently.
> > >
> > > So, anywhere i can lend a hand - let me know.  I think this effort will
> > > require all of us in the file system community to join forces, and it
> > will
> > > benefit us all immensly in the long run as well.
> > >
> > >
> > I should do another '9361 patch, once I get those final quirks in file://
> > sorted out so that it is consistent with HDFS.
> > 1. HDFS is and continues to be, the definition of the semantics of all
> > filesystem interfaces.
> > 2. It'd be good if we understood more about what accidental features of
> the
> > FS code depends on. e.g. does anything rely on mkdirs() being atomic? Of
> > 0x00 being a valid char in a filename? How do programs fail when
> blocksize
> > is too small (try setting it to 1 and see how pig reacts)? How much code
> > depends on close() being near-instantaneous and never failing? Blobstores
> > do their write then, and can break both these requirements -which is
> > something a mock FS could add atop file:
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: In-Memory Reference FS implementations

2014-03-06 Thread Steve Loughran
On 6 March 2014 16:37, Jay Vyas  wrote:

> As part of HADOOP-9361, im visioning this.
>
> 1) - We create In Memory FS implementation of different Reference
> FileSystems, each of which specifies appropriate tests , and passes those
> tests , i.e.
>
>InMemStrictlyConsistentFS (i.e. hdfs)
>

HDFS is the filesystem semantics expected by applications -indeed, it is
actually stricter than NFS in terms of its consistency model.

MiniHDFSCluster implements this today -and provides the RPC needed for
forked apps to access it.

For example, here's a test that uses YARN to bring up a forked process
bonded to HDFS mini cluster -a process that then starts HBase instances
talking to HDFS

https://github.com/hortonworks/hoya/blob/develop/hoya-core/src/test/groovy/org/apache/hoya/yarn/cluster/live/TestHBaseMasterOnHDFS.groovy



   InMemEventuallyConsistentFS (blob stores)
>InMemMinmalFS (a very minimal gaurantee FS, for maybe
>
> The beauty of this is - it gives us simple, easily testable reference
> implementations that we can base our complex real world file system unit
> tests off of.
>
>
I can see the merits of the Blobstore one, so as to demonstrate its
failings.

Thinking about it, we are mostly there already, because there's a mock impl
of the org.apache.hadoop.fs.s3native.NativeFileSystemStore interface used
behind the s3n:// class

hadoop-trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/s3native/InMemoryNativeFileSystemStore.java

We could enhance this to give it lower guarantees (AWS-US-east-no
guarantees, US-west: create-consistency), and allow a period of time before
new actions become visible, where actions are: create, delete, overwrite.

We could also allow its methods to take time and maybe fail, so emulating
the storeFile() operation, amongst others. Failure simulation would be
nice.




> 2) Then, downstream vendors can just "pick" which of these file systems
> they are most close to, and modify their particular file system to declare
> semantics using the matching FS as a template.
>
>
they get to implement an FS that works like HDFS. If the semantics << HDFS,
well, that's not a filesystem, irrespective of what methods it implements.


The blobstore marker interface is intended to cover that, to warn that
"this is not a real filesystem" -a marker applications can use to assert
that it isn't a "FileSystem" by the standard definition of one -and that
all guarantees are lost.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: FileSystem and FileContext Janitor, at your service !

2014-03-06 Thread Jay Vyas
I think that is the purpose of the bigtop smoke tests, not the filesystem
smoke tests. right?


On Thu, Mar 6, 2014 at 12:51 PM, Steve Loughran wrote:

> I was thinking to test YARN-hosted apps like MapReduce, we need to see how
> they handle filesystems with different consistency/atomicity models, and
> YARN -even MiniYARNCluster -forks things off.
>
> If the MR commit logic is isolated, that could be tested in the JUnit JVM.
> But for other applications -example: Tez, its probably too complex to mock
>
>
>
>
> On 6 March 2014 16:17, Jay Vyas  wrote:
>
> > steve you mentioned:
> >
> > >> but to test YARN it has to be visible across processes.
> >
> > What do you mean by "test yarn"?   I think for the FileSystem APIs unit
> > testing, we dont care about YARN, do we?
> >
> >
> >
> >
> >
> > On Thu, Mar 6, 2014 at 6:02 AM, Steve Loughran  > >wrote:
> >
> > > On 5 March 2014 19:07, Jay Vyas  wrote:
> > >
> > > > Hi HCFS Community :)
> > > >
> > > > This is Jay...  Some of you know me I hack on a broad range of
> file
> > > > system and hadoop ecosystem interoperability stuff.  I just wanted to
> > > > introduce myself and let you folks know im going to be working to
> help
> > > > clean up the existing unit testing frameworks for the FileSystem and
> > > > FileContext APIs.  I've listed some bullets below .
> > > >
> > > > - byte code inspection based code coverage for file system APIs with
> a
> > > tool
> > > > such as corbertura.
> > > >
> > > > - HADOOP-9361 points out that there are many different types of file
> > > > systems.
> > > >
> > > >
> > > It adds a lot more structure to the tests with an XML declaration of
> each
> > > FS (in the -test) JAR.
> > >
> > > It's pretty much complete except for some discrepancies between file://
> > and
> > > hdfs that I need to fix in file:
> > > -handling of mkdirs if the destination exists and is a file (currently:
> > > returns 0)
> > > -seek() on a closed stream. Currently appears to work,  at least on
> OS/X.
> > >
> > >
> > > > - Creating mock file systems which can be used to validate API tests,
> > > which
> > > > emulate different FS semantics (atomic directory creation, eventual
> > > > consistency, strict consistency, POSIX compliance, append support,
> > > etc...)
> > > >
> > >
> > > That's an interesting thought, adding some inconsistency semantics on
> top
> > > of an existing FS to emulate blobstore
> > > behaviour. How would you do this? A in-memory RAM FS could do some of
> > this,
> > > but to test YARN it has to be visible across processes.
> > > We'd really need an in-ram simulation of semantics that also offered an
> > RPC
> > > API of some form.
> > >
> > >
> > >
> > > >
> > > > Is anyone interested in the above issues or have any opinions on how
> /
> > > > where i should get started?
> > > >
> > > > Our end goal is to have a more transparent and portable set of test
> > APIs
> > > > for the hadoop file system implementors, across the board : so that
> we
> > > can
> > > > all test our individual implementations confidently.
> > > >
> > > > So, anywhere i can lend a hand - let me know.  I think this effort
> will
> > > > require all of us in the file system community to join forces, and it
> > > will
> > > > benefit us all immensly in the long run as well.
> > > >
> > > >
> > > I should do another '9361 patch, once I get those final quirks in
> file://
> > > sorted out so that it is consistent with HDFS.
> > > 1. HDFS is and continues to be, the definition of the semantics of all
> > > filesystem interfaces.
> > > 2. It'd be good if we understood more about what accidental features of
> > the
> > > FS code depends on. e.g. does anything rely on mkdirs() being atomic?
> Of
> > > 0x00 being a valid char in a filename? How do programs fail when
> > blocksize
> > > is too small (try setting it to 1 and see how pig reacts)? How much
> code
> > > depends on close() being near-instantaneous and never failing?
> Blobstores
> > > do their write then, and can break both these requirements -which is
> > > something a mock FS could add atop file:
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> > >
> >
> >
> >
> > --
> > Jay Vyas
> > http://jayunit100.blogspot.com
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,

[jira] [Created] (HDFS-6067) TestPread.testMaxOutHedgedReadPool is flaky

2014-03-06 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-6067:
--

 Summary: TestPread.testMaxOutHedgedReadPool is flaky
 Key: HDFS-6067
 URL: https://issues.apache.org/jira/browse/HDFS-6067
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


TestPread.testMaxOutHedgedReadPool is flaky, giving assertions like this:

{code}
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at 
org.apache.hadoop.hdfs.TestPread.testMaxOutHedgedReadPool(TestPread.java:284)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: In-Memory Reference FS implementations

2014-03-06 Thread Jay Vyas
do you consider that native S3 FS  a real "reference implementation" for
blob stores? or just something that , by mere chance, we are able to use as
a ref. impl.


Re: In-Memory Reference FS implementations

2014-03-06 Thread Colin McCabe
NetFlix's Apache-licensed S3mper system provides consistency for an
S3-backed store.
http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html

It would be nice to see this or something like it integrated with
Hadoop.  I fear that a lot of applications are not ready for eventual
consistency, and may never be, leading to the feeling that Hadoop on
S3 is buggy.

Colin

On Thu, Mar 6, 2014 at 10:42 AM, Jay Vyas  wrote:
> do you consider that native S3 FS  a real "reference implementation" for
> blob stores? or just something that , by mere chance, we are able to use as
> a ref. impl.


Re: In-Memory Reference FS implementations

2014-03-06 Thread Jay Vyas
Thanks Colin: that's a good example of why we want To unify the hcfs test 
profile.  So how can  hcfs implementations use current hadoop-common tests?

In mind there are three ways.

- one solution is to manually cobble together and copy tests , running them one 
by one and seeing which ones apply to their fs.  this is what I think we do now 
(extending base contract, main operations tests, overriding some methods, ..).

- another solution is that all hadoop filesystems should conform to one exact 
contract.  Is that a pipe dream? Or is it possible?

- a third solution. Is that we could use a declarative API where file system 
implementations declare which tests or groups of tests they don't want to run.  
 That is basically hadoop-9361

- The third approach could be complimented by barebones, simple in-memory 
curated reference implementations that exemplify distilled filesystems with 
certain salient properties (I.e. Non atomic mkdirs) 

 
> On Mar 6, 2014, at 1:47 PM, Colin McCabe  wrote:
> 
> NetFlix's Apache-licensed S3mper system provides consistency for an
> S3-backed store.
> http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html
> 
> It would be nice to see this or something like it integrated with
> Hadoop.  I fear that a lot of applications are not ready for eventual
> consistency, and may never be, leading to the feeling that Hadoop on
> S3 is buggy.
> 
> Colin
> 
>> On Thu, Mar 6, 2014 at 10:42 AM, Jay Vyas  wrote:
>> do you consider that native S3 FS  a real "reference implementation" for
>> blob stores? or just something that , by mere chance, we are able to use as
>> a ref. impl.


Re: In-Memory Reference FS implementations

2014-03-06 Thread Steve Loughran
EMR's S3 does extra things, which is why netflix used injection tricks to
add theirs on top.

For blobstores, key use cases are

   1. -general source of low-rate-of-change artifacts
   2. -input for analysis jobs
   3. -output from them
   4. -chained operations
   5. storage of data to outlive the EMR cluster

#1 isn't a problem assuming the velocity of the artifacts is pretty low.

#2 -OK for data written "a while" earlier, provided there isn't an ongoing
partition.

#3 -speculation relies on atomic rename that fails if dest dir exists.
Blobstores don't have this and do rename as
   (i): check
  (iii) create root path
  (iii) copy of individual items below path
  (iv) delete of source.

The race between (i) and (ii) exists, and if the object store doesn't even
do create consistency (e.g AWS S3 US-East, but not the others [1]). This
means there's a risk of two committing reducers mixing outputs (risk low,
requires both processes to commit simultaneously)

#4 is trouble -anything waiting for one MR job to finish may start when it
finishes, but when job #2 kicks off and does an of the dir/path listing
methods, it may get an incomplete list of children -and hence, incomplete
list of output files.

That's the trouble. If people follow the best practise -HDFS for
intermediate work, S3 for final output, all is well. Netflix use S3 as the
output of all work, so they can schedule analytics on any Hadoop cluster
they have, and at the scale they run at they hit this problem. Other people
may have -just not noticed.

" I fear that a lot of applications are not ready for eventual
consistency, and may never be"

Exactly: i have code that uses HDFS to co-ordinate, and will never work on
an object store that doesn't have atomic/consistent ops

"leading to the feeling that Hadoop on S3 is buggy"

https://issues.apache.org/jira/browse/HADOOP-9577  -filed by someone @amazon


-Steve

HADOOP-9565 says "add a marker" :
https://issues.apache.org/jira/browse/HADOOP-9565

HADOOP-10373 goes further and says "move the s3 & s3n code into
hadoop-tools/hadoop-aws

https://issues.apache.org/jira/browse/HADOOP-10373

This will make it possible to swap in versions compiled against the same
Hadoop release, without having to build your own hadoop JARs

steve

(who learned too much about object stores  and the FileSystem class while
doing the swift:// coding)


[1] http://aws.amazon.com/s3/faqs/


On 6 March 2014 18:47, Colin McCabe  wrote:

> NetFlix's Apache-licensed S3mper system provides consistency for an
> S3-backed store.
> http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html
>
> It would be nice to see this or something like it integrated with
> Hadoop.  I fear that a lot of applications are not ready for eventual
> consistency, and may never be, leading to the feeling that Hadoop on
> S3 is buggy.
>
> Colin
>
> On Thu, Mar 6, 2014 at 10:42 AM, Jay Vyas  wrote:
> > do you consider that native S3 FS  a real "reference implementation" for
> > blob stores? or just something that , by mere chance, we are able to use
> as
> > a ref. impl.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: In-Memory Reference FS implementations

2014-03-06 Thread Steve Loughran
Lets get the HADOOP-9361 stuff in (it lives alongside
FileSystemContractBaseTest) and you can work off that.


On 6 March 2014 18:57, Jay Vyas  wrote:

> Thanks Colin: that's a good example of why we want To unify the hcfs test
> profile.  So how can  hcfs implementations use current hadoop-common tests?
>
> In mind there are three ways.
>
> - one solution is to manually cobble together and copy tests , running
> them one by one and seeing which ones apply to their fs.  this is what I
> think we do now (extending base contract, main operations tests, overriding
> some methods, ..).
>

Yes it is. Start there.


>
> - another solution is that all hadoop filesystems should conform to one
> exact contract.  Is that a pipe dream? Or is it possible?
>


No as the nativeFS and hadoop FS
-throw different exceptions
-raise exceptions on seek past end of file at different times (HDFS: on
seek, file:// on read)
-have different illegal filenames (hdfs 2.3+ ".snapshot"). NTFS: "COM1" to
COM9, unless you use the \\.\ unicode prefix
-have different limits on dir size, depth, filename length
-have different case sensitivity

None of these are explicitly in the FileSystem and FileContract APIs, and
nor can they be.



>
> - a third solution. Is that we could use a declarative API where file
> system implementations declare which tests or groups of tests they don't
> want to run.   That is basically hadoop-9361
>
>
it does more,

1.it lets filesystems declare strict vs lax exceptions. Strict: detailed
exceptions, like EOFException. Lax: IOException.
2. by declaring behaviours in an XML file in each filesystems -test.jar,
downstream tests in, say, bigtop, can read in the same details


> - The third approach could be complimented by barebones, simple in-memory
> curated reference implementations that exemplify distilled filesystems with
> certain salient properties (I.e. Non atomic mkdirs)
>
>
> > On Mar 6, 2014, at 1:47 PM, Colin McCabe  wrote:
> >
> > NetFlix's Apache-licensed S3mper system provides consistency for an
> > S3-backed store.
> > http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html
> >
> > It would be nice to see this or something like it integrated with
> > Hadoop.  I fear that a lot of applications are not ready for eventual
> > consistency, and may never be, leading to the feeling that Hadoop on
> > S3 is buggy.
> >
> > Colin
> >
> >> On Thu, Mar 6, 2014 at 10:42 AM, Jay Vyas  wrote:
> >> do you consider that native S3 FS  a real "reference implementation" for
> >> blob stores? or just something that , by mere chance, we are able to
> use as
> >> a ref. impl.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Reopened] (HDFS-6063) TestAclCLI fails intermittently when running test 24: copyFromLocal

2014-03-06 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth reopened HDFS-6063:
-

  Assignee: Chris Nauroth

I'm going to reopen this.  I'm still seeing a few failures that look to be 
different from the HDFS-6058 problem.  For example:

https://builds.apache.org/job/PreCommit-HDFS-Build/6325/


> TestAclCLI fails intermittently when running test 24: copyFromLocal
> ---
>
> Key: HDFS-6063
> URL: https://issues.apache.org/jira/browse/HDFS-6063
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Chris Nauroth
>
> TestAclCLI seems to fail intermittently when running Test ID: \[24\]: 
> copyFromLocal: copying file into a directory with a default ACL.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Thinking ahead to 2.4

2014-03-06 Thread Arun C Murthy
Gang,

 Most of the big-ticket items are already in, awesome!

 I'm thinking we could roll out a 2.4 RC in the next 2-3 weeks after we get 
through the list of blockers. Here is a handy link: 
http://s.apache.org/hadoop-2.4-blockers

 If you find more, please set Target Version to 2.4.0 and mark it a blocker. 
I'll try nudging people to start closing these soon, appreciate any help!

thanks,
Arun

 
On Feb 20, 2014, at 3:45 PM, Arun C Murthy  wrote:

> Thanks Azuryy & Suresh. I've updated the roadmap wiki to reflect this.
> 
> Arun
> 
> On Feb 20, 2014, at 2:01 PM, Suresh Srinivas  wrote:
> 
>> Arun,
>> 
>> Some of the previously 2.4 targeted features were made available in 2.3:
>> - Heterogeneous storage support
>> - Datanode cache
>> 
>> The following are being targeted for 2.4:
>> - Use protobuf for fsimge (already in)
>> - ACLs (in trunk. In a week or so, this will be merged to branch-2.4)
>> - Rolling upgrades (last bunch of jiras being worked in feature branch.
>> Will be in 2.4 in around two weeks. Currently testing is in progress)
>> 
>> So HDFS features should be ready in two weeks.
>> 
>> 
>> On Sat, Feb 15, 2014 at 4:47 PM, Azuryy  wrote:
>> 
>>> Hi,
>>> I think you omit some key pieces in 2.4
>>> 
>>> Protobuf fsimage, rolling upgrade are also targeting 2.4
>>> 
>>> 
>>> 
>>> Sent from my iPhone5s
>>> 
 On 2014年2月16日, at 6:59, Arun C Murthy  wrote:
 
 Folks,
 
 With hadoop-2.3 nearly done, I think it's time to think ahead to
>>> hadoop-2.4. I think it was a good idea to expedite release of 2.3 while we
>>> finished up pieces that didn't make it in such as HDFS Caching & Support
>>> for Heterogenous Storage.
 
 Now, most of the key pieces incl. Resource Manager Automatic Failover
>>> (YARN-149), Application History Server (YARN-321) & Application Timeline
>>> Server (YARN-1530) are either complete or very close to done, and I think
>>> we will benefit with an extended test-cycle for 2.4 - similar to what
>>> happened with 2.2. To provide some context: 2.2 went through nearly 6 weeks
>>> of extended testing and it really helped us push out a very stable release.
 
 I think it will be good to create a 2.4 branch ASAP and start testing.
>>> As such, I plan to cut the branch early next week. With this, we should be
>>> good shape sometime to release 2.4 in mid-March.
 
 I've updated https://wiki.apache.org/hadoop/Roadmap to reflect this.
 
 Also, we should start thinking ahead to 2.5 and what folks would like to
>>> see in it. If we continue our 6-week cycles, we could shoot to get that out
>>> in April.
 
 Thoughts?
 
 thanks,
 Arun
 
 
 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/
 
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
>>> to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
>>> that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
>>> immediately
 and delete it from your system. Thank You.
>>> 
>> 
>> 
>> 
>> -- 
>> http://hortonworks.com/download/
>> 
>> -- 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to 
>> which it is addressed and may contain information that is confidential, 
>> privileged and exempt from disclosure under applicable law. If the reader 
>> of this message is not the intended recipient, you are hereby notified that 
>> any printing, copying, dissemination, distribution, disclosure or 
>> forwarding of this communication is strictly prohibited. If you have 
>> received this communication in error, please contact the sender immediately 
>> and delete it from your system. Thank You.
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Created] (HDFS-6068) Disallow snapshot names that are also invalid directory names

2014-03-06 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-6068:
-

 Summary: Disallow snapshot names that are also invalid directory 
names
 Key: HDFS-6068
 URL: https://issues.apache.org/jira/browse/HDFS-6068
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Affects Versions: 2.4.0
Reporter: Andrew Wang


There are a number of restrictions on valid names in HDFS. For example, you 
can't name a directory "." or "..", or something containing a ":".

However, I can happily create a snapshot named "a:b:c", resulting in this:

{code}
-> % hdfs dfs -createSnapshot /home/andrew a:b:c
Created snapshot /home/andrew/.snapshot/a:b:c
-> % hadoop fs -ls /home/andrew/.snapshot
-ls: java.net.URISyntaxException: Relative path in absolute URI: a:b:c
Usage: hadoop fs [generic options] -ls [-d] [-h] [-R] [ ...]
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6069) Quash stack traces when ACLs are disabled

2014-03-06 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-6069:
-

 Summary: Quash stack traces when ACLs are disabled
 Key: HDFS-6069
 URL: https://issues.apache.org/jira/browse/HDFS-6069
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.0
Reporter: Andrew Wang


When ACLs are disabled, I get a lot of stack traces in the namenode log. It'd 
be nice to quash them for less spew.

{noformat}
14/03/06 13:56:53 INFO ipc.Server: IPC Server handler 9 on 8020, call 
org.apache.hadoop.hdfs.protocol.ClientProtocol.getAclStatus from 
127.0.0.1:54988 Call#2 Retry#0: error: 
org.apache.hadoop.hdfs.protocol.AclException: The ACL operation has been 
rejected.  Support for ACLs has been disabled by setting 
dfs.namenode.acls.enabled to false.
org.apache.hadoop.hdfs.protocol.AclException: The ACL operation has been 
rejected.  Support for ACLs has been disabled by setting 
dfs.namenode.acls.enabled to false.
at 
org.apache.hadoop.hdfs.server.namenode.AclConfigFlag.checkForApiCall(AclConfigFlag.java:50)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAclStatus(FSNamesystem.java:7666)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAclStatus(NameNodeRpcServer.java:1341)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAclStatus(ClientNamenodeProtocolServerSideTranslatorPB.java:1259)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6070) Cleanup use of ReadStatistics in DFSInputStream

2014-03-06 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-6070:
-

 Summary: Cleanup use of ReadStatistics in DFSInputStream
 Key: HDFS-6070
 URL: https://issues.apache.org/jira/browse/HDFS-6070
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Trivial
 Attachments: hdfs-6070.patch

Trivial little code cleanup related to DFSInputStream#ReadStatistics to use 
update methods rather than reaching in directly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6071) BlockReaderLocal doesn't return -1 on EOF when doing a zero-length read on a short file

2014-03-06 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-6071:
--

 Summary: BlockReaderLocal doesn't return -1 on EOF when doing a 
zero-length read on a short file
 Key: HDFS-6071
 URL: https://issues.apache.org/jira/browse/HDFS-6071
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


BlockReaderLocal doesn't return -1 on EOF when doing a zero-length read on a 
short file.  Specifically, if the file is shorter than the readahead buffer, or 
if the position is nearer to the end than the length of the readahead buffer, 
this may happen.  This is mainly a concern because libhdfs relies on this to 
determine whether it should use direct reads.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6072) Clean up dead code of FSImage

2014-03-06 Thread Haohui Mai (JIRA)
Haohui Mai created HDFS-6072:


 Summary: Clean up dead code of FSImage
 Key: HDFS-6072
 URL: https://issues.apache.org/jira/browse/HDFS-6072
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


After HDFS-5698 HDFS store FSImage in protobuf format. The old code of saving 
the FSImage is now dead, which should be removed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6073) NameNodeResourceChecker prints 'null' mount point to the log

2014-03-06 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created HDFS-6073:
---

 Summary: NameNodeResourceChecker prints 'null' mount point to the 
log
 Key: HDFS-6073
 URL: https://issues.apache.org/jira/browse/HDFS-6073
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.3.0
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA


If the available space on the volume used for saving fsimage is less than 100MB 
(default), NameNodeResourceChecker prints the log as follows:
{code}
Space available on volume 'null' is 92274688, which is below the configured 
reserved amount 104857600
{code}
It should print an appropriate mount point instead of null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6074) Under replicated blocks with ONE replica should get replication priority over blocks with more than one replica.

2014-03-06 Thread Mike George (JIRA)
Mike George created HDFS-6074:
-

 Summary: Under replicated blocks with ONE replica should get 
replication priority over blocks with more than one replica.
 Key: HDFS-6074
 URL: https://issues.apache.org/jira/browse/HDFS-6074
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.2.0
Reporter: Mike George
Priority: Minor


We had two nodes fail at the same time.   There were over 2000 blocks at higher 
risk of causing corrupt files since they were single replica.   There were over 
4 under replicated blocks with most having two replicas and the namenode 
priority to recreate missing replicas clearly placed no priority of single 
replica blocks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: In-Memory Reference FS implementations

2014-03-06 Thread Jay Vyas
Thanks steve.  So i guess the conclusion is

1) Wait on HADOOP-9361.

2) There definitively cannot be a strict contract for a single HCFS, based
on your examples shown.

In the meantime ill audit existing test coverage, and let me know if i can
lend a hand in the cleanup process.




On Thu, Mar 6, 2014 at 4:01 PM, Steve Loughran wrote:

> Lets get the HADOOP-9361 stuff in (it lives alongside
> FileSystemContractBaseTest) and you can work off that.
>
>
> On 6 March 2014 18:57, Jay Vyas  wrote:
>
> > Thanks Colin: that's a good example of why we want To unify the hcfs test
> > profile.  So how can  hcfs implementations use current hadoop-common
> tests?
> >
> > In mind there are three ways.
> >
> > - one solution is to manually cobble together and copy tests , running
> > them one by one and seeing which ones apply to their fs.  this is what I
> > think we do now (extending base contract, main operations tests,
> overriding
> > some methods, ..).
> >
>
> Yes it is. Start there.
>
>
> >
> > - another solution is that all hadoop filesystems should conform to one
> > exact contract.  Is that a pipe dream? Or is it possible?
> >
>
>
> No as the nativeFS and hadoop FS
> -throw different exceptions
> -raise exceptions on seek past end of file at different times (HDFS: on
> seek, file:// on read)
> -have different illegal filenames (hdfs 2.3+ ".snapshot"). NTFS: "COM1" to
> COM9, unless you use the \\.\ unicode prefix
> -have different limits on dir size, depth, filename length
> -have different case sensitivity
>
> None of these are explicitly in the FileSystem and FileContract APIs, and
> nor can they be.
>
>
>
> >
> > - a third solution. Is that we could use a declarative API where file
> > system implementations declare which tests or groups of tests they don't
> > want to run.   That is basically hadoop-9361
> >
> >
> it does more,
>
> 1.it lets filesystems declare strict vs lax exceptions. Strict: detailed
> exceptions, like EOFException. Lax: IOException.
> 2. by declaring behaviours in an XML file in each filesystems -test.jar,
> downstream tests in, say, bigtop, can read in the same details
>
>
> > - The third approach could be complimented by barebones, simple in-memory
> > curated reference implementations that exemplify distilled filesystems
> with
> > certain salient properties (I.e. Non atomic mkdirs)
> >
> >
> > > On Mar 6, 2014, at 1:47 PM, Colin McCabe 
> wrote:
> > >
> > > NetFlix's Apache-licensed S3mper system provides consistency for an
> > > S3-backed store.
> > > http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html
> > >
> > > It would be nice to see this or something like it integrated with
> > > Hadoop.  I fear that a lot of applications are not ready for eventual
> > > consistency, and may never be, leading to the feeling that Hadoop on
> > > S3 is buggy.
> > >
> > > Colin
> > >
> > >> On Thu, Mar 6, 2014 at 10:42 AM, Jay Vyas 
> wrote:
> > >> do you consider that native S3 FS  a real "reference implementation"
> for
> > >> blob stores? or just something that , by mere chance, we are able to
> > use as
> > >> a ref. impl.
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
Jay Vyas
http://jayunit100.blogspot.com


Re: Thinking ahead to 2.4

2014-03-06 Thread Azuryy Yu
Hi Arun,

I just advice remove some sub tasks from blockers, then add Umbrella tasks,
such as:

RM HA  - YARN-149
YARN Generic Application Timeline -  YARN-1530
Generic application history service- YARN-321

Rolling upgrade(will be ready, no blockers currently, I just mention it)  -
HDFS-5535
new ACL - HDFS-3685


I also want to know, which version targeting of Heterogeneous Storage
client API?  Nobody can use this feature without client API(configuration,
FileSystem API)




On Fri, Mar 7, 2014 at 5:40 AM, Arun C Murthy  wrote:

> Gang,
>
>  Most of the big-ticket items are already in, awesome!
>
>  I'm thinking we could roll out a 2.4 RC in the next 2-3 weeks after we
> get through the list of blockers. Here is a handy link:
> http://s.apache.org/hadoop-2.4-blockers
>
>  If you find more, please set Target Version to 2.4.0 and mark it a
> blocker. I'll try nudging people to start closing these soon, appreciate
> any help!
>
> thanks,
> Arun
>
>
> On Feb 20, 2014, at 3:45 PM, Arun C Murthy  wrote:
>
> > Thanks Azuryy & Suresh. I've updated the roadmap wiki to reflect this.
> >
> > Arun
> >
> > On Feb 20, 2014, at 2:01 PM, Suresh Srinivas 
> wrote:
> >
> >> Arun,
> >>
> >> Some of the previously 2.4 targeted features were made available in 2.3:
> >> - Heterogeneous storage support
> >> - Datanode cache
> >>
> >> The following are being targeted for 2.4:
> >> - Use protobuf for fsimge (already in)
> >> - ACLs (in trunk. In a week or so, this will be merged to branch-2.4)
> >> - Rolling upgrades (last bunch of jiras being worked in feature branch.
> >> Will be in 2.4 in around two weeks. Currently testing is in progress)
> >>
> >> So HDFS features should be ready in two weeks.
> >>
> >>
> >> On Sat, Feb 15, 2014 at 4:47 PM, Azuryy  wrote:
> >>
> >>> Hi,
> >>> I think you omit some key pieces in 2.4
> >>>
> >>> Protobuf fsimage, rolling upgrade are also targeting 2.4
> >>>
> >>>
> >>>
> >>> Sent from my iPhone5s
> >>>
>  On 2014年2月16日, at 6:59, Arun C Murthy  wrote:
> 
>  Folks,
> 
>  With hadoop-2.3 nearly done, I think it's time to think ahead to
> >>> hadoop-2.4. I think it was a good idea to expedite release of 2.3
> while we
> >>> finished up pieces that didn't make it in such as HDFS Caching &
> Support
> >>> for Heterogenous Storage.
> 
>  Now, most of the key pieces incl. Resource Manager Automatic Failover
> >>> (YARN-149), Application History Server (YARN-321) & Application
> Timeline
> >>> Server (YARN-1530) are either complete or very close to done, and I
> think
> >>> we will benefit with an extended test-cycle for 2.4 - similar to what
> >>> happened with 2.2. To provide some context: 2.2 went through nearly 6
> weeks
> >>> of extended testing and it really helped us push out a very stable
> release.
> 
>  I think it will be good to create a 2.4 branch ASAP and start testing.
> >>> As such, I plan to cut the branch early next week. With this, we
> should be
> >>> good shape sometime to release 2.4 in mid-March.
> 
>  I've updated https://wiki.apache.org/hadoop/Roadmap to reflect this.
> 
>  Also, we should start thinking ahead to 2.5 and what folks would like
> to
> >>> see in it. If we continue our 6-week cycles, we could shoot to get
> that out
> >>> in April.
> 
>  Thoughts?
> 
>  thanks,
>  Arun
> 
> 
>  --
>  Arun C. Murthy
>  Hortonworks Inc.
>  http://hortonworks.com/
> 
> 
> 
>  --
>  CONFIDENTIALITY NOTICE
>  NOTICE: This message is intended for the use of the individual or
> entity
> >>> to
>  which it is addressed and may contain information that is
> confidential,
>  privileged and exempt from disclosure under applicable law. If the
> reader
>  of this message is not the intended recipient, you are hereby notified
> >>> that
>  any printing, copying, dissemination, distribution, disclosure or
>  forwarding of this communication is strictly prohibited. If you have
>  received this communication in error, please contact the sender
> >>> immediately
>  and delete it from your system. Thank You.
> >>>
> >>
> >>
> >>
> >> --
> >> http://hortonworks.com/download/
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or
> entity to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader
> >> of this message is not the intended recipient, you are hereby notified
> that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> immediately
> >> and delete it from your system. Thank You.
> >
> > --
> > Arun C. Murthy
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> >
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.

Re: Thinking ahead to 2.4

2014-03-06 Thread Azuryy Yu
Hi,
Sorry ignore my inputs.

Just keep sub tasks as blocker, because the whole Umbrella tasks may not
block release.


On Fri, Mar 7, 2014 at 9:44 AM, Azuryy Yu  wrote:

> Hi Arun,
>
> I just advice remove some sub tasks from blockers, then add Umbrella
> tasks, such as:
>
> RM HA  - YARN-149
> YARN Generic Application Timeline -  YARN-1530
> Generic application history service- YARN-321
>
> Rolling upgrade(will be ready, no blockers currently, I just mention it)
>  - HDFS-5535
> new ACL - HDFS-3685
>
>
> I also want to know, which version targeting of Heterogeneous Storage
> client API?  Nobody can use this feature without client API(configuration,
> FileSystem API)
>
>
>
>
> On Fri, Mar 7, 2014 at 5:40 AM, Arun C Murthy  wrote:
>
>> Gang,
>>
>>  Most of the big-ticket items are already in, awesome!
>>
>>  I'm thinking we could roll out a 2.4 RC in the next 2-3 weeks after we
>> get through the list of blockers. Here is a handy link:
>> http://s.apache.org/hadoop-2.4-blockers
>>
>>  If you find more, please set Target Version to 2.4.0 and mark it a
>> blocker. I'll try nudging people to start closing these soon, appreciate
>> any help!
>>
>> thanks,
>> Arun
>>
>>
>> On Feb 20, 2014, at 3:45 PM, Arun C Murthy  wrote:
>>
>> > Thanks Azuryy & Suresh. I've updated the roadmap wiki to reflect this.
>> >
>> > Arun
>> >
>> > On Feb 20, 2014, at 2:01 PM, Suresh Srinivas 
>> wrote:
>> >
>> >> Arun,
>> >>
>> >> Some of the previously 2.4 targeted features were made available in
>> 2.3:
>> >> - Heterogeneous storage support
>> >> - Datanode cache
>> >>
>> >> The following are being targeted for 2.4:
>> >> - Use protobuf for fsimge (already in)
>> >> - ACLs (in trunk. In a week or so, this will be merged to branch-2.4)
>> >> - Rolling upgrades (last bunch of jiras being worked in feature branch.
>> >> Will be in 2.4 in around two weeks. Currently testing is in progress)
>> >>
>> >> So HDFS features should be ready in two weeks.
>> >>
>> >>
>> >> On Sat, Feb 15, 2014 at 4:47 PM, Azuryy  wrote:
>> >>
>> >>> Hi,
>> >>> I think you omit some key pieces in 2.4
>> >>>
>> >>> Protobuf fsimage, rolling upgrade are also targeting 2.4
>> >>>
>> >>>
>> >>>
>> >>> Sent from my iPhone5s
>> >>>
>>  On 2014年2月16日, at 6:59, Arun C Murthy  wrote:
>> 
>>  Folks,
>> 
>>  With hadoop-2.3 nearly done, I think it's time to think ahead to
>> >>> hadoop-2.4. I think it was a good idea to expedite release of 2.3
>> while we
>> >>> finished up pieces that didn't make it in such as HDFS Caching &
>> Support
>> >>> for Heterogenous Storage.
>> 
>>  Now, most of the key pieces incl. Resource Manager Automatic Failover
>> >>> (YARN-149), Application History Server (YARN-321) & Application
>> Timeline
>> >>> Server (YARN-1530) are either complete or very close to done, and I
>> think
>> >>> we will benefit with an extended test-cycle for 2.4 - similar to what
>> >>> happened with 2.2. To provide some context: 2.2 went through nearly 6
>> weeks
>> >>> of extended testing and it really helped us push out a very stable
>> release.
>> 
>>  I think it will be good to create a 2.4 branch ASAP and start
>> testing.
>> >>> As such, I plan to cut the branch early next week. With this, we
>> should be
>> >>> good shape sometime to release 2.4 in mid-March.
>> 
>>  I've updated https://wiki.apache.org/hadoop/Roadmap to reflect this.
>> 
>>  Also, we should start thinking ahead to 2.5 and what folks would
>> like to
>> >>> see in it. If we continue our 6-week cycles, we could shoot to get
>> that out
>> >>> in April.
>> 
>>  Thoughts?
>> 
>>  thanks,
>>  Arun
>> 
>> 
>>  --
>>  Arun C. Murthy
>>  Hortonworks Inc.
>>  http://hortonworks.com/
>> 
>> 
>> 
>>  --
>>  CONFIDENTIALITY NOTICE
>>  NOTICE: This message is intended for the use of the individual or
>> entity
>> >>> to
>>  which it is addressed and may contain information that is
>> confidential,
>>  privileged and exempt from disclosure under applicable law. If the
>> reader
>>  of this message is not the intended recipient, you are hereby
>> notified
>> >>> that
>>  any printing, copying, dissemination, distribution, disclosure or
>>  forwarding of this communication is strictly prohibited. If you have
>>  received this communication in error, please contact the sender
>> >>> immediately
>>  and delete it from your system. Thank You.
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> http://hortonworks.com/download/
>> >>
>> >> --
>> >> CONFIDENTIALITY NOTICE
>> >> NOTICE: This message is intended for the use of the individual or
>> entity to
>> >> which it is addressed and may contain information that is confidential,
>> >> privileged and exempt from disclosure under applicable law. If the
>> reader
>> >> of this message is not the intended recipient, you are hereby notified
>> that
>> >> any printing, copying, dissemination, distribution, disclosure or
>> >> for