Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2016-07-20 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/100/

[Jul 19, 2016 10:33:28 AM] (varunsaxena) YARN-4996. Make 
TestNMReconnect.testCompareRMNodeAfterReconnect()
[Jul 19, 2016 2:17:58 PM] (junping_du) YARN-5213. Fix a bug in LogCLIHelpers 
which cause
[Jul 19, 2016 5:43:19 PM] (Arun Suresh) Revert "YARN=5181. ClusterNodeTracker: 
add method to get list of nodes
[Jul 19, 2016 5:43:37 PM] (Arun Suresh) YARN-5181. ClusterNodeTracker: add 
method to get list of nodes matching
[Jul 19, 2016 8:49:24 PM] (aajisaka) HDFS-10603. Fix flaky tests in
[Jul 19, 2016 9:46:07 PM] (aajisaka) HDFS-10647. Add a link to HDFS disk 
balancer document in site.xml.
[Jul 19, 2016 10:13:01 PM] (aajisaka) HDFS-10620. StringBuilder created and 
appended even if logging is
[Jul 19, 2016 11:05:48 PM] (aajisaka) HADOOP-12991. Conflicting default ports 
in DelegateToFileSystem.
[Jul 20, 2016 3:15:37 AM] (sjlee) MAPREDUCE-6365. Refactor 
JobResourceUploader#uploadFilesInternal (Chris
[Jul 20, 2016 6:03:58 AM] (Arun Suresh) YARN-5350. Distributed Scheduling: 
Ensure sort order of allocatable




-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-10657) testAclCLI.xml inherit default ACL to dir test should expect mask r-x

2016-07-20 Thread John Zhuge (JIRA)
John Zhuge created HDFS-10657:
-

 Summary: testAclCLI.xml inherit default ACL to dir test should 
expect mask r-x
 Key: HDFS-10657
 URL: https://issues.apache.org/jira/browse/HDFS-10657
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: John Zhuge
Assignee: John Zhuge
Priority: Minor


The following test case should expect {{mask::r-x}} ACL entry instead of 
{{mask::rwx}}:
{code:xml}
  setfacl : check inherit default ACL to dir
  
-fs NAMENODE -mkdir /dir1
-fs NAMENODE -setfacl -m 
default:user:charlie:r-x,default:group:admin:rwx /dir1
-fs NAMENODE -mkdir /dir1/dir2
-fs NAMENODE -getfacl /dir1/dir2
...

  SubstringComparator
  mask::rwx

{code}

But why does it pass? Because the comparator type is {{SubstringComparator}} 
and it matches the wrong line {{default:mask::rwx}} in the output of 
{{getfacl}}:
{noformat}
# file: /dir1/dir2
# owner: jzhuge
# group: supergroup
user::rwx
user:charlie:r-x
group::r-x
group:admin:rwx #effective:r-x
mask::r-x
other::r-x
default:user::rwx
default:user:charlie:r-x
default:group::r-x
default:group:admin:rwx
default:mask::rwx
default:other::r-x
{noformat}

The comparator should match the entire line instead of just substring. Other 
comparators in {{testAclCLI.xml}} have the same problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10658) Reduce JsonFactory instance allocation in StartupProgressServlet

2016-07-20 Thread Yiqun Lin (JIRA)
Yiqun Lin created HDFS-10658:


 Summary: Reduce JsonFactory instance allocation in 
StartupProgressServlet
 Key: HDFS-10658
 URL: https://issues.apache.org/jira/browse/HDFS-10658
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yiqun Lin
Assignee: Yiqun Lin


Now in class {{StartupProgressServlet}}, it will always create a new 
{{JsonFactory}} instance to create a JsonGenerator. The codes:
{code}
  protected void doGet(HttpServletRequest req, HttpServletResponse resp)
  throws IOException {
resp.setContentType("application/json; charset=UTF-8");
StartupProgress prog = NameNodeHttpServer.getStartupProgressFromContext(
  getServletContext());
StartupProgressView view = prog.createView();
JsonGenerator json = new 
JsonFactory().createJsonGenerator(resp.getWriter());
try {
  json.writeStartObject();
  json.writeNumberField(ELAPSED_TIME, view.getElapsedTime());
  json.writeNumberField(PERCENT_COMPLETE, view.getPercentComplete());
  json.writeArrayFieldStart(PHASES);
  ...
{code}
We can reuse the instance and reduce {{JsonFactory instance}} allocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10659) Namenode crashes after Journalnode re-installation in an HA cluster

2016-07-20 Thread Amit Anand (JIRA)
Amit Anand created HDFS-10659:
-

 Summary: Namenode crashes after Journalnode re-installation in an 
HA cluster
 Key: HDFS-10659
 URL: https://issues.apache.org/jira/browse/HDFS-10659
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, journal-node
Affects Versions: 2.7.1
Reporter: Amit Anand


In my environment I am seeing {{Namenodes}} crashing down after 
{{Journalnodes}} are re-installed. We manage multiple clusters and do rolling 
upgrades followed by rolling re-install of each node including master(NN, JN, 
RM, ZK) nodes. When a journal node is re-installed or moved to a new disk/host, 
instead of running {{"initializeSharedEdits"}} command, I copy {{VERSION}} file 
from one of the other {{Journalnode}} and that allows my {{NN}} to start 
writing data to the newly installed {{Journalnode}}.

To acheive quorum for JN and recover unfinalized segments NN during starupt 
creates .tmp files under {{"/jn/current/paxos"}} directory . In 
current implementation "paxos" directry is only created during 
{{"initializeSharedEdits"}} command and if a JN is re-installed the "paxos" 
directory is not created upon JN startup or by NN while writing .tmp files 
which causes NN to crash with following error message:

{code}
192.168.100.16:8485: /disk/1/dfs/jn/Test-Laptop/current/paxos/64044.tmp (No 
such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:221)
at java.io.FileOutputStream.(FileOutputStream.java:171)
at 
org.apache.hadoop.hdfs.util.AtomicFileOutputStream.(AtomicFileOutputStream.java:58)
at 
org.apache.hadoop.hdfs.qjournal.server.Journal.persistPaxosData(Journal.java:971)
at 
org.apache.hadoop.hdfs.qjournal.server.Journal.acceptRecovery(Journal.java:846)
at 
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.acceptRecovery(JournalNodeRpcServer.java:205)
at 
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.acceptRecovery(QJournalProtocolServerSideTranslatorPB.java:249)
at 
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:25435)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
{code}

The current 
[getPaxosFile|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JNStorage.java#L128-L130]
 method simply returns a path to a file under "paxos" directory without 
verifiying its existence. Since "paxos" directoy holds files that are required 
for NN recovery and acheiving JN quorum my proposed solution is to add a check 
to "getPaxosFile" method and create the "paxos" directory if it is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10660) Expose storage policy apis via HDFSAdmin interface

2016-07-20 Thread Rakesh R (JIRA)
Rakesh R created HDFS-10660:
---

 Summary: Expose storage policy apis via HDFSAdmin interface
 Key: HDFS-10660
 URL: https://issues.apache.org/jira/browse/HDFS-10660
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Rakesh R
Assignee: Rakesh R


Presently, {{org.apache.hadoop.hdfs.client.HdfsAdmin.java}} interface has only 
{{#setStoragePolicy()}} API exposed. This jira is to add the following set of 
apis into HdfsAdmin.

{code}
HdfsAdmin#unsetStoragePolicy
HdfsAdmin#getStoragePolicy
HdfsAdmin#getAllStoragePolicies
{code}

Thanks [~arpitagarwal] for the offline discussions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-8914) Document HA support in the HDFS HdfsDesign.md

2016-07-20 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved HDFS-8914.
---
Resolution: Fixed

Closing this again.

> Document HA support in the HDFS HdfsDesign.md
> -
>
> Key: HDFS-8914
> URL: https://issues.apache.org/jira/browse/HDFS-8914
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.7.1
> Environment: Documentation page in live
>Reporter: Ravindra Babu
>Assignee: Lars Francke
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-8914.1.patch, HDFS-8914.2.patch
>
>
> Please refer to these two links and correct one of them.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
> The NameNode machine is a single point of failure for an HDFS cluster. If the 
> NameNode machine fails, manual intervention is necessary. Currently, 
> automatic restart and failover of the NameNode software to another machine is 
> not supported.
> http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
> The HDFS High Availability feature addresses the above problems by providing 
> the option of running two redundant NameNodes in the same cluster in an 
> Active/Passive configuration with a hot standby. This allows a fast failover 
> to a new NameNode in the case that a machine crashes, or a graceful 
> administrator-initiated failover for the purpose of planned maintenance.
> Please update hdfsDesign article with same facts to avoid confusion in 
> Reader's mind..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10661) Make MiniDFSCluster AutoCloseable

2016-07-20 Thread Akira Ajisaka (JIRA)
Akira Ajisaka created HDFS-10661:


 Summary: Make MiniDFSCluster AutoCloseable
 Key: HDFS-10661
 URL: https://issues.apache.org/jira/browse/HDFS-10661
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Reporter: Akira Ajisaka


If we make MiniDFSCluster AutoCloseable, we can create MiniDFSCluster instance 
using try-with-resources statement. That way we don't have to shutdown the 
cluster in finally clause every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10662) Optimize UTF8 string/byte conversions

2016-07-20 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-10662:
--

 Summary: Optimize UTF8 string/byte conversions
 Key: HDFS-10662
 URL: https://issues.apache.org/jira/browse/HDFS-10662
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs
Reporter: Daryn Sharp
Assignee: Daryn Sharp


String/byte conversions may take either a Charset instance or its canonical 
name.  One might think a Charset instance would be faster due to avoiding a 
lookup and instantiation of a Charset, but it's not.  The canonical string name 
variants will cache the string encoder/decoder (obtained from a Charset) 
resulting in better performance.

LOG4J2-935 describes a real-world performance boost.  I micro-benched a 
marginal runtime improvement on jdk 7/8.  However for a 16 byte path, using the 
canonical name generated 50% less garbage.  For a 64 byte path, 25% of the 
garbage.  Given the sheer number of times that paths are (re)parsed, the cost 
adds up quickly.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10663) Comparison of two System.nanoTimes methods return values are against standard java recoemmendations.

2016-07-20 Thread Rushabh S Shah (JIRA)
Rushabh S Shah created HDFS-10663:
-

 Summary: Comparison of two System.nanoTimes methods return values 
are against standard java recoemmendations.
 Key: HDFS-10663
 URL: https://issues.apache.org/jira/browse/HDFS-10663
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah


I was chasing a bug where the namenode didn't declare a datanode dead even when 
the last contact time was 2.5 hours before.
Before I could debug, the datanode was re-imaged (all the logs were deleted) 
and the namenode was upgraded to new software.
While debugging, I came across this heartbeat check code where the comparison 
of two System.nanoTime is against the java recommended way.
Here is the hadoop code:
{code:title=DatanodeManager.java|borderStyle=solid}

  /** Is the datanode dead? */
  boolean isDatanodeDead(DatanodeDescriptor node) {
return (node.getLastUpdateMonotonic() <
(monotonicNow() - heartbeatExpireInterval));
  }
{code}

The montonicNow() is calculated as:
{code:title=Time.java|borderStyle=solid}
  public static long monotonicNow() {
final long NANOSECONDS_PER_MILLISECOND = 100;

return System.nanoTime() / NANOSECONDS_PER_MILLISECOND;
  }
{code}

As per javadoc of System.nanoTime, it is clearly stated that we should subtract 
two nano time output 
{noformat}
To compare two nanoTime values

 long t0 = System.nanoTime();
 ...
 long t1 = System.nanoTime();
one should use t1 - t0 < 0, not t1 < t0, because of the possibility of 
numerical overflow.
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10661) Make MiniDFSCluster AutoCloseable

2016-07-20 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge resolved HDFS-10661.
---
Resolution: Duplicate

[~ajisakaa] Looks like a dup of HDFS-10287. Please re-open if you think 
otherwise.

> Make MiniDFSCluster AutoCloseable
> -
>
> Key: HDFS-10661
> URL: https://issues.apache.org/jira/browse/HDFS-10661
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Reporter: Akira Ajisaka
>
> If we make MiniDFSCluster AutoCloseable, we can create MiniDFSCluster 
> instance using try-with-resources statement. That way we don't have to 
> shutdown the cluster in finally clause every time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Apache MSDN Offer is Back

2016-07-20 Thread Ravi Prakash
Thanks Chris!

I did avail of the offer a few months ago, and wasn't able to figure out if
a windows license was also available. I want to run windows inside a
virtual machine on my Linux laptop, for the rare cases that there are
patches that may affect that. Any clue if that is possible?

Thanks
Ravi

On Tue, Jul 19, 2016 at 4:09 PM, Chris Nauroth 
wrote:

> A few months ago, we learned that the offer for ASF committers to get an
> MSDN license had gone away.  I'm happy to report that as of a few weeks
> ago, that offer is back in place.  For more details, committers can check
> out https://svn.apache.org/repos/private/committers and read
> donated-licenses/msdn.txt.
>
> --Chris Nauroth
>


Re: Apache MSDN Offer is Back

2016-07-20 Thread Chris Nauroth
That definitely was possible under the old deal.  You could go through the MSDN 
site and download an iso for various versions of Windows and run it under 
VirtualBox.  The MSDN site also would furnish a license key that you could use 
to activate the machine.

I haven't yet gone through this new process to see if anything has changed in 
the benefits.

--Chris Nauroth

From: Ravi Prakash mailto:ravihad...@gmail.com>>
Date: Wednesday, July 20, 2016 at 12:04 PM
To: Chris Nauroth mailto:cnaur...@hortonworks.com>>
Cc: "common-...@hadoop.apache.org" 
mailto:common-...@hadoop.apache.org>>, 
"hdfs-dev@hadoop.apache.org" 
mailto:hdfs-dev@hadoop.apache.org>>, 
"yarn-...@hadoop.apache.org" 
mailto:yarn-...@hadoop.apache.org>>, 
"mapreduce-...@hadoop.apache.org" 
mailto:mapreduce-...@hadoop.apache.org>>
Subject: Re: Apache MSDN Offer is Back

Thanks Chris!

I did avail of the offer a few months ago, and wasn't able to figure out if a 
windows license was also available. I want to run windows inside a virtual 
machine on my Linux laptop, for the rare cases that there are patches that may 
affect that. Any clue if that is possible?

Thanks
Ravi

On Tue, Jul 19, 2016 at 4:09 PM, Chris Nauroth 
mailto:cnaur...@hortonworks.com>> wrote:
A few months ago, we learned that the offer for ASF committers to get an MSDN 
license had gone away.  I'm happy to report that as of a few weeks ago, that 
offer is back in place.  For more details, committers can check out 
https://svn.apache.org/repos/private/committers and read 
donated-licenses/msdn.txt.

--Chris Nauroth



[jira] [Created] (HDFS-10664) layoutVersion mismatch between Namenode VERSION file and Journalnode VERSION file after cluster upgrade

2016-07-20 Thread Amit Anand (JIRA)
Amit Anand created HDFS-10664:
-

 Summary: layoutVersion mismatch between Namenode VERSION file and 
Journalnode VERSION file after cluster upgrade
 Key: HDFS-10664
 URL: https://issues.apache.org/jira/browse/HDFS-10664
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, hdfs
Affects Versions: 2.7.1
Reporter: Amit Anand


After a cluster is upgraded I see a mismatch in {{layoutVersion}} between NN 
VERSION file and JN VERSION file.

Here is what I see:

Before cluster upgrade:
==
{code}
## Version file from NN current directory
namespaceID=109645726
clusterID=CID-edcb62c5-bc1f-49f5-addb-37827340b5de
cTime=0
storageType=NAME_NODE
blockpoolID=BP-786201894-10.0.100.11-1466026941507
layoutVersion=-60
{code}

{code}
## Version file from JN current directory
namespaceID=109645726
clusterID=CID-edcb62c5-bc1f-49f5-addb-37827340b5de
cTime=0
storageType=JOURNAL_NODE
layoutVersion=-60
{code}

After cluster upgrade:
=
{code}
## Version file from NN current directory
namespaceID=109645726
clusterID=CID-edcb62c5-bc1f-49f5-addb-37827340b5de
cTime=0
storageType=NAME_NODE
blockpoolID=BP-786201894-10.0.100.11-1466026941507
layoutVersion=-63
{code}

{code}
## Version file from JN current directory
namespaceID=109645726
clusterID=CID-edcb62c5-bc1f-49f5-addb-37827340b5de
cTime=0
storageType=JOURNAL_NODE
layoutVersion=-60
{code}

Since {{Namenode}} is what creates {{Journalnode}} {{VERSION}} file during 
{{initializeSharedEdits}}, it should also update the file with correct 
information after the cluster is upgrade and {{hdfs dfsadmin -finalizeUpgrade}} 
has been executed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10665) Provide a way to add a new Journalnode to an existing quorum

2016-07-20 Thread Amit Anand (JIRA)
Amit Anand created HDFS-10665:
-

 Summary: Provide a way to add a new Journalnode to an existing 
quorum
 Key: HDFS-10665
 URL: https://issues.apache.org/jira/browse/HDFS-10665
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: ha, hdfs, journal-node
Reporter: Amit Anand


In current implementation of {{HDFS}} {{HA}} using {{QJOURNAL}} there is no way 
to add a new {{Journalnode(JN)}} to an existing {{JN}} quorum or reinstall a 
failed {{JN}} machine.

The current process to populate {{JN}} directories is:
* Start {{JN}} daemons on multiple machines (usually an odd number 3 or 5)
* Shutdown {{Namenode}}
* Issue {{hdfs namenode -initializeSharedEdits}} - This will populate {{JN}}

After {{JN}} are populated; if a machine, after hardware failure, is 
reinstalled or a new set of machines are added to expand the {{JN}} quorum the 
new {{JN}} machines will not be populated by {{NameNode}} without following the 
current process that is described above. 

The current process causes downtime on a 24x7 operation cluster if {{JN}} needs 
any maintenance. 

Although, one can follow steps given below to work around the issue described 
above:
1. Install a new {{JN}} or reinstall an existing {{JN}} machine.
2. Created the required {{JN}} directory structure
3. Copy {{VERSION}} file from an existing {{JN}} to {{JN's}} {{current}} 
directory
4. Manually create {{paxos}} directory under {{JN's}} {{current}} directory
5. Start the {{JN}} daemon.
6. Add new set of {{JNs}} to {{hdfs-site.xml}} and restart {{NN}}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSS] Official Docker Image at release time

2016-07-20 Thread Tsuyoshi Ozawa
Forwarding this discussion to Klaus.

- Tsuyoshi

On Tue, Jul 19, 2016 at 4:46 PM, Tsuyoshi Ozawa  wrote:
> Hi developers,
>
> Klaus mentioned the availability of an official docker image of Apache
> Hadoop. Is it time that we start to distribute an official docker
> image at release time?
>
> http://mail-archives.apache.org/mod_mbox/hadoop-user/201607.mbox/%3CSG2PR04MB162977CFE150444FA022510FB6370%40SG2PR04MB1629.apcprd04.prod.outlook.com%3E
>
> Thoughts?
>
> Thanks,
> - Tsuyoshi

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-10587) Incorrect offset/length calculation in pipeline recovery causes block corruption

2016-07-20 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved HDFS-10587.
--
   Resolution: Duplicate
Fix Version/s: 2.7.1, 2.6.4

Closing this jira as duplicate of HDFS-4660.


> Incorrect offset/length calculation in pipeline recovery causes block 
> corruption
> 
>
> Key: HDFS-10587
> URL: https://issues.apache.org/jira/browse/HDFS-10587
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Fix For: 2.7.1, 2.6.4
>
> Attachments: HDFS-10587-test.patch, HDFS-10587.001.patch
>
>
> We found incorrect offset and length calculation in pipeline recovery may 
> cause block corruption and results in missing blocks under a very unfortunate 
> scenario. 
> (1) A client established pipeline and started writing data to the pipeline.
> (2) One of the data node in the pipeline restarted, closing the socket, and 
> some written data were unacknowledged.
> (3) Client replaced the failed data node with a new one, initiating block 
> transfer to copy existing data in the block to the new datanode.
> (4) The block is transferred to the new node. Crucially, the entire block, 
> including the unacknowledged data, was transferred.
> (5) The last chunk (512 bytes) was not a full chunk, but the destination 
> still reserved the whole chunk in its buffer, and wrote the entire buffer to 
> disk, therefore some written data is garbage.
> (6) When the transfer was done, the destination data node converted the 
> replica from temporary to rbw, which made its visible length as the length of 
> bytes on disk. That is to say, it thought whatever was transferred was 
> acknowledged. However, the visible length of the replica is different (round 
> up to the next multiple of 512) than the source of transfer. [1]
> (7) Client then truncated the block in the attempt to remove unacknowledged 
> data. However, because the visible length is equivalent of the bytes on disk, 
> it did not truncate unacknowledged data.
> (8) When new data was appended to the destination, it skipped the bytes 
> already on disk. Therefore, whatever was written as garbage was not replaced.
> (9) the volume scanner detected corrupt replica, but due to HDFS-10512, it 
> wouldn’t tell NameNode to mark the replica as corrupt, so the client 
> continued to form a pipeline using the corrupt replica.
> (10) Finally the DN that had the only healthy replica was restarted. NameNode 
> then update the pipeline to only contain the corrupt replica.
> (11) Client continue to write to the corrupt replica, because neither client 
> nor the data node itself knows the replica is corrupt. When the restarted 
> datanodes comes back, their replica are stale, despite they are not corrupt. 
> Therefore, none of the replica is good and up to date.
> The sequence of events was reconstructed based on DataNode/NameNode log and 
> my understanding of code.
> Incidentally, we have observed the same sequence of events on two independent 
> clusters.
> [1]
> The sender has the replica as follows:
> 2016-04-15 22:03:05,066 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW
>   getNumBytes() = 41381376
>   getBytesOnDisk()  = 41381376
>   getVisibleLength()= 41186444
>   getVolume()   = /hadoop-i/data/current
>   getBlockFile()= 
> /hadoop-i/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324
>   bytesAcked=41186444
>   bytesOnDisk=41381376
> while the receiver has the replica as follows:
> 2016-04-15 22:03:05,068 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> Recovering ReplicaBeingWritten, blk_1556997324_1100153495099, RBW
>   getNumBytes() = 41186816
>   getBytesOnDisk()  = 41186816
>   getVisibleLength()= 41186816
>   getVolume()   = /hadoop-g/data/current
>   getBlockFile()= 
> /hadoop-g/data/current/BP-1043567091-10.1.1.1-1343682168507/current/rbw/blk_1556997324
>   bytesAcked=41186816
>   bytesOnDisk=41186816



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10666) Über-jira: Unit tests should not depend on nondeterministic behavior using fixed sleep interval

2016-07-20 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-10666:


 Summary: Über-jira: Unit tests should not depend on 
nondeterministic behavior using fixed sleep interval
 Key: HDFS-10666
 URL: https://issues.apache.org/jira/browse/HDFS-10666
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0-alpha2
Reporter: Mingliang Liu
Assignee: Mingliang Liu


There have been dozens of intermittent failing unit tests because they depend 
on fixed-interval sleep to wait for conditions to reach before assertion. This 
umbrella jira is to replace these sleep statements with:
* {{GenericTestUtils.waitFor()}} to retry the conditions/assertions
* Trigger internal state change of {{MiniDFSCluster}}, e.g. 
{{trigger\{BlockReports,HeartBeats,DeletionReports\}}}
* fails fast if specific exceptions are caught
* _ad-hoc fixes_ (TBD)

p.s. I don't know how closures in Java 8 comes into play but I'd like to see 
any effort.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10667) Report more accurate info about data corruption location

2016-07-20 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-10667:


 Summary: Report more accurate info about data corruption location
 Key: HDFS-10667
 URL: https://issues.apache.org/jira/browse/HDFS-10667
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, hdfs
Reporter: Yongjun Zhang


Per 

https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15376897&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15376897

129.77 report:

{code}
2016-07-13 11:49:01,512 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Receiving blk_1116167880_42906656 src: /10.6.134.229:43844 dest: 
/10.6.129.77:5080
2016-07-13 11:49:01,543 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Checksum error in block blk_1116167880_42906656 from /10.6.134.229:43844
org.apache.hadoop.fs.ChecksumException: Checksum error: 
DFSClient_NONMAPREDUCE_2019484565_1 at 81920 exp: 1352119728 got: -1012279895
at 
org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(Native 
Method)
at 
org.apache.hadoop.util.NativeCrc32.verifyChunkedSumsByteArray(NativeCrc32.java:69)
at 
org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:347)
at 
org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:294)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:421)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:558)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:789)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:917)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:174)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:80)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
at java.lang.Thread.run(Thread.java:745)
2016-07-13 11:49:01,543 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
Exception for blk_1116167880_42906656
java.io.IOException: Terminating due to a checksum error.java.io.IOException: 
Unexpected checksum mismatch while writing blk_1116167880_42906656 from 
/10.6.134.229:43844
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:571)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:789)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:917)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:174)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:80)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:244)
at java.lang.Thread.run(Thread.java:745)
{code}

and

https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15378879&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15378879

{quote}
While verifying only packet, the position mentioned in the checksum exception, 
is relative to packet buffer offset, not the block offset. So 81920 is the 
offset in the exception.
{quote}

Create this jira to report more accurate corruption location information: the 
offset in the file, offset in block, and offset in packet.

See 

https://issues.apache.org/jira/browse/HDFS-10587?focusedCommentId=15387083&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15387083




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10668) Fix intermittently failing UT TestDataNodeMXBean#testDataNodeMXBeanBlockCount

2016-07-20 Thread Mingliang Liu (JIRA)
Mingliang Liu created HDFS-10668:


 Summary: Fix intermittently failing UT 
TestDataNodeMXBean#testDataNodeMXBeanBlockCount
 Key: HDFS-10668
 URL: https://issues.apache.org/jira/browse/HDFS-10668
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Mingliang Liu
Assignee: Mingliang Liu


h6.Error Message
{code}
After delete one file expected:<4> but was:<5>
{code}

h6. Stacktrace
{code}
java.lang.AssertionError: After delete one file expected:<4> but was:<5>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeMXBean.testDataNodeMXBeanBlockCount(TestDataNodeMXBean.java:124)
{code}

Sample failing Jenkins pre-commit built, see 
[here|https://builds.apache.org/job/PreCommit-HDFS-Build/16094/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMXBean/testDataNodeMXBeanBlockCount/].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-10669) error while creating collection in solr

2016-07-20 Thread SHIVADEEP GUNDOJU (JIRA)
SHIVADEEP GUNDOJU created HDFS-10669:


 Summary: error while creating collection in solr
 Key: HDFS-10669
 URL: https://issues.apache.org/jira/browse/HDFS-10669
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: solr cloud mode on apache hadoop
Reporter: SHIVADEEP GUNDOJU


Hello Team,
   I have configured Solr in cloud mode on my apache hadoop 4 node cluster. I 
"have created a collection with name tweets". able to use the collection 
without any issues.

 When I try to create  new collection . I am getting below error but but 
directory gets created under solr in hdfs.  please help

user@Hadoop3:/usr/local/solr_download/solr-5.5.2$ sudo ./bin/solr create -c 
tweets1  -d data_driven_schema_configs

Connecting to ZooKeeper at localhost:9983 ...
Re-using existing configuration directory tweets1

Creating new collection 'tweets1' using command:
http://172.16.16.129:8983/solr/admin/collections?action=CREATE&name=tweets1&numShards=1&replicationFactor=1&maxShardsPerNode=1&collection.configName=tweets1


ERROR: Failed to create collection 'tweets1' due to: 
{172.16.16.129:8983_solr=org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:Error
 from server at http://172.16.16.129:8983/solr: Error CREATEing SolrCore 
'tweets1_shard1_replica1': Unable to create core [tweets1_shard1_replica1] 
Caused by: Illegal pattern component: T}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org