[jira] [Created] (HADOOP-15094) FileSystem

2017-12-06 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15094:
---

 Summary: FileSystem
 Key: HADOOP-15094
 URL: https://issues.apache.org/jira/browse/HADOOP-15094
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Steve Loughran


Discussion around SPARK-22587 highlights how per-fs notions of a canonical URI 
make it hard to determine if a file is on a specific filesystem, or, put 
differently, if two filesystems are equivalent.

You can't reliably use this.getUri == that.getUri as it doesn't handle FQDN == 
unqualified DN, bit you can't do nslookup as HDFS HA doesn't use hosnames.

If {{FileSystem.getCanonicalUri()}} were public, then this could be used to 
compare things consistently.

needs: filesystem.md coverage; contract test (two filesystem instances are 
equal, different filesystems aren't). Or at least: this method never returns 
null.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-13967) S3ABlockOutputStream to support plugin point for different multipart strategies

2017-12-06 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-13967.
-
   Resolution: Duplicate
Fix Version/s: 3.1.0

> S3ABlockOutputStream to support plugin point for different multipart 
> strategies
> ---
>
> Key: HADOOP-13967
> URL: https://issues.apache.org/jira/browse/HADOOP-13967
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 3.1.0
>
>
> For 0-rename commits, we need to delay the final commit of a multipart PUT, 
> instead saving the data needed to build that commit into the s3 bucket.
> This means changes to {{S3ABlockOutputStream}} so that it can support 
> different policies on how to do this, "classic" and "delayed commit".
> Having this self contained means we can test it in isolation of anything else.
> I'm ignoring the old output stream...we will switch to fast output whenever a 
> special destination path is encountered



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-13969) S3A to support commit(path) operation, which commits all pending put commits in a path

2017-12-06 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-13969.
-
   Resolution: Duplicate
 Assignee: Steve Loughran
Fix Version/s: 3.1.0

> S3A to support commit(path) operation, which commits all pending put commits 
> in a path
> --
>
> Key: HADOOP-13969
> URL: https://issues.apache.org/jira/browse/HADOOP-13969
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 3.1.0
>
>
> as well as creating and saving data with a pending-commit, s3a needs to add 
> the actual commit operation.
> this would scan a directory, take its pending commits, read them in and 
> execute them. 
> issue: what to do on failures?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-13968) S3a FS to support "__magic" path for the special "unmaterialized" writes

2017-12-06 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-13968.
-
   Resolution: Duplicate
 Assignee: Steve Loughran
Fix Version/s: 3.1.0

> S3a FS to support "__magic" path for the special "unmaterialized" writes
> 
>
> Key: HADOOP-13968
> URL: https://issues.apache.org/jira/browse/HADOOP-13968
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Fix For: 3.1.0
>
>
> S3AFileSystem to add support for a special path, such as  
> {{.temp_pending_put/}} or similar, which, when used as the base of a path, 
> indicates that the file is actually to be saved to the parent dir, but only 
> via a delayed put commit operation.
> At the same time, we may need blocks on some normal fileIO ops under these 
> dirs, especially rename and delete, as this would cause serious problems 
> including data loss and large bills for pending data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15095) S3a committer factory to warn when default FileOutputFormat committer is created

2017-12-06 Thread Steve Loughran (JIRA)
Steve Loughran created HADOOP-15095:
---

 Summary: S3a committer factory to warn when default 
FileOutputFormat committer is created
 Key: HADOOP-15095
 URL: https://issues.apache.org/jira/browse/HADOOP-15095
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Steve Loughran
Priority: Minor


The S3ACommitterFactory should warn when the classic FileOutputCommitter is 
used (i.e. the client is not configured to use a new one). Something like

"this committer is neither fast nor guaranteed to be correct. See $URL" where 
URL is a pointer to something (wiki? hadoop docs?).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: VOTE] Release Apache Hadoop 2.7.5 (RC0)

2017-12-06 Thread Erik Krogen
+1 (non-binding)

- Verified signatures, MD5, RMD160, SHA* for bin and src tarballs
- Built from source on macOS 10.12.6 and RHEL 6.6
- Ran local HDFS cluster, ran basic commands, verified read and write 
capability.
- Ran 3000 node cluster via Dynamometer and do not see significant performance 
variation from 2.7.4 expectations 

@Brahma, I was able to find HDFS-12831, HADOOP-14881, and HADOOP-14827 in 
CHANGES.txt, but agree with you on the others listed. I was, however, able to 
find all of them in the linked releasenotes.html.

Thanks Konstantin!

Erik

On 12/4/17, 10:50 PM, "Brahma Reddy Battula"  
wrote:

+1  (non-binding), thanks Konstantin for driving this.


--Built from the source
--Installed 3 Node HA Cluster
--Ran basic shell commands
--Verified append/snapshot/truncate
--Ran sample jobs like pi,wordcount


Looks follow commits are missed in changes.txt.

MAPREDUCE-6975
HADOOP-14919
HDFS-12596
YARN-7084
HADOOP-14881
HADOOP-14827
HDFS-12832


--Brahma Reddy Battula

-Original Message-
From: Konstantin Shvachko [mailto:shv.had...@gmail.com] 
Sent: 02 December 2017 10:13
To: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: VOTE] Release Apache Hadoop 2.7.5 (RC0)

Hi everybody,

This is the next dot release of Apache Hadoop 2.7 line. The previous one
2.7.4 was release August 4, 2017.
Release 2.7.5 includes critical bug fixes and optimizations. See more 
details in Release Note:
http://home.apache.org/~shv/hadoop-2.7.5-RC0/releasenotes.html

The RC0 is available at: http://home.apache.org/~shv/hadoop-2.7.5-RC0/

Please give it a try and vote on this thread. The vote will run for 5 days 
ending 12/08/2017.

My up to date public key is available from:
https://dist.apache.org/repos/dist/release/hadoop/common/KEYS

Thanks,
--Konstantin




Re: VOTE] Release Apache Hadoop 2.7.5 (RC0)

2017-12-06 Thread Naganarasimha Garla
Thanks for the release Konstantin.

Verified the following:
- Downloaded the tar on Ubuntu and verified the signatures
- Deployed pseudo cluster
- Sanity checks
- Basic hdfs operations
- Spark PyWordcount & few MR jobs
- Accessed most of the web UI's

when accessing the docs(from the tar) was able to notice :
-  Release Notes, Common, HDFS, MapReduce Changes showing file not
found
-  I observed that changes for all components were not available
for 2.7.4 as well (
http://hadoop.apache.org/docs/r2.7.4/hadoop-project-dist/hadoop-common/CHANGES.txt
)

So not sure whether its missed or not required, else everything else is
fine.

Regards,
+ Naga


On Tue, Dec 5, 2017 at 2:50 PM, Brahma Reddy Battula <
brahmareddy.batt...@huawei.com> wrote:

> +1  (non-binding), thanks Konstantin for driving this.
>
>
> --Built from the source
> --Installed 3 Node HA Cluster
> --Ran basic shell commands
> --Verified append/snapshot/truncate
> --Ran sample jobs like pi,wordcount
>
>
> Looks follow commits are missed in changes.txt.
>
> MAPREDUCE-6975
> HADOOP-14919
> HDFS-12596
> YARN-7084
> HADOOP-14881
> HADOOP-14827
> HDFS-12832
>
>
> --Brahma Reddy Battula
>
> -Original Message-
> From: Konstantin Shvachko [mailto:shv.had...@gmail.com]
> Sent: 02 December 2017 10:13
> To: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org;
> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
> Subject: VOTE] Release Apache Hadoop 2.7.5 (RC0)
>
> Hi everybody,
>
> This is the next dot release of Apache Hadoop 2.7 line. The previous one
> 2.7.4 was release August 4, 2017.
> Release 2.7.5 includes critical bug fixes and optimizations. See more
> details in Release Note:
> http://home.apache.org/~shv/hadoop-2.7.5-RC0/releasenotes.html
>
> The RC0 is available at: http://home.apache.org/~shv/hadoop-2.7.5-RC0/
>
> Please give it a try and vote on this thread. The vote will run for 5 days
> ending 12/08/2017.
>
> My up to date public key is available from:
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Thanks,
> --Konstantin
>


Re: [VOTE] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

2017-12-06 Thread Subramaniam V K
+1.

Skimmed through the design doc and uber patch and seems to be reasonable.

This is a welcome addition especially w.r.t. cloud deployments so thanks to
everyone who worked on this.

On Mon, Dec 4, 2017 at 8:18 PM, Rohith Sharma K S  wrote:

> +1
>
> On Nov 30, 2017 7:26 AM, "Sunil G"  wrote:
>
> > Hi All,
> >
> >
> > Based on the discussion at [1], I'd like to start a vote to merge feature
> > branch
> >
> > YARN-5881 to trunk. Vote will run for 7 days, ending Wednesday Dec 6 at
> > 6:00PM PDT.
> >
> >
> > This branch adds support to configure queue capacity as absolute resource
> > in
> >
> > capacity scheduler. This will help admins who want fine control of
> > resources of queues.
> >
> >
> > Feature development is done at YARN-5881 [2], jenkins build is here
> > (YARN-7510 [3]).
> >
> > All required tasks for this feature are committed. This feature changes
> > RM’s Capacity Scheduler only,
> >
> > and we did extensive tests for the feature in the last couple of months
> > including performance tests.
> >
> >
> > Key points:
> >
> > - The feature is turned off by default, and have to configure absolute
> > resource to enable same.
> >
> > - Detailed documentation about how to use this feature is done as part of
> > [4].
> >
> > - No major performance degradation is observed with this branch work. SLS
> > and UT performance
> >
> > tests are done.
> >
> >
> > There were 11 subtasks completed for this feature.
> >
> >
> > Huge thanks to everyone who helped with reviews, commits, guidance, and
> >
> > technical discussion/design, including Wangda Tan, Vinod Vavilapalli,
> > Rohith Sharma K S, Eric Payne .
> >
> >
> > [1] :
> > http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201711.mbox/%
> > 3CCACYiTuhKhF1JCtR7ZFuZSEKQ4sBvN_n_tV5GHsbJ3YeyJP%2BP4Q%
> > 40mail.gmail.com%3E
> >
> > [2] : https://issues.apache.org/jira/browse/YARN-5881
> >
> > [3] : https://issues.apache.org/jira/browse/YARN-7510
> >
> > [4] : https://issues.apache.org/jira/browse/YARN-7533
> >
> >
> > Regards
> >
> > Sunil and Wangda
> >
>


Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-12-06 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/614/

[Dec 4, 2017 6:40:11 PM] (xiao) HDFS-12396. Webhdfs file system should get 
delegation token from kms
[Dec 4, 2017 8:11:00 PM] (eyang) YARN-6669.  Implemented Kerberos security for 
YARN service framework. 
[Dec 4, 2017 9:14:55 PM] (rkanter) YARN-5594. Handle old RMDelegationToken 
format when recovering RM
[Dec 4, 2017 10:39:43 PM] (mackrorysd) HADOOP-15058. create-release site build 
outputs dummy shaded jars due to
[Dec 5, 2017 5:02:04 AM] (arp) HADOOP-14976. Set HADOOP_SHELL_EXECNAME 
explicitly in scripts.
[Dec 5, 2017 5:30:46 AM] (aajisaka) HADOOP-14985. Remove subversion related 
code from VersionInfoMojo.java.
[Dec 5, 2017 12:58:31 PM] (sunilg) YARN-7586. Application Placement should be 
done before ACL checks in
[Dec 5, 2017 2:11:07 PM] (sunilg) YARN-7092. Render application specific log 
under application tab in new
[Dec 5, 2017 2:23:46 PM] (brahma) HDFS-11751. DFSZKFailoverController daemon 
exits with wrong status code.
[Dec 5, 2017 3:05:41 PM] (stevel) HADOOP-15071 S3a troubleshooting docs to add 
a couple more failure
[Dec 5, 2017 5:20:07 PM] (sunilg) YARN-7438. Additional changes to make 
SchedulingPlacementSet agnostic to
[Dec 5, 2017 7:06:32 PM] (fabbri) HADOOP-14475 Metrics of S3A don't print out 
when enabled. Contributed by
[Dec 5, 2017 9:09:49 PM] (wangda) YARN-7381. Enable the configuration:
[Dec 6, 2017 2:40:33 AM] (aajisaka) HDFS-12889. Router UI is missing robots.txt 
file. Contributed by Bharat
[Dec 6, 2017 4:01:36 AM] (zhengkai.zk) HADOOP-15039. Move 
SemaphoredDelegatingExecutor to hadoop-common.
[Dec 6, 2017 4:21:52 AM] (wwei) YARN-7611. Node manager web UI should display 
container type in
[Dec 6, 2017 4:48:16 AM] (xiao) HDFS-12872. EC Checksum broken when 
BlockAccessToken is enabled.
[Dec 6, 2017 9:52:41 AM] (wwei) YARN-7610. Extend Distributed Shell to support 
launching job with




-1 overall


The following subsystems voted -1:
asflicense findbugs unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
   org.apache.hadoop.yarn.api.records.Resource.getResources() may expose 
internal representation by returning Resource.resources At Resource.java:by 
returning Resource.resources At Resource.java:[line 234] 

Failed junit tests :

   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure170 
   hadoop.hdfs.server.balancer.TestBalancerWithSaslDataTransfer 
   hadoop.hdfs.TestFileChecksum 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure030 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure190 
   hadoop.fs.TestUnbuffer 
   hadoop.hdfs.server.balancer.TestBalancerRPCDelay 
   hadoop.hdfs.TestErasureCodingPolicies 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure 
   hadoop.hdfs.server.namenode.TestDecommissioningStatus 
   hadoop.hdfs.TestReconstructStripedFile 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure140 
   
hadoop.yarn.server.nodemanager.containermanager.launcher.TestContainerLaunch 
   
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestNodeLabelContainerAllocation
 
   hadoop.yarn.client.api.impl.TestAMRMClientOnRMRestart 
   hadoop.mapreduce.v2.app.rm.TestRMContainerAllocator 
   hadoop.mapreduce.v2.TestUberAM 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/614/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/614/artifact/out/diff-compile-javac-root.txt
  [280K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/614/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/614/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/614/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/614/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/614/artifact/out/whitespace-eol.txt
  [8.8M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/614/artifact/out/whitespace-tabs.txt
  [288K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/614/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-

[jira] [Created] (HADOOP-15096) start-build-env.sh can create a docker image that fills up disk

2017-12-06 Thread Addison Higham (JIRA)
Addison Higham created HADOOP-15096:
---

 Summary: start-build-env.sh can create a docker image that fills 
up disk
 Key: HADOOP-15096
 URL: https://issues.apache.org/jira/browse/HADOOP-15096
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 3.1.0
Reporter: Addison Higham


start-build-env.sh has the potential to build an image that can fill up root 
disks by exploding a sparse file.

In my case, the right ingredients are:
Ubuntu 17.04
Docker 17.09.0
AUFS storage driver
userId and groupid with a high number

This happens when building the hadoop-build-${USER_ID} image, specifically in 
the 

{code}
RUN useradd -g ${GROUP_ID} -u ${USER_ID} -k /root -m ${USER_NAME}
{code}

command.

The reason for this:
/var/log/lastlog is a sparse file that pre-reserves based on highest seen UID 
and GID, in my case, those numbers are very high (above 1 billion). Locally, 
this result in a sparse file that reports as 443 GB. However, under docker and 
specifically AUFS, it appears that his file *isn't* sparse and it tries to 
allocate the whole file.

If you start this script and walk away to wait for it to finish, you come back 
to a computer with a completely full disk.

Luckily, the fix is quite easy, simply add the `-l` option to useradd which 
won't create those files



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-15012) Add readahead, dropbehind, and unbuffer to StreamCapabilities

2017-12-06 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen reopened HADOOP-15012:


> Add readahead, dropbehind, and unbuffer to StreamCapabilities
> -
>
> Key: HADOOP-15012
> URL: https://issues.apache.org/jira/browse/HADOOP-15012
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.9.0
>Reporter: John Zhuge
>Assignee: John Zhuge
> Fix For: 3.1.0
>
> Attachments: HADOOP-15012.branch-2.01.patch
>
>
> A split from HADOOP-14872 to track changes that enhance StreamCapabilities 
> class with READAHEAD, DROPBEHIND, and UNBUFFER capability.
> Discussions and code reviews are done in HADOOP-14872.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org