from:"Andrew Wang"

Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk

2015-09-30 Thread Andrew Wang

overhead, HDFS-EC provides data durability through parity data
> blocks.
> > > > >With most EC configurations, the storage overhead is no more than
> 50%.
> > > > >Based on profiling results of production clusters, we decided to
> > > > >support EC with the striped block layout in the first phase, so
> > > > >that small files can be better handled. This means dividing each
> > > > >logical HDFS file block into smaller units (striping cells) and
> > > > >spreading them on a set of DataNodes in round-robin fashion. Parity
> > > > >cells are generated for each stripe of original data cells. We have
> > > > >made changes to NameNode, client, and DataNode to generalize the
> > > > >block concept and handle the mapping between a logical file block
> > > > >and its internal storage blocks. For further details please see the
> > > > >design doc on HDFS-7285.
> > > > >HADOOP-11264 focuses on providing flexible and high-performance
> > > > >codec calculation support.
> > > > >
> > > > >The nightly Jenkins job of the branch has reported several
> > > > >successful runs, and doesn't show new flaky tests compared with
> > > > >trunk. We have posted several versions of the test plan including
> > > > >both unit testing and cluster testing, and have executed most tests
> > > > >in the plan. The most basic functionalities have been extensively
> > > > >tested and verified in several real clusters with different
> > > > >hardware configurations; results have been very stable. We have
> > > > >created follow-on tasks for more advanced error handling and
> > optimization under the umbrella HDFS-8031.
> > > > >We also plan to implement or harden the integration of EC with
> > > > >existing features such as WebHDFS, snapshot, append, truncate,
> > > > >hflush, hsync, and so forth.
> > > > >
> > > > >Development of this feature has been a collaboration across many
> > > > >companies and institutions. I'd like to thank J. Andreina, Takanobu
> > > > >Asanuma, Vinayakumar B, Li Bo, Takuya Fukudome, Uma Maheswara Rao
> > > > >G, Rui Li, Yi Liu, Colin McCabe, Xinwei Qin, Rakesh R, Gao Rui, Kai
> > > > >Sasaki, Walter Su, Tsz Wo Nicholas Sze, Andrew Wang, Yong Zhang,
> > > > >Jing Zhao, Hui Zheng and Kai Zheng for their code contributions and
> > reviews.
> > > > >Andrew and Kai Zheng also made fundamental contributions to the
> > > > >initial design. Rui Li, Gao Rui, Kai Sasaki, Kai Zheng and many
> > > > >other contributors have made great efforts in system testing. Many
> > > > >thanks go to Weihua Jiang for proposing the JIRA, and ATM, Todd
> > > > >Lipcon, Silvius Rus, Suresh, as well as many others for providing
> > helpful feedbacks.
> > > > >
> > > > >Following the community convention, this vote will last for 7 days
> > > > >(ending September 29th). Votes from Hadoop committers are binding
> > > > >but non-binding votes are very welcome as well. And here's my
> > > > >non-binding
> > > +1.
> > > > >
> > > > >Thanks,
> > > > >---
> > > > >Zhe Zhang
> > > >
> > > >
> > >
> >
>

Re: Questions on HDFS-8880

2015-10-06 Thread Andrew Wang

If it's duplicate we should probably back it out, but taking a step back,
is the issue that there isn't good documentation about configuring Metrics2
/ FileSync? I see the API docs, but a user-focused guide on how to
configure Metrics2 would probably be a welcome addition.

HBase has a blog at https://blogs.apache.org/hbase/ this could also be good
content for a blog post.

Best,
Andrew

On Tue, Oct 6, 2015 at 11:12 AM, Allen Wittenauer  wrote:

>
> Folks,
>
> I’ve been looking over HDFS-8880 and it’s various follow-on
> JIRAs.  The intentions are good, but the implementation is
> mostly/effectively a duplicate of the FileSink that’s already part of the
> Hadoop metrics subsystem. (which therefore means it works with all daemons,
> out of the box already).  Reading through HDFS-9114, it’s pretty obvious
> now that users are going to get *very* confused as just what happens when
> they set the “hadoop.metrics.log.file” property.  It’s opening a pandora’s
> box of work, since that property only partially works with one sub-project,
> will show up on the command line of every daemon, and isn’t documented...
>
> I’d like to see this series of patches reverted (they haven’t
> shipped yet, so now is the time!) and effort placed into updating the
> metrics2 FileSink to have whatever functionality is missing.
>
> Thoughts?

Re: [DISCUSS] About the details of JDK-8 support

2015-10-07 Thread Andrew Wang

We've been supporting JDK8 as a runtime for CDH5 for a while now (meaning
the full stack including HBase), so I agree that we're good there.

I'm against dropping JDK7 support though in branch-2. Even bumping
dependency versions scares me, since it often leads to downstream pain. Any
comment about the compatibility of said bump? We need to have very high
confidence if it's targeted for branch-2.

Best,
Andrew

On Wed, Oct 7, 2015 at 2:27 AM, Steve Loughran 
wrote:

>
> > On 7 Oct 2015, at 07:29, Masatake Iwasaki 
> wrote:
> >
> > Thanks for clear summary, Tsuyoshi.
> >
> > I read some related past discussions.
> >
> >  https://wiki.apache.org/hadoop/MovingToJdk7and8
> >  http://search-hadoop.com/m/uOzYtGSiCs1acRnh
> >  http://search-hadoop.com/m/uOzYthdWJqpGdSZ1
> >
> > Though there seems to be no consensus about when to drop java 7 support
> yet,
> > it would not be 2.8 for which the preparation is already started.
> > If the works for making source compatible with java 8 does not result in
> > dropping java 7 support, it would be nice and easy to backport to
> branch-2.
> >
> >
> > > we need to upgrade grizzly to 2.2.16 to use
> > > jersey-test-framework-grizzly2. I’d like to discuss which version we
> > > will target this change. Can we do this in branch-2?
> >
> > At lease, the newest grizzly, jersey and asm seems to support java 7 too
> > and HADOOP-11993 may work in branch-2.
> >
>
> Certainly for trunk, I'm +1 for making the leap. For branch 2, how
> backwards compatible/incompatible is the change?
>
> I think we'd have to test it downstream; I can use slider & spark as test
> builds locally —YARN apps are the failure points. Someone else would have
> to try HBase.
>
> In that world, we could think of having a short-lived branch-2-java-8
> branch, which cherry picks the grizzly changes from trunk, and which we can
> then use for that downstream testing
>
> >
> > Masatake Iwasaki
> >
> >
> > On 10/6/15 09:35, Tsuyoshi Ozawa wrote:
> > > Hi commiters and users of Hadoop stack,
> > >
> > > I’ll share the current status of JDK-8 support here. We can take a
> > > two-step approach to support JDK-8 - runtime-level support and
> > > source-level support.
> > >
> > > About runtime-level support, I’ve tested Hadoop stack with JDK-8  e.g.
> > > MapReduce, Spark, Tez, Flink on YARN and HDFS for a few months. As
> > > long as I tested, it works well completely since JDK-8 doesn’t have
> > > any incompatibility at binary level. We can say Hadoop has supported
> > > JDK8 runtime already. Do you have any concern about this? I’ve not
> > > tested with HBase yet. I need help of HBase community. I think only
> > > problem about runtime is HADOOP-11364, default value of
> > > colntainer-killer of YARN. After fixing the issue, we can declare the
> > > support of JDK on Hadoop Wiki to make it clear for users.
> > > https://wiki.apache.org/hadoop/HadoopJavaVersions
> > >
> > > About source-level, however, we have one big problem - upgrading
> > > dependency of asm and cglib. We need to upgrade all libraries which
> > > depends on asm to support new byte code introduced in JDK8[1]. The
> > > dependencies which uses asm are jersey-server for compile and provide
> > > scope, and cglib for test scope(I checked it with mvn dependency:tree
> > > command). HADOOP-9613 is addressing the problem.
> > >
> > > One complex problem I’ve faced is Jersey depends on grizzly - to
> > > upgrade jersey to 1.19, which supports JDK8,
> > >  we need to upgrade grizzly to 2.2.16 to use
> > > jersey-test-framework-grizzly2. I’d like to discuss which version we
> > > will target this change. Can we do this in branch-2? Should we take
> > > care of HADOOP-11656 and HADOOP-11993 at the same time? I’d also
> > > confirm whether HADOOP-11993 means to remove Jersey, which depends on
> > > asm, or not. I think we can collaborate with Yetus community here.
> > >
> > > Also, another simple problem is that source code cannot be compiled
> > > because javadoc format or variable identifier are illegal(e.g.
> > > HADOOP-12457, HADOOP-11875). I think this can be solved
> > > straightforwardly.
> > >
> > > Please share any concern I’ve missed. The opinions of users are also
> welcome :-)
> > > I'd like to go forward this step by step to make Hadoop user friendly.
> > >
> > > Thanks Steve, Sean, Allen, Robert, Brahma, Akira, Larry, Allen, Andrew
> > > Purtell, Tsz-wo Sze, Sethen and other guys for having lots works about
> > > JDK-8.
> > >
> > > Best regards,
> > > - Tsuyoshi
> > >
> > > [1] http://product.hubspot.com/blog/upgrading-to-java-8-at-scale
> > > [2] http://ispras.linuxbase.org/index.php/Java_API_Compliance_Checker
> >
> >
>
>

Re: [DISCUSS] About the details of JDK-8 support

2015-10-07 Thread Andrew Wang

>
> > On 7 Oct 2015, at 17:23, Andrew Wang  wrote:
> >
> > We've been supporting JDK8 as a runtime for CDH5 for a while now (meaning
> > the full stack including HBase), so I agree that we're good there.
> >
>
>
> with Kerberos on?
>
> Yea, I haven't been that involved with our internal JDK validation
efforts, but I know there have been an assortment of JDK8 bugs related to
Kerberos. Our latest docs currently recommend 1.8.0_40 or above:

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_req_supported_versions.html#concept_pdd_kzf_vp_unique_1

Re: DFSClient got deadlock when close file and failed to renew lease

2015-10-19 Thread Andrew Wang

Hi daniedeng,

Please file a JIRA at https://issues.apache.org/jira/browse/HDFS with
details about your issue, and someone will take a look.

Best,
Andrew

On Sun, Oct 18, 2015 at 6:43 PM, daniedeng(邓飞) 
wrote:

>
>
> --
> daniedeng(邓飞)
>
>
> *发件人：* daniedeng(邓飞) 
> *发送时间：* 2015-10-16 15:44
> *收件人：* hdfs-issues ; u...@hadoop.apache.org
> *主题：* DFSClient got deadlock when close file and failed to renew lease
> Hi,All
> We found a deadlock at our HBase(0.98) cluster(and the Hadoop Version
> is 2.2.0),and it should be HDFS BUG,at the time our network is not stable.
>  below is the stack:
>
>
> *
> Found one Java-level deadlock:
> =
> "MemStoreFlusher.1":
>   waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
>   waiting to lock monitor 0x7ff2e67e16a8 (object 0x000486ce6620, a
> org.apache.hadoop.hdfs.DFSOutputStream),
>   which is held by "MemStoreFlusher.0"
> "MemStoreFlusher.0":
>   waiting to lock monitor 0x7ff27cfa5218 (object 0x0002fae5ebe0, a
> org.apache.hadoop.hdfs.LeaseRenewer),
>   which is held by "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel"
>
> Java stack information for the threads listed above:
> ===
> "MemStoreFlusher.1":
> at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
> - waiting to lock <0x0002fae5ebe0> (a
> org.apache.hadoop.hdfs.LeaseRenewer)
> at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
> at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
> at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
> - locked <0x00055b606cb0> (a org.apache.hadoop.hdfs.DFSOutputStream)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)
> at
> org.apache.hadoop.hbase.io.hfile.AbstractHFileWriter.finishClose(AbstractHFileWriter.java:250)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:402)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
> at
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:78)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
> - locked <0x00059869eed8> (a java.lang.Object)
> at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:812)
> at
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1974)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1795)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1678)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1591)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:472)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:211)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:66)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:238)
> at java.lang.Thread.run(Thread.java:744)
> "LeaseRenewer:hbaseadmin@hbase-ns-gdt-sh-marvel":
> at org.apache.hadoop.hdfs.DFSOutputStream.abort(DFSOutputStream.java:1822)
> - waiting to lock <0x000486ce6620> (a
> org.apache.hadoop.hdfs.DFSOutputStream)
> at
> org.apache.hadoop.hdfs.DFSClient.closeAllFilesBeingWritten(DFSClient.java:780)
> at org.apache.hadoop.hdfs.DFSClient.abort(DFSClient.java:753)
> at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:453)
> - locked <0x0002fae5ebe0> (a org.apache.hadoop.hdfs.LeaseRenewer)
> at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
> at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:298)
> at java.lang.Thread.run(Thread.java:744)
> "MemStoreFlusher.0":
> at org.apache.hadoop.hdfs.LeaseRenewer.addClient(LeaseRenewer.java:216)
> - waiting to lock <0x0002fae5ebe0> (a
> org.apache.hadoop.hdfs.LeaseRenewer)
> at org.apache.hadoop.hdfs.LeaseRenewer.getInstance(LeaseRenewer.java:81)
> at org.apache.hadoop.hdfs.DFSClient.getLeaseRenewer(DFSClient.java:648)
> at org.apache.hadoop.hdfs.DFSClient.endFileLease(DFSClient.java:659)
> at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1882)
> - locked <0x000486ce6620> (a org.apache.hadoop.hdfs.DFSOutputStream)
> at
> org.apache.hadoop.fs.FSDataOutp

Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk

2015-10-19 Thread Andrew Wang

I think our plan thus far has been to target this for 3.0. I'm okay with
putting it in branch-2 if we've given a hard look at compatibility, but
I'll note though that 2.8 is already looking like quite a large release,
and our release bandwidth has been focused on the 2.6 and 2.7 maintenance
releases. Adding another multi-hundred JIRAs to 2.8 might make it too
unwieldy to get out the door. If we bump EC past that, 3.0 might very well
be our next release vehicle. I do plan to revive the 3.0 schedule some time
next year. With EC and JDK8 in a good spot, the only big feature remaining
is classpath isolation.

EC is also a pretty fundamental change to HDFS. Even if it's compatible, in
terms of size and impact it might best belong in a new major release.

Best,
Andrew

On Fri, Oct 16, 2015 at 7:04 PM, Vinayakumar B <
vinayakumarb.apa...@gmail.com> wrote:

> Is anyone else also thinks that feature is ready to goto branch-2  as well?
>
> Its > 2 weeks EC landed on trunk. IMo Looks Its quite stable since then and
> ready to go in branch-2.
>
> -Vinay
> On Oct 6, 2015 12:51 AM, "Zhe Zhang"  wrote:
>
> > Thanks Vinay for capturing the issue and Uma for offering the help.
> >
> > ---
> > Zhe Zhang
> >
> > On Mon, Oct 5, 2015 at 12:19 PM, Gangumalla, Uma <
> uma.ganguma...@intel.com
> > >
> > wrote:
> >
> > > Vinay,
> > >
> > >
> > >  I would merge them as part of HDFS-9182.
> > >
> > > Thanks,
> > > Uma
> > >
> > >
> > >
> > > On 10/5/15, 12:48 AM, "Vinayakumar B"  wrote:
> > >
> > > >Hi Andrew,
> > > > I see CHANGES.txt entries not yet merged from
> CHANGES-HDFS-EC-7285.txt.
> > > >
> > > > Was this intentional?
> > > >
> > > >Regards,
> > > >Vinay
> > > >
> > > >On Wed, Sep 30, 2015 at 9:15 PM, Andrew Wang <
> andrew.w...@cloudera.com>
> > > >wrote:
> > > >
> > > >> Branch has been merged to trunk, thanks again to everyone who worked
> > on
> > > >>the
> > > >> feature!
> > > >>
> > > >> On Tue, Sep 29, 2015 at 10:44 PM, Zhe Zhang 
> > > >>wrote:
> > > >>
> > > >> > Thanks everyone who has participated in this discussion.
> > > >> >
> > > >> > With 7 +1's (5 binding and 2 non-binding), and no -1, this vote
> has
> > > >> passed.
> > > >> > I will do a final 'git merge' with trunk and work with Andrew to
> > merge
> > > >> the
> > > >> > branch to trunk. I'll update on this thread when the merge is
> done.
> > > >> >
> > > >> > ---
> > > >> > Zhe Zhang
> > > >> >
> > > >> > On Thu, Sep 24, 2015 at 11:08 PM, Liu, Yi A 
> > > >>wrote:
> > > >> >
> > > >> > > (Change it to binding.)
> > > >> > >
> > > >> > > +1
> > > >> > > I have been involved in the development and code review on the
> > > >>feature
> > > >> > > branch. It's a great feature and I think it's ready to merge it
> > into
> > > >> > trunk.
> > > >> > >
> > > >> > > Thanks all for the contribution.
> > > >> > >
> > > >> > > Regards,
> > > >> > > Yi Liu
> > > >> > >
> > > >> > >
> > > >> > > -Original Message-
> > > >> > > From: Liu, Yi A
> > > >> > > Sent: Friday, September 25, 2015 1:51 PM
> > > >> > > To: hdfs-dev@hadoop.apache.org
> > > >> > > Subject: RE: [VOTE] Merge HDFS-7285 (erasure coding) branch to
> > trunk
> > > >> > >
> > > >> > > +1 (non-binding)
> > > >> > > I have been involved in the development and code review on the
> > > >>feature
> > > >> > > branch. It's a great feature and I think it's ready to merge it
> > into
> > > >> > trunk.
> > > >> > >
> > > >> > > Thanks all for the contribution.
> > > >> > >
> > > >> > > Regards,
> > > >> > > Yi Liu
> > > >> > >
> > > >> > >
> > > >

Re: [VOTE] Release Apache Hadoop 2.6.2

2015-10-26 Thread Andrew Wang

Thanks all for the quick action on the KEYS file

Steps:
* Reviewed list of changed JIRAs, all in YARN/MR.
* Release notes and CHANGES.txt looked good
* Verified checksums and signature via `gpg --verify`
* Ran `mvn apache-rat:check` on source tarball, passed
* Tarball size looks reasonable
* Built with "-Pdist" from source tarball
* Did some basic filesystem operations
* Ran a pi job, successfully returned a value of 4

Issues:
* Do we have a clean full test run?
* Maybe I'm using GPG wrong, but Sangjin's key isn't hooked into my web of
trust:

-> % gpg --verify hadoop-2.6.2-RC0-src.tar.gz.asc
hadoop-2.6.2-RC0-src.tar.gz
gpg: Signature made Wed 21 Oct 2015 09:16:07 PM PDT using RSA key ID
90348D47
gpg: Good signature from "Sangjin Lee "
gpg: WARNING: This key is not certified with a trusted signature!
gpg:  There is no indication that the signature belongs to the
owner.
Primary key fingerprint: 8B44 A05C 3089 55D1 9195  6559 A5CE E20A 9034 8D47

Overall I'm +1 (binding), but it'd be good to see these two things
addressed. Maybe it's time for an SF key signing party.

Best,
Andrew

On Mon, Oct 26, 2015 at 2:35 PM, Li Lu  wrote:

> Thanks Sangjin for the work! I downloaded the version, built it, and
> successfully ran a few jobs on my local machine in a single node setup. +1
> non-binding.
>
> Li Lu
>
> On Oct 26, 2015, at 14:08, Sangjin Lee  sj...@apache.org>> wrote:
>
> Thanks Vinod! Thanks for the correction on the vote Andrew. Making some
> rookie mistakes here. :)
>
> On Mon, Oct 26, 2015 at 2:06 PM, Vinod Vavilapalli <
> vino...@hortonworks.com<mailto:vino...@hortonworks.com>>
> wrote:
>
> I was helping Sangjin offline with the release.
>
> We briefly discussed the KEYS problem before, but it missed my attention.
>
> I will get his KEYS committed right-away, the release is testable right
> away though.
>
> Regarding the voting period, let’s continue voting for two more days, the
> period also had the weekend during which a lot of people (atleast myself
> and team) didn’t pay attention to this vote.
>
> Thanks
> +Vinod
>
>
> On Oct 26, 2015, at 1:50 PM, Andrew Wang  andrew.w...@cloudera.com>>
> wrote:
>
> Hey Sangjin, did you add your release signing keys to the KEYS file? I
> don't see it here:
>
> https://dist.apache.org/repos/dist/release/hadoop/common/KEYS
>
> Also only PMC votes are binding on releases, so I think we currently
> still
> stand at 0 binding +1s.
>
> On Mon, Oct 26, 2015 at 1:28 PM, Sangjin Lee  sjl...@gmail.com>> wrote:
>
> That makes sense. Thanks for pointing that out. The git commit id is
> 0cfd050febe4a30b1ee1551dcc527589509fb681.
>
> On Mon, Oct 26, 2015 at 12:25 PM, Steve Loughran <
> ste...@hortonworks.com<mailto:ste...@hortonworks.com>>
> wrote:
>
>
> On 22 Oct 2015, at 22:14, Sangjin Lee  sj...@apache.org>> wrote:
>
> Hi all,
>
> I have created a release candidate (RC0) for Hadoop 2.6.2.
>
> The RC is available at:
> http://people.apache.org/~sjlee/hadoop-2.6.2-RC0/
>
> The RC tag in git is: release-2.6.2-RC0
>
>
> Tags can move; we should *never* vote for a release of one.
>
> What is the git commit #  ?
>
> The list of JIRAs committed for 2.6.2:
>
>
>
>
> https://issues.apache.org/jira/browse/YARN-4101?jql=project%20in%20(HADOOP%2C%20HDFS%2C%20YARN%2C%20MAPREDUCE)%20AND%20fixVersion%20%3D%202.6.2
>
> The maven artifacts are staged at
>
>
> https://repository.apache.org/content/repositories/orgapachehadoop-1022/
>
> Please try out the release candidate and vote. The vote will run for 5
> days.
>
> Thanks,
> Sangjin
>
>
>
>
>
>
>

Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk]

2015-11-02 Thread Andrew Wang

Thanks for forking the thread Vinod,

SGTM, though I really do recommend waiting for 2.9 given the current size
of 2.8. I'm not a fan of an "off by default" half-measure, since it doesn't
change our compatibility requirements, and there's some major NN surgery
that can't really be disabled.

If we do find a major user who's backported this to their own branch-2
fork, I agree that's motivation to get it in an upstream release quicker. I
haven't heard anything along these lines though.

On Mon, Nov 2, 2015 at 11:49 AM, Vinod Vavilapalli 
wrote:

> Forking the thread. Started looking at the 2.8 list, various features’
> status and arrived here.
>
> While I understand the pervasive nature of EC and a need for a significant
> bake-in, moving this to a 3.x release is not a good idea. We will surely
> get a 2.8 out this year and, as needed, I can even spend time getting
> started on a 2.9. OTOH, 3.x is long ways off, and given all the
> incompatibilities there, it would be a while before users can get their
> hands on EC if it were to be only on 3.x. At best, this may force sites
> that want EC to backport the entire EC feature to older releases, at worst
> this will be repeat the mess of 0.20 security release forks.
>
> If we think adding this to 2.8 (even if it switched off) is too much risk
> per our original plan, let’s move this to 2.9, there by leaving enough time
> for stability, integration testing and bake-in, and a realistic chance of
> having it end up on users’ clusters soonish.
>
> +Vinod
>
> > On Oct 19, 2015, at 1:44 PM, Andrew Wang 
> wrote:
> >
> > I think our plan thus far has been to target this for 3.0. I'm okay with
> > putting it in branch-2 if we've given a hard look at compatibility, but
> > I'll note though that 2.8 is already looking like quite a large release,
> > and our release bandwidth has been focused on the 2.6 and 2.7 maintenance
> > releases. Adding another multi-hundred JIRAs to 2.8 might make it too
> > unwieldy to get out the door. If we bump EC past that, 3.0 might very
> well
> > be our next release vehicle. I do plan to revive the 3.0 schedule some
> time
> > next year. With EC and JDK8 in a good spot, the only big feature
> remaining
> > is classpath isolation.
> >
> > EC is also a pretty fundamental change to HDFS. Even if it's compatible,
> in
> > terms of size and impact it might best belong in a new major release.
> >
> > Best,
> > Andrew
> >
> > On Fri, Oct 16, 2015 at 7:04 PM, Vinayakumar B <
> > vinayakumarb.apa...@gmail.com> wrote:
> >
> >> Is anyone else also thinks that feature is ready to goto branch-2  as
> well?
> >>
> >> Its > 2 weeks EC landed on trunk. IMo Looks Its quite stable since then
> and
> >> ready to go in branch-2.
> >>
> >> -Vinay
> >> On Oct 6, 2015 12:51 AM, "Zhe Zhang"  wrote:
> >>
> >>> Thanks Vinay for capturing the issue and Uma for offering the help.
> >>>
> >>> ---
> >>> Zhe Zhang
> >>>
> >>> On Mon, Oct 5, 2015 at 12:19 PM, Gangumalla, Uma <
> >> uma.ganguma...@intel.com
> >>>>
> >>> wrote:
> >>>
> >>>> Vinay,
> >>>>
> >>>>
> >>>> I would merge them as part of HDFS-9182.
> >>>>
> >>>> Thanks,
> >>>> Uma
> >>>>
> >>>>
> >>>>
> >>>> On 10/5/15, 12:48 AM, "Vinayakumar B" 
> wrote:
> >>>>
> >>>>> Hi Andrew,
> >>>>> I see CHANGES.txt entries not yet merged from
> >> CHANGES-HDFS-EC-7285.txt.
> >>>>>
> >>>>> Was this intentional?
> >>>>>
> >>>>> Regards,
> >>>>> Vinay
> >>>>>
> >>>>> On Wed, Sep 30, 2015 at 9:15 PM, Andrew Wang <
> >> andrew.w...@cloudera.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Branch has been merged to trunk, thanks again to everyone who worked
> >>> on
> >>>>>> the
> >>>>>> feature!
> >>>>>>
> >>>>>> On Tue, Sep 29, 2015 at 10:44 PM, Zhe Zhang 
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks everyone who has participated in this discussion.
> >>>>>>>
> >>>>>>> With 7 +1's (5 binding and 2 non-binding), and no -1, this v

Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk]

2015-11-02 Thread Andrew Wang

If we use an umbrella JIRA to categorize all the ongoing EC work, that will
let us more easily change the target version later. For instance, if we
decide to bump Phase II out of 2.9, then we just need to change the target
version of the Phase II umbrella rather than all the subtasks.

On Mon, Nov 2, 2015 at 4:26 PM, Zheng, Kai  wrote:

> Yeah, so for the issues we recently resolved on trunk and are addressing
> as follow-on tasks in Phase I, we would label them with "erasure coding"
> and maybe also set the target version as "2.9" for the convenience?
>
> -Original Message-
> From: Jing Zhao [mailto:ji...@apache.org]
> Sent: Tuesday, November 03, 2015 8:04 AM
> To: hdfs-dev@hadoop.apache.org
> Subject: Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge HDFS-7285
> (erasure coding) branch to trunk]
>
> +1 for the plan about Phase I & II.
>
> BTW, maybe out of the scope of this thread, just want to mention we should
> either move the jira under HDFS-8031 or update the jira component as
> "erasure-coding" when making further improvement or fixing bugs in EC. In
> this way it will be easier for later backporting EC to 2.9.
>
> On Mon, Nov 2, 2015 at 3:48 PM, Vinayakumar B <
> vinayakumarb.apa...@gmail.com
> > wrote:
>
> > +1 for the idea.
> > On Nov 3, 2015 07:22, "Zheng, Kai"  wrote:
> >
> > > Sounds good to me. When it's determined to include EC in 2.9
> > > release, it may be good to have a rough release date as Zhe asked,
> > > so accordingly the scope of EC can be discussed out. We still have
> > > quite a few of things as Phase I follow-on tasks to do before EC can
> > > be deployed in a production system. Phase II to develop non-striping
> > > EC for cold data would possibly
> > be
> > > started after that. We might consider to include only Phase I and
> > > leave Phase II for next release according to the rough release date.
> > >
> > > Regards,
> > > Kai
> > >
> > > -Original Message-
> > > From: Gangumalla, Uma [mailto:uma.ganguma...@intel.com]
> > > Sent: Tuesday, November 03, 2015 5:41 AM
> > > To: hdfs-dev@hadoop.apache.org
> > > Subject: Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge
> > > HDFS-7285 (erasure coding) branch to trunk]
> > >
> > > +1 for EC to go into 2.9. Yes, 3.x would be long way to go when we
> > > +plan to
> > > have 2.8 and 2.9 releases.
> > >
> > > Regards,
> > > Uma
> > >
> > > On 11/2/15, 11:49 AM, "Vinod Vavilapalli" 
> > wrote:
> > >
> > > >Forking the thread. Started looking at the 2.8 list, various
> > > >features¹ status and arrived here.
> > > >
> > > >While I understand the pervasive nature of EC and a need for a
> > > >significant bake-in, moving this to a 3.x release is not a good idea.
> > > >We will surely get a 2.8 out this year and, as needed, I can even
> > > >spend time getting started on a 2.9. OTOH, 3.x is long ways off,
> > > >and given all the incompatibilities there, it would be a while
> > > >before users can get their hands on EC if it were to be only on
> > > >3.x. At best, this may force sites that want EC to backport the
> > > >entire EC feature to older releases, at worst this will be repeat
> > > >the mess of 0.20 security release
> > > forks.
> > > >
> > > >If we think adding this to 2.8 (even if it switched off) is too
> > > >much risk per our original plan, let¹s move this to 2.9, there by
> > > >leaving enough time for stability, integration testing and bake-in,
> > > >and a realistic chance of having it end up on users¹ clusters soonish.
> > > >
> > > >+Vinod
> > > >
> > > >> On Oct 19, 2015, at 1:44 PM, Andrew Wang
> > > >>
> > > >>wrote:
> > > >>
> > > >> I think our plan thus far has been to target this for 3.0. I'm
> > > >>okay with  putting it in branch-2 if we've given a hard look at
> > > >>compatibility, but  I'll note though that 2.8 is already looking
> > > >>like quite a large release,  and our release bandwidth has been
> > > >>focused on the 2.6 and 2.7 maintenance  releases. Adding another
> > > >>multi-hundred JIRAs to 2.8 might make it too  unwieldy to get out
> > > >>the door. If we bump EC past that, 3.0 might very well  be our
> > > >>next release vehicle. I

Re: Erasure coding in branch-2 [Was Re: [VOTE] Merge HDFS-7285 (erasure coding) branch to trunk]

2015-11-04 Thread Andrew Wang

We used to get help from Bigtop when it comes to integration testing. Do we
think that's possible for 2.8?

On Wed, Nov 4, 2015 at 10:08 AM, Steve Loughran 
wrote:

>
> > On 2 Nov 2015, at 23:11, Vinod Vavilapalli 
> wrote:
> >
> > Yes, I’ve already started looking at 2.8.0, that is exactly how I ended
> up with this discussion on the state of EC.
> >
> > +Vinod
> >
> >
> > On Nov 2, 2015, at 3:02 PM, Haohui Mai  ricet...@gmail.com>> wrote:
> >
> > Is it a good time to start the discussion on the issues of releasing 2.8?
> >
>
> Before rushing to release 2.8, people should be trying in downstream apps
> today,. As well as identifying hdfs-client related issues, I've just
> discovered that the MiniYARNCluster has added lots of stack traces
> (YARN-4330), and I'm sure there are other regressions.
>
> It's generally not that hard to take a downstream project and try to build
> with a hadoop version of 2.8.0-SNAPSHOT; compilation and classpath problems
> will show up immediately; unit test regressions can be at least identified
> by switching between 2.7.1 and 2.8.0-SNAPSHOT.

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Andrew Wang

Hey Vinod,

I'm fine with the idea of alpha/beta marking in the abstract, but had a
question: do we define these terms in our compatibility policy or
elsewhere? I think it's commonly understood among us developers (alpha
means not fully tested and API unstable, beta means it's not fully tested
but is API stable), but it'd be good to have it written down.

Also I think we've only done alpha/beta tagging at the release-level
previously which is a simpler story to tell users. So it's important for
this release that alpha features set their interface stability annotations
to "evolving". There isn't a corresponding annotation for "interface
quality", but IMO that's overkill.

Thanks,
Andrew

On Wed, Nov 25, 2015 at 11:08 AM, Vinod Kumar Vavilapalli <
vino...@apache.org> wrote:

> This is the current state from the feedback I gathered.
>  - Support priorities across applications within the same queue YARN-1963
> — Can push as an alpha / beta feature per Sunil
>  - YARN-1197 Support changing resources of an allocated container:
> — Can push as an alpha/beta feature per Wangda
>  - YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
> most of it anyways.
> — Can push as an alpha feature.
>  - YARN Timeline Service v1.5 - YARN-4233
> — Should include per Li Lu
>  - YARN Timeline Service Next generation: YARN-2928
> — Per analysis from Sangjin, drop this from 2.8.
>
> One open feature status
>  - HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
>
> Updated the Roadmap wiki with the same.
>
> Thanks
> +Vinod
>
> > On Nov 13, 2015, at 12:12 PM, Sangjin Lee  wrote:
> >
> > I reviewed the current state of the YARN-2928 changes regarding its
> impact
> > if the timeline service v.2 is disabled. It does appear that there are a
> > lot of things that still do get created and enabled unconditionally
> > regardless of configuration. While this is understandable when we were
> > working to implement the feature, this clearly needs to be cleaned up so
> > that when disabled the timeline service v.2 doesn't impact other things.
> >
> > I filed a JIRA for that work:
> > https://issues.apache.org/jira/browse/YARN-4356
> >
> > We need to complete it before we can merge.
> >
> > Somewhat related is the status of the configuration and what it means in
> > various contexts (client/app-side vs. server-side, v.1 vs. v.2, etc.). I
> > know there is an ongoing discussion regarding YARN-4183. We'll need to
> > reflect the outcome of that discussion.
> >
> > My overall impression of whether this can be done for 2.8 is that it
> looks
> > rather challenging given the suggested timeframe. We also need to
> complete
> > several major tasks before it is ready.
> >
> > Sangjin
> >
> >
> > On Wed, Nov 11, 2015 at 5:49 PM, Sangjin Lee  wrote:
> >
> >>
> >> On Wed, Nov 11, 2015 at 12:13 PM, Vinod Vavilapalli <
> >> vino...@hortonworks.com> wrote:
> >>
> >>>— YARN Timeline Service Next generation: YARN-2928: Lots of
> momentum,
> >>> but clearly a work in progress. Two options here
> >>>— If it is safe to ship it into 2.8 in a disable manner, we can
> >>> get the early code into trunk and all the way int o2.8.
> >>>— If it is not safe, it organically rolls over into 2.9
> >>>
> >>
> >> I'll review the changes on YARN-2928 to see what impact it has (if any)
> if
> >> the timeline service v.2 is disabled.
> >>
> >> Another condition for it to make 2.8 is whether the branch will be in a
> >> shape in a couple of weeks such that it adds value for folks that want
> to
> >> test it. Hopefully it will become clearer soon.
> >>
> >> Sangjin
> >>
>
>

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-25 Thread Andrew Wang

SGTM, thanks Vinod! LMK if you need reviews on any of that.

Regarding the release checklist, another item I'd add is updating the
release notes in the project documentation, we've forgotten in the past.

On Wed, Nov 25, 2015 at 2:01 PM, Vinod Kumar Vavilapalli  wrote:

> Tx for your comments, Andrew!
>
> I did talk about it in a few discussions in the past related to this but
> yes, we never codified the feature-level alpha/beta tags. Part of the
> reason why I never pushed for such a codification is that (a) it is a
> subjective decision that the feature contributors usually have the best say
> on and (2) voting on the alpha-ness / beta-ness may not be a productive
> exercise in non-trivial number of cases (as I have seen with the
> release-level tags, some users think an alpha release is of production
> quality enough for _their_ use-cases).
>
> That said, I agree about noting down our general recommendations on what
> an alpha feature means, what a beta feature means etc. Let me file a JIRA
> for this.
>
> The second point you made is absolutely true. Atleast on YARN / MR side, I
> usually end up traversing (some if not all of) alpha features and making
> sure the corresponding APIs are explicitly marked private or public
> unstable / evolving. I do think that there is a lot of value in us  getting
> more systematic with this - how about we do this for the feature list of
> 2.8 and evolve the process?
>
> In general, may be we could have a list of ‘check-list’ JIRAs that we
> always address before every release. Few things already come to my mind:
>  - Mark which features are alpha / beta and make sure the corresponding
> APIs, public interfaces reflect the state
>  - Revise all newly added configuration properties to make sure they
> follow our general naming patterns. New contributors sometimes create
> non-standard properties that we come to regret supporting.
>  - Generate a list of newly added public entry-points and validate that
> they are all indeed meant to be public
>  - [...]
>
> Thoughts?
>
> +Vinod
>
>
> > On Nov 25, 2015, at 11:47 AM, Andrew Wang 
> wrote:
> >
> > Hey Vinod,
> >
> > I'm fine with the idea of alpha/beta marking in the abstract, but had a
> > question: do we define these terms in our compatibility policy or
> > elsewhere? I think it's commonly understood among us developers (alpha
> > means not fully tested and API unstable, beta means it's not fully tested
> > but is API stable), but it'd be good to have it written down.
> >
> > Also I think we've only done alpha/beta tagging at the release-level
> > previously which is a simpler story to tell users. So it's important for
> > this release that alpha features set their interface stability
> annotations
> > to "evolving". There isn't a corresponding annotation for "interface
> > quality", but IMO that's overkill.
> >
> > Thanks,
> > Andrew
> >
> > On Wed, Nov 25, 2015 at 11:08 AM, Vinod Kumar Vavilapalli <
> > vino...@apache.org> wrote:
> >
> >> This is the current state from the feedback I gathered.
> >> - Support priorities across applications within the same queue YARN-1963
> >>— Can push as an alpha / beta feature per Sunil
> >> - YARN-1197 Support changing resources of an allocated container:
> >>— Can push as an alpha/beta feature per Wangda
> >> - YARN-3611 Support Docker Containers In LinuxContainerExecutor: Well
> >> most of it anyways.
> >>— Can push as an alpha feature.
> >> - YARN Timeline Service v1.5 - YARN-4233
> >>— Should include per Li Lu
> >> - YARN Timeline Service Next generation: YARN-2928
> >>— Per analysis from Sangjin, drop this from 2.8.
> >>
> >> One open feature status
> >> - HDFS-8155Support OAuth2 in WebHDFS: Alpha / Early feature?
> >>
> >> Updated the Roadmap wiki with the same.
> >>
> >> Thanks
> >> +Vinod
> >>
> >>> On Nov 13, 2015, at 12:12 PM, Sangjin Lee  wrote:
> >>>
> >>> I reviewed the current state of the YARN-2928 changes regarding its
> >> impact
> >>> if the timeline service v.2 is disabled. It does appear that there are
> a
> >>> lot of things that still do get created and enabled unconditionally
> >>> regardless of configuration. While this is understandable when we were
> >>> working to implement the feature, this clearly needs to be cleaned up
> so
> >>> that when disabled the timeline service v.2 doesn't impact other
> things

Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-30 Thread Andrew Wang

>
>
> maybe discuss having a list @ release time. As an example, s3 and
> encryption at rest shipped in beta stage... what's in 2.8 that "we don't
> yet trust ourselves?".  Me, I'd put erasure coding in there just because
> I've no familiarity with it
>
> Quick clarification, EC isn't scheduled for 2.8. IMO it's still an open
question whether we want to include in any branch-2 release. Elliot
(wearing his Facebook hat) said he'd be hesitant to deploy it because of
the significant NN changes. This might apply to our other big users like
Yahoo or Twitter.

Re: TestDirectoryScanner.testThrottle() Failures

2015-12-16 Thread Andrew Wang

Done

On Wed, Dec 16, 2015 at 4:17 PM, Daniel Templeton 
wrote:

> Would someone please review and commit HDFS-9300 so that the
> testThrottle() test will stop failing.  It's a 2-line patch.
>
> Thanks,
> Daniel
>

Re: [VOTE] Release Apache Hadoop 2.7.2 RC1

2015-12-24 Thread Andrew Wang

My 2c is that we should have monotonicity in releases. That way no
"upgrade" is a regression.

On Wed, Dec 23, 2015 at 10:00 PM, Tsuyoshi Ozawa  wrote:

> Hi Vinod,
>
> thank you for the clarification.
>
> >  - Pull these 16 tickets into 2.7.2 and roll a new RC
> > > What do people think? Do folks expect “any fix in 2.6.3 to be there in
> all releases that get out after 2.6.3 release date (December 16th)”?
>
> I personally prefer to pull these tickets into 2.7.2 since it's
> intuitive for me. I can help to cherrypick these tickets into 2.7.2
> once we decide to do so.
>
> This conflicts happened since the the timings of cutting branches and
> actual release are crossed. We can face these situations usually in
> the future since we have 2 or more branches for stable releases.
> Hence, it's a good time to decide basic policy now.
>
> BTW, should we start to discuss on new thread or continue to discuss here?
>
> Thanks,
> - Tsuyoshi
>
> On Thu, Dec 24, 2015 at 9:47 AM, Vinod Kumar Vavilapalli
>  wrote:
> > I retract my -1. I think we will need to discuss this a bit more.
> >
> > Beyond those two tickets, there are a bunch more (totaling to 16) that
> are in 2.6.3 but *not* in 2.7.2. See this:
> https://issues.apache.org/jira/issues/?jql=key%20in%20%28HADOOP-12526%2CHADOOP-12413%2CHADOOP-11267%2CHADOOP-10668%2CHADOOP-10134%2CYARN-4434%2CYARN-4365%2CYARN-4348%2CYARN-4344%2CYARN-4326%2CYARN-4241%2CYARN-2859%2CMAPREDUCE-6549%2CMAPREDUCE-6540%2CMAPREDUCE-6377%2CMAPREDUCE-5883%2CHDFS-9431%2CHDFS-9289%2CHDFS-8615%29%20and%20fixVersion%20!%3D%202.7.0
> <
> https://issues.apache.org/jira/issues/?jql=key%20in%20(HADOOP-12526,HADOOP-12413,HADOOP-11267,HADOOP-10668,HADOOP-10134,YARN-4434,YARN-4365,YARN-4348,YARN-4344,YARN-4326,YARN-4241,YARN-2859,MAPREDUCE-6549,MAPREDUCE-6540,MAPREDUCE-6377,MAPREDUCE-5883,HDFS-9431,HDFS-9289,HDFS-8615)%20and%20fixVersion%20!=%202.7.0
> >
> >
> > Two options here, depending on the importance of ‘causality' between
> 2.6.x and 2.7.x lines.
> >  - Ship 2.7.2 as we voted on here
> >  - Pull these 16 tickets into 2.7.2 and roll a new RC
> >
> > What do people think? Do folks expect “any fix in 2.6.3 to be there in
> all releases that get out after 2.6.3 release date (December 16th)”?
> >
> > Thanks
> > +Vinod
> >
> >> On Dec 23, 2015, at 12:37 PM, Vinod Kumar Vavilapalli <
> vino...@apache.org> wrote:
> >>
> >> Sigh. Missed this.
> >>
> >> To retain causality ("any fix in 2.6.3 will be there in all releases
> that got out after 2.6.3”), I’ll get these patches in.
> >>
> >> Reverting my +1, and casting -1 for the RC myself.
> >>
> >> Will spin a new RC, this voting thread is marked dead.
> >>
> >> Thanks
> >> +Vinod
> >>
> >>> On Dec 22, 2015, at 8:24 AM, Junping Du  j...@hortonworks.com>> wrote:
> >>>
> >>> However, when I look at our commit log and CHANGES.txt, I found
> something we are missing:
> >>> 1. HDFS-9470 and YARN-4424 are missing from the 2.7.2 branch and RC1
> tag.
> >>> 2. HADOOP-5323, HDFS-8767 are missing in CHANGE.txt
> >>
> >
>

Re: [VOTE] Release Apache Hadoop 2.7.2 RC1

2016-01-08 Thread Andrew Wang

I like monotonic releases since it's simple for users to understand. Is it
difficult to backport to 2.7.x if you're already backporting to 2.6.x? I
don't follow why special casing some class of fixes is desirable.

Also for maintenance releases, aren't all included fixes supposed to be for
serious bugs? Minor JIRAs can wait for the next minor release. If there are
strong reasons to include a minor JIRA in a maintenance release, then maybe
it's not really a minor JIRA.

Best,
Andrew

On Fri, Jan 8, 2016 at 8:43 AM, Akira AJISAKA 
wrote:

> The general rule sounds good to me.
>
> > "any fix in 2.x.y to be there in all 2.b.c releases (while b>=x) that
> get out after 2.x.y release date"
>
> +1
>
> > I would prefer this rule only applies on critical/blocker fixes, but not
> applies on minor/trivial issues.
>
> +1
>
> Thanks,
> Akira
>
>
> On 12/29/15 23:50, Junping Du wrote:
>
>> I am +1 with pulling all of these tickets into 2.7.2.
>>
>> - For “any fix in 2.6.3 to be there in all releases that get out after
>> 2.6.3 release date”
>>
>> Shall we conclude this as a general rule - "any fix in 2.x.y to be there
>> in all 2.b.c releases (while b>=x) that get out after 2.x.y release date"?
>> I am generally fine with this, but just feel it sounds to set too strong
>> restrictions among branches. Some fixes could be trivial (test case fix,
>> etc.) enough to deserve more flexibility. I would prefer this rule only
>> applies on critical/blocker fixes, but not applies on minor/trivial issues.
>>
>> Just 2 cents.
>>
>>
>> Thanks,
>>
>>
>> Junping
>>
>>
>> 
>> From: Vinod Kumar Vavilapalli 
>> Sent: Thursday, December 24, 2015 12:47 AM
>> To: Junping Du
>> Cc: mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org;
>> common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org
>> Subject: Re: [VOTE] Release Apache Hadoop 2.7.2 RC1
>>
>> I retract my -1. I think we will need to discuss this a bit more.
>>
>> Beyond those two tickets, there are a bunch more (totaling to 16) that
>> are in 2.6.3 but *not* in 2.7.2. See this:
>> https://issues.apache.org/jira/issues/?jql=key%20in%20%28HADOOP-12526%2CHADOOP-12413%2CHADOOP-11267%2CHADOOP-10668%2CHADOOP-10134%2CYARN-4434%2CYARN-4365%2CYARN-4348%2CYARN-4344%2CYARN-4326%2CYARN-4241%2CYARN-2859%2CMAPREDUCE-6549%2CMAPREDUCE-6540%2CMAPREDUCE-6377%2CMAPREDUCE-5883%2CHDFS-9431%2CHDFS-9289%2CHDFS-8615%29%20and%20fixVersion%20!%3D%202.7.0
>> <
>> https://issues.apache.org/jira/issues/?jql=key%20in%20(HADOOP-12526,HADOOP-12413,HADOOP-11267,HADOOP-10668,HADOOP-10134,YARN-4434,YARN-4365,YARN-4348,YARN-4344,YARN-4326,YARN-4241,YARN-2859,MAPREDUCE-6549,MAPREDUCE-6540,MAPREDUCE-6377,MAPREDUCE-5883,HDFS-9431,HDFS-9289,HDFS-8615)%20and%20fixVersion%20!=%202.7.0
>> >
>>
>> Two options here, depending on the importance of ‘causality' between
>> 2.6.x and 2.7.x lines.
>>   - Ship 2.7.2 as we voted on here
>>   - Pull these 16 tickets into 2.7.2 and roll a new RC
>>
>> What do people think? Do folks expect “any fix in 2.6.3 to be there in
>> all releases that get out after 2.6.3 release date (December 16th)”?
>>
>> Thanks
>> +Vinod
>>
>> On Dec 23, 2015, at 12:37 PM, Vinod Kumar Vavilapalli > > wrote:
>>
>> Sigh. Missed this.
>>
>> To retain causality ("any fix in 2.6.3 will be there in all releases that
>> got out after 2.6.3”), I’ll get these patches in.
>>
>> Reverting my +1, and casting -1 for the RC myself.
>>
>> Will spin a new RC, this voting thread is marked dead.
>>
>> Thanks
>> +Vinod
>>
>> On Dec 22, 2015, at 8:24 AM, Junping Du > j...@hortonworks.com>> wrote:
>>
>> However, when I look at our commit log and CHANGES.txt, I found something
>> we are missing:
>> 1. HDFS-9470 and YARN-4424 are missing from the 2.7.2 branch and RC1 tag.
>> 2. HADOOP-5323, HDFS-8767 are missing in CHANGE.txt
>>
>>
>>
>

Re: [VOTE] Release Apache Hadoop 2.7.2 RC1

2016-01-11 Thread Andrew Wang

On Mon, Jan 11, 2016 at 7:22 AM, Junping Du  wrote:

> bq.  Is it difficult to backport to 2.7.x if you're already backporting to
> 2.6.x? I don't follow why special casing some class of fixes is desirable.
> It is not difficult to backport the commits between 2.6.x and 2.7.x.
> However, it do *difficult* to track exactly for hundreds of commits between
> them. Taking HDFS-9470 as an example, the committer totally forget to merge
> the commit into 2.7.2 when it is resolved as fixed in 2.7.2. The commit was
> merged into 2.6.3 later but get missed on 2.7.2 RC1. If this is not a
> critical fix, I don't think 2.7.2 should get a new RC to wait this commit
> to land on. That's why classifying on priority of fixes are important and
> desirable when we are facing this situation.
>
> Gotcha, so this in this case it is the exception and not the rule? I'd
still rather the rule be simple, and then exceptions like this addressed on
a case-by-case basis.

Colin also wrote a branch-diff tool that looks at git log, which makes
tracking easier. You can do things like diff 2.6.0 with 2.6.3, 2.7.0 with
2.7.2, and then make sure that the 2.7 diff is a superset of 2.6.

https://github.com/cmccabe/cmccabe-hbin/blob/master/jirafun.go

Wouldn't be the worst idea to make this part of our release validation
process. The report could be automated as a Jenkins job.


> bq. Also for maintenance releases, aren't all included fixes supposed to
> be for serious bugs? Minor JIRAs can wait for the next minor release. If
> there are strong reasons to include a minor JIRA in a maintenance release,
> then maybe it's not really a minor JIRA.
> If a committer commit a major/minor priority patch on a maintenance
> release, what RM should do? Revert it or upgrade the priority to critical
> even it doesn't belong to critical?
> I believe only commit critical/blocker patch to maintenance release can
> only be a general guideline for maintenance release, but not a strict rule
> for all committers in practice. RMs should obey this guideline strictly in
> cherry-pick commits but there are more commits get committed by other
> committers. The committer choose the fixed branch not only by priority but
> also by target branch proposed by patch contributor who may only work on
> that branch release for a long time. I think this target/fix branch
> negotiation mechanism going on well and we shouldn't break it.
>
> This sounds like another reminder for everyone to:

- Please be judicious about what gets backported to maintenance releases.
- When backporting, please backport to all intermediate maintenance
branches.

Based on what I've seen, the RMs have been very responsive, so the safest
thing is to ping them about inclusion before backporting. I'd be in favor
of a guideline like "get an RM to +1 before backporting."

Best,
Andrew

Re: [VOTE] Release hadoop-2.0.3-alpha

2013-02-07 Thread Andrew Wang

Verified the tarball checksums. Ran a couple example jobs on a 3 node
cluster successfully, with the same WARN caveat as Bobby.

+1 (non-binding).

On Thu, Feb 7, 2013 at 7:33 AM, Robert Evans  wrote:
> I downloaded the binary package and ran a few example jobs on a 3 node
> cluster.  Everything seems to be working OK on it, I did see
>
> WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
> platform... using builtin-java classes where applicable
>
> For every shell command, but just like with 0.23.6 I don't think it is a
> blocker.
>
> +1 (Binding)
>
> --Bobby
>
> On 2/6/13 9:59 PM, "Arun C Murthy"  wrote:
>
>>Folks,
>>
>>I've created a release candidate (rc0) for hadoop-2.0.3-alpha that I
>>would like to release.
>>
>>This release contains several major enhancements such as QJM for HDFS HA,
>>multi-resource scheduling for YARN, YARN ResourceManager restart etc.
>>Also YARN has achieved significant stability at scale (more details from
>>Y! folks here: http://s.apache.org/VYO).
>>
>>The RC is available at:
>>http://people.apache.org/~acmurthy/hadoop-2.0.3-alpha-rc0/
>>The RC tag in svn is here:
>>http://svn.apache.org/viewvc/hadoop/common/tags/release-2.0.3-alpha-rc0/
>>
>>The maven artifacts are available via repository.apache.org.
>>
>>Please try the release and vote; the vote will run for the usual 7 days.
>>
>>thanks,
>>Arun
>>
>>
>>
>>--
>>Arun C. Murthy
>>Hortonworks Inc.
>>http://hortonworks.com/
>>
>>
>

are the HDFS javadocs published on the website?

2013-02-14 Thread Andrew Wang

Hi all,

I think something changed recently regarding the online HDFS javadocs. I'm
fairly sure they used to be available online, since it's indexed by google:

https://www.google.com/?q=inurl:distributedfilesystem++site%3Ahadoop.apache.org

However, all of those results 404 now.

Going to the current API doc page (
http://hadoop.apache.org/docs/current/api/), the "Hadoop Distributed
FileSystem 
(HDFS)"
link also 404's:

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/hdfs/package-summary.html

Is this an intended change? I at least found it really handy to have this
stuff indexed and available online, even if they aren't user-facing APIs.

Best,
Andrew

Re: are the HDFS javadocs published on the website?

2013-02-23 Thread Andrew Wang

Taking silence here to mean we aren't that concerned with the javadocs.

Are we okay with Doug's proposed fix for the broken links?

Thanks,
Andrew


On Thu, Feb 14, 2013 at 4:06 PM, Doug Cutting  wrote:

> All of Hadoop's javadocs were recently lost from our website when it
> was converted to svnpubsub.  These were historically not stored in
> subversion but manually added to the website by release managers.
> When the site was converted to svnpubsub no one had first copied the
> docs tree into subversion so it was lost.  (It could perhaps be
> recovered from tape archives, but that would be a pain.)
>
> Yesterday, on seeing this, I reconstructed what I could.  I extracted
> documentation from the release tarballs of recent releases an pushed
> it into subversion.  Those release tarballs did not seem to include
> HDFS javadocs.
>
> You've found two links to HDFS javadocs in what I restored, and those
> links, as you note, are broken.  If someone has those javadocs or
> wants to build them then they can be restored by committing them to
> subversion under:
>
>
> https://svn.apache.org/repos/asf/hadoop/common/site/main/publish/docs/r1.1.1/
>
> https://svn.apache.org/repos/asf/hadoop/common/site/main/publish/docs/r1.0.4/
>
> I've not seen (broken) links to HDFS documentation in the other more
> recent releases whose documentation I restored.
>
> An alternative might be to put a redirect in to the HDFS user guide to
> fix those two broken links.  If folks prefer that approach I'd be
> happy to implement it.
>
> Doug
>
> On Thu, Feb 14, 2013 at 3:48 PM, Andrew Wang 
> wrote:
> > Hi all,
> >
> > I think something changed recently regarding the online HDFS javadocs.
> I'm
> > fairly sure they used to be available online, since it's indexed by
> google:
> >
> >
> https://www.google.com/?q=inurl:distributedfilesystem++site%3Ahadoop.apache.org
> >
> > However, all of those results 404 now.
> >
> > Going to the current API doc page (
> > http://hadoop.apache.org/docs/current/api/), the "Hadoop Distributed
> > FileSystem (HDFS)<
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/hdfs/package-summary.html
> >"
> > link also 404's:
> >
> >
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/hdfs/package-summary.html
> >
> > Is this an intended change? I at least found it really handy to have this
> > stuff indexed and available online, even if they aren't user-facing APIs.
> >
> > Best,
> > Andrew
>

Re: Unable to delete symlinks in HDFS via FileContext

2013-04-25 Thread Andrew Wang

Hi Dia,

That's definitely a weird one, let's try to figure out what's going on.

Can you possibly share the complete FileContext snippet you're using to do
this test? You could also try using fully-qualified URIs everywhere
(including for symlink creation), to remove any possible ambiguity.

Also, can you clarify the version of Hadoop you're using? CDH4.x? Apache
2.0.0? If you're using CDH, let's move this off hdfs-dev and over to
cdh-u...@cloudera.org.

Best,
Andrew

On Wed, Apr 24, 2013 at 8:05 PM, Dia Kharrat  wrote:

> Hi,
>
> We have an HDFS setup (v2.0.0) managed by Cloudera. I'm having trouble
> getting the FileStatus of a symlink or deleting it.
>
> According to the documentation, FileContext#getFileLinkStatus() or
> FileContext#delete() should operate on the symlink itself if the provided
> path is a symlink.
>
> However, what happens instead is, FileContext#getFileLinkStatus() or
> delete() resolve the symlink and act on the target path. So, for example,
> if "/tmp/symlink.file" points to "/tmp/target.file", doing this:
>
> Path path = new Path("hdfs://nameservice1:8020/tmp/symlink.file");
> fileContext.getFileLinkStatus(path).getPath();
>
> returns a path of:
>
> Path("hdfs://nameservice1:8020/tmp/target.file")
>
> Similarly, fileContext.delete(path, true) deletes the target file
> ("/tmp/target.file") instead of the symlink itself. So, this behavior does
> not match with the documentation.
>
> What's interesting is, locally in pseudo-mode, the above works as expected.
>
> Any ideas or pointers as to why FileContext#delete and
> FileContext#getFileLinkStatus() are not operating correctly on the
> symlinks?
>
> Thanks,
> Dia
>

Re: transfer -> CreateSocketForPipeline : hardcoded length of 2?

2013-04-29 Thread Andrew Wang

Hi Jay,

Actually, my question on seeing that code is wondering why it's hardcoded
to 2, rather than targets.length. The pipeline length is supposed to be the
number of datanodes in the pipeline. This might be a bug.

Regarding the timeout, it makes sense to boost the timeout based on the
length of the pipeline. Longer pipelines can experience more delays, since
data needs to flow down and then get ack'd back up.

Best,
Andrew


On Tue, Apr 23, 2013 at 6:25 AM, Jay Vyas  wrote:

> Hi guys: I noticed that in the call to createSocketForPipeline, there is a
> hardcoded length of "2".
>
> //from
> sock = createSocketForPipeline(src, 2, dfsClient);
>
> This cascades down to the "getDataNodeReadTimeout" method, resulting in a
> multiplier of 2.
>
> //from DFSClient.java
> int getDatanodeReadTimeout(int numNodes) {
> return dfsClientConf.socketTimeout > 0 ?
> (HdfsServerConstants.READ_TIMEOUT_EXTENSION * numNodes +
> dfsClientConf.socketTimeout) : 0;
>   }
>
> I wonder why the pipeline length is "2" as opposed to "1" ?  It seems that
> transferring a single block should have a pipeline length of 1?
>
> ///for example: in the createBlockOutputStream method, we have
>s = createSocketForPipeline(nodes[0], nodes.length, dfsClient);
>
> Is the "2", then, just used to add some cushion to the timeout? Or is
> something expected to be happening during a block transfer which makes the
> pipeline a 2 node, rather than 1 node one?
>
> Maybe I'm misunderstanding something about the way the pipeline works so
> thanks for helping and apologies if this question is a little silly.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Re: Why is FileSystem.createNonRecursive deprecated?

2013-06-11 Thread Andrew Wang

Hi Ravi,

I wasn't around for HADOOP-6840, but I'm guessing it's deprecated for the
same reasons as primitiveCreate: FileSystem is supposed to eventually to be
supplanted by FileContext.

FileContext#create also has a more manageable number of method signatures
through the use of flags, and in fact defaults to not creating parent
directories. I believe MR2 also uses FileContext over FileSystem, so this
might be your best bet.

HTH,
Andrew

On Tue, Jun 11, 2013 at 3:18 PM, Ravi Prakash  wrote:

> Hi folks,
>
> I am trying to fix MAPREDUCE-5317. I noticed that the only way through
> FileSystem to NOT recursively create directories is through the deprecated
> method
>
> @deprecated API only for 0.20-append
> FileSystem.createNonRecursive.
>
>
> This has been marked deprecated ever since it was put in by HADOOP-6840.
> Do we know if we ever expect to un-deprecate this method? I am trying to
> find the rationale behind checking it in as a deprecated method, but
> haven't been able to find any written record. Does anyone know?
> Thanks
> Ravi

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-07 Thread Andrew Wang

I don't think exposing DFSClient and DistributedFileSystem members is
necessary to achieve what you're trying to do. We've got wrapper
FileSystems like FilterFileSystem and ViewFileSystem which you might be
able to use for inspiration, and the HCFS wiki lists some third-party
FileSystems that might also be helpful too.


On Wed, Aug 7, 2013 at 11:11 AM, Joe Bounour  wrote:

> Hello Jeff
>
> Is it something that could go under HCFS project?
> http://wiki.apache.org/hadoop/HCFS
> (I might be wrong?)
>
> Joe
>
>
> On 8/7/13 10:59 AM, "Jeff Dost"  wrote:
>
> >Hello,
> >
> >We work in a software development team at the UCSD CMS Tier2 Center.  We
> >would like to propose a mechanism to allow one to subclass the
> >DFSInputStream in a clean way from an external package.  First I'd like
> >to give some motivation on why and then will proceed with the details.
> >
> >We have a 3 Petabyte Hadoop cluster we maintain for the LHC experiment
> >at CERN.  There are other T2 centers worldwide that contain mirrors of
> >the same data we host.  We are working on an extension to Hadoop that,
> >on reading a file, if it is found that there are no available replicas
> >of a block, we use an external interface to retrieve this block of the
> >file from another data center.  The external interface is necessary
> >because not all T2 centers involved in CMS are running a Hadoop cluster
> >as their storage backend.
> >
> >In order to implement this functionality, we need to subclass the
> >DFSInputStream and override the read method, so we can catch
> >IOExceptions that occur on client reads at the block level.
> >
> >The basic steps required:
> >1. Invent a new URI scheme for the customized "FileSystem" in
> >core-site.xml:
> >   
> > fs.foofs.impl
> > my.package.FooFileSystem
> > My Extended FileSystem for foofs: uris.
> >   
> >
> >2. Write new classes included in the external package that subclass the
> >following:
> >FooFileSystem subclasses DistributedFileSystem
> >FooFSClient subclasses DFSClient
> >FooFSInputStream subclasses DFSInputStream
> >
> >Now any client commands that explicitly use the foofs:// scheme in paths
> >to access the hadoop cluster can open files with a customized
> >InputStream that extends functionality of the default hadoop client
> >DFSInputStream.  In order to make this happen for our use case, we had
> >to change some access modifiers in the DistributedFileSystem, DFSClient,
> >and DFSInputStream classes provided by Hadoop.  In addition, we had to
> >comment out the check in the namenode code that only allows for URI
> >schemes of the form "hdfs://".
> >
> >Attached is a patch file we apply to hadoop.  Note that we derived this
> >patch by modding the Cloudera release hadoop-2.0.0-cdh4.1.1 which can be
> >found at:
> >http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.1.1.tar.gz
> >
> >We would greatly appreciate any advise on whether or not this approach
> >sounds reasonable, and if you would consider accepting these
> >modifications into the official Hadoop code base.
> >
> >Thank you,
> >Jeff, Alja & Matevz
> >UCSD Physics
>
>

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-07 Thread Andrew Wang

Blocks are supposed to be an internal abstraction within HDFS, and aren't
an inherent part of FileSystem (the user-visible class used to access all
Hadoop filesystems).

Is it possible to instead deal with files and offsets? On a read failure,
you could open a stream to the same file on the backup filesystem, seek to
the old file position, and retry the read. This feels like it's possible
via wrapping.


On Wed, Aug 7, 2013 at 3:29 PM, Jeff Dost  wrote:

> Thank you for the suggestion, but we don't see how simply wrapping a
> FileSystem object would be sufficient in our use case.  The reason why is
> we need to catch and handle read exceptions at the block level.  There
> aren't any public methods available in the high level FileSystem
> abstraction layer that would give us the fine grained control we need at
> block level read failures.
>
> Perhaps if I outline the steps more clearly it will help explain what we
> are trying to do.  Without our enhancements, suppose a user opens a file
> stream and starts reading the file from Hadoop. After some time, at some
> position into the file, if there happen to be no replicas available for a
> particular block for whatever reason, datanodes have gone down due to disk
> issues, etc. the stream will throw an IOException (BlockMissingException or
> similar) and the read will fail.
>
> What we are doing is rather than letting the stream fail, we have another
> stream queued up that knows how to fetch the blocks elsewhere outside of
> our Hadoop cluster that couldn't be retrieved.  So we need to be able to
> catch the exception at this point, and these externally fetched bytes then
> get read into the user supplied read buffer.  Now Hadoop can proceed to
> read in the stream the next blocks in the file.
>
> So as you can see this method of fail over on demand allows an input
> stream to keep reading data, without having to start it all over again if a
> failure occurs (assuming the remote bytes were successfully fetched).
>
> As a final note I would like to mention that we will be providing our
> failover module to the Open Science Grid.  Since we hope to provide this as
> a benefit to all OSG users running at participating T2 computing clusters,
> we will be committed to maintaining this software and any changes to Hadoop
> needed to make it work.  In other words we will be willing to maintain any
> implementation changes that may become necessary as Hadoop internals change
> in future releases.
>
> Thanks,
> Jeff
>
>
> On 8/7/13 11:30 AM, Andrew Wang wrote:
>
>> I don't think exposing DFSClient and DistributedFileSystem members is
>> necessary to achieve what you're trying to do. We've got wrapper
>> FileSystems like FilterFileSystem and ViewFileSystem which you might be
>> able to use for inspiration, and the HCFS wiki lists some third-party
>> FileSystems that might also be helpful too.
>>
>>
>> On Wed, Aug 7, 2013 at 11:11 AM, Joe Bounour  wrote:
>>
>>  Hello Jeff
>>>
>>> Is it something that could go under HCFS project?
>>> http://wiki.apache.org/hadoop/**HCFS<http://wiki.apache.org/hadoop/HCFS>
>>> (I might be wrong?)
>>>
>>> Joe
>>>
>>>
>>> On 8/7/13 10:59 AM, "Jeff Dost"  wrote:
>>>
>>>  Hello,
>>>>
>>>> We work in a software development team at the UCSD CMS Tier2 Center.  We
>>>> would like to propose a mechanism to allow one to subclass the
>>>> DFSInputStream in a clean way from an external package.  First I'd like
>>>> to give some motivation on why and then will proceed with the details.
>>>>
>>>> We have a 3 Petabyte Hadoop cluster we maintain for the LHC experiment
>>>> at CERN.  There are other T2 centers worldwide that contain mirrors of
>>>> the same data we host.  We are working on an extension to Hadoop that,
>>>> on reading a file, if it is found that there are no available replicas
>>>> of a block, we use an external interface to retrieve this block of the
>>>> file from another data center.  The external interface is necessary
>>>> because not all T2 centers involved in CMS are running a Hadoop cluster
>>>> as their storage backend.
>>>>
>>>> In order to implement this functionality, we need to subclass the
>>>> DFSInputStream and override the read method, so we can catch
>>>> IOExceptions that occur on client reads at the block level.
>>>>
>>>> The basic steps required:
>>>> 1. Invent a new URI scheme for the customized "FileSystem" in
>&

Re: Secure deletion of blocks

2013-08-15 Thread Andrew Wang

Hi Matt,

Here are some code pointers:

- When doing a file deletion, the NameNode turns the file into a set of
blocks that need to be deleted.
- When datanodes heartbeat in to the NN (see BPServiceActor#offerService),
the NN replies with blocks to be invalidated (see BlockCommand and
DatanodeProtocol.DNA_INVALIDATE).
- The DN processes these invalidates in
BPServiceActor#processCommandFromActive (look for DNA_INVALIDATE again).
- The magic lines you're looking for are probably in
FsDatasetAsyncDiskService#run, since we delete blocks in the background

Best,
Andrew


On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
matt.fell...@bespokesoftware.com> wrote:

> Hi,
> I'm looking into writing a patch for HDFS which will provide a new method
> within HDFS which can securely delete the contents of a block on all the
> nodes upon which it exists. By securely delete I mean, overwrite with
> 1's/0's/random data cyclically such that the data could not be recovered
> forensically.
>
> I'm not currently aware of any existing code / methods which provide this,
> so was going to implement this myself.
>
> I figured the DataNode.java was probably the place to start looking into
> how this could be done, so I've read the source for this, but it's not
> really enlightened me a massive amount.
>
> I'm assuming I need to tell the NameServer that all DataNodes with a
> particular block id would be required to be deleted, then as each DataNode
> calls home, the DataNode would be instructed to securely delete the
> relevant block, and it would oblige.
>
> Unfortunately I have no idea where to begin and was looking for some
> pointers?
>
> I guess specifically I'd like to know:
>
> 1. Where the hdfs CLI commands are implemented
> 2. How a DataNode identifies a block / how a NameServer could inform a
> DataNode to delete a block
> 3. Where the existing "delete" is implemented so I can make sure my secure
> delete makes use of it after successfully blanking the block contents
> 4. If I've got the right idea about this at all?
>
> Kind regards,
> Matt Fellows
>
> --
> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
>  First Option Software Ltd
> Signal House
> Jacklyns Lane
> Alresford
> SO24 9JJ
> Tel: +44 (0)1962 738232
> Mob: +44 (0)7710 160458
> Fax: +44 (0)1962 600112
> Web: www.b 
> espokesoftware.com
>
> __**__
>
> This is confidential, non-binding and not company endorsed - see full
> terms at 
> www.fosolutions.co.uk/**emailpolicy.html
>
> First Option Software Ltd Registered No. 06340261
> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
> __**__
>
>

Re: hsync is too slower than hflush

2013-08-25 Thread Andrew Wang

50ms is believable. hsync makes each DN call fsync and wait for acks, so
you'd expect at least a disk seek time (~10ms) with some extra time
depending on how much unsync'd data is being written.

So, just as some back of the envelope math, assuming a disk that can write
at 100MB/s:

50ms - 10ms seek = 40ms writing time
100 MB/s * 40ms = 4MB

If you're hsync'ing every 4MB, 50ms would be exactly what I'd expect.

Best,
Andrew

On Sat, Aug 24, 2013 at 10:11 PM, haosdent  wrote:

> Hi, all. Hadoop support hsync which would call fsync of system after
> 2.0.2. I have tested the performance of hsync() and hflush() again and
> again, but I found that the hsync call() everytime would spent nearly 50ms
> while the hflush call() just spent 2ms. In this slide(
> http://www.slideshare.net/enissoz/hbase-and-hdfs-understanding-filesystem-usagePage
>  18), the author mentions that hsync() is 2x slower than hflush(). So,
> is anything wrong? Thank you very much and looking forward to your help.
>
> --
> Best Regards,
> Haosong Huang
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>

Re: hsync is too slower than hflush

2013-08-25 Thread Andrew Wang

Ah, I forgot the checksum fsync, so two seeks. Even with 4k writes, 50ms
still feels in the right ballpark. Best case it's ~20ms, still way slower
than hflush.

It's also worth asking if there's other dirty data waiting for writeback,
since I believe it can also get written out on an fsync.

hflush doesn't durably write to disk, so you're still in danger of losing
data if there's a cluster-wide power outage. Because HDFS writes to two
different racks, hflush still protects you from single-rack outages. Most
people think this is good enough (I believe HBase by default runs with just
hflush), but if you *really* want to be sure, pay the cost of hsync and do
durable writes.


On Sun, Aug 25, 2013 at 7:44 PM, haosdent  wrote:

> In fact, I just write 4k in every hsync. Datenode would write checksum
> file and data file when I hsync data to datanode. Each of them would spent
> nearly 25ms, so a hsync call would spent nearly 50ms. But hflush is very
> fast, which spent both 1ms in write checksum and data. If a hsync would
> spent 50ms, what meanings we use it? Or my test way is wrong?
>
> --
> Best Regards,
> Haosong Huang
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> On Monday, August 26, 2013 at 7:07 AM, Andrew Wang wrote:
>
> > 50ms is believable. hsync makes each DN call fsync and wait for acks, so
> > you'd expect at least a disk seek time (~10ms) with some extra time
> > depending on how much unsync'd data is being written.
> >
> > So, just as some back of the envelope math, assuming a disk that can
> write
> > at 100MB/s:
> >
> > 50ms - 10ms seek = 40ms writing time
> > 100 MB/s * 40ms = 4MB
> >
> > If you're hsync'ing every 4MB, 50ms would be exactly what I'd expect.
> >
> > Best,
> > Andrew
> >
> >
> > On Sat, Aug 24, 2013 at 10:11 PM, haosdent  haosd...@gmail.com)> wrote:
> >
> > > Hi, all. Hadoop support hsync which would call fsync of system after
> > > 2.0.2. I have tested the performance of hsync() and hflush() again and
> > > again, but I found that the hsync call() everytime would spent nearly
> 50ms
> > > while the hflush call() just spent 2ms. In this slide(
> > >
> http://www.slideshare.net/enissoz/hbase-and-hdfs-understanding-filesystem-usagePage18),
>  the author mentions that hsync() is 2x slower than hflush(). So,
> > > is anything wrong? Thank you very much and looking forward to your
> help.
> > >
> > > --
> > > Best Regards,
> > > Haosong Huang
> > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > >
> >
> >
> >
>
>
>

Re: hsync is too slower than hflush

2013-08-26 Thread Andrew Wang

It's syncing the checksum file, so the disk head very likely has to move.
There are rotational seek delays too.


On Mon, Aug 26, 2013 at 7:30 AM, lei liu  wrote:

> Hi all,
>
> DataNode sequential write file, so I think the disk seek time should be
> very small.Why is disk seek time 10ms? I think that is too long. Whether we
> can optimize the linux system configuration, reduce disk seek time.
>
>
> 2013/8/26 haosdent 
>
> > haha, thank you very much, I get it now.
> >
> > --
> > Best Regards,
> > Haosong Huang
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> >
> >
> > On Monday, August 26, 2013 at 11:18 AM, Andrew Wang wrote:
> >
> > > Ah, I forgot the checksum fsync, so two seeks. Even with 4k writes,
> 50ms
> > > still feels in the right ballpark. Best case it's ~20ms, still way
> slower
> > > than hflush.
> > >
> > > It's also worth asking if there's other dirty data waiting for
> writeback,
> > > since I believe it can also get written out on an fsync.
> > >
> > > hflush doesn't durably write to disk, so you're still in danger of
> losing
> > > data if there's a cluster-wide power outage. Because HDFS writes to two
> > > different racks, hflush still protects you from single-rack outages.
> Most
> > > people think this is good enough (I believe HBase by default runs with
> > just
> > > hflush), but if you *really* want to be sure, pay the cost of hsync and
> > do
> > > durable writes.
> > >
> > >
> > > On Sun, Aug 25, 2013 at 7:44 PM, haosdent  > haosd...@gmail.com)> wrote:
> > >
> > > > In fact, I just write 4k in every hsync. Datenode would write
> checksum
> > > > file and data file when I hsync data to datanode. Each of them would
> > spent
> > > > nearly 25ms, so a hsync call would spent nearly 50ms. But hflush is
> > very
> > > > fast, which spent both 1ms in write checksum and data. If a hsync
> would
> > > > spent 50ms, what meanings we use it? Or my test way is wrong?
> > > >
> > > > --
> > > > Best Regards,
> > > > Haosong Huang
> > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > > >
> > > >
> > > > On Monday, August 26, 2013 at 7:07 AM, Andrew Wang wrote:
> > > >
> > > > > 50ms is believable. hsync makes each DN call fsync and wait for
> > acks, so
> > > > > you'd expect at least a disk seek time (~10ms) with some extra time
> > > > > depending on how much unsync'd data is being written.
> > > > >
> > > > > So, just as some back of the envelope math, assuming a disk that
> can
> > > > write
> > > > > at 100MB/s:
> > > > >
> > > > > 50ms - 10ms seek = 40ms writing time
> > > > > 100 MB/s * 40ms = 4MB
> > > > >
> > > > > If you're hsync'ing every 4MB, 50ms would be exactly what I'd
> expect.
> > > > >
> > > > > Best,
> > > > > Andrew
> > > > >
> > > > >
> > > > > On Sat, Aug 24, 2013 at 10:11 PM, haosdent  (mailto:
> > haosd...@gmail.com) (mailto:
> > > > haosd...@gmail.com (mailto:haosd...@gmail.com))> wrote:
> > > > >
> > > > > > Hi, all. Hadoop support hsync which would call fsync of system
> > after
> > > > > > 2.0.2. I have tested the performance of hsync() and hflush()
> again
> > and
> > > > > > again, but I found that the hsync call() everytime would spent
> > nearly
> > > > > >
> > > > >
> > > > >
> > > >
> > > > 50ms
> > > > > > while the hflush call() just spent 2ms. In this slide(
> > > > >
> > > >
> > > >
> >
> http://www.slideshare.net/enissoz/hbase-and-hdfs-understanding-filesystem-usagePage18
> ),
> > the author mentions that hsync() is 2x slower than hflush(). So,
> > > > > > is anything wrong? Thank you very much and looking forward to
> your
> > > > >
> > > >
> > > > help.
> > > > > >
> > > > > > --
> > > > > > Best Regards,
> > > > > > Haosong Huang
> > > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > > > > >
> > > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> >
> >
>

symlink support in Hadoop 2 GA

2013-09-16 Thread Andrew Wang

Hi all,

I wanted to broadcast plans for putting the FileSystem symlinks work
(HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I think
it's pretty important we get it in since it's not a compatible change; if
it misses the GA train, we're not going to have symlinks until the next
major release.

However, we're still dealing with ongoing issues revealed via testing.
There's user-code out there that only handles files and directories and
will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
for a nice example where globStatus returning symlinks broke Pig; some of
us had a conference call to talk it through, and one definite conclusion
was that this wasn't solvable in a generally compatible manner.

There are also still some gaps in symlink support right now. For example,
the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
resolution, and tooling like the FsShell and Distcp still need to be
updated as well.

So, there's definitely work to be done, but there are a lot of users
interested in the feature, and symlinks really should be in GA. Would
appreciate any thoughts/input on the matter.

Thanks,
Andrew

Re: [VOTE] Release Apache Hadoop 2.1.1-beta

2013-09-17 Thread Andrew Wang

Hey all,

Sorry to hijack the vote thread, but it'd be good to get some input on my
email from yesterday re: symlink support in branch-2.1. I think it really
should be in GA one way or the other.

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201309.mbox/%3CCAGB5D2ZDjqt69oFfv_HOsWEH18T9GanuuF1Y%3DaKG-JptvV3ViA%40mail.gmail.com%3E

Thanks,
Andrew


On Tue, Sep 17, 2013 at 2:23 AM, Alejandro Abdelnur wrote:

> Thanks Arun.
>
> +1
>
> * Downloaded source tarball.
> * Verified MD5
> * Verified signature
> * run apache-rat:check ok after minor tweak (see NIT1 below)
> * checked CHANGES.txt headers (see NIT2 below)
> * built DIST from source
> * verified hadoop version of Hadoop JARs
> * configured pseudo cluster
> * tested HttpFS
> * run a few MR examples
> * run a few unmanaged AM app examples
>
> The following NITs should be addressed if there is a new RC or in the next
> release
>
> --
> NIT1, empty files that make apache-rat:check to fail, these files should be
> removed:
>
> *
>
> /Users/tucu/Downloads/h/hadoop-2.1.1-beta-src/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FileContextSymlinkBaseTest.java
>
> *
>
> /Users/tucu/Downloads/h/hadoop-2.1.1-beta-src/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestLocalFSFileContextSymlink.java
>
> *
>
> /Users/tucu/Downloads/h/hadoop-2.1.1-beta-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestFcHdfsSymlink.java
> --
> NIT2, common/hdfs/mapreduce/yarn CHANGES.txt have 2.2.0 header, they should
> not
> --
>
>
>
> On Tue, Sep 17, 2013 at 8:38 AM, Arun C Murthy 
> wrote:
>
> > Folks,
> >
> > I've created a release candidate (rc0) for hadoop-2.1.1-beta that I would
> > like to get released - this release fixes a number of bugs on top of
> > hadoop-2.1.0-beta as a result of significant amounts of testing.
> >
> > If things go well, this might be the last of the *beta* releases of
> > hadoop-2.x.
> >
> > The RC is available at:
> > http://people.apache.org/~acmurthy/hadoop-2.1.1-beta-rc0
> > The RC tag in svn is here:
> >
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.1-beta-rc0
> >
> > The maven artifacts are available via repository.apache.org.
> >
> > Please try the release and vote; the vote will run for the usual 7 days.
> >
> > thanks,
> > Arun
> >
> >
> > --
> > Arun C. Murthy
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>
>
>
> --
> Alejandro
>

Re: symlink support in Hadoop 2 GA

2013-09-17 Thread Andrew Wang

I encourage interested parties to read through HADOOP-9912 to get a feel
for the issues. There really is no way to add symlink support without
changing the behavior of existing APIs. Ultimately, anything that returns a
FileStatus is going to be different. Even if we default to resolving
symlinks, resolving can lead to FileNotFound or permission errors. Thus, we
have to choose whether to prune the bad links, show the bad links as
dangling, or throwing an exception. None of these options are compatible.

I'm really concerned about putting this in a minor release like 2.3 since
it has the potential to break a lot of user code. HADOOP-9912 is an example
from within our own ecosystem, but think of all the custom user code out
there written against FileSystem. 2.2 GA is basically our last chance to
make this kind of change before Hadoop 3.

Thanks,
Andrew


On Tue, Sep 17, 2013 at 9:10 AM, Colin McCabe wrote:

> The issue is not modifying existing APIs.  The issue is that code has
> been written that makes assumptions that are incompatible with the
> existence of things that are not files or directories.  For example,
> there is a lot of code out there that looks at FileStatus#isFile, and
> if it returns false, assumes that what it is looking at is a
> directory.  In the case of a symlink, this assumption is incorrect.
>
> Faced with this, we have considered making the default behavior of
> listStatus and globStatus to be fully resolving symlinks, and simply
> not listing dangling symlinks. Code which is prepared to deal symlinks
> can use newer versions of the listStatus and globStatus functions
> which do return symlinks as symlinks.
>
> We might consider defaulting FileSystem#listStatus and
> FileSystem#globStatus to "fully resolving symlinks by default" and
> defaulting FileContext#listStatus and FileContext#Util#globStatus to
> the opposite.  This seems like the maximally compatible solution that
> we're going to get.  I think this makes sense.
>
> The alternative is kicking the can down the road to Hadoop 3, and
> letting vendors of alternative (including some proprietary
> alternative) systems continue to claim that "Hadoop doesn't support
> symlinks yet" (with some justice).
>
> P.S.  I would be fine with putting this in 2.2 or 2.3 if that seems
> more appropriate.
>
> sincerely,
> Colin
>
> On Tue, Sep 17, 2013 at 8:23 AM, Suresh Srinivas 
> wrote:
> > I agree that this is an important change. However, 2.2.0 GA is getting
> > ready to rollout in weeks. I am concerned that these changes will add not
> > only incompatible changes late in the game, but also possibly
> instability.
> > Java API incompatibility is some thing we have avoided for the most part
> > and I am concerned that this is adding such incompatibility in FileSystem
> > APIs. We should find work arounds by adding possibly newer APIs and
> leaving
> > existing APIs as is. If this can be done, my vote is to enable this
> feature
> > in 2.3. Even if it cannot be done, I am concerned that this is coming
> quite
> > late and we should see if could allow some incompatible changes into 2.3
> > for this feature.
> >
> >
> > On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang  >wrote:
> >
> >> Hi all,
> >>
> >> I wanted to broadcast plans for putting the FileSystem symlinks work
> >> (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I
> think
> >> it's pretty important we get it in since it's not a compatible change;
> if
> >> it misses the GA train, we're not going to have symlinks until the next
> >> major release.
> >>
> >> However, we're still dealing with ongoing issues revealed via testing.
> >> There's user-code out there that only handles files and directories and
> >> will barf when given a symlink (perhaps a dangling one!). See
> HADOOP-9912
> >> for a nice example where globStatus returning symlinks broke Pig; some
> of
> >> us had a conference call to talk it through, and one definite conclusion
> >> was that this wasn't solvable in a generally compatible manner.
> >>
> >> There are also still some gaps in symlink support right now. For
> example,
> >> the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need
> symlink
> >> resolution, and tooling like the FsShell and Distcp still need to be
> >> updated as well.
> >>
> >> So, there's definitely work to be done, but there are a lot of users
> >> interested in the feature, and symlinks really should be in GA. Would
> >> appreciate any thoughts/input on the matter.
> >>
> >

Re: symlink support in Hadoop 2 GA

2013-09-18 Thread Andrew Wang

It's an incompatible change. Existing APIs like listStatus and globStatus
need to be symlink aware now, which can break assumptions of user code.
We've had FileStatus#isSymlink() since the early days, but lots of user
code hasn't been updated to use it.

I think Eli's earlier email did a good job at laying out the current state
and our options. I didn't realize this before, but most of HADOOP-8040 is
already in branch-2.1-beta, but many of the subsequent changes are not
(e.g. HADOOP-9417, HADOOP-9817, HADOOP-9652). This means the current state
of symlink support in branch-2.1-beta is half-baked, which is why "do
nothing" is not a good option.

With that in mind, perhaps Eli's proposals (abbreviated here) make more
sense:

1) Delay 2.2 GA and put in some more effort to fix API issues like
HADOOP-9912 / HADOOP-9972. Undoubtedly, more issues will still fall out of
this post-GA, but we can do our best to fix these issues compatibly in 2.3.
2) Revert symlinks from branch-2.1-beta and leave it all for 2.3, but that
makes 2.3 a pretty big jump from GA. Since symlinks have already appeared
in the 2.1.0 release, it'd also technically make 2.2 a regression from
2.1.0.
3) Wait for 3.0, which I don't think anyone wants.




On Wed, Sep 18, 2013 at 10:05 AM, Steve Loughran wrote:

> the main change is whatever APIs are going to be provided (and implicitly:
> supported for a long time) to handle symlinks separately from directories
>
>
> On 18 September 2013 17:24, Eli Collins  wrote:
>
> > On Wed, Sep 18, 2013 at 5:45 AM, Steve Loughran  > >wrote:
> >
> > > On 18 September 2013 12:53, Alejandro Abdelnur 
> > wrote:
> > >
> > > > On Wed, Sep 18, 2013 at 11:29 AM, Steve Loughran <
> > ste...@hortonworks.com
> > > > >wrote:
> > > >
> > > > > I'm reluctant for this as while delaying the release, because we
> are
> > > > going
> > > > > to find problems all the way up the stack -which will require a
> > > > > choreographed set of changes. Given the grief of the protbuf
> update,
> > I
> > > > > don't want to go near that just before the final release.
> > > > >
> > > >
> > > > Well, I would use the exact same argument used for protobuf (which
> only
> > > > complication was getting protoc 2.5.0 in the jenkins boxes and
> > > communicate
> > > > developers to do the same, other than that we didn't hit any other
> > issue
> > > > AFAIK) ...
> > > >
> > >
> > > protobuf was traumatic at build time, as I recall because it was
> neither
> > > forwards or backwards compatible. Those of us trying to build different
> > > branches had to choose which version to have on the path, or set up
> > scripts
> > > to do the switching. HBase needed rebuilding, so did other things. And
> I
> > > still have the pain of downloading and installing protoc on all Linux
> > VMs I
> > > build up going forward, until apt-get and yum have protoc 2.5
> artifacts.
> > >
> > > This means it was very painful for developer, added a lot of late
> > breaking
> > > pain to the developers, but it had one key feature that gave it an
> edge:
> > it
> > > was immediately obvious where you had a problem as things didn't
> compile
> > or
> > > classload without linkage problems. No latent bugs, unless protobuf 2.5
> > has
> > > them internally -for which we have to rely on google's release testing
> to
> > > have found.
> > >
> > > That is a lot simpler to regression test than adding any new feature to
> > > HDFS and seeing what breaks -as that is something that only surfaces
> out
> > in
> > > the field. Which is why I think it's too late in the 2.1 release
> > timetable
> > > to add symlinks. We've had a 2.1-beta out there, we've got feedback.
> Fix
> > > those problems that are show stoppers, but don't add more stuff. Which
> is
> > > precisely why I have not been pushing in any of my recent changes. I
> may
> > > seem ruthless arguing against symlinks -but I'm not being inconsistent
> > with
> > > my own commit history. The only two things I've put in branch-2.1 since
> > > beta-1 were a separate log for the Configuration deprecation warnings
> > and a
> > > patch to the POM for a java7 build on OSX: and they weren't even my
> > > patches.
> > >
> > >
> > > -Steve
> > >
> > > (One of these days I should volunteer to be the release manager and
> it'll
> > > be obvious that Arun is being quite amenable to all the other
> developers)
> > >
> > >
> > >
> > > >
> > > > IMO, it makes more sense to do this change during the beta rather
> than
> > > when
> > > > GA. That gives us more flexibility to iron out things if necessary.
> > > >
> > > >
> > > I'm arguing this change can go into the beta of the successor to 2.1
> -not
> > > GA.
> > >
> > >
> > What does "this change" refer to?  Symlinks are already in 2.1, and the
> > existing semantics create problems for programs (eg see the pig
> > example in HADOOP-9912)
> > that we need to resolve.  I don't think do nothing is an option for 2.2.
> > GA.
> >
> > Thanks,
> > Eli
> >
> >
> >
> >
> >
> >
> >
> > > --
> > > CON

Re: [VOTE] Release Apache Hadoop 2.1.1-beta

2013-09-23 Thread Andrew Wang

We still need to resolve some symlink issues; are we planning to spin a new
RC? Leaving it as-is is not a good option.


On Sun, Sep 22, 2013 at 11:23 PM, Roman Shaposhnik  wrote:

> On Mon, Sep 16, 2013 at 11:38 PM, Arun C Murthy 
> wrote:
> > Folks,
> >
> > I've created a release candidate (rc0) for hadoop-2.1.1-beta that I
> would like to get
> > released - this release fixes a number of bugs on top of
> hadoop-2.1.0-beta as a result of significant amounts of testing.
> >
> > If things go well, this might be the last of the *beta* releases of
> hadoop-2.x.
> >
> > The RC is available at:
> http://people.apache.org/~acmurthy/hadoop-2.1.1-beta-rc0
> > The RC tag in svn is here:
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.1-beta-rc0
> >
> > The maven artifacts are available via repository.apache.org.
> >
> > Please try the release and vote; the vote will run for the usual 7 days.
>
> Short of HDFS-5225 from the Bigtop perspective this RC gets a +1.
>
> All tests passed in both secure and unsecure modes in 4 nodes
> pseudo distributed cluster with all the members of Hadoop
> ecosystem running smoke tests.
>
> Thanks,
> Roman.
>

Re: [VOTE] Release Apache Hadoop 2.1.1-beta

2013-09-24 Thread Andrew Wang

Hey Arun,

That plan sounds good to me, thanks for being on top of things. What's the
new fix version we should be using (2.1.2 or 2.2.0)? Would be good to get
the same clarification regarding which branches should be receiving
commits. I think a 2.1.2 would be nice to get the symlinks changes in a
beta release pre-GA.

I'd also like to add HADOOP-9761 to tucu's list of JIRAs, a symlink+viewfs
regression that's mistakenly only in branch-2.

Thanks,
Andrew


On Tue, Sep 24, 2013 at 1:39 PM, Arun C Murthy  wrote:

> Rather than spin another RC, let's get this out and follow up with the
> next release - especially since it's not clear how long it will take for
> the symlink stuff to sort itself out.
>
> Getting this out will help downstream projects, even if it does so in
> small way.
>
> Arun
>
> On Sep 23, 2013, at 5:36 PM, Alejandro Abdelnur  wrote:
>
> > Vote for the 2.1.1-beta release is closing tonight, while we had quite a
> > few +1s, it seems we need to address the following before doing a
> release:
> >
> > symlink discussion: get a concrete and explicit understanding on what we
> > will do and  in what release(s).
> >
> > Also, the following JIRAs seem nasty enough to require a new RC:
> >
> > https://issues.apache.org/jira/browse/HDFS-5225 (no patch avail)
> > https://issues.apache.org/jira/browse/HDFS-5228 (patch avail)
> > https://issues.apache.org/jira/browse/YARN-1089 (patch avail)
> > https://issues.apache.org/jira/browse/MAPREDUCE-5529 (patch avail)
> >
> > I won't -1 the release but I'm un-casting my vote as I think we should
> > address these things before.
> >
> > Thanks.
> >
> > Alejandro
> >
> >
> > On Tue, Sep 24, 2013 at 1:49 AM, Suresh Srinivas  >wrote:
> >
> >> +1 (binding)
> >>
> >>
> >> Verified the signatures and hashes for both src and binary tars. Built
> from
> >> the source, the binary distribution and the documentation. Started a
> single
> >> node cluster and tested the following:
> >>
> >> # Started HDFS cluster, verified the hdfs CLI commands such ls, copying
> >> data back and forth, verified namenode webUI etc.
> >>
> >> # Ran some tests such as sleep job, TestDFSIO, NNBench etc.
> >>
> >>
> >>
> >>
> >> On Mon, Sep 16, 2013 at 11:38 PM, Arun C Murthy 
> >> wrote:
> >>
> >>> Folks,
> >>>
> >>> I've created a release candidate (rc0) for hadoop-2.1.1-beta that I
> would
> >>> like to get released - this release fixes a number of bugs on top of
> >>> hadoop-2.1.0-beta as a result of significant amounts of testing.
> >>>
> >>> If things go well, this might be the last of the *beta* releases of
> >>> hadoop-2.x.
> >>>
> >>> The RC is available at:
> >>> http://people.apache.org/~acmurthy/hadoop-2.1.1-beta-rc0
> >>> The RC tag in svn is here:
> >>>
> >>
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.1-beta-rc0
> >>>
> >>> The maven artifacts are available via repository.apache.org.
> >>>
> >>> Please try the release and vote; the vote will run for the usual 7
> days.
> >>>
> >>> thanks,
> >>> Arun
> >>>
> >>>
> >>> --
> >>> Arun C. Murthy
> >>> Hortonworks Inc.
> >>> http://hortonworks.com/
> >>>
> >>>
> >>>
> >>> --
> >>> CONFIDENTIALITY NOTICE
> >>> NOTICE: This message is intended for the use of the individual or
> entity
> >> to
> >>> which it is addressed and may contain information that is confidential,
> >>> privileged and exempt from disclosure under applicable law. If the
> reader
> >>> of this message is not the intended recipient, you are hereby notified
> >> that
> >>> any printing, copying, dissemination, distribution, disclosure or
> >>> forwarding of this communication is strictly prohibited. If you have
> >>> received this communication in error, please contact the sender
> >> immediately
> >>> and delete it from your system. Thank You.
> >>>
> >>
> >>
> >>
> >> --
> >> http://hortonworks.com/download/
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or
> entity to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader
> >> of this message is not the intended recipient, you are hereby notified
> that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> immediately
> >> and delete it from your system. Thank You.
> >>
> >
> >
> >
> > --
> > Alejandro
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> fo

Re: 2.1.2 (Was: Re: [VOTE] Release Apache Hadoop 2.1.1-beta)

2013-10-01 Thread Andrew Wang

HADOOP-9984 is going to break interface compatibility for out-of-tree
FileSystems. It'd also be good to let downstream components do some testing
before GA.

Thanks,
Andrew


On Tue, Oct 1, 2013 at 5:18 PM, Jagane Sundar  wrote:

> +1
> Makes good sense.
>
> Jagane
>
> -Original Message-
> From: Arun C Murthy [mailto:a...@hortonworks.com]
> Sent: Tuesday, October 01, 2013 4:15 PM
> To: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> yarn-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org
> Subject: 2.1.2 (Was: Re: [VOTE] Release Apache Hadoop 2.1.1-beta)
>
> Guys,
>
>  I took a look at the content in 2.1.2-beta so far, other than the
> critical fixes such as HADOOP-9984 (symlinks) and few others in YARN/MR,
> there is fairly little content (unit tests fixes etc.)
>
>  Furthermore, it's standing up well in testing too. Plus, the protocols
> look good for now (I wrote a gohadoop to try convince myself), let's lock
> them in.
>
>  Given that, I'm thinking we can just go ahead rename it 2.2.0 rather than
> make another 2.1.x release.
>
>  This will drop a short-lived release (2.1.2) and help us move forward on
> 2.3 which has a fair bunch of content already...
>
>  Thoughts?
>
> thanks,
> Arun
>
>
> On Sep 24, 2013, at 4:24 PM, Zhijie Shen  wrote:
>
> > I've added MAPREDUCE-5531 to the blocker list. - Zhijie
> >
> >
> > On Tue, Sep 24, 2013 at 3:41 PM, Arun C Murthy 
> wrote:
> >
> >> With 4 +1s (3 binding) and no -1s the vote passes. I'll push it out...
> >> I'll make it clear on the release page, that there are some known
> >> issues and that we will follow up very shortly with another release.
> >>
> >> Meanwhile, let's fix the remaining blockers (please mark them as such
> >> with Target Version 2.1.2-beta).
> >> The current blockers are here:
> >> http://s.apache.org/hadoop-2.1.2-beta-blockers
> >>
> >> thanks,
> >> Arun
> >>
> >> On Sep 16, 2013, at 11:38 PM, Arun C Murthy 
> wrote:
> >>
> >>> Folks,
> >>>
> >>> I've created a release candidate (rc0) for hadoop-2.1.1-beta that I
> >> would like to get released - this release fixes a number of bugs on
> >> top of hadoop-2.1.0-beta as a result of significant amounts of testing.
> >>>
> >>> If things go well, this might be the last of the *beta* releases of
> >> hadoop-2.x.
> >>>
> >>> The RC is available at:
> >> http://people.apache.org/~acmurthy/hadoop-2.1.1-beta-rc0
> >>> The RC tag in svn is here:
> >> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.1-beta
> >> -rc0
> >>>
> >>> The maven artifacts are available via repository.apache.org.
> >>>
> >>> Please try the release and vote; the vote will run for the usual 7
> days.
> >>>
> >>> thanks,
> >>> Arun
> >>>
> >>>
> >>> --
> >>> Arun C. Murthy
> >>> Hortonworks Inc.
> >>> http://hortonworks.com/
> >>>
> >>>
> >>
> >> --
> >> Arun C. Murthy
> >> Hortonworks Inc.
> >> http://hortonworks.com/
> >>
> >>
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or
> >> entity to which it is addressed and may contain information that is
> >> confidential, privileged and exempt from disclosure under applicable
> >> law. If the reader of this message is not the intended recipient, you
> >> are hereby notified that any printing, copying, dissemination,
> >> distribution, disclosure or forwarding of this communication is
> >> strictly prohibited. If you have received this communication in
> >> error, please contact the sender immediately and delete it from your
> system. Thank You.
> >>
> >
> >
> >
> > --
> > Zhijie Shen
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or
> > entity to which it is addressed and may contain information that is
> > confidential, privileged and exempt from disclosure under applicable
> > law. If the reader of this message is not the intended recipient, you
> > are hereby notified that any printing, copying, dissemination,
> > distribution, disclosure or forwarding of this communication is
> > strictly prohibited. If you have received this communication in error,
> > please contact the sender immediately and delete it from your system.
> Thank You.
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: 2.1.2 (Was: Re: [VOTE] Release Apache Hadoop 2.1.1-beta)

2013-10-02 Thread Andrew Wang

If we're serious about not breaking compatibility after GA, then we need to
slow down and make sure we get these new APIs right, or can add them in a
compatible fashion.

HADOOP-9984 ended up being a bigger change than initially expected, and we
need to break compatibility with out-of-tree FileSystems to do it properly.
I would like to see HADOOP-9972 in as well (globLinkStatus), and there are
open questions on HADOOP-9984 about changing PathFilter and
FileStatus.getPath() semantics (which would be incompatible). Yes, we could
just +1 HADOOP-9984 and stamp 2.2.0 on it, but I think it looks bad to then
immediately turn around and release an incompatible 2.3.

My preference is still for a 2.1.2 with the above API questions resolved,
then an actual API-stable 2.2.0 GA. This is already punting out all the
other related FS/tooling changes that we think can be done compatibly but
are still pretty crucial: shell, distcp, webhdfs, hftp; it'd be great to
get help on any of these.

Thanks,
Andrew

On Wed, Oct 2, 2013 at 2:56 PM, Roman Shaposhnik  wrote:

> On Tue, Oct 1, 2013 at 5:15 PM, Vinod Kumar Vavilapalli
>  wrote:
> > +1. We should get an RC as soon as possible so that we can get all the
> downstream components to sign off.
> > The earlier the better.
>
> On this very note -- would there be any interest in joining efforts
> with the Bigtop integration aimed at Hadoop 2.2.x based release
> of all the Hadoop ecosystem projects?
>
> Our current plan is to release Bigtop 0.7.0 within a couple of weeks.
> That will be the last stable 2.0.x-based release. Bigtop 0.8.0 is supposed
> to
> be based on Hadoop 2.x that gets us (Bigtop community) as close as possible
> to the Hadoop's GA. Here's more on what we'll be doing with Bigtop 0.8.0:
>
> http://comments.gmane.org/gmane.comp.apache.incubator.bigtop.devel/10769
>
> Of course, on the Bigtop side of things we're stuck with all the necessary
> integration work anyway, but if there's anything at all that folks are
> willing
> to help us and the bigger Hadoop community with that would be very
> much appreciated. I think both communities will benefit from this type
> of collaboration.
>
> On a practical side of things, as soon as the branch for 2.2.0 gets cut
> Bigtop can start publishing a complete set of Hadoop ecosystem
> artifacts built against that particular version and easily install-able
> on all of our supported systems. We can also start publishing VMs
> so that folks on OSes other than Linux can help us with testing.
>
> Thanks,
> Roman.
>

Re: symlink support in Hadoop 2 GA

2013-10-04 Thread Andrew Wang

Colin posted a summary of our phone call yesterday (attendees: myself,
Colin, Daryn, Nathan, Jason, Chris, Suresh, Sanjay) on HADOOP-9984:

https://issues.apache.org/jira/browse/HADOOP-9984?focusedCommentId=13785701&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13785701

Pasted here:


   - We discussed alternatives to
HADOOP-9984,
   but concluded that they weren't workable.
   - We agreed that doing the symlink resolution in each Filesystem
   subclass is what we ought to do in 9984, in order to keep compatibility
   with out-of-tree filesystems.
   - We agreed to disable symlink resolution in Hadoop 2 GA. We will spend
   a few weeks ironing out all the bugs and enable it in Hadoop 2.3. However,
   we would like to make all backwards-incompatible API changes prior to
   Hadoop 2 GA.
   - We agreed that
HADOOP-9972 (new
   symlink-aware API for globStatus) should get into Hadoop 2 GA.
   - We discussed the issue of returning resolved paths versus unresolved
   paths, but were unable to come to any conclusion. Everyone agreed that
   there would be serious performance problems if we returned unresolved
   paths, but some claimed that programs would break when encountering
   resolved paths.


There's also a new umbrella issue at HADOOP-10019 tracking on-going
symlinks changes.

Best,
Andrew


On Thu, Oct 3, 2013 at 2:08 PM, Daryn Sharp  wrote:

> I reluctantly agree that we should disable symlinks in 2.2 until we can
> sort out the compatibility issues.  I'm reluctant in the sense that its a
> feature users have long wanted, and it's something we'd like to use from an
> administrative view.  However I don't see all the issues being shorted out
> in the very near future.
>
> I filed some jiras today that have led me to believe that the current
> implementation of fs symlinks is irreparably flawed.  Adding optional
> primitives to filesystems to make them symlink capable is ok.  However,
> adding symlink resolution to individual filesystems is fundamentally
> broken.  It doesn't work for stacked filesystems (viewfs, chroots, filters,
> etc) because the resolution must occur at the highest level, not within an
> individual filesystem itself.  Otherwise the abstraction of the top-level
> filesystem is violated and all kinds of unexpected behavior like walking
> out of chroots becomes possible.
>
> Daryn
>
> On Oct 3, 2013, at 1:39 PM, sanjay Radia wrote:
>
> > There are a number of issues (some minor, some more than minor).
> > GA is close and we are are still in discussion on the some of them;
> while I believe we will close on these very very shortly, code change like
> this so close to GA is dangerous.
> >
> > I suggest we do the following:
> > 1) Disable Symlinks  in 2.2 GA- throw unsupported exception on
> createSymlink in both FileSystem and FileContext.
> > 2) Deal with the  isDir() in 2.2GA in preparation for item 3 coming
> after GA:
> >   a) Deprecate isDir()
> >b) Add a new API that returns an enum (see FileContext).
> > 3) Fix Symlinks, in a future release, hopefully the very next one after
> 2.2GA
> >   a)  change the stack to use the new API replacing isDir().
> >   b) fix isDIr() to do something smarter (we can detail this later but
> there is a solution that has been discussed). This helps customer
> applications that call isDir().
> >  c) Remove isDir in a future release when customers have had sufficient
> time to migrate.
> >
> > sanjay
> >
> > PS. J Rottinghuis expressed a similar sentiment in a previous email in
> this thread:
> >
> >
> >
> > On Sep 18, 2013, at 5:11 PM, J. Rottinghuis wrote:
> >
> >> I like symlink functionality, but in our migration to Hadoop 2.x this
> is a
> >> total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
> >> a) Not uprev until symlink support is figured out up and down the stack,
> >> and we've been able to migrate all our 1.x (equivalent) clusters to 2.x
> >> (equivalent). Or
> >> b) rip out the API altogether. Or
> >> c) change the implementation to throw an UnsupportedOperationException
> >> I'm not sure yet which of these I like least.
> >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>
>

Re: About Block related classes in hdfs package

2013-10-16 Thread Andrew Wang

Hey Yoonmin,

Unfortunately I agree it's a bit complex, especially because "Block" is
sometimes used where "Replica" might be more accurate. If you find any
ambiguities like this, I think we'd happily take patches with clarifying
comments / javadoc.

The best way to learn is to read the code, but maybe this will help a bit:

- The NameNode uses the BlocksMap to store the block -> datanode locations
mapping. This is done by the BlockInfo class, which actually holds the
locations of the block's replicas in the triplets array. The map is
appropriately managed by the BlockManager.
- BlockInfo is also a GSet.Element, which is used to get the set of
BlockInfo on a particular datanode. This is primarily useful when
processing block reports.
- LocatedBlock and LocatedBlocks are used in
ClientProtocol#getBlockLocations, which clients use to query the block ->
datanode mapping. It makes sense to have separate client and server Block
representations here, though they aren't the purest.
- INodes are pretty separate from Blocks. BlockInfo has a pointer back to
the containing BlockCollection, which can be some type of INode, but that's
about all the BlockManager worries about.

Best,
Andrew


On Tue, Oct 15, 2013 at 11:18 PM, Yoonmin Nam  wrote:

> When we see the source code of hdfs especially FSNamesystem, there is so
> many block related types are used such as Block, LocatedBLocks,
> BlocksWithLocations. And this makes me very unclear about the system.
>
> In addition, BlocksMap just maps Block and BlockInfo, but Block becomes
> LocatedBlock with DatanodeInfo. With several locateBlock, these become
> LocatedBlocks.
>
> Also, Combining INode related classes with Block related classes makes me
> unhappy.
>
> Is there anyone who let me know about the motto of this kind of complex
> structure of HDFS block management and give more specific and detail
> information?
>
> Thanks!
>
>
>
>

[VOTE] Merge HDFS-4949 to trunk

2013-10-17 Thread Andrew Wang

Hello all,

I'd like to call a vote to merge the HDFS-4949 branch (in-memory caching)
to trunk. Colin McCabe and I have been hard at work the last 3.5 months
implementing this feature, and feel that it's reached a level of stability
and utility where it's ready for broader testing and integration.

I'd also like to thank Chris Nauroth at Hortonworks for code reviews and
bug fixes, and everyone who's reviewed the HDFS-4949 design doc and left
comments.

Obviously, I am +1 for the merge. The vote will run the standard 7 days,
closing on October 24 at 11:59PM.

Thanks,
Andrew

Re: Managing docs with hadoop-1 & hadoop-2

2013-10-23 Thread Andrew Wang

Hey folks,

I've been seeing some reports about search results for Hadoop being broken
because stable now points to the v2 docs, where a lot of stuff has moved
around.

e.g.

http://hadoop.apache.org/docs/stable/fair_scheduler.html (404, first result
on google for "hadoop fair scheduler")
http://hadoop.apache.org/docs/stable1/fair_scheduler.html (works)
http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
(location
of new YARN page)

Any chance we can reconsider this? It's kind of a pain for anyone who's
linked to Hadoop docs. I agree that semantically, v2 should have the stable
pointer, but Cool URIs Don't Change [1].

Thanks,
Andrew

[1] http://www.w3.org/Provider/Style/URI.html


On Tue, Oct 22, 2013 at 8:22 AM, Eli Collins  wrote:

> On Mon, Oct 21, 2013 at 9:50 PM, Arun C Murthy 
> wrote:
> >
> > On Oct 18, 2013, at 2:17 PM, Eli Collins  wrote:
> >
> > On Fri, Oct 18, 2013 at 2:10 PM, Arun C Murthy 
> wrote:
> >
> > Folks,
> >
> > Currently http://hadoop.apache.org/docs/stable/ points to hadoop-1. With
> > hadoop-2 going GA, should we just point that to hadoop-2?
> >
> > Couple of options:
> > # Have stable1/stable2 links:
> >   http://hadoop.apache.org/docs/stable1 -> hadoop-1.x
> >   http://hadoop.apache.org/docs/stable2 -> hadoop-2.x
> >
> >
> > +1,   would also make:
> > current -> stable2(since v2 is the latest)
> > stable -> stable1 (for compatibility)
> >
> >
> > Let's point stable -> stable2 & current to current2 (for e.g. 2.3 in
> > future).
> >
> > This way we all look ahead. Makes sense?
> >
>
> Sure, I don't feel strongly.
>

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-23 Thread Andrew Wang

e from each other.
> >> > - HDFS-5385: Caching RPCs are AtMostOnce, but do not persist client ID
> >> and
> >> > call ID to edit log.
> >> > - HDFS-5386: Add feature documentation for datanode caching.
> >> > - Standard clean-ups to satisfy Jenkins pre-commit on the merge patch.
> >> >  (For example, I know we've introduced some Javadoc warnings.)
> >> > - Full test suite run on Windows.  (The feature is not yet implemented
> >> on
> >> > Windows.  This is just intended to catch regressions.)
> >> > - Test plan posted to HDFS-4949, similar in scope to the snapshot test
> >> plan
> >> > that was posted to HDFS-2802.  For my own part, I've run the new unit
> >> > tests, and I've tested end-to-end in a pseudo-distributed deployment.
> >>  It's
> >> > unlikely that I'll get a chance to test fully distributed before the
> >> vote
> >> > closes, so I'm curious to hear if you've done this on your side yet.
> >> >
> >> > Also, I want to confirm that this vote only covers trunk.  I don't see
> >> > branch-2 mentioned, so I assume that we're not voting on merge to
> >> branch-2
> >> > yet.
> >> >
> >> > Before I cast my vote, can you please discuss whether or not it's
> >> feasible
> >> > to complete all of the above in the next 7 days?  For the issues
> >> assigned
> >> > to me, I do expect to complete them.
> >> >
> >> > Thanks again for all of your hard work!
> >> >
> >> > Chris Nauroth
> >> > Hortonworks
> >> > http://hortonworks.com/
> >> >
> >> >
> >> >
> >> > On Thu, Oct 17, 2013 at 3:07 PM, Colin McCabe  >> >wrote:
> >> >
> >> >> +1.  Thanks, guys.
> >> >>
> >> >> best,
> >> >> Colin
> >> >>
> >> >> On Thu, Oct 17, 2013 at 3:01 PM, Andrew Wang <
> andrew.w...@cloudera.com
> >> >
> >> >> wrote:
> >> >> > Hello all,
> >> >> >
> >> >> > I'd like to call a vote to merge the HDFS-4949 branch (in-memory
> >> caching)
> >> >> > to trunk. Colin McCabe and I have been hard at work the last 3.5
> >> months
> >> >> > implementing this feature, and feel that it's reached a level of
> >> >> stability
> >> >> > and utility where it's ready for broader testing and integration.
> >> >> >
> >> >> > I'd also like to thank Chris Nauroth at Hortonworks for code
> reviews
> >> and
> >> >> > bug fixes, and everyone who's reviewed the HDFS-4949 design doc and
> >> left
> >> >> > comments.
> >> >> >
> >> >> > Obviously, I am +1 for the merge. The vote will run the standard 7
> >> days,
> >> >> > closing on October 24 at 11:59PM.
> >> >> >
> >> >> > Thanks,
> >> >> > Andrew
> >> >>
> >> >
> >> > --
> >> > CONFIDENTIALITY NOTICE
> >> > NOTICE: This message is intended for the use of the individual or
> >> entity to
> >> > which it is addressed and may contain information that is
> confidential,
> >> > privileged and exempt from disclosure under applicable law. If the
> >> reader
> >> > of this message is not the intended recipient, you are hereby notified
> >> that
> >> > any printing, copying, dissemination, distribution, disclosure or
> >> > forwarding of this communication is strictly prohibited. If you have
> >> > received this communication in error, please contact the sender
> >> immediately
> >> > and delete it from your system. Thank You.
> >>
> >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: Replacing the JSP web UIs to HTML 5 applications

2013-10-30 Thread Andrew Wang

I'm also not convinced that a Javascript-based approach is the way to go.
We shouldn't switch the default UI until (at a minimum) we have the
command-line tools that Colin requested, and even then I'd still want to
retain support for text-based browsers like elinks unless there are
compelling technical reasons not to.

Haohui, I'm sympathetic since you've already done all this work on a
pure-JS version, but it's also true that the existing JSP pages could be
cleaned up to achieve basically the same visual effect while also still
working in text-only browsers.

Thanks,
Andrew


On Wed, Oct 30, 2013 at 12:34 AM, Luke Lu  wrote:

> I don't think that we have reached a consensus that the new javascript only
> UI is the right direction to go. Most people considered it "interesting". I
> personally think it's inappropriate for core Hadoop UI, as it increases
> attack surface of the UI and taking away existing mitigation options from
> users unnecessarily. See my latest comments on HDFS-5333 for "concrete"
> examples.
>
> __Luke
>
>
> On Tue, Oct 29, 2013 at 11:28 AM, Haohui Mai  wrote:
>
> > I would like to summarize the discussions so far. It seems that we have
> > reached two consensus:
> >
> > 1. The new JavaScript-based UI is the right direction to go.
> > 2. For now we should keep the old JSP pages around for compatibility
> > reasons.
> >
> > There're some debates on the usages of the JMX / JSON APIs, but this is
> > orthogonal to switching the UI, thus I consider it as a technical detail.
> > We can continue the discussions in the public jira.
> >
> > The new UI has already landed in the trunk, based on the consensus it
> seems
> > that we can switch the default UI to the new one shortly. The user can
> > still access the old web UI using the same URLs.
> >
> > The only question remain is that who is going to maintain the old web UI.
> > My answer is that we should leave them as deprecated and focus the effort
> > on the new web UI.
> >
> > Thanks,
> > Haohui
> >
> >
> >
> > On Tue, Oct 29, 2013 at 5:22 AM, Zheng, Kai  wrote:
> >
> > > > having /JMX for monitoring integration and a /JSON end point for the
> UI
> > > IMHO, this makes sense, especially for the long term. JMX interface
> > serves
> > > as management console in admin perspective, WebUI serves as end user
> > > interface. Both might share same functionality codes, but that does not
> > > validate we couple them together.
> > >
> > > Thanks & regards,
> > > Kai
> > >
> > > -Original Message-
> > > From: Alejandro Abdelnur [mailto:t...@cloudera.com]
> > > Sent: Tuesday, October 29, 2013 8:14 AM
> > > To: hdfs-dev@hadoop.apache.org
> > > Subject: Re: Replacing the JSP web UIs to HTML 5 applications
> > >
> > > Isn't using JMX to expose JSON for the web UI misusing JMX?
> > >
> > > I would think a more appropriate approach would be having /JMX for
> > > monitoring integration and a /JSON end point for the UI data.
> > >
> > > Thanks.
> > >
> > >
> > > On Mon, Oct 28, 2013 at 4:58 PM, Haohui Mai 
> > wrote:
> > >
> > > > Alejandro,
> > > >
> > > > If I understand correctly, that is the exact approach that the new
> web
> > > > UI is taking. The new web UI takes the output from JMX and renders
> > > > them as HTML at the client side.
> > > >
> > > > ~Haohui
> > > >
> > > >
> > > > On Mon, Oct 28, 2013 at 4:18 PM, Alejandro Abdelnur <
> t...@cloudera.com
> > > > >wrote:
> > > >
> > > > > Haohui,
> > > > >
> > > > > If you have NN and DNs producing JSON instead HTML, then you can
> > > > > build JS based web UIs. Take for example Oozie, Oozie produces
> JSON,
> > > > > it has a
> > > > built
> > > > > in JS web ui that consumes JSON and Hue has built an external web
> UI
> > > > > that also consumes JSON. In the case of Hue UI, Oozie didn't have
> to
> > > > > change anything to get that UI and improvements on the Hue UI don't
> > > > > require changes in Oozie unless it is to produce additional
> > > information.
> > > > >
> > > > > hope this clarifies.
> > > > >
> > > > > Thx
> > > > >
> > > > >
> > > > > On Mon, Oct 28, 2013 at 4:06 PM, Haohui Mai 
> > > > wrote:
> > > > >
> > > > > > Echo my comments on HDFS-5402:
> > > > > >
> > > > > > bq. If we're going to remove the old web UI, I think the new web
> > > > > > UI has to have the same level of unit testing. We shouldn't go
> > > > > > backwards in terms of unit testing.
> > > > > >
> > > > > > I take a look at TestNamenodeJspHelper / TestDatanodeJspHelper /
> > > > > > TestClusterJspHelper. It seems to me that we can merge these
> tests
> > > > > > with
> > > > > the
> > > > > > unit tests on JMX.
> > > > > >
> > > > > > bq. If we are going to
> > > > > > remove this capability, we need to add some other command-line
> > > > > > tools to get the same functionality. These tools could use REST
> if
> > > > > > we have that, or JMX, but they need to exist before we can
> > > > > > consider removing the old UI.
> > > > > >
> > > > > > This is a good point. Since all information are available

Re: HDFS read/write data throttling

2013-11-11 Thread Andrew Wang

Hey Lohit,

This is an interesting topic, and something I actually worked on in grad
school before coming to Cloudera. It'd help if you could outline some of
your usecases and how per-FileSystem throttling would help. For what I was
doing, it made more sense to throttle on the DN side since you have a
better view over all the I/O happening on the system, and you have
knowledge of different volumes so you can set limits per-disk. This still
isn't 100% reliable though since normally a portion of each disk is used
for MR scratch space, which the DN doesn't have control over. I tried
playing with thread I/O priorities here, but didn't see much improvement.
Maybe the newer cgroups stuff can help out.

I'm sure per-FileSystem throttling will have some benefits (and probably be
easier than some DN-side implementation) but again, it'd help to better
understand the problem you are trying to solve.

Best,
Andrew


On Mon, Nov 11, 2013 at 6:16 PM, Haosong Huang  wrote:

> Hi, lohit. There is a Class named
> ThrottledInputStream<
> http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
> >
>  in hadoop-distcp, you could check it out and find more details.
>
> In addition to this, I am working on this and try to achieve resources
> control(include CPU, Network, Disk IO) in JVM. But my implementation is
> depends on cgroup, which only could run in Linux. I would push my
> library(java-cgroup) to github in the next several months. If you are
> interested at it, give my any advices and help me improve it please. :-)
>
>
> On Tue, Nov 12, 2013 at 3:47 AM, lohit  wrote:
>
> > Hi Adam,
> >
> > Thanks for the reply. The changes I was referring was in FileSystem.java
> > layer which should not affect HDFS Replication/NameNode operations.
> > To give better idea this would affect clients something like this
> >
> > Configuration conf = new Configuration();
> > conf.setInt("read.bandwitdh.mbpersec", 20); // 20MB/s
> > FileSystem fs = FileSystem.get(conf);
> >
> > FSDataInputStream fis = fs.open("/path/to/file.xt");
> > fis.read(); // <-- This would be max of 20MB/s
> >
> >
> >
> >
> > 2013/11/11 Adam Muise 
> >
> > > See https://issues.apache.org/jira/browse/HDFS-3475
> > >
> > > Please note that this has met with many unexpected impacts on workload.
> > Be
> > > careful and be mindful of your Datanode memory and network capacity.
> > >
> > >
> > >
> > >
> > > On Mon, Nov 11, 2013 at 1:59 PM, lohit 
> > wrote:
> > >
> > > > Hello Devs,
> > > >
> > > > Wanted to reach out and see if anyone has thought about ability to
> > > throttle
> > > > data transfer within HDFS. One option we have been thinking is to
> > > throttle
> > > > on a per FileSystem basis, similar to Statistics in FileSystem. This
> > > would
> > > > mean anyone with handle to HDFS/Hftp will be throttled globally
> within
> > > JVM.
> > > > Right value to come up for this would be based on type of hardware we
> > use
> > > > and how many tasks/clients we allow.
> > > >
> > > > On the other hand doing something like this at FileSystem layer would
> > > mean
> > > > many other tasks such as Job jar copy, DistributedCache copy and any
> > > hidden
> > > > data movement would also be throttled. We wanted to know if anyone
> has
> > > had
> > > > such requirement on their clusters in the past and what was the
> > thinking
> > > > around it. Appreciate your inputs/comments
> > > >
> > > > --
> > > > Have a Nice Day!
> > > > Lohit
> > > >
> > >
> > >
> > >
> > > --
> > >* Adam Muise *   Solutions Engineer
> > > --
> > >
> > > Phone:416-417-4037
> > >   Email:  amu...@hortonworks.com
> > >   Website:   http://www.hortonworks.com/
> > >
> > >   * Follow Us: *
> > > <
> > >
> >
> http://facebook.com/hortonworks/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature
> > > >
> > > <
> > >
> >
> http://twitter.com/hortonworks?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature
> > > >
> > > <
> > >
> >
> http://www.linkedin.com/company/hortonworks?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature
> > > >
> > >
> > >  [image: photo]
> > >
> > >   Latest From Our Blog:  How to use R and other non-Java languages in
> > > MapReduce and Hive
> > > <
> > >
> >
> http://hortonworks.com/blog/using-r-and-other-non-java-languages-in-mapreduce-and-hive/?utm_source=WiseStamp&utm_medium=email&utm_term=&utm_content=&utm_campaign=signature
> > > >
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, disse

Re: HDFS read/write data throttling

2013-11-12 Thread Andrew Wang

uch as Google Omega: do you want max cluster
> utilisation vs max determinism of workload.
>
> If someone were to do IOP throttling in the 3.x+ timeline,
>
>1. It needs clear use cases, YARN containers being #1 for me
>2. We'd have to look at all the research done on this in the past to see
>what works, doesn't
>
> Andrew, what citations of relevance do you have?
>
> -steve
>
>
> On 12 November 2013 04:24, lohit  wrote:
>
> > 2013/11/11 Andrew Wang 
> >
> > > Hey Lohit,
> > >
> > > This is an interesting topic, and something I actually worked on in
> grad
> > > school before coming to Cloudera. It'd help if you could outline some
> of
> > > your usecases and how per-FileSystem throttling would help. For what I
> > was
> > > doing, it made more sense to throttle on the DN side since you have a
> > > better view over all the I/O happening on the system, and you have
> > > knowledge of different volumes so you can set limits per-disk. This
> still
> > > isn't 100% reliable though since normally a portion of each disk is
> used
> > > for MR scratch space, which the DN doesn't have control over. I tried
> > > playing with thread I/O priorities here, but didn't see much
> improvement.
> > > Maybe the newer cgroups stuff can help out.
> > >
> >
> > Thanks. Yes, we also thought about having something on DataNode. This
> would
> > also mean one could easily throttle client who access from outside the
> > cluster, for example distcp or hftp copies. Clients need not worry about
> > throttle configs and each cluster can control how much much throughput
> can
> > be achieved. We do want to have something like this.
> >
> > >
> > > I'm sure per-FileSystem throttling will have some benefits (and
> probably
> > be
> > > easier than some DN-side implementation) but again, it'd help to better
> > > understand the problem you are trying to solve.
> > >
> >
> > One idea was flexibility for client to override and have value they can
> > set. For on trusted cluster we could allow clients to go beyond default
> > value for some usecases. Alternatively we also thought about having
> default
> > value and max value where clients could change default, but not go beyond
> > default. Another problem with DN side config is having different values
> for
> > different clients and easily changing those for selective clients.
> >
> > As, Haosong also suggested we could wrap FSDataOutputStream/FSDataInput
> > stream with ThrottleInputStream. But we might have to be careful of any
> > code which uses FileSystem APIs and accidentally throttling itself. (like
> > reducer copy,  distributed cache and such...)
> >
> >
> >
> > > Best,
> > > Andrew
> > >
> > >
> > > On Mon, Nov 11, 2013 at 6:16 PM, Haosong Huang 
> > wrote:
> > >
> > > > Hi, lohit. There is a Class named
> > > > ThrottledInputStream<
> > > >
> > >
> >
> http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/ThrottledInputStream.java
> > > > >
> > > >  in hadoop-distcp, you could check it out and find more details.
> > > >
> > > > In addition to this, I am working on this and try to achieve
> resources
> > > > control(include CPU, Network, Disk IO) in JVM. But my implementation
> is
> > > > depends on cgroup, which only could run in Linux. I would push my
> > > > library(java-cgroup) to github in the next several months. If you are
> > > > interested at it, give my any advices and help me improve it please.
> > :-)
> > > >
> > > >
> > > > On Tue, Nov 12, 2013 at 3:47 AM, lohit 
> > > wrote:
> > > >
> > > > > Hi Adam,
> > > > >
> > > > > Thanks for the reply. The changes I was referring was in
> > > FileSystem.java
> > > > > layer which should not affect HDFS Replication/NameNode operations.
> > > > > To give better idea this would affect clients something like this
> > > > >
> > > > > Configuration conf = new Configuration();
> > > > > conf.setInt("read.bandwitdh.mbpersec", 20); // 20MB/s
> > > > > FileSystem fs = FileSystem.get(conf);
> > > > >
> > > > > FSDataInputStream fis = fs.open("/path/to/file.xt");
&g

Re: HDFS read/write data throttling

2013-11-18 Thread Andrew Wang

Thanks for asking, here's a link:

http://www.umbrant.com/papers/socc12-cake.pdf

I don't think there's a recording of my talk unfortunately.

I'll also copy my comments over to the JIRA, though I'd like to not
distract too much from what Lohit's trying to do.


On Wed, Nov 13, 2013 at 2:54 AM, Steve Loughran wrote:

> this is interesting -I've moved my comments over to the JIRA and it would
> be good for yours to go there too.
>
> is there a URL for your paper?
>
>
> On 13 November 2013 06:27, Andrew Wang  wrote:
>
> > Hey Steve,
> >
> > My research project (Cake, published at SoCC '12) was trying to provide
> > SLAs for mixed workloads of latency-sensitive and throughput-bound
> > applications, e.g. HBase running alongside MR. This was challenging
> because
> > seeks are a real killer. Basically, we had to strongly limit MR I/O to
> keep
> > worst-case seek latency down, and did so by putting schedulers on the RPC
> > queues in HBase and HDFS to restrict queuing in the OS and disk where we
> > lacked preemption.
> >
> > Regarding citations of note, most academics consider throughput-sharing
> to
> > be a solved problem. It's not dissimilar from normal time slicing, you
> try
> > to ensure fairness over some coarse timescale. I think cgroups [1] and
> > ioprio_set [2] essentially provide this.
> >
> > Mixing throughput and latency though is difficult, and my conclusion is
> > that there isn't a really great solution for spinning disks besides
> > physical isolation. As we all know, you can get either IOPS or bandwidth,
> > but not both, and it's not a linear tradeoff between the two. If you're
> > interested in this though, I can dig up some related work from my Cake
> > paper.
> >
> > However, since it seems that we're more concerned with throughput-bound
> > apps, we might be okay just using cgroups and ioprio_set to do
> > time-slicing. I actually hacked up some code a while ago which passed a
> > client-provided priority byte to the DN, which used it to set the I/O
> > priority of the handling DataXceiver accordingly. This isn't the most
> > outlandish idea, since we've put QoS fields in our RPC protocol for
> > instance; this would just be another byte. Short-circuit reads are
> outside
> > this paradigm, but then you can use cgroup controls instead.
> >
> > My casual conversations with Googlers indicate that there isn't any
> special
> > Borg/Omega sauce either, just that they heavily prioritize DFS I/O over
> > non-DFS. Maybe that's another approach: if we can separate block
> management
> > in HDFS, MR tasks could just write their output to a raw HDFS block, thus
> > bringing a lot of I/O back into the fold of "datanode as I/O manager"
> for a
> > machine.
> >
> > Overall, I strongly agree with you that it's important to first define
> what
> > our goals are regarding I/O QoS. The general case is a tarpit, so it'd be
> > good to carve off useful things that can be done now (like Lohit's
> > direction of per-stream/FS throughput throttling with trusted clients)
> and
> > then carefully grow the scope as we find more usecases we can confidently
> > solve.
> >
> > Best,
> > Andrew
> >
> > [1] cgroups blkio controller
> > https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt
> > [2] ioprio_set http://man7.org/linux/man-pages/man2/ioprio_set.2.html
> >
> >
> > On Tue, Nov 12, 2013 at 1:38 AM, Steve Loughran  > >wrote:
> >
> > > I've looked at it a bit within the context of YARN.
> > >
> > > YARN containers are where this would be ideal, as then you'd be able to
> > > request IO capacity as well as CPU and RAM. For that to work, the
> > > throttling would have to be outside the App, as you are trying to limit
> > > code whether or not it wants to be, and because you probably (*) want
> to
> > > give it more bandwidth if the system is otherwise idle. Self-throttling
> > > doesn't pick up spare IO
> > >
> > >
> > >1. you can use cgroups in YARN to throttle local disk IO through the
> > >file:// URLs or the java filesystem APIs -such as for MR temp data
> > >2. you can't c-group throttle HDFS per YARN container, which would
> be
> > >the ideal use case for it. The IO is taking place in the DN, and
> > cgroups
> > >only limits IO in the throttled process group.
> > >3. implementing it in the DN would

Re: HDFS read/write data throttling

2013-11-18 Thread Andrew Wang

https://issues.apache.org/jira/browse/HDFS-5499


On Mon, Nov 18, 2013 at 10:46 AM, Jay Vyas  wrote:

> Where is the jira for this?
>
> Sent from my iPhone
>
> > On Nov 18, 2013, at 1:25 PM, Andrew Wang 
> wrote:
> >
> > Thanks for asking, here's a link:
> >
> > http://www.umbrant.com/papers/socc12-cake.pdf
> >
> > I don't think there's a recording of my talk unfortunately.
> >
> > I'll also copy my comments over to the JIRA, though I'd like to not
> > distract too much from what Lohit's trying to do.
> >
> >
> > On Wed, Nov 13, 2013 at 2:54 AM, Steve Loughran  >wrote:
> >
> >> this is interesting -I've moved my comments over to the JIRA and it
> would
> >> be good for yours to go there too.
> >>
> >> is there a URL for your paper?
> >>
> >>
> >>> On 13 November 2013 06:27, Andrew Wang 
> wrote:
> >>>
> >>> Hey Steve,
> >>>
> >>> My research project (Cake, published at SoCC '12) was trying to provide
> >>> SLAs for mixed workloads of latency-sensitive and throughput-bound
> >>> applications, e.g. HBase running alongside MR. This was challenging
> >> because
> >>> seeks are a real killer. Basically, we had to strongly limit MR I/O to
> >> keep
> >>> worst-case seek latency down, and did so by putting schedulers on the
> RPC
> >>> queues in HBase and HDFS to restrict queuing in the OS and disk where
> we
> >>> lacked preemption.
> >>>
> >>> Regarding citations of note, most academics consider throughput-sharing
> >> to
> >>> be a solved problem. It's not dissimilar from normal time slicing, you
> >> try
> >>> to ensure fairness over some coarse timescale. I think cgroups [1] and
> >>> ioprio_set [2] essentially provide this.
> >>>
> >>> Mixing throughput and latency though is difficult, and my conclusion is
> >>> that there isn't a really great solution for spinning disks besides
> >>> physical isolation. As we all know, you can get either IOPS or
> bandwidth,
> >>> but not both, and it's not a linear tradeoff between the two. If you're
> >>> interested in this though, I can dig up some related work from my Cake
> >>> paper.
> >>>
> >>> However, since it seems that we're more concerned with throughput-bound
> >>> apps, we might be okay just using cgroups and ioprio_set to do
> >>> time-slicing. I actually hacked up some code a while ago which passed a
> >>> client-provided priority byte to the DN, which used it to set the I/O
> >>> priority of the handling DataXceiver accordingly. This isn't the most
> >>> outlandish idea, since we've put QoS fields in our RPC protocol for
> >>> instance; this would just be another byte. Short-circuit reads are
> >> outside
> >>> this paradigm, but then you can use cgroup controls instead.
> >>>
> >>> My casual conversations with Googlers indicate that there isn't any
> >> special
> >>> Borg/Omega sauce either, just that they heavily prioritize DFS I/O over
> >>> non-DFS. Maybe that's another approach: if we can separate block
> >> management
> >>> in HDFS, MR tasks could just write their output to a raw HDFS block,
> thus
> >>> bringing a lot of I/O back into the fold of "datanode as I/O manager"
> >> for a
> >>> machine.
> >>>
> >>> Overall, I strongly agree with you that it's important to first define
> >> what
> >>> our goals are regarding I/O QoS. The general case is a tarpit, so it'd
> be
> >>> good to carve off useful things that can be done now (like Lohit's
> >>> direction of per-stream/FS throughput throttling with trusted clients)
> >> and
> >>> then carefully grow the scope as we find more usecases we can
> confidently
> >>> solve.
> >>>
> >>> Best,
> >>> Andrew
> >>>
> >>> [1] cgroups blkio controller
> >>> https://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt
> >>> [2] ioprio_set http://man7.org/linux/man-pages/man2/ioprio_set.2.html
> >>>
> >>>
> >>> On Tue, Nov 12, 2013 at 1:38 AM, Steve Loughran <
> ste...@hortonworks.com
> >>>> wrote:
> >>>
> >>>> I've lo

Re: when datanode will delete these invalidate blocks?

2013-11-18 Thread Andrew Wang

Try looking in the heartbeat code on the NN and DN, it should clear things
up. The namenode sends these block invalidations to the DN on the DN
heartbeat response. The DN then deletes the blocks and on the next
heartbeat reports to the NN that it invalidated the blocks. The NN then
removes the invalidated blocks from the blockmap.

On Mon, Nov 18, 2013 at 9:19 PM, ch huang  wrote:

> hi,all:
>  i read the replication monitor code ,in   *invalidateWork *call it
> just add the invalidate block into
> the invalidateBlocks list of the DatanodeDescriptor class  ,but i do not
> see any remove operation in replication monitor code ,my question is when
> these invalidate blocks will be removed from the DN which host them?
>

Re: issue about rpc activity metrics

2013-11-20 Thread Andrew Wang

The metrics system generates a number of different entries per in-code
metrics object. For instance, the "SendHeartbeat"  MutableRate will
generate both "NumOps" and "AvgTime". Look in NameNodeMetrics.java for
where these are updated.

Best,
Andrew


On Tue, Nov 19, 2013 at 10:52 PM, ch huang  wrote:

> hi,all:
> i get rpc metrics from NN 50070 port ,and i try search the code to
> see how these metrics is caculated,
> i try to use grep,but get nothing ,why?
> [root@CH124 hadoop-2.0.0-cdh4.3.0]# grep -R 'DeleteNumOps' *
>  {
> "name" : "Hadoop:service=NameNode,name=RpcDetailedActivityForPort8020",
> "modelerType" : "RpcDetailedActivityForPort8020",
> "tag.port" : "8020",
> "tag.Context" : "rpcdetailed",
> "tag.Hostname" : "CHBM220",
> "SendHeartbeatNumOps" : 106434,
> "SendHeartbeatAvgTime" : 0.05366726296958853,
> "VersionRequestNumOps" : 9,
> "VersionRequestAvgTime" : 0.,
> "RegisterDatanodeNumOps" : 9,
> "RegisterDatanodeAvgTime" : 2.2223,
> "BlockReportNumOps" : 24,
> "BlockReportAvgTime" : 3.0,
> "GetServiceStatusNumOps" : 63811,
> "GetServiceStatusAvgTime" : 0.05970149253731349,
> "MonitorHealthNumOps" : 63811,
> "MonitorHealthAvgTime" : 0.0686567164179105,
> "TransitionToStandbyNumOps" : 3,
> "TransitionToStandbyAvgTime" : 27.336,
> "TransitionToActiveNumOps" : 1,
> "TransitionToActiveAvgTime" : 8026.0,
> "RollEditLogNumOps" : 210,
> "RollEditLogAvgTime" : 306.7428571428572,
> "GetListingNumOps" : 516,
> "GetListingAvgTime" : 0.18798449612403115,
> "GetFileInfoNumOps" : 507,
> "GetFileInfoAvgTime" : 0.12228796844181453,
> "CreateNumOps" : 4,
> "CreateAvgTime" : 53.5,
> "CompleteNumOps" : 4,
> "CompleteAvgTime" : 45.0,
> "SetOwnerNumOps" : 4,
> "SetOwnerAvgTime" : 43.0,
> "DeleteNumOps" : 4,
> "DeleteAvgTime" : 44.75
>   }
>

Re: Metrics2 code

2013-11-20 Thread Andrew Wang

Hey LiuLei,

Gauges can go up and down, counters only go up. Snapshot doesn't actually
reset anything, it's just a way for the metrics system to get an updated
value. There aren't any time-based rolling metrics to my knowledge besides
MutableQuantiles.

Best,
Andrew

On Wed, Nov 20, 2013 at 7:34 PM, lei liu  wrote:

> I use cdh-4.3.1 version.  I am reading the code about metrics2.
>
> There are COUNTER and GAUGE metric type in metrics v2. What is the
> difference
> between the two?
>
>
> There is @Metric MutableCounterLong bytesWritten attribute in
> DataNodeMetrics, which is used to  statistics written bytes per second on
> DataNode.So I think the value of MutableCounterLong should be divided
> by 10and be reseted to zero per ten seconds in
> MutableCounterLong.snapshot
> method, is that right? But MutableCounterLong.snapshot method don't do
> that. I miss anything please tell me.
>
> Thanks,
>
> LiuLei
>

Re: [VOTE] Merge HDFS-2832 Heterogeneous Storage Phase 1 to trunk

2013-12-06 Thread Andrew Wang

Hi everyone,

I'm still getting up to speed on the changes here (my fault for not
following development more closely, other priorities etc etc), but the
branch thus far is already quite impressive. It's quite an undertaking to
turn the DN into a collection of Storages, along with the corresponding
datastructure, tracking, and other changes in the NN and DN.

Correct me if I'm wrong though, but this still leaves a substantial part of
the design doc to be implemented. Looking at the list of remaining
subtasks, it seems like we still can't specify a storage type for a file
(HDFS-5229) or write a file to a given storage type (HDFS-5391), along with
the corresponding client protocol changes. This leads me to two questions:

- If this is merged, what can I do with the new code? Without client
changes or the ability to create a file on a different storage type, I
don't know how (for example) I could hand this to our QA team to test. I'm
wondering why we want to merge now rather than when the branch is more
feature complete.
- What's the plan for the implementation of the remaining features? How
many phases? What's the timeline for these phases? Particularly, related to
the use cases presented in section 2 of the design doc.

I'm also going to post some design doc questions to the JIRA, there are a
few technical q's I'd like to get clarification on.

Thanks,
Andrew


On Wed, Dec 4, 2013 at 7:21 AM, Sirianni, Eric wrote:

> +1
>
> My team has been developing and testing against the HDFS-2832 branch for
> the past month.  It has proven to be quite stable.
>
> Eric
>
> -Original Message-
> From: Arpit Agarwal [mailto:aagar...@hortonworks.com]
> Sent: Monday, December 02, 2013 7:07 PM
> To: hdfs-dev@hadoop.apache.org; common-...@hadoop.apache.org
> Subject: [VOTE] Merge HDFS-2832 Heterogeneous Storage Phase 1 to trunk
>
> Hello all,
>
> I would like to call a vote to merge phase 1 of the Heterogeneous Storage
> feature into trunk.
>
> *Scope of the changes:*
> The changes allow exposing the DataNode as a collection of storages and set
> the foundation for subsequent work to present Heterogeneous Storages to
> applications. This allows DataNodes to send block and storage reports
> per-storage. In addition this change introduces the ability to add a
> 'storage type' tag to the storage directories. This enables supporting
> different types of storages in addition to disk storage.
>
> Development of the feature is tracked in the jira
> https://issues.apache.org/jira/browse/HDFS-2832.
>
> *Details of development and testing:*
> Development has been done in a separate branch -
> https://svn.apache.org/repos/asf/hadoop/common/branches/HDFS-2832. The
> updated design is posted at -
>
> https://issues.apache.org/jira/secure/attachment/12615761/20131125-HeterogeneousStorage.pdf
> .
> The changes involve ~6K changed lines of code, with a third of those
> changes being to tests.
>
> Please see the test plan
>
> https://issues.apache.org/jira/secure/attachment/12616642/20131202-HeterogeneousStorage-TestPlan.pdffor
> the details. Once the feature is
> merged into trunk, we will continue to test and fix any bugs that may be
> found on trunk as well as add further tests as outlined in the test plan.
>
> The bulk of the design and implementation was done by Suresh Srinivas,
> Sanjay Radia, Nicholas Sze, Junping Du and me. Also, thanks to Eric
> Sirianni, Chris Nauroth, Steve Loughran, Bikas Saha, Andrew Wang and Todd
> Lipcon for providing feedback on the Jiras and in discussions.
>
> This vote runs for a week and closes on 12/9/2013 at 11:59 pm PT.
>
> Thanks,
> Arpit
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: [VOTE] Merge HDFS-2832 Heterogeneous Storage Phase 1 to trunk

2013-12-09 Thread Andrew Wang

Thanks for clarifying that Arpit. I'm a +0.9 since I haven't reviewed
enough to +1, but everything thus far looks great.

Andrew


On Fri, Dec 6, 2013 at 5:35 PM, Chen He  wrote:

> +1 nice feature for HDFS
>
>
> On Fri, Dec 6, 2013 at 7:32 PM, Arpit Agarwal  >wrote:
>
> > Hi Andrew,
> >
> > Our plan as stated back in August was to do this work principally in two
> > phases.
> >
> https://issues.apache.org/jira/browse/HDFS-2832?focusedCommentId=13739041
> >
> > For the second phase which includes API support, we also need quota
> > management. For changes of this scope, to do all the work at once while
> > keeping the feature branch in sync with ongoing development in trunk is
> > unmanageable. Hence we'd like to stick with the initial plan and develop
> in
> > phases.
> >
> > Even for datanode caching the initial merge did not include the quota
> > management changes which are happening subsequently.
> >
> > Going forward, we will stabilize the current changes in trunk in the 2.4
> > time frame. Next we will add quota management and API support which can
> > align with the 2.5 time frame, with the second merge potentially in
> > March/April.
> >
> > Arpit
> >
> >
> > On Fri, Dec 6, 2013 at 3:15 PM, Andrew Wang  > >wrote:
> >
> > > Hi everyone,
> > >
> > > I'm still getting up to speed on the changes here (my fault for not
> > > following development more closely, other priorities etc etc), but the
> > > branch thus far is already quite impressive. It's quite an undertaking
> to
> > > turn the DN into a collection of Storages, along with the corresponding
> > > datastructure, tracking, and other changes in the NN and DN.
> > >
> > > Correct me if I'm wrong though, but this still leaves a substantial
> part
> > of
> > > the design doc to be implemented. Looking at the list of remaining
> > > subtasks, it seems like we still can't specify a storage type for a
> file
> > > (HDFS-5229) or write a file to a given storage type (HDFS-5391), along
> > with
> > > the corresponding client protocol changes. This leads me to two
> > questions:
> > >
> > > - If this is merged, what can I do with the new code? Without client
> > > changes or the ability to create a file on a different storage type, I
> > > don't know how (for example) I could hand this to our QA team to test.
> > I'm
> > > wondering why we want to merge now rather than when the branch is more
> > > feature complete.
> > > - What's the plan for the implementation of the remaining features? How
> > > many phases? What's the timeline for these phases? Particularly,
> related
> > to
> > > the use cases presented in section 2 of the design doc.
> > >
> > > I'm also going to post some design doc questions to the JIRA, there
> are a
> > > few technical q's I'd like to get clarification on.
> > >
> > > Thanks,
> > > Andrew
> > >
> > >
> > > On Wed, Dec 4, 2013 at 7:21 AM, Sirianni, Eric <
> eric.siria...@netapp.com
> > > >wrote:
> > >
> > > > +1
> > > >
> > > > My team has been developing and testing against the HDFS-2832 branch
> > for
> > > > the past month.  It has proven to be quite stable.
> > > >
> > > > Eric
> > > >
> > > > -Original Message-
> > > > From: Arpit Agarwal [mailto:aagar...@hortonworks.com]
> > > > Sent: Monday, December 02, 2013 7:07 PM
> > > > To: hdfs-dev@hadoop.apache.org; common-...@hadoop.apache.org
> > > > Subject: [VOTE] Merge HDFS-2832 Heterogeneous Storage Phase 1 to
> trunk
> > > >
> > > > Hello all,
> > > >
> > > > I would like to call a vote to merge phase 1 of the Heterogeneous
> > Storage
> > > > feature into trunk.
> > > >
> > > > *Scope of the changes:*
> > > > The changes allow exposing the DataNode as a collection of storages
> and
> > > set
> > > > the foundation for subsequent work to present Heterogeneous Storages
> to
> > > > applications. This allows DataNodes to send block and storage reports
> > > > per-storage. In addition this change introduces the ability to add a
> > > > 'storage type' tag to the storage directories. This enables
> supporting
> > > > different types of storage

Re: persistent under-replicated blocks

2014-01-09 Thread Andrew Wang

Hi Chris,

BCC'ing hdfs-dev@ since you're using CDH, moving us to cdh-user@.

You should be able to manually copy the under-replicated blocks and md5
files to a different datanode and restart it. I'm curious that you're
having this issue though, I haven't encountered it before. Can you send
your NN logs to me, either as an attachment or a file drop? Also, what
version of CDH are you using?

Here are also a few ideas for things you can check:

* There are a number of block replication stats available in the NN /jmx
webui, e.g. PendingReplicationBlocks, UnderReplicatedBlocks,
ScheduledReplicationBlocks. This will let you know if the NN is at least
attempting to replicate your blocks (pending and scheduled).
* Look in the NN log for BlockPlacementPolicy errors. It'll help to enable
DEBUG level output here.

Best,
Andrew


On Thu, Jan 9, 2014 at 10:46 AM, Cooper Bethea wrote:

> I have only 9 under-replicated blocks on the cluster, and it is very
> important that I restore my cluster to a fully-replicated state. Is there a
> way I can manually copy these blocks to other datanodes, or perhaps new
> datanodes?
>
>
> On Thu, Jan 9, 2014 at 10:34 AM, Cooper Bethea  >wrote:
>
> > Chris, Steve, thanks for responding.
> >
> > Overnight I ran a script to bump replication, then lower it, as Chris
> > suggested. There has been no effect--all underreplicated blocks still
> have
> > only 1 replica.
> >
> > Steve, I am running the rebalancer.
> >
> >
> > On Thu, Jan 9, 2014 at 1:33 AM, Steve Loughran  >wrote:
> >
> >> are you  running the rebalancer?
> >>
> >>
> >> On 9 January 2014 04:40, Chris Embree  wrote:
> >>
> >> > It's too bad that this hasn't been corrected in HDFS 2.0  I have a
> >> > script that I run several times a day to ensure that blocks are
> >> replicated
> >> > correctly.  Here a link to an article about it:
> >> > http://dataforprofit.com/?p=427
> >> >
> >> >
> >> > On Wed, Jan 8, 2014 at 9:00 PM, Cooper Bethea 
> >> > wrote:
> >> >
> >> > > Following on--is there a way that I can forcibly replicate these
> >> blocks,
> >> > > perhaps by rsyncing the underlying files to other datanodes? As you
> >> might
> >> > > imagine under-replicated data makes me very uneasy.
> >> > >
> >> > >
> >> > > On Wed, Jan 8, 2014 at 12:00 PM, Cooper Bethea <
> co...@siftscience.com
> >> > > >wrote:
> >> > >
> >> > > > Hi HDFS developers,
> >> > > >
> >> > > > I have a worrying problem in a 2.0.0-cdh4.4.0 HDFS cluster I am
> >> > running.
> >> > > 9
> >> > > > blocks in the cluster are persistently reported to be
> >> under-replicated
> >> > > per
> >> > > > "hdfs fsck".
> >> > > >
> >> > > > I am able to fetch the files that contain these blocks, so I know
> >> that
> >> > > the
> >> > > > data is there, but for some reason replication is not taking
> >> effect. In
> >> > > > hopes of getting the cluster to notice that there were
> >> under-replicated
> >> > > > blocks I tried using "hdfs dfs -setrep" to raise the replication
> >> > factor,
> >> > > > but the cluster continues to report a single replica for each of
> >> these
> >> > > > blocks. When viewing master logs I see that the replication factor
> >> > change
> >> > > > is respected, but there are no messages that refer to the
> >> > > under-replicated
> >> > > > blocks.
> >> > > >
> >> > > > Thanks for your time. Please let me know what I can do to
> >> investigate
> >> > > > further.
> >> > > >
> >> > >
> >> >
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or entity
> >> to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader
> >> of this message is not the intended recipient, you are hereby notified
> >> that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> >> immediately
> >> and delete it from your system. Thank You.
> >>
> >
> >
>

Re: Re-swizzle 2.3

2014-01-29 Thread Andrew Wang

I just finished tuning up branch-2.3 and fixing up the HDFS and Common
CHANGES.txt in trunk, branch-2, and branch-2.3. I had to merge back a few
JIRAs committed between the swizzle and now where the fix version was 2.3
but weren't in branch-2.3.

I think the only two HDFS and Common JIRAs that are marked for 2.4 are
these:

HDFS-5842 Cannot create hftp filesystem when using a proxy user ugi and a
doAs on a secure cluster
HDFS-5781 Use an array to record the mapping between FSEditLogOpCode and
the corresponding byte value

Jing, these both look safe to me if you want to merge them back, or I can
just do it.

Thanks,
Andrew

On Wed, Jan 29, 2014 at 1:21 PM, Doug Cutting  wrote:
>
> On Wed, Jan 29, 2014 at 12:30 PM, Jason Lowe  wrote:
> >  It is a bit concerning that the JIRA history showed that the target
version
> > was set at some point in the past but no record of it being cleared.
>
> Perhaps the version itself was renamed?
>
> Doug

Re: Re-swizzle 2.3

2014-01-31 Thread Andrew Wang

Thanks for the link Arun, I went ahead and punted one HADOOP blocker, and
the remaining two HADOOP/HDFS looks like they're under active review.

Post-swizzle, it seems like most blockers for 2.4 would also apply to 2.3,
so I looked at that list too:

https://issues.apache.org/jira/issues/?filter=12326375&jql=project%20in%20(HADOOP%2C%20YARN%2C%20HDFS%2C%20MAPREDUCE)%20AND%20priority%20%3D%20Blocker%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20%22Target%20Version%2Fs%22%20%3D%20%222.4.0%22

YARN-1673 IIUC relates to the AHS, so is actually only in branch-2 and not
branch-2.3.

HADOOP-10048, Jason's comment says he's okay with it not being a blocker.

HDFS-5796 hasn't seen much action. Kihwal or Haohui, could you comment on
the importance/status? I don't have much context in this area.

Best,
Andrew

On Fri, Jan 31, 2014 at 4:29 PM, Arun C Murthy  wrote:

> Thanks Vinod, appreciate it!
>
> I think we are very close.
>
> Here is a handy ref. to the list of blockers:
> http://s.apache.org/hadoop-2.3.0-blockers
>
> I'd appreciate if folks can help expedite these fixes, and, equally
> importantly bring up others they feel should be blockers for 2.3.0.
>
> thanks,
> Arun
>
> On Jan 30, 2014, at 12:42 PM, Vinod Kumar Vavilapalli 
> wrote:
>
> > That was quite some exercise, but I'm done with it now. Updated YARN's
> and MAPREDUCE's CHANGES.txt on trunk, branch-2 and branch-2.3. Let me know
> if you find some inaccuracies.
> >
> > Thanks,
> > +Vinod
> >
> > On Jan 29, 2014, at 10:49 PM, Vinod Kumar Vavilapalli <
> vino...@apache.org> wrote:
> >
> >>
> >> Okay, I'll look at YARN and MR CHANGES.txt problems. Seems like they
> aren't addressed yet.
> >>
> >> +Vinod
> >>
> >>
> >> On Jan 29, 2014, at 3:24 PM, Andrew Wang 
> wrote:
> >>
> >>> I just finished tuning up branch-2.3 and fixing up the HDFS and Common
> >>> CHANGES.txt in trunk, branch-2, and branch-2.3. I had to merge back a
> few
> >>> JIRAs committed between the swizzle and now where the fix version was
> 2.3
> >>> but weren't in branch-2.3.
> >>>
> >>> I think the only two HDFS and Common JIRAs that are marked for 2.4 are
> >>> these:
> >>>
> >>> HDFS-5842 Cannot create hftp filesystem when using a proxy user ugi
> and a
> >>> doAs on a secure cluster
> >>> HDFS-5781 Use an array to record the mapping between FSEditLogOpCode
> and
> >>> the corresponding byte value
> >>>
> >>> Jing, these both look safe to me if you want to merge them back, or I
> can
> >>> just do it.
> >>>
> >>> Thanks,
> >>> Andrew
> >>>
> >>> On Wed, Jan 29, 2014 at 1:21 PM, Doug Cutting 
> wrote:
> >>>>
> >>>> On Wed, Jan 29, 2014 at 12:30 PM, Jason Lowe 
> wrote:
> >>>>> It is a bit concerning that the JIRA history showed that the target
> >>> version
> >>>>> was set at some point in the past but no record of it being cleared.
> >>>>
> >>>> Perhaps the version itself was renamed?
> >>>>
> >>>> Doug
> >>
> >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> > 
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: DISCUSS: Hadoop Compatability Guidelines

2017-09-07 Thread Andrew Wang

There's also the DataNode data directory layout. FS edit logs should also
be included if we're including the fsimage.

Historically we've bumped these in minor and major releases, though I'm not
sure whether precedent supports the practice. It means you can't downgrade,
and features that need metadata changes are often also destabilizing. DN
layout version upgrades are also very time intensive, since it needs to
hardlink all the blocks.

I don't think we can change this policy in the next week, but it's
something to consider post-beta1. Now that we have xattrs, there's less
need for metadata layout changes. If we revive the feature flags effort,
then there's even less need.

Cheers,
Andrew

On Thu, Sep 7, 2017 at 11:13 AM, Daniel Templeton 
wrote:

> Good point.  I think it would be valuable to enumerate the policies around
> the versioned state stores.  We have the three you listed. We should
> probably include the HDFS fsimage in that list.  Any others?
>
> I also want to add a section that clarifies when it's OK to change the
> visibility or audience of an API.
>
> Daniel
>
>
> On 9/5/17 11:04 AM, Arun Suresh wrote:
>
>> Thanks for starting this Daniel.
>>
>> I think we should also add a section for store compatibility (all state
>> stores including RM, NM, Federation etc.). Essentially an explicit policy
>> detailing when is it ok to change the major and minor versions and how it
>> should relate to the hadoop release version.
>> Thoughts ?
>>
>> Cheers
>> -Arun
>>
>>
>> On Tue, Sep 5, 2017 at 10:38 AM, Daniel Templeton 
>> wrote:
>>
>> Good idea.  I should have thought of that. :)  Done.
>>>
>>> Daniel
>>>
>>>
>>> On 9/5/17 10:33 AM, Anu Engineer wrote:
>>>
>>> Could you please attach the PDFs to the JIRA. I think the mailer is
 stripping them off from the mail.

 Thanks
 Anu





 On 9/5/17, 9:44 AM, "Daniel Templeton"  wrote:

 Resending with a broader audience, and reattaching the PDFs.

> Daniel
>
> On 9/4/17 9:01 AM, Daniel Templeton wrote:
>
> All, in prep for Hadoop 3 beta 1 I've been working on updating the
>> compatibility guidelines on HADOOP-13714.  I think the initial doc is
>> more or less complete, so I'd like to open the discussion up to the
>> broader Hadoop community.
>>
>> In the new guidelines, I have drawn some lines in the sand regarding
>> compatibility between releases.  In some cases these lines are more
>> restrictive than the current practices.  The intent with the new
>> guidelines is not to limit progress by restricting what goes into a
>> release, but rather to drive release numbering to keep in line with
>> the reality of the code.
>>
>> Please have a read and provide feedback on the JIRA.  I'm sure there
>> are more than a couple of areas that could be improved.  If you'd
>> rather not read markdown from a diff patch, I've attached PDFs of the
>> two modified docs.
>>
>> Thanks!
>> Daniel
>>
>>
> -
>>> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
>>> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>>>
>>>
>>>
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>

Re: [VOTE] Merge yarn-native-services branch into trunk

2017-09-07 Thread Andrew Wang

Hi folks,

This vote closes today. I see a -1 from Allen on inclusion in beta1. I see
there's active fixing going on, but given that we're one week out from RC0,
I think we should drop this from beta1.

Allen, Jian, others, is this reasonable? What release should we retarget
this for? I don't have a sense for how much work there is left to do, but
as a reminder, we're planning GA for Nov 1st, and 3.1.0 for January.

Best,
Andrew

On Wed, Sep 6, 2017 at 10:19 AM, Jian He  wrote:

> >   Please correct me if I’m wrong, but the current summary of the
> branch, post these changes, looks like:
> Sorry for confusion, I was actively writing the formal documentation for
> how to use/how it works etc. and will post soon in a few hours.
>
>
> > On Sep 6, 2017, at 10:15 AM, Allen Wittenauer 
> wrote:
> >
> >
> >> On Sep 5, 2017, at 6:23 PM, Jian He  wrote:
> >>
> >>> If it doesn’t have all the bells and whistles, then it shouldn’t
> be on port 53 by default.
> >> Sure, I’ll change the default port to not use 53 and document it.
> >>> *how* is it getting launched on a privileged port? It sounds like
> the expectation is to run “command” as root.   *ALL* of the previous
> daemons in Hadoop that needed a privileged port used jsvc.  Why isn’t this
> one? These questions matter from a security standpoint.
> >> Yes, it is running as “root” to be able to use the privileged port. The
> DNS server is not yet integrated with the hadoop script.
> >>
> >>> Check the output.  It’s pretty obviously borked:
> >> Thanks for pointing out. Missed this when rebasing onto trunk.
> >
> >
> >   Please correct me if I’m wrong, but the current summary of the
> branch, post these changes, looks like:
> >
> >   * A bunch of mostly new Java code that may or may not have
> javadocs (post-revert YARN-6877, still working out HADOOP-14835)
> >   * ~1/3 of the docs are roadmap/TBD
> >   * ~1/3 of the docs are for an optional DNS daemon that has
> no end user hook to start it
> >   * ~1/3 of the docs are for a REST API that comes from some
> undefined daemon (apiserver?)
> >   * Two new, but undocumented, subcommands to yarn
> >   * There are no docs for admins or users on how to actually
> start or use this completely new/separate/optional feature
> >
> >   How are outside people (e.g., non-branch committers) supposed to
> test this new feature under these conditions?
> >
>
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>

2017-09-07 Hadoop 3 release status update

2017-09-07 Thread Andrew Wang

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-09-07

Slightly early update since I'll be out tomorrow. We're one week out, and
focus is on blocker burndown.

Highlights:

   - 3.1.0 release planning is underway, led by Wangda. Target release date
   is in January.

Red flags:

   - YARN native services merge vote got a -1 for beta1, I recommended we
   drop it from beta1 and retarget for a later release.
   - 11 blockers on the dashboard, one more than last week [image: (sad)]

Previously tracked beta1 blockers that have been resolved or dropped:

   - HADOOP-14826 was duped to HADOOP-14738.
   - YARN-5536  (Multiple
   format support (JSON, etc.) for exclude node file in NM graceful
   decommission with timeout): Downgraded in priority in favor of YARN-7162
   which Robert has posted a patch for.
   - MAPREDUCE-6941 (The default setting doesn't work for MapReduce job): I
   resolved this and Junping confirmed this is fine.


beta1 blockers:

   - HADOOP-14738  (Remove
   S3N and obsolete bits of S3A; rework docs): Steve has been actively revving
   this with our new committer Aaron Fabbri ready to review. The scope has
   expanded from HADOOP-14826, so it's not just a doc update.
   - HADOOP-14284  (Shade
   Guava everywhere): No change since last week. This is an umbrella JIRA.
   - HADOOP-14771 
(hadoop-client
   does not include hadoop-yarn-client): Patch up, needs review, still waiting
   on Busbey. Bharat gave it a review.
   - YARN-7162  (Remove
   XML excludes file format): Robert has posted a patch and is waiting for a
   review.
   - HADOOP-14238 
(Rechecking
   Guava's object is not exposed to user-facing API): Bharat took this up and
   turned it into an umbrella.
  - HADOOP-14847
 (Remove
  Guava Supplier and change to java Supplier in AMRMClient and
  AMRMClientAysnc) Bharat posted a patch on a subtask to fix the
known Guava
  Supplier issue in AMRMClient. Needs a review.
   - HADOOP-14835  (mvn
   site build throws SAX errors): I'm working on this. Debugged it and have a
   proposed patch up, discussing with Allen.
   - HDFS-12218  (Rename
   split EC / replicated block metrics in BlockManager): I'm working on this,
   just need to commit it, already have a +1 from Eddy.


beta1 features:

   - Erasure coding
  - There are three must-dos, all being actively worked on.
  - HDFS-7859 is being actively reviewed and revved by Sammi and Kai
  and Eddy.
  - HDFS-12395 was split out of HDFS-7859 to do the edit log changes.
  - HDFS-12218 is discussed above.
   - Addressing incompatible changes (YARN-6142 and HDFS-11096)
   - Ray and Allen reviewed Sean's HDFS rolling upgrade scripts.
  - Sean did a run through of the HDFS JACC report and it looked fine.
   - Classpath isolation (HADOOP-11656)
  - Sean has retriaged the subtasks and has been posting patches.
   - Compat guide (HADOOP-13714
   )
  - Daniel has been collecting feedback on dev lists, but still needs a
  detailed review of the patch.
   - YARN native services
  - Jian sent out the merge vote, but it's been -1'd for beta1 by
  Allen. I propose we drop this from beta1 scope and retarget.
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]

GA features:

   - Resource profiles (Wangda Tan)
  - Merge vote was sent out. Since branch-3.0 has been cut, this can be
  merged to trunk (3.1.0) and then backported once we've completed testing.
   - HDFS router-based federation (Chris Douglas)
   - This is like YARN federation, very separate and doesn't add new APIs,
  run in production at MSFT.
  - If it passes Cloudera internal integration testing, I'm fine
  putting this in for GA.
   - API-based scheduler configuration (Jonathan Hung)
  - Jonathan mentioned that his main goal is to get this in for 2.9.0,
  which seems likely to go out after 3.0.0 GA since there hasn't been any
  serious release planning yet. Jonathan said that delaying this
until 3.1.0
  is fine.

Re: [VOTE] Merge yarn-native-services branch into trunk

2017-09-11 Thread Andrew Wang

Thanks for your consideration Jian, let's track this for GA then.

Best,
Andrew

On Fri, Sep 8, 2017 at 3:02 PM, Jian He  wrote:

> Hi Andrew,
>
> At this point, there are no more release blockers including documentations
> from our side - all work done.
> But I agree it is too close to the release, after talking with other team
> members, we are fine to drop  this from beta,
>
> And we want to target this for GA.
> I’m withdrawing this vote and will start afresh vote later for GA.
> Thanks all who voted this effort !
>
> Thanks,
> Jian
>
>
> > On Sep 7, 2017, at 3:59 PM, Andrew Wang 
> wrote:
> >
> > Hi folks,
> >
> > This vote closes today. I see a -1 from Allen on inclusion in beta1. I
> see
> > there's active fixing going on, but given that we're one week out from
> RC0,
> > I think we should drop this from beta1.
> >
> > Allen, Jian, others, is this reasonable? What release should we retarget
> > this for? I don't have a sense for how much work there is left to do, but
> > as a reminder, we're planning GA for Nov 1st, and 3.1.0 for January.
> >
> > Best,
> > Andrew
> >
> > On Wed, Sep 6, 2017 at 10:19 AM, Jian He  wrote:
> >
> >>>  Please correct me if I’m wrong, but the current summary of the
> >> branch, post these changes, looks like:
> >> Sorry for confusion, I was actively writing the formal documentation for
> >> how to use/how it works etc. and will post soon in a few hours.
> >>
> >>
> >>> On Sep 6, 2017, at 10:15 AM, Allen Wittenauer <
> a...@effectivemachines.com>
> >> wrote:
> >>>
> >>>
> >>>> On Sep 5, 2017, at 6:23 PM, Jian He  wrote:
> >>>>
> >>>>>If it doesn’t have all the bells and whistles, then it shouldn’t
> >> be on port 53 by default.
> >>>> Sure, I’ll change the default port to not use 53 and document it.
> >>>>>*how* is it getting launched on a privileged port? It sounds like
> >> the expectation is to run “command” as root.   *ALL* of the previous
> >> daemons in Hadoop that needed a privileged port used jsvc.  Why isn’t
> this
> >> one? These questions matter from a security standpoint.
> >>>> Yes, it is running as “root” to be able to use the privileged port.
> The
> >> DNS server is not yet integrated with the hadoop script.
> >>>>
> >>>>> Check the output.  It’s pretty obviously borked:
> >>>> Thanks for pointing out. Missed this when rebasing onto trunk.
> >>>
> >>>
> >>>  Please correct me if I’m wrong, but the current summary of the
> >> branch, post these changes, looks like:
> >>>
> >>>  * A bunch of mostly new Java code that may or may not have
> >> javadocs (post-revert YARN-6877, still working out HADOOP-14835)
> >>>  * ~1/3 of the docs are roadmap/TBD
> >>>  * ~1/3 of the docs are for an optional DNS daemon that has
> >> no end user hook to start it
> >>>  * ~1/3 of the docs are for a REST API that comes from some
> >> undefined daemon (apiserver?)
> >>>  * Two new, but undocumented, subcommands to yarn
> >>>  * There are no docs for admins or users on how to actually
> >> start or use this completely new/separate/optional feature
> >>>
> >>>  How are outside people (e.g., non-branch committers) supposed to
> >> test this new feature under these conditions?
> >>>
> >>
> >>
> >> -
> >> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
> >>
> >>
>
>

2017-09-19 Hadoop 3 release status update

2017-09-19 Thread Andrew Wang

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-09-19

Sorry for the late update. We're down to one blocker and one EC must do!
Made great progress over the last week and a bit.

We will likely cut RC0 this week.

Highlights:

   - Down to just two blocker issues!

Red flags:

   - HDFS unit tests are quite flaky. Some blockers were filed and then
   resolved or downgraded. More work to do here.

Previously tracked beta1 blockers that have been resolved or dropped:

   - HADOOP-14738  (Remove
   S3N and obsolete bits of S3A; rework docs): Committed!
   - HADOOP-14284  (Shade
   Guava everywhere): We resolved this since we decided it was unnecessary for
   beta1.
   - YARN-7162  (Remove
   XML excludes file format): Robert committed after review from Junping.
   - HADOOP-14847  (Remove
   Guava Supplier and change to java Supplier in AMRMClient and
   AMRMClientAysnc): Committed!
   - HADOOP-14238 
(Rechecking
   Guava's object is not exposed to user-facing API): We dropped this off the
   blocker list in the absence of other known issues
   - HADOOP-14835  (mvn
   site build throws SAX errors): I committed after further discussion and
   review with Sean Mackrory and Allen. Planning to switch to japicmp for
   later releases.
   - HDFS-12218  (Rename
   split EC / replicated block metrics in BlockManager): Committed.


beta1 blockers:

   - HADOOP-14771 
(hadoop-client
   does not include hadoop-yarn-client): This was committed but then reverted
   since it broke the build. Haibo and Sean are actively pressing towards a
   correct fix.


beta1 features:

   - Erasure coding
  - Resolved a number of must-dos
 - HDFS-7859 (fsimage changes) was committed!
 - HDFS-12395 (edit log changes) was also committed!
 - HDFS-12218 is discussed above.
  - Remaining blockers:
 - HDFS-12447 is to refactor some of the fsimage code, Andrew needs
 to review
  - Also been progress cleaning up the flaky unit tests, still more to
  do
   - Addressing incompatible changes (YARN-6142 and HDFS-11096)
   - Ray has gone through almost all the YARN protos and thinks we're okay
  to move forwards.
  - I think we'll move forward without this committed, given that Sean
  has run it successfully.
   - Classpath isolation (HADOOP-11656)
  - We have just HADOOP-14771 left.
   - Compat guide (HADOOP-13714
   )
  - This was committed! Some follow-on work filed for GA.
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]

GA features:

   - Resource profiles (Wangda Tan)
  - Merge vote was sent out. Since branch-3.0 has been cut, this can be
  merged to trunk (3.1.0) and then backported once we've completed testing.
   - HDFS router-based federation (Chris Douglas)
   - This is like YARN federation, very separate and doesn't add new APIs,
  run in production at MSFT.
  - If it passes Cloudera internal integration testing, I'm fine
  putting this in for GA.
   - API-based scheduler configuration (Jonathan Hung)
  - Jonathan mentioned that his main goal is to get this in for 2.9.0,
  which seems likely to go out after 3.0.0 GA since there hasn't been any
  serious release planning yet. Jonathan said that delaying this
until 3.1.0
  is fine.
   - YARN native services
  - Still not 100% clear when this will land.

Re: [DISCUSS] moving to Apache Yetus Audience Annotations

2017-09-22 Thread Andrew Wang

Is this itself an incompatible change? I imagine the bytecode will be
different.

I think we're too late to do this for beta1 given that I want to cut an RC0
today.

On Fri, Sep 22, 2017 at 7:03 AM, Sean Busbey  wrote:

> When Apache Yetus formed, it started with several key pieces of Hadoop that
> looked reusable. In addition to our contribution testing infra, the project
> also stood up a version of our audience annotations for delineating the
> public facing API[1].
>
> I recently got the Apache HBase community onto the Yetus version of those
> annotations rather than their internal fork of the Hadoop ones[2]. It
> wasn't pretty, mostly a lot of blind sed followed by spot checking and
> reliance on automated tests.
>
> What do folks think about making the jump ourselves? I'd be happy to work
> through things, either as one unreviewable monster or per-module
> transitions (though a piece-meal approach might complicate our javadoc
> situation).
>
>
> [1]: http://yetus.apache.org/documentation/0.5.0/interface-classification/
> [2]: https://issues.apache.org/jira/browse/HBASE-17823
>
> --
> busbey
>

Re: [DISCUSS] moving to Apache Yetus Audience Annotations

2017-09-22 Thread Andrew Wang

Yea, unfortunately I'd say backburner it. This would have been perfect
during alpha.

On Fri, Sep 22, 2017 at 11:14 AM, Sean Busbey  wrote:

> I'd refer to it as an incompatible change; we expressly label the
> annotations as IA.Public.
>
> If you think it's too late to get in for 3.0, I can make a jira and put it
> on the back burner for when trunk goes to 4.0?
>
> On Fri, Sep 22, 2017 at 12:49 PM, Andrew Wang 
> wrote:
>
>> Is this itself an incompatible change? I imagine the bytecode will be
>> different.
>>
>> I think we're too late to do this for beta1 given that I want to cut an
>> RC0 today.
>>
>> On Fri, Sep 22, 2017 at 7:03 AM, Sean Busbey  wrote:
>>
>>> When Apache Yetus formed, it started with several key pieces of Hadoop
>>> that
>>> looked reusable. In addition to our contribution testing infra, the
>>> project
>>> also stood up a version of our audience annotations for delineating the
>>> public facing API[1].
>>>
>>> I recently got the Apache HBase community onto the Yetus version of those
>>> annotations rather than their internal fork of the Hadoop ones[2]. It
>>> wasn't pretty, mostly a lot of blind sed followed by spot checking and
>>> reliance on automated tests.
>>>
>>> What do folks think about making the jump ourselves? I'd be happy to work
>>> through things, either as one unreviewable monster or per-module
>>> transitions (though a piece-meal approach might complicate our javadoc
>>> situation).
>>>
>>>
>>> [1]: http://yetus.apache.org/documentation/0.5.0/interface-classi
>>> fication/
>>> [2]: https://issues.apache.org/jira/browse/HBASE-17823
>>>
>>> --
>>> busbey
>>>
>>
>>
>
>
> --
> busbey
>

2017-09-22 Hadoop 3 release status update

2017-09-22 Thread Andrew Wang

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-09-22

We've had some late breaking blockers related to Docker support that are
delaying the release. We're on a day-by-day slip at this point.



Highlights:

   - I did a successful test create-release earlier this week.

Red flags:

   - Docker work resulted in some last minute blockers

Previously tracked beta1 blockers that have been resolved or dropped:

   - HADOOP-14771 
(hadoop-client
   does not include hadoop-yarn-client): Dropped this from the blocker list as
   it's mainly for documentation purposes
   - HDFS-12247 (Rename AddECPolicyResponse to
   AddErasureCodingPolicyResponse) was committed.

beta1 blockers:

   - YARN-6623  (Add
   support to turn off launching privileged containers in the
   container-executor): This is a newly escalated blocker related to the
   Docker work in YARN. Patch is up but we're still waiting on a commit.
   - HADOOP-14897  (Loosen
   compatibility guidelines for native dependencies): Raised by Chris Douglas,
   Daniel will post a patch soon.

beta1 features:

   - Erasure coding
  - Resolved last must-do for beta1!
  - People are looking more at the flaky tests and nice-to-haves
  - Eddy continues to make improvements to block reconstruction
  codepaths
   - Addressing incompatible changes (YARN-6142 and HDFS-11096)
   - Ray has gone through almost all the YARN protos and thinks we're okay
  to move forwards.
  - I think we'll move forward without this committed, given that Sean
  has run it successfully.
   - Classpath isolation (HADOOP-11656)
  - HADOOP-13917
 (Ensure
  nightly builds run the integration tests for the shaded client):
Sean wants
  to get this in before beta1 if there's time, it's already
catching issues.
  Relies on YETUS-543 which I reviewed, waiting on Allen.
  - HADOOP-14771 might be squeezed in if there's time.
   - Compat guide (HADOOP-13714
   )
  - HADOOP-14897 Above mentioned blocker filed by Chris Douglas.
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]

GA features:

   - Resource profiles (Wangda Tan)
  - Merge vote was sent out. Since branch-3.0 has been cut, this can be
  merged to trunk (3.1.0) and then backported once we've completed testing.
   - HDFS router-based federation (Chris Douglas)
   - This is like YARN federation, very separate and doesn't add new APIs,
  run in production at MSFT.
  - If it passes Cloudera internal integration testing, I'm fine
  putting this in for GA.
   - API-based scheduler configuration (Jonathan Hung)
  - Jonathan mentioned that his main goal is to get this in for 2.9.0,
  which seems likely to go out after 3.0.0 GA since there hasn't been any
  serious release planning yet. Jonathan said that delaying this
until 3.1.0
  is fine.
   - YARN native services
  - Still not 100% clear when this will land.

Heads up: branching branch-3.0.0-beta1 off of branch-3.0

2017-09-28 Thread Andrew Wang

Hi folks,

We've driven the blocker count down to 0, and I went through and made sure
the fix versions and release notes and so on are all lined up.

I'm going to cut branch-3.0.0-beta1 off branch-3.0 and try and get RC0 out
today.

Cheers,
Andrew

Re: Heads up: branching branch-3.0.0-beta1 off of branch-3.0

2017-09-28 Thread Andrew Wang

Branch has been cut, branch-3.0 is now open for commits for 3.0.0 GA.

HEAD of branch-3.0.0-beta1 is 2223393ad1d5ffdd62da79e1546de79c6259dc12.

On Thu, Sep 28, 2017 at 10:52 AM, Andrew Wang 
wrote:

> Hi folks,
>
> We've driven the blocker count down to 0, and I went through and made sure
> the fix versions and release notes and so on are all lined up.
>
> I'm going to cut branch-3.0.0-beta1 off branch-3.0 and try and get RC0 out
> today.
>
> Cheers,
> Andrew
>

[VOTE] Release Apache Hadoop 3.0.0-beta1 RC0

2017-09-28 Thread Andrew Wang

Hi all,

Let me start, as always, by thanking the many, many contributors who helped
with this release! I've prepared an RC0 for 3.0.0-beta1:

http://home.apache.org/~wang/3.0.0-beta1-RC0/

This vote will run five days, ending on Nov 3rd at 5PM Pacific.

beta1 contains 576 fixed JIRA issues comprising a number of bug fixes,
improvements, and feature enhancements. Notable additions include the
addition of YARN Timeline Service v2 alpha2, S3Guard, completion of the
shaded client, and HDFS erasure coding pluggable policy support.

I've done the traditional testing of running a Pi job on a pseudo cluster.
My +1 to start.

We're working internally on getting this run through our integration test
rig. I'm hoping Vijay or Ray can ring in with a +1 once that's complete.

Best,
Andrew

Re: [DISCUSS] Merging API-based scheduler configuration to trunk/branch-2

2017-09-29 Thread Andrew Wang

Hi Jonathan,

I'm okay with putting this into branch-3.0 for GA if it can be merged
within the next two weeks. Even though beta1 has slipped by a month, I want
to stick to the targeted GA data of Nov 1st as much as possible. Of course,
let's not sacrifice quality or stability for speed; if something's not
ready, let's defer it to 3.1.0.

Subru, have you been able to review this feature from the 2.9.0
perspective? It'd add confidence if you think it's immediately ready for
merging to branch-2 for 2.9.0.

Thanks,
Andrew

On Thu, Sep 28, 2017 at 11:32 AM, Jonathan Hung 
wrote:

> Hi everyone,
>
> Starting this thread to discuss merging API-based scheduler configuration
> to trunk/branch-2. The feature adds the framework for allowing users to
> modify scheduler configuration via REST or CLI using a configurable backend
> (leveldb/zk are currently supported), and adds capacity scheduler support
> for this. The umbrella JIRA is YARN-5734. All the required work for this
> feature is done and committed to branch YARN-5734, and a full diff has been
> generated at YARN-7241.
>
> Regarding compatibility, this feature is configurable and turned off by
> default.
>
> The feature has been tested locally on a couple RMs (since it is an RM
> only change), with queue addition/removal/updates tested on single RM
> (leveldb) and two RMs (zk). Also we verified the original configuration
> update mechanism (via refreshQueues) is unaffected when the feature is
> off/not configured.
>
> Our original plan was to merge this to trunk (which is what the YARN-7241
> diff is based on), and port to branch-2 before the 2.9 release. @Andrew,
> what are your thoughts on also merging this to branch-3.0?
>
> Thanks!
>
> Jonathan Hung
>

2017-09-20 Hadoop 3 release status update

2017-09-29 Thread Andrew Wang

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-09-29

After about a month of slip, RC0 has been sent out for a VOTE. Focus now
turns to GA, where we will attempt to keep the original beta1 target date
(early November).

Highlights:

   - RC0 vote was sent out on Thursday, two binding +1's so far.

Red flags:

   - Resource profiles still has a number of pending subtasks, which is
   concerning from a schedule perspective. I emailed Wangda about this, and we
   need to discuss with other key contributors.
   - Native services has one pending subtask but we haven't gotten
   follow-on reviews from Allen (who -1'd the earlier merge vote). Need to
   confirm that we've satisfied his feedback.

Previously tracked beta1 blockers that have been resolved or dropped:

   - YARN-6623 was pushed out of beta1 to GA, has been committed so we can
   drop it from tracking.
   - HADOOP-14897  (Loosen
   compatibility guidelines for native dependencies): Patch committed!

beta1 blockers:

   - None, RC0 is out

GA blockers:

   - YARN-7134
    -
AppSchedulingInfo
   has a dependency on capacity scheduler OPEN  : this one popped out of
   nowhere, I don't have an update yet.
   - YARN-7178
    - Add
   documentation for Container Update API OPEN : this also popped out of
   nowhere, no update yet.
   - YARN-7275
    - NM
   Statestore cleanup for Container updates OPEN : Ditto
   - YARN-4859
    - [Bug]
   Unable to submit a job to a reservation when using FairScheduler OPEN :
   Ditto
   - YARN-4827
    - Document
   configuration of ReservationSystem for FairScheduler OPEN : Ditto

Features merged for GA:

   - Erasure coding
  - People are looking more at the flaky tests and nice-to-haves
  - Some bugs reported and being fixed based on testing at Cloudera
  - Need to finish the 3.0 must-do's.
   - Addressing incompatible changes (YARN-6142 and HDFS-11096)
   - Sean has posted a new rev of the rolling upgrade script
  - Some YARN PB backward compat issues that we decided weren't
  blockers and are scheduled for GA
   - Classpath isolation (HADOOP-11656)
  - HADOOP-13917
 (Ensure
  nightly builds run the integration tests for the shaded client):
Resolved,
  Sean retriggered and determined that this works.
  - HADOOP-14771 is still floating, along with adding documentation.
   - Compat guide (HADOOP-13714
   )
  - A few subtasks are targeted at GA
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]

Unmerged features:

   - Resource profiles (YARN-3926
    and YARN-7069
   ) (Wangda Tan)
  - This has been merged for 3.1.0, YARN-7069 tracks follow on work
  - ~7 patch available subtasks, I asked Wangda to set up a JIRA query
  for tracking this
   - HDFS router-based federation (HDFS-10467
   ) (Inigo Goiri and
   Chris Douglas)
   - Inigo sent out the merge vote
   - API-based scheduler configuration (Jonathan Hung)
  - Jonathan sent out a discuss thread for merge, thinking is early
  next week. Larry did a security-oriented review.
   - YARN native services (YARN-5079
   ) (Jian He)
  - Subtasks were filed to address Allen's review comments from the
  previous merge vote, only one pending
  - We need to confirm with Allen that this is ready to go, he hasn't
  been reviewing

Re: [VOTE] Release Apache Hadoop 3.0.0-beta1 RC0

2017-10-03 Thread Andrew Wang

Thanks for all the votes thus far! We've gotten the binding +1's to close
the release, though are there contributors who could kick the tires on
S3Guard and YARN TSv2 alpha2? These are the two new features merged since
alpha4, so it'd be good to get some coverage.



On Tue, Oct 3, 2017 at 9:45 AM, Brahma Reddy Battula 
wrote:

>
> Thanks Andrew.
>
> +1 (non binding)
>
> --Built from source
> --installed 3 node HA cluster
> --Verified shell commands and UI
> --Ran wordcount/pic jobs
>
>
>
>
> On Fri, 29 Sep 2017 at 5:34 AM, Andrew Wang 
> wrote:
>
>> Hi all,
>>
>> Let me start, as always, by thanking the many, many contributors who
>> helped
>> with this release! I've prepared an RC0 for 3.0.0-beta1:
>>
>> http://home.apache.org/~wang/3.0.0-beta1-RC0/
>>
>> This vote will run five days, ending on Nov 3rd at 5PM Pacific.
>>
>> beta1 contains 576 fixed JIRA issues comprising a number of bug fixes,
>> improvements, and feature enhancements. Notable additions include the
>> addition of YARN Timeline Service v2 alpha2, S3Guard, completion of the
>> shaded client, and HDFS erasure coding pluggable policy support.
>>
>> I've done the traditional testing of running a Pi job on a pseudo cluster.
>> My +1 to start.
>>
>> We're working internally on getting this run through our integration test
>> rig. I'm hoping Vijay or Ray can ring in with a +1 once that's complete.
>>
>> Best,
>> Andrew
>>
> --
>
>
>
> --Brahma Reddy Battula
>

Re: [VOTE] Release Apache Hadoop 3.0.0-beta1 RC0

2017-10-03 Thread Andrew Wang

Thanks everyone for voting! With 4 binding +1s and 7 non-binding +1s, the
vote passes.

I'll get started on pushing out the release.

Best,
Andrew

On Tue, Oct 3, 2017 at 3:45 PM, Aaron Fabbri  wrote:

> +1
>
> Built from source.  Ran S3A integration tests in us-west-2 with S3Guard
> (both Local and Dynamo metadatastore).
>
> Everything worked fine except I hit one integration test failure.  It is a
> minor test issue IMO and I've filed HADOOP-14927
>
> Failed tests:
>   ITestS3GuardToolDynamoDB>AbstractS3GuardToolTestBase.testDestroyNoBucket:228
> Expected an exception, got 0
>   ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testDestroyNoBucket:228
> Expected an exception, got 0
>
>
>
> On Tue, Oct 3, 2017 at 2:45 PM, Ajay Kumar 
> wrote:
>
>> +1 (non-binding)
>>
>> - built from source
>> - deployed on single node cluster
>> - Basic hdfs operations
>> - Run wordcount on a text file
>> Thanks,
>> Ajay
>>
>>
>> On 10/3/17, 1:04 PM, "Eric Badger"  wrote:
>>
>> +1 (non-binding)
>>
>> - Verified all checksums and signatures
>> - Built native from source on macOS 10.12.6 and RHEL 7.1
>> - Deployed a single node pseudo cluster
>> - Ran pi and sleep jobs
>> - Verified Docker was marked as experimental
>>
>> Thanks,
>>
>> Eric
>>
>> On Tue, Oct 3, 2017 at 1:41 PM, John Zhuge 
>> wrote:
>>
>> > +1 (binding)
>> >
>> >- Verified checksums and signatures of all tarballs
>> >- Built source with native, Java 1.8.0_131-b11 on Mac OS X
>> 10.12.6
>> >- Verified cloud connectors:
>> >   - All S3A integration tests
>> >   - All ADL live unit tests
>> >- Deployed both binary and built source to a pseudo cluster,
>> passed the
>> >following sanity tests in insecure, SSL, and SSL+Kerberos mode:
>> >   - HDFS basic and ACL
>> >   - DistCp basic
>> >   - MapReduce wordcount (only failed in SSL+Kerberos mode for
>> binary
>> >   tarball, probably unrelated)
>> >   - KMS and HttpFS basic
>> >   - Balancer start/stop
>> >
>> > Hit the following errors but they don't seem to be blocking:
>> >
>> > == Missing dependencies during build ==
>> >
>> > > ERROR: hadoop-aliyun has missing dependencies: json-lib-jdk15.jar
>> > > ERROR: hadoop-azure has missing dependencies:
>> jetty-util-ajax-9.3.19.
>> > > v20170502.jar
>> > > ERROR: hadoop-azure-datalake has missing dependencies:
>> okhttp-2.4.0.jar
>> > > ERROR: hadoop-azure-datalake has missing dependencies:
>> okio-1.4.0.jar
>> >
>> >
>> > Filed HADOOP-14923, HADOOP-14924, and HADOOP-14925.
>> >
>> > == Unit tests failed in Kerberos+SSL mode for KMS and HttpFs
>> default HTTP
>> > servlet /conf, /stacks, and /logLevel ==
>> >
>> > One example below:
>> >
>> > >Connecting to
>> > > https://localhost:14000/logLevel?log=org.apache.hadoop.fs.
>> http.server.
>> > HttpFSServer
>> > >Exception in thread "main"
>> > > org.apache.hadoop.security.authentication.client.
>> > AuthenticationException:
>> > > Authentication failed, URL:
>> > > https://localhost:14000/logLevel?log=org.apache.hadoop.fs.
>> http.server.
>> > HttpFSServer&user.name=jzhuge,
>> > > status: 403, message: GSSException: Failure unspecified at
>> GSS-API level
>> > > (Mechanism level: Request is a replay (34))
>> >
>> >
>> > The /logLevel failure will affect command "hadoop daemonlog".
>> >
>> >
>> > On Tue, Oct 3, 2017 at 10:56 AM, Andrew Wang <
>> andrew.w...@cloudera.com>
>>     > wrote:
>> >
>> > > Thanks for all the votes thus far! We've gotten the binding +1's
>> to close
>> > > the release, though are there contributors who could kick the
>> tires on
>> > > S3Guard and YARN TSv2 alpha2? These are the two new features
>> merged since
>> > > alpha4, so it'd be good to get some coverage.
>> > >
>> > >
>> > >
>> > >

Re: [VOTE] Release Apache Hadoop 3.0.0-beta1 RC0

2017-10-04 Thread Andrew Wang

Thanks for the additional review Rohith, much appreciated!

On Wed, Oct 4, 2017 at 12:14 AM, Rohith Sharma K S <
rohithsharm...@apache.org> wrote:

> +1 (binding)
>
> Built from source and deployed YARN HA cluster with ATSv2 enabled in
> non-secured cluster.
> - tested for RM HA/work-preservring-restart/ NM-work-preserving restart
> for ATSv2 entities.
> - verified all ATSv2 REST end points to retrieve the entities
> - ran sample MR jobs and distributed jobs
>
> Thanks & Regards
> Rohith Sharma K S
>
> On 4 October 2017 at 05:31, Andrew Wang  wrote:
>
>> Thanks everyone for voting! With 4 binding +1s and 7 non-binding +1s, the
>> vote passes.
>>
>> I'll get started on pushing out the release.
>>
>> Best,
>> Andrew
>>
>> On Tue, Oct 3, 2017 at 3:45 PM, Aaron Fabbri  wrote:
>>
>> > +1
>> >
>> > Built from source.  Ran S3A integration tests in us-west-2 with S3Guard
>> > (both Local and Dynamo metadatastore).
>> >
>> > Everything worked fine except I hit one integration test failure.  It
>> is a
>> > minor test issue IMO and I've filed HADOOP-14927
>> >
>> > Failed tests:
>> >   ITestS3GuardToolDynamoDB>AbstractS3GuardToolTestBase.testDe
>> stroyNoBucket:228
>> > Expected an exception, got 0
>> >   ITestS3GuardToolLocal>AbstractS3GuardToolTestBase.testDestr
>> oyNoBucket:228
>> > Expected an exception, got 0
>> >
>> >
>> >
>> > On Tue, Oct 3, 2017 at 2:45 PM, Ajay Kumar 
>> > wrote:
>> >
>> >> +1 (non-binding)
>> >>
>> >> - built from source
>> >> - deployed on single node cluster
>> >> - Basic hdfs operations
>> >> - Run wordcount on a text file
>> >> Thanks,
>> >> Ajay
>> >>
>> >>
>> >> On 10/3/17, 1:04 PM, "Eric Badger"  wrote:
>> >>
>> >> +1 (non-binding)
>> >>
>> >> - Verified all checksums and signatures
>> >> - Built native from source on macOS 10.12.6 and RHEL 7.1
>> >> - Deployed a single node pseudo cluster
>> >> - Ran pi and sleep jobs
>> >> - Verified Docker was marked as experimental
>> >>
>> >> Thanks,
>> >>
>> >> Eric
>> >>
>> >> On Tue, Oct 3, 2017 at 1:41 PM, John Zhuge 
>> >> wrote:
>> >>
>> >> > +1 (binding)
>> >> >
>> >> >- Verified checksums and signatures of all tarballs
>> >> >- Built source with native, Java 1.8.0_131-b11 on Mac OS X
>> >> 10.12.6
>> >> >- Verified cloud connectors:
>> >> >   - All S3A integration tests
>> >> >   - All ADL live unit tests
>> >> >- Deployed both binary and built source to a pseudo cluster,
>> >> passed the
>> >> >following sanity tests in insecure, SSL, and SSL+Kerberos
>> mode:
>> >> >   - HDFS basic and ACL
>> >> >   - DistCp basic
>> >> >   - MapReduce wordcount (only failed in SSL+Kerberos mode for
>> >> binary
>> >> >   tarball, probably unrelated)
>> >> >   - KMS and HttpFS basic
>> >> >   - Balancer start/stop
>> >> >
>> >> > Hit the following errors but they don't seem to be blocking:
>> >> >
>> >> > == Missing dependencies during build ==
>> >> >
>> >> > > ERROR: hadoop-aliyun has missing dependencies:
>> json-lib-jdk15.jar
>> >> > > ERROR: hadoop-azure has missing dependencies:
>> >> jetty-util-ajax-9.3.19.
>> >> > > v20170502.jar
>> >> > > ERROR: hadoop-azure-datalake has missing dependencies:
>> >> okhttp-2.4.0.jar
>> >> > > ERROR: hadoop-azure-datalake has missing dependencies:
>> >> okio-1.4.0.jar
>> >> >
>> >> >
>> >> > Filed HADOOP-14923, HADOOP-14924, and HADOOP-14925.
>> >> >
>> >> > == Unit tests failed in Kerberos+SSL mode for KMS and HttpFs
>> >> default HTTP
>> >> > servlet /conf, /stacks, and /logLevel ==
>> >> >
>> >> > One example below:
>> >> >
>> &

2017-10-06 Hadoop 3 release status update

2017-10-06 Thread Andrew Wang

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-10-06

The beta1 RC0 vote passed, and beta1 is out! Now tracking GA features.

Highlights:

   - 3.0.0-beta1 has been released!
   - Router-based federation merge vote should be about to pass
   - API-based scheduler configuration merge vote is out, has the votes so
   far

Red flags:

   - Still need to nail down whether we're going to try and merge resource
   profiles. I've been emailing with Wangda and Daniel about this, we need to
   reach a decision ASAP (might already be too late).
   - Still waiting on Allen to review YARN native services feature.

Previously tracked GA blockers that have been resolved or dropped:

   - YARN-7134
    -
AppSchedulingInfo
   has a dependency on capacity schedulerOPEN:  Wangda downgraded this to
   "Major", dropping from list.

GA blockers:

   - YARN-6623
    - Add
   support to turn off launching privileged containers in the
   container-executor PATCH AVAILABLE: Actively being reviewed
   - Change of ExecutionType
  - YARN-7275
   - NM
  Statestore cleanup for Container updatesPATCH AVAILABLE: Kartheek has
  posted a patch, waiting for review
  - YARN-7178
   - Add
  documentation for Container Update API OPEN : No update from Arun,
  though it's just a docs patch
   - ReservationSystem
  - YARN-4859
   - [Bug]
  Unable to submit a job to a reservation when using FairScheduler OPEN:
  Yufei has picked this up
  - YARN-4827
  
- Document
  configuration of ReservationSystem for FairScheduler OPEN: Yufei has
  picked this up, just a docs patch
   - Rolling upgrade
  - YARN-6142
   - Support
  rolling upgrade between 2.x and 3.x OPEN : Ray is still going through
  JACC and proto output
  - HDFS-11096
  
- Support
  rolling upgrade between 2.x and 3.xPATCH AVAILABLE: Sean has revved
  the patch and is waiting on reviews from Ray, Allen

Features merged for GA:

   - Erasure coding
  - Continued bug reporting and fixing based on testing at Cloudera.
  - Still need to finish the 3.0 must-do's
   - Classpath isolation (HADOOP-11656)
   - HADOOP-14771 is still floating, along with adding documentation.
   - Compat guide (HADOOP-13714
   )
  - Synced with Daniel, he plans to wrap up the remaining  stuff next
  week
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]

Unmerged features:

   - Resource types / profiles (YARN-3926
    and YARN-7069
   ) (Wangda Tan)
  - This has been merged for 3.1.0, YARN-7069 tracks follow on work
  - Wangda said that he's okay waiting for 3.1.0 for this, we're
  waiting on Daniel. I synced with Daniel earlier this week, and
he wants to
  try and get some of it into 3.0.0. Waiting on an update.
  - I still need a JIRA query for tracking the state of this.
   - HDFS router-based federation (HDFS-10467
   ) (Inigo Goiri and
   Chris Douglas)
   - Merge vote should close any minute now
   - API-based scheduler configuration (Jonathan Hung)
  - Merge vote is out, will close next week
   - YARN native services (YARN-5079
   ) (Jian He)
  - Subtasks were filed to address Allen's review comments from the
  previous merge vote, only one pending
  - We need to confirm with Allen that this is ready to go, he hasn't
  been reviewing

Re: 2017-10-06 Hadoop 3 release status update

2017-10-06 Thread Andrew Wang

Thanks for the update Allen, appreciate your continued help reviewing this
feature.

Looking at the calendar, we have three weeks from when we want to have GA
RC0 out for vote. We're already dipping into code freeze time landing HDFS
router-based federation and API-based scheduler configuration next week. If
we want to get any more features in, it means slipping the GA date.

So, my current thinking is that we should draw a line after these pending
branches merge. Like before, I'm willing to bend on this if there are
strong arguments, but the quality bar is even higher than it was for beta1,
and we've still got plenty of other blockers/criticals to work on for GA.

If you feel differently, please reach out, I can make myself very available
next week for a call.

Best,
Andrew

On Fri, Oct 6, 2017 at 3:12 PM, Allen Wittenauer 
wrote:

>
> > On Oct 6, 2017, at 1:31 PM, Andrew Wang 
> wrote:
> >
> >   - Still waiting on Allen to review YARN native services feature.
>
> Fake news.
>
> I’m still -1 on it, at least prior to a patch that posted late
> yesterday. I’ll probably have a chance to play with it early next week.
>
>
> Key problems:
>
> * still haven’t been able to bring up dns daemon due to lacking
> documentation
>
> * it really needs better naming and command structures.  When put
> into the larger YARN context, it’s very problematic:
>
> $ yarn —daemon start resourcemanager
>
> vs.
>
> $ yarn —daemon start apiserver
>
> if you awoke from a deep sleep from inside a cave, which
> one would you expect to “start YARN”? Made worse that the feature is
> called “YARN services” all over the place.
>
> $ yarn service foo
>
> … what does this even mean?
>
> It would be great if other outsiders really looked hard at this
> branch to give the team feedback.   Once it gets released, it’s gonna be
> too late to change it….
>
> As a sidenote:
>
> It’d be great if the folks working on YARN spent some time
> consolidating daemons.  With this branch, it now feels like we’re
> approaching the double digit area of daemons to turn on all the features.
> It’s well past ridiculous, especially considering we still haven’t replaced
> the MRJHS’s feature set to the point we can turn it off.
>
>

2017-10-20 Hadoop 3 release status update

2017-10-20 Thread Andrew Wang

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-10-20

Apologies for skipping the update last week. Here's how we're tracking for
GA.

Highlights:

   - Merge of HDFS router-based federation and API-based scheduler
   configuration with no reported problems. Kudos to the contributors involved!

Red flags:

   - We're making a last-minute push to get resource types (but not
   resource profiles in). Coming this late, it's a risk, but we decided it's
   worthwhile for this feature. See Daniel's yarn-dev email
   

for
   the full rationale.
   - Still uncovering EC bugs from testing

Previously tracked GA blockers that have been resolved or dropped:

   - YARN-6623
    - Add
   support to turn off launching privileged containers in the
   container-executor RESOLVED: Committed and resolved
   - Change of ExecutionType
  - YARN-7275
   - NM
  Statestore cleanup for Container updates RESOLVED : Patch committed,
  resolved.
   - ReservationSystem
  - YARN-4859
   - [Bug]
  Unable to submit a job to a reservation when using FairScheduler
  RESOLVED: Yufei tested this and found things mostly worked, filed two
  not-blocker followons: YARN-7347
   - Fixe
  the bug in Fair scheduler to handle a queue named "root.root" OPEN
   and YARN-7348
   - Ignore
  the vcore in reservation request for fair policy queue OPEN

GA blockers:

   - Change of ExecutionType
  - YARN-7178
   - Add
  documentation for Container Update API OPEN : Still no update from
  Arun, I pinged it.
   - ReservationSystem
  - YARN-4827
  
- Document
  configuration of ReservationSystem for FairScheduler OPEN: Yufei said
  he'd work on it as of 2 days ago
   - Rolling upgrade
  - YARN-6142
   - Support
  rolling upgrade between 2.x and 3.x OPEN : I pinged this and asked
  for a status update
  - HDFS-11096
  
- Support
  rolling upgrade between 2.x and 3.xPATCH AVAILABLE: I pinged this and
  asked for a status update
   - Erasure coding
  - HDFS-12682
  
- ECAdmin
  -listPolicies will always show policy state as DISABLED OPEN: New
  blocker filed this week, Xiao is working on it
  - HDFS-12686
  
- Erasure
  coding system policy state is not correctly saved and loaded during real
  cluster restart OPEN: New blocker filed this week, Sammi is on it
  - HDFS-12686
  
- Erasure
  coding system policy state is not correctly saved and loaded during real
  cluster restart OPEN: Old blocker, Huafeng is on it, waiting on
  review from Wei-Chiu or Sammi

Features merged for GA:

   - Erasure coding
  - Continued bug reporting and fixing based on testing at Cloudera.
  - Two new blockers filed this week, mentioned above.
  - Huafeng completed patch to reenable disabled EC tests
   - Classpath isolation (HADOOP-11656)
   - HADOOP-13916
  
- Document
  how downstream clients should make use of the new shaded client artifacts
   IN PROGRESS: I pinged it
   - Compat guide (HADOOP-13714
   )
  - HADOOP-14876
  
- Create
  downstream developer docs from the compatibility guidelines PATCH
  AVAILABLE: Daniel has a patch up, revved based on Steve's review
  feedback, waiting on Steve's reply
  - HADOOP-14875
  
- Create
  end user documentation from the compatibility guidelines OPEN: No
  patch yet
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]
   - API-based scheduler configuration YARN-5734
    - OrgQueue
   for easy CapacityScheduler queue configuration management RESOLVED
  - Merged, no problems thus far [image: (smile)]
   - HDFS router-based configuration HDFS-10467

Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Andrew Wang

FWIW we've been running branch-3.0 unit tests successfully internally,
though we have separate jobs for Common, HDFS, YARN, and MR. The failures
here are probably a property of running everything in the same JVM, which
I've found problematic in the past due to OOMs.

On Tue, Oct 24, 2017 at 4:04 PM, Allen Wittenauer 
wrote:

>
> My plan is currently to:
>
> *  switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561
> patch to test it out.
> * if the tests work, work on getting YETUS-561 committed to yetus master
> * switch jobs back to ASF yetus master either post-YETUS-561 or without it
> if it doesn’t work
> * go back to working on something else, regardless of the outcome
>
>
> > On Oct 24, 2017, at 2:55 PM, Chris Douglas  wrote:
> >
> > Sean/Junping-
> >
> > Ignoring the epistemology, it's a problem. Let's figure out what's
> > causing memory to balloon and then we can work out the appropriate
> > remedy.
> >
> > Is this reproducible outside the CI environment? To Junping's point,
> > would YETUS-561 provide more detailed information to aid debugging? -C
> >
> > On Tue, Oct 24, 2017 at 2:50 PM, Junping Du  wrote:
> >> In general, the "solid evidence" of memory leak comes from analysis of
> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which
> piece of code are leaking memory from the analysis.
> >>
> >> Unfortunately, I cannot find any conclusion from previous comments and
> it even cannot tell which daemons/components of HDFS consumes unexpected
> high memory. Don't sounds like a solid bug report to me.
> >>
> >>
> >>
> >> Thanks,?
> >>
> >>
> >> Junping
> >>
> >>
> >> 
> >> From: Sean Busbey 
> >> Sent: Tuesday, October 24, 2017 2:20 PM
> >> To: Junping Du
> >> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev;
> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> >>
> >> Just curious, Junping what would "solid evidence" look like? Is the
> supposition here that the memory leak is within HDFS test code rather than
> library runtime code? How would such a distinction be shown?
> >>
> >> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du  > wrote:
> >> Allen,
> >> Do we have any solid evidence to show the HDFS unit tests going
> through the roof are due to serious memory leak by HDFS? Normally, I don't
> expect memory leak are identified in our UTs - mostly, it (test jvm gone)
> is just because of test or deployment issues.
> >> Unless there is concrete evidence, my concern on seriously memory
> leak for HDFS on 2.8 is relatively low given some companies (Yahoo,
> Alibaba, etc.) have deployed 2.8 on large production environment for
> months. Non-serious memory leak (like forgetting to close stream in
> non-critical path, etc.) and other non-critical bugs always happens here
> and there that we have to live with.
> >>
> >> Thanks,
> >>
> >> Junping
> >>
> >> 
> >> From: Allen Wittenauer  a...@effectivemachines.com>>
> >> Sent: Tuesday, October 24, 2017 8:27 AM
> >> To: Hadoop Common
> >> Cc: Hdfs-dev; mapreduce-...@hadoop.apache.org hadoop.apache.org>; yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org>
> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> >>
> >>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer <
> a...@effectivemachines.com> wrote:
> >>>
> >>>
> >>>
> >>> With no other information or access to go on, my current hunch is that
> one of the HDFS unit tests is ballooning in memory size.  The easiest way
> to kill a Linux machine is to eat all of the RAM, thanks to overcommit and
> that's what this "feels" like.
> >>>
> >>> Someone should verify if 2.8.2 has the same issues before a release
> goes out ...
> >>
> >>
> >>FWIW, I ran 2.8.2 last night and it has the same problems.
> >>
> >>Also: the node didn't die!  Looking through the workspace (so
> the next run will destroy them), two sets of logs stand out:
> >>
> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
> >>
> >>and
> >>
> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
> >>
> >>It looks like my hunch is correct:  RAM in the HDFS unit tests
> are going through the roof.  It's also interesting how MANY log files there
> are.  Is surefire not picking up that jobs are dying?  Maybe not if memory
> is getting tight.
> >>
> >>Anyway, at the point, branch-2.8 and higher are probably
> fubar'd. Additionally, I've filed YETUS-561 so that Yetus-controlled Docker
> containers can have their RAM limits set in order to prevent more nodes
> going catatonic.
> >>
> >>
> >>
> >> --

2017-10-31 Hadoop 3 release status update

2017-10-31 Thread Andrew Wang

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+
updates

2017-10-31

Lots of progress towards GA, we look on track for cutting RC0 this week. I
ran the versions script to check the branch matches up with JIRA and fixed
things up, and also checked that the changelog and release notes look
reasonable.

Highlights:

   - Resource types vote has passed and will be merged with branch-3.0
   shortly.
   - Down to three blockers on the dashboard, all being actively revved.

Red flags:

   - Still need to validate that resource types is ready to go once it's
   merged.

Previous tracked GA blockers that have been resolved or dropped:

   - Change of ExecutionType
  - YARN-7178
   - Add
  documentation for Container Update API RESOLVED : Arun got the patch
  in with reviews from Wangda and Haibo.
   - ReservationSystem
  - YARN-4827
  
- Document
  configuration of ReservationSystem for FairScheduler RESOLVED: Yufei
  and Subru got this in.
   - Rolling upgrade
  - YARN-6142
   - Support
  rolling upgrade between 2.x and 3.x RESOLVED : Ray resolved this
  since we think it's sufficiently complete.
   - Erasure coding
  - HDFS-12686
  
- Erasure
  coding system policy state is not correctly saved and loaded during real
  cluster restart RESOLVED: Resolved this one to incorporate it in
  HDFS-12682

GA blockers:

   - Rolling upgrade
  - HDFS-11096
  
- Support
  rolling upgrade between 2.x and 3.xPATCH AVAILABLE: I asked Sean if
  we can downgrade this from blocker
   - Erasure coding
  - HDFS-12682
  
- ECAdmin
  -listPolicies will always show SystemErasureCodingPolicies state
as DISABLED
   PATCH AVAILABLE: Actively being worked on and reviewed, should be in
  soon
  - HDFS-11467
  
- Support
  ErasureCoding section in OIV XML/ReverseXMLPATCH AVAILABLE: Waiting
  on HDFS-12682, I asked if we can work concurrently

Features merged for GA:

   - Erasure coding
  - Testing is still ongoing at Cloudera, no new bugs found recently
  - Closing on remaining blockers for GA
   - Classpath isolation (HADOOP-11656)
   - HADOOP-13916
  
- Document
  how downstream clients should make use of the new shaded client artifacts
   OPEN: Seems unlikely to make it
   - Compat guide (HADOOP-13714
   )
  - HADOOP-14876
  
- Create
  downstream developer docs from the compatibility guidelines PATCH
  AVAILABLE: Patch is being actively revved and reviewed, Robert +1'd,
  Anu posted a big review
  - HADOOP-14875
  
- Create
  end user documentation from the compatibility guidelines PATCH
  AVAILABLE: No patch yet
   - TSv2 alpha 2
   - This was merged, no problems thus far [image: (smile)]
   - API-based scheduler configuration YARN-5734
    - OrgQueue
   for easy CapacityScheduler queue configuration management RESOLVED
  - Merged, no problems thus far [image: (smile)]
   - HDFS router-based configuration HDFS-10467
    -
Router-based
   HDFS federation RESOLVED
  - Merged, no problems thus far [image: (smile)]
   - Resource types YARN-3926
    - Extend
   the YARN resource model for easier resource-type management and profiles
   RESOLVED
  - Vote has passed, Daniel is currently doing the mechanics of merging
  - Need to also perform final validation post-merge

Dropping the "unmerged features" section since we're not letting in
anything else at this point.

Re: [DISCUSS] A final minor release off branch-2?

2017-11-06 Thread Andrew Wang

What are the known gaps that need bridging between 2.x and 3.x?

>From an HDFS perspective, we've tested wire compat, rolling upgrade, and
rollback.

>From a YARN perspective, we've tested wire compat and rolling upgrade. Arun
just mentioned an NM rollback issue that I'm not familiar with.

Anything else? External to this discussion, these should be documented as
known issues for 3.0.

Best.
Andrew

On Sun, Nov 5, 2017 at 1:46 PM, Arun Suresh  wrote:

> Thanks for starting this discussion VInod.
>
> I agree (C) is a bad idea.
> I would prefer (A) given that ATM, branch-2 is still very close to
> branch-2.9 - and it is a good time to make a collective decision to lock
> down commits to branch-2.
>
> I think we should also clearly define what the 'bridging' release should
> be.
> I assume it means the following:
> * Any 2.x user wanting to move to 3.x must first upgrade to the bridging
> release first and then upgrade to the 3.x release.
> * With regard to state store upgrades (at least NM state stores) the
> bridging state stores should be aware of all new 3.x keys so the implicit
> assumption would be that a user can only rollback from the 3.x release to
> the bridging release and not to the old 2.x release.
> * Use the opportunity to clean up deprecated API ?
> * Do we even want to consider a separate bridging release for 2.7, 2.8 an
> 2.9 lines ?
>
> Cheers
> -Arun
>
> On Fri, Nov 3, 2017 at 5:07 PM, Vinod Kumar Vavilapalli <
> vino...@apache.org>
> wrote:
>
> > Hi all,
> >
> > With 3.0.0 GA around the corner (tx for the push, Andrew!), 2.9.0 RC out
> > (tx Arun / Subru!) and 2.8.2 (tx Junping!), I think it's high time we
> have
> > a discussion on how we manage our developmental bandwidth between 2.x
> line
> > and 3.x lines.
> >
> > Once 3.0 GA goes out, we will have two parallel and major release lines.
> > The last time we were in this situation was back when we did 1.x -> 2.x
> > jump.
> >
> > The parallel releases implies overhead of decisions, branch-merges and
> > back-ports. Right now we already do backports for 2.7.5, 2.8.2, 2.9.1,
> > 3.0.1 and potentially a 3.1.0 in a few months after 3.0.0 GA. And many of
> > these lines - for e.g 2.8, 2.9 - are going to be used for a while at a
> > bunch of large sites! At the same time, our users won't migrate to 3.0 GA
> > overnight - so we do have to support two parallel lines.
> >
> > I propose we start thinking of the fate of branch-2. The idea is to have
> > one final release that helps our users migrate from 2.x to 3.x. This
> > includes any changes on the older line to bridge compatibility issues,
> > upgrade issues, layout changes, tooling etc.
> >
> > We have a few options I think
> >  (A)
> > -- Make 2.9.x the last minor release off branch-2
> > -- Have a maintenance release that bridges 2.9 to 3.x
> > -- Continue to make more maintenance releases on 2.8 and 2.9 as
> > necessary
> > -- All new features obviously only go into the 3.x line as no
> features
> > can go into the maint line.
> >
> >  (B)
> > -- Create a new 2.10 release which doesn't have any new features, but
> > as a bridging release
> > -- Continue to make more maintenance releases on 2.8, 2.9 and 2.10 as
> > necessary
> > -- All new features, other than the bridging changes, go into the 3.x
> > line
> >
> >  (C)
> > -- Continue making branch-2 releases and postpone this discussion for
> > later
> >
> > I'm leaning towards (A) or to a lesser extent (B). Willing to hear
> > otherwise.
> >
> > Now, this obviously doesn't mean blocking of any more minor releases on
> > branch-2. Obviously, any interested committer / PMC can roll up his/her
> > sleeves, create a release plan and release, but we all need to
> acknowledge
> > that versions are not cheap and figure out how the community bandwidth is
> > split overall.
> >
> > Thanks
> > +Vinod
> > PS: The proposal is obviously not to force everyone to go in one
> direction
> > but more of a nudging the community to figure out if we can focus a major
> > part of of our bandwidth on one line. I had a similar concern when we
> were
> > doing 2.8 and 3.0 in parallel, but the impending possibility of spreading
> > too thin is much worse IMO.
> > PPS: (C) is a bad choice. With 2.8 and 2.9 we are already seeing user
> > adoption splintering between two lines. With 2.10, 2.11 etc coexisting
> with
> > 3.0, 3.1 etc, we will revisit the mad phase years ago when we had 0.20.x,
> > 0.20-security coexisting with 0.21, 0.22 etc.
>

Heads up: branching branch-3.0.0 for GA

2017-11-14 Thread Andrew Wang

Hi folks,

We've resolved all the blockers for 3.0.0 and the release notes and
changelog look good, so I'm going to cut the branch and get started on the
RC.

* branch-3.0 will advance to 3.0.1-SNAPSHOT
* branch-3.0.0 will go to 3.0.0

Please keep this in mind when committing.

Cheers,
Andrew

Re: Heads up: branching branch-3.0.0 for GA

2017-11-14 Thread Andrew Wang

Branching is complete. Please use the 3.0.1 fix version for further commits
to branch-3.0. Ping me if you want something in branch-3.0.0 since I'm
rolling RC0 now.

On Tue, Nov 14, 2017 at 11:08 AM, Andrew Wang 
wrote:

> Hi folks,
>
> We've resolved all the blockers for 3.0.0 and the release notes and
> changelog look good, so I'm going to cut the branch and get started on the
> RC.
>
> * branch-3.0 will advance to 3.0.1-SNAPSHOT
> * branch-3.0.0 will go to 3.0.0
>
> Please keep this in mind when committing.
>
> Cheers,
> Andrew
>

[VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-14 Thread Andrew Wang

Hi folks,

Thanks as always to the many, many contributors who helped with this
release. I've created RC0 for Apache Hadoop 3.0.0. The artifacts are
available here:

http://people.apache.org/~wang/3.0.0-RC0/

This vote will run 5 days, ending on Nov 19th at 1:30pm Pacific.

3.0.0 GA contains 291 fixed JIRA issues since 3.0.0-beta1. Notable
additions include the merge of YARN resource types, API-based configuration
of the CapacityScheduler, and HDFS router-based federation.

I've done my traditional testing with a pseudo cluster and a Pi job. My +1
to start.

Best,
Andrew

Re: [DISCUSS] A final minor release off branch-2?

2017-11-14 Thread Andrew Wang

To follow up on my earlier email, I don't think there's need for a bridge
release given that we've successfully tested rolling upgrade from 2.x to
3.0.0. I expect we'll keep making improvements to smooth over any
additional incompatibilities found, but there isn't a requirement that a
user upgrade to a bridge release before upgrading to 3.0.

Otherwise, I don't have a strong opinion about when to discontinue branch-2
releases. Historically, a release line is maintained until interest in it
wanes. If the maintainers are taking care of the backports, it's not much
work for the rest of us to vote on the RCs.

Best,
Andrew

On Mon, Nov 13, 2017 at 4:19 PM, Wangda Tan  wrote:

> Thanks Vinod for staring this,
>
> I'm also leaning towards the plan (A):
>
>
>
>
> * (A)-- Make 2.9.x the last minor release off branch-2-- Have a
> maintenance release that bridges 2.9 to 3.x-- Continue to make more
> maintenance releases on 2.8 and 2.9 as necessary*
>
> The only part I'm not sure is having a separate bridge release other than
> 3.x.
>
> For the bridge release, Steve's suggestion sounds more doable:
>
> ** 3.1+ for new features*
> ** fixes to 3.0.x &, where appropriate, 2.9, esp feature stabilisation*
> ** whoever puts their hand up to do 2.x releases deserves support in
> testing &c*
> ** If someone makes a really strong case to backport a feature from 3.x to
> branch-2 and its backwards compatible, I'm not going to stop them. It's
> just once 3.0 is out and a 3.1 on the way, it's less compelling*
>
> This makes community can focus on 3.x releases and fill whatever gaps of
> migrating from 2.x to 3.x.
>
> Best,
> Wangda
>
>
> On Wed, Nov 8, 2017 at 3:57 AM, Steve Loughran 
> wrote:
>
>>
>> > On 7 Nov 2017, at 19:08, Vinod Kumar Vavilapalli 
>> wrote:
>> >
>> >
>> >
>> >
>> >> Frankly speaking, working on some bridging release not targeting any
>> feature isn't so attractive to me as a contributor. Overall, the final
>> minor release off branch-2 is good, we should also give 3.x more time to
>> evolve and mature, therefore it looks to me we would have to work on two
>> release lines meanwhile for some time. I'd like option C), and suggest we
>> focus on the recent releases.
>> >
>> >
>> >
>> > Answering this question is also one of the goals of my starting this
>> thread. Collectively we need to conclude if we are okay or not okay with no
>> longer putting any new feature work in general on the 2.x line after 2.9.0
>> release and move over our focus into 3.0.
>> >
>> >
>> > Thanks
>> > +Vinod
>> >
>>
>>
>> As a developer of new features (e.g the Hadoop S3A committers), I'm
>> mostly already committed to targeting 3.1; the code in there to deal with
>> failures and retries has unashamedly embraced java 8 lambda-expressions in
>> production code: backporting that is going to be traumatic in terms of
>> IDE-assisted code changes and the resultant diff in source between branch-2
>> & trunk. What's worse, its going to be traumatic to test as all my JVMs
>> start with an 8 at the moment, and I'm starting to worry about whether I
>> should bump a windows VM up to Java 9 to keep an eye on Akira's work there.
>> Currently the only testing I'm really doing on java 7 is yetus branch-2 &
>> internal test runs.
>>
>>
>> 3.0 will be out the door, and we can assume that CDH will ship with it
>> soon (*)  which will allow for a rapid round trip time on inevitable bugs:
>> 3.1 can be the release with compatibility tuned, those reported issues
>> addressed. It's certainly where I'd like to focus.
>>
>>
>> At the same time: 2.7.2-2.8.x are the broadly used versions, we can't
>> just say "move to 3.0" & expect everyone to do it, not given we have
>> explicitly got backwards-incompatible changes in. I don't seen people
>> rushing to do it until the layers above are all qualified (HBase, Hive,
>> Spark, ...). Which means big users of 2.7/2,8 won't be in a rush to move
>> and we are going to have to maintain 2.x for a while, including security
>> patches for old versions. One issue there: what if a patch (such as bumping
>> up a JAR version) is incompatible?
>>
>> For me then
>>
>> * 3.1+ for new features
>> * fixes to 3.0.x &, where appropriate, 2.9, esp feature stabilisation
>> * whoever puts their hand up to do 2.x releases deserves support in
>> testing &c
>> * If someone makes a really strong case to backport a feature from 3.x to
>> branch-2 and its backwards compatible, I'm not going to stop them. It's
>> just once 3.0 is out and a 3.1 on the way, it's less compelling
>>
>> -Steve
>>
>> Note: I'm implicitly assuming a timely 3.1 out the door with my work
>> included, all all issues arriving from 3,0 fixed. We can worry when 3.1
>> ships whether there's any benefit in maintaining a 3.0.x, or whether it's
>> best to say "move to 3.1"
>>
>>
>>
>> (*) just a guess based the effort & test reports of Andrew & others
>>
>>
>> -
>> To unsubscribe, e-mail: mapredu

Re: [DISCUSS] A final minor release off branch-2?

2017-11-15 Thread Andrew Wang

Hi Junping,

On Wed, Nov 15, 2017 at 1:37 AM, Junping Du  wrote:

> Thanks Vinod to bring up this discussion, which is just in time.
>
> I agree with most responses that option C is not a good choice as our
> community bandwidth is precious and we should focus on very limited
> mainstream branches to develop, test and deployment. Of course, we should
> still follow Apache way to allow any interested committer for rolling up
> his/her own release given specific requirement over the mainstream releases.
>
> I am not biased on option A or B (I will discuss this later), but I think
> a bridge release for upgrading to and back from 3.x is very necessary.
> The reasons are obviously:
> 1. Given lesson learned from previous experience of migration from 1.x to
> 2.x, no matter how careful we tend to be, there is still chance that some
> level of compatibility (source, binary, configuration, etc.) get broken for
> the migration to new major release. Some of these incompatibilities can
> only be identified in runtime after GA release with widely deployed in
> production cluster - we have tons of downstream projects and numerous
> configurations and we cannot cover them all from in-house deployment and
> test.
>

Source and binary compatibility are not required for 3.0.0. It's a new
major release, and there are known, documented incompatibilities in this
regard.

That said, we've done far, far more in this regard compared to previous
major or minor releases. We've compiled all of CDH against Hadoop 3 and run
our suite of system tests for the platform. We've been testing in this way
since 3.0.0-alpha1 and found and fixed plenty of source and binary
compatibility issues during the alpha and beta process. Many of these fixes
trickled down into 2.8 and 2.9.

>
> 2. From recent classpath isolation work, I was surprised to find out that
> many of our downstream projects (HBase, Tez, etc.) are still consuming many
> non-public, server side APIs of Hadoop, not saying the projects/products
> outside of hadoop ecosystem. Our API compatibility test does not (and
> should not) cover these cases and situations. We can claim that new major
> release shouldn't be responsible for these private API changes. But given
> the possibility of breaking existing applications in some way, users could
> be very hesitated to migrate to 3.x release if there is no safe solution to
> roll back.
>

This is true for 2.x releases as well. Similar to the previous answer,
we've compiled all of CDH against Hadoop 3, providing a much higher level
of assurance even compared to 2.x releases.

>
> 3. Beside incompatibilities, there is also possible to have performance
> regressions (lower throughput, higher latency, slower job running, bigger
> memory footprint or even memory leaking, etc.) for new hadoop releases.
> While the performance impact of migration (if any) could be neglectable to
> some users, other users could be very sensitive and wish to roll back if it
> happens on their production cluster.
>
> Yes, bugs exist. I won't claim that 3.0.0 is bug-free. All new releases
can potentially introduce new bugs.

However, I don't think rollback is the solution. In my experience, users
rarely rollback since it's so disruptive and causes data loss. It's much
more common that they patch and upgrade. With that in mind, I'd rather we
spend our effort on making 3.0.x high-quality vs. making it easier to
rollback.

The root of my concern in announcing a "bridge release" is that it
discourages users from upgrading to 3.0.0 until a bridge release is out. I
strongly believe the level of quality provided by 3.0.0 is at least equal
to new 2.x minor releases, given our extended testing and integration
process, and we don't have bridge releases for 2.x.

This is why I asked for a list of known issues with 2.x -> 3.0 upgrades,
that would necessitate a bridge release. Arun raised a concern about NM
rollback. Are there any other *known* issues?

Best,
Andrew

Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-17 Thread Andrew Wang

Thanks for the spot, normally create-release spits those out. I uploaded
asc and mds for the release artifacts.

Best,
Andrew

On Thu, Nov 16, 2017 at 11:33 PM, Akira Ajisaka  wrote:

> Hi Andrew,
>
> Signatures are missing. Would you upload them?
>
> Thanks,
> Akira
>
>
> On 2017/11/15 6:34, Andrew Wang wrote:
>
>> Hi folks,
>>
>> Thanks as always to the many, many contributors who helped with this
>> release. I've created RC0 for Apache Hadoop 3.0.0. The artifacts are
>> available here:
>>
>> http://people.apache.org/~wang/3.0.0-RC0/
>>
>> This vote will run 5 days, ending on Nov 19th at 1:30pm Pacific.
>>
>> 3.0.0 GA contains 291 fixed JIRA issues since 3.0.0-beta1. Notable
>> additions include the merge of YARN resource types, API-based
>> configuration
>> of the CapacityScheduler, and HDFS router-based federation.
>>
>> I've done my traditional testing with a pseudo cluster and a Pi job. My +1
>> to start.
>>
>> Best,
>> Andrew
>>
>>

Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-17 Thread Andrew Wang

Hi Arpit,

I agree the timing is not great here, but extending it to meaningfully
avoid the holidays would mean extending it an extra week (e.g. to the
29th). We've been coordinating with ASF PR for that Tuesday, so I'd really,
really like to get the RC out before then.

In terms of downstream testing, we've done extensive integration testing
with downstreams via the alphas and betas, and we have continuous
integration running at Cloudera against branch-3.0. Because of this, I have
more confidence in our integration for 3.0.0 than most Hadoop releases.

Is it meaningful to extend to say, the 21st, which provides for a full week
of voting?

Best,
Andrew

On Fri, Nov 17, 2017 at 1:27 PM, Arpit Agarwal 
wrote:

> Hi Andrew,
>
> Thank you for your hard work in getting us to this step. This is our first
> major GA release in many years.
>
> I feel a 5-day vote window ending over the weekend before thanksgiving may
> not provide sufficient time to evaluate this RC especially for downstream
> components.
>
> Would you please consider extending the voting deadline until a few days
> after the thanksgiving holiday? It would be a courtesy to our broader
> community and I see no harm in giving everyone a few days to evaluate it
> more thoroughly.
>
> On a lighter note, your deadline is also 4 minutes short of the required 5
> days. :)
>
> Regards,
> Arpit
>
>
>
> On 11/14/17, 1:34 PM, "Andrew Wang"  wrote:
>
> Hi folks,
>
> Thanks as always to the many, many contributors who helped with this
> release. I've created RC0 for Apache Hadoop 3.0.0. The artifacts are
> available here:
>
> http://people.apache.org/~wang/3.0.0-RC0/
>
> This vote will run 5 days, ending on Nov 19th at 1:30pm Pacific.
>
> 3.0.0 GA contains 291 fixed JIRA issues since 3.0.0-beta1. Notable
> additions include the merge of YARN resource types, API-based
> configuration
> of the CapacityScheduler, and HDFS router-based federation.
>
> I've done my traditional testing with a pseudo cluster and a Pi job.
> My +1
> to start.
>
> Best,
> Andrew
>
>
>

Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-20 Thread Andrew Wang

Thanks for the spot Sangjin. I think this bug introduced in create-release
by HADOOP-14835. The multi-pass maven build generates these dummy client
jars during the site build since skipShade is specified.

This might be enough to cancel the RC. Thoughts?

Best,
Andrew

On Mon, Nov 20, 2017 at 7:51 PM, Sangjin Lee  wrote:

> I checked the client jars that are supposed to contain shaded
> dependencies, and they don't look quite right:
>
> $ tar -tzvf hadoop-3.0.0.tar.gz | grep hadoop-client-api-3.0.0.jar
> -rw-r--r--  0 andrew andrew44531 Nov 14 11:53
> hadoop-3.0.0/share/hadoop/client/hadoop-client-api-3.0.0.jar
> $ tar -tzvf hadoop-3.0.0.tar.gz | grep hadoop-client-runtime-3.0.0.jar
> -rw-r--r--  0 andrew andrew45533 Nov 14 11:53
> hadoop-3.0.0/share/hadoop/client/hadoop-client-runtime-3.0.0.jar
> $ tar -tzvf hadoop-3.0.0.tar.gz | grep hadoop-client-minicluster-3.0.0.jar
> -rw-r--r--  0 andrew andrew47015 Nov 14 11:53
> hadoop-3.0.0/share/hadoop/client/hadoop-client-minicluster-3.0.0.jar
>
> When I look at what's inside those jar, they only seem to include
> pom-related files with no class files. Am I missing something?
>
> When I build from the source with -Pdist, I do get much bigger jars:
> total 113760
> -rw-r--r--  1 sangjinlee  120039211  17055399 Nov 20 17:17
> hadoop-client-api-3.0.0.jar
> -rw-r--r--  1 sangjinlee  120039211  20451447 Nov 20 17:19
> hadoop-client-minicluster-3.0.0.jar
> -rw-r--r--  1 sangjinlee  120039211  20730866 Nov 20 17:18
> hadoop-client-runtime-3.0.0.jar
>
> Sangjin
>
> On Mon, Nov 20, 2017 at 5:52 PM, Sangjin Lee  wrote:
>
>>
>>
>> On Mon, Nov 20, 2017 at 5:26 PM, Vinod Kumar Vavilapalli <
>> vino...@apache.org> wrote:
>>
>>> Thanks for all the push, Andrew!
>>>
>>> Looking at the RC. Went through my usual check-list. Here's my summary.
>>> Will cast my final vote after comparing and validating my findings with
>>> others.
>>>
>>> Verification
>>>
>>>  - [Check] Successful recompilation from source tar-ball
>>>  - [Check] Signature verification
>>>  - [Check] Generating dist tarballs from source tar-ball
>>>  - [Check] Testing
>>> -- Start NN, DN, RM, NM, JHS, Timeline Service
>>> -- Ran dist-shell example, MR sleep, wordcount, randomwriter, sort,
>>> grep, pi
>>> -- Tested CLIs to print nodes, apps etc and also navigated UIs
>>>
>>> Issues found during testing
>>>
>>> Major
>>>  - The previously supported way of being able to use different tar-balls
>>> for different sub-modules is completely broken - common and HDFS tar.gz are
>>> completely empty.
>>>  - Cannot enable new UI in YARN because it is under a non-default
>>> compilation flag. It should be on by default.
>>>  - One decommissioned node in YARN ResourceManager UI always appears to
>>> start with, even when there are no NodeManagers that are started yet:  Info
>>> :-1, DECOMMISSIONED, null rack. It shows up only in the UI though, not
>>> in the CLI node -list
>>>
>>> Minor
>>>  - resourcemanager-metrics.out is going into current directory instead
>>> of log directory
>>>  - $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start historyserver doesn't
>>> even work. Not just deprecated in favor of timelineserver as was advertised.
>>>  - Spurious warnings on CLI
>>> 17/11/20 17:07:34 INFO conf.Configuration:
>>> resource-types.xml not found
>>> 17/11/20 17:07:34 INFO resource.ResourceUtils: Unable to
>>> find 'resource-types.xml'.
>>>
>>> Side notes
>>>
>>>  - When did we stop putting CHANGES files into the source artifacts?
>>>  - Even after "mvn install"ing once, shading is repeated again and again
>>> for every new 'mvn install' even though there are no source changes - we
>>> should see how this can be avoided.
>>>  - Compatibility notes
>>> -- NM's env list is curtailed unlike in 2.x (For e.g,
>>> HADOOP_MAPRED_HOME is not automatically inherited. Correct behavior)
>>> -- Sleep is moved from hadoop-mapreduce-client-jobclient-3.0.0.jar
>>> into hadoop-mapreduce-client-jobclient-3.0.0-tests.jar
>>>
>>
>> Sleep has always been in the jobclient test jar as long as I can
>> remember, so it's not new for 3.0.
>>
>>
>>>
>>> Thanks
>>> +Vinod
>>>
>>> > On Nov 14, 2017, at 1:34 PM, Andrew Wang 
>>> wrote:
>>> >
>>> > Hi folks,
>>> >
>>> > Thanks as always to the many, many contributors who helped with this
>>> > release. I've created RC0 for Apache Hadoop 3.0.0. The artifacts are
>>> > available here:
>>> >
>>> > http://people.apache.org/~wang/3.0.0-RC0/
>>> >
>>> > This vote will run 5 days, ending on Nov 19th at 1:30pm Pacific.
>>> >
>>> > 3.0.0 GA contains 291 fixed JIRA issues since 3.0.0-beta1. Notable
>>> > additions include the merge of YARN resource types, API-based
>>> configuration
>>> > of the CapacityScheduler, and HDFS router-based federation.
>>> >
>>> > I've done my traditional testing with a pseudo cluster and a Pi job.
>>> My +1
>>> > to start.
>>> >
>>> > Best,
>>> > Andrew
>>>
>>>
>>
>

Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-20 Thread Andrew Wang

Thanks for the thorough review Vinod, some inline responses:

*Issues found during testing*
>
> Major
>  - The previously supported way of being able to use different tar-balls
> for different sub-modules is completely broken - common and HDFS tar.gz are
> completely empty.
>

Is this something people use? I figured that the sub-tarballs were a relic
from the project split, and nowadays Hadoop is one project with one release
tarball. I actually thought about getting rid of these extra tarballs since
they add extra overhead to a full build.


>  - Cannot enable new UI in YARN because it is under a non-default
> compilation flag. It should be on by default.
>

The yarn-ui profile has always been off by default, AFAIK. It's documented
to turn it on in BUILDING.txt for release builds, and we do it in
create-release.

IMO not a blocker. I think it's also more of a dev question (do we want to
do this on every YARN build?) than a release one.


>  - One decommissioned node in YARN ResourceManager UI always appears to
> start with, even when there are no NodeManagers that are started yet:
> Info :-1, DECOMMISSIONED, null rack. It shows up only in the UI though,
> not in the CLI node -list
>

Is this a blocker? Could we get a JIRA?

Thanks,
Andrew

Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-20 Thread Andrew Wang

On Mon, Nov 20, 2017 at 9:59 PM, Sangjin Lee  wrote:

>
> On Mon, Nov 20, 2017 at 9:46 PM, Andrew Wang 
> wrote:
>
>> Thanks for the spot Sangjin. I think this bug introduced in
>> create-release by HADOOP-14835. The multi-pass maven build generates these
>> dummy client jars during the site build since skipShade is specified.
>>
>> This might be enough to cancel the RC. Thoughts?
>>
>
> IMO yes. This was one of the key features mentioned in the 3.0 release
> notes. I appreciate your effort for the release Andrew!
>
>
Yea, I was leaning that way too. Let's cancel this RC. I hope to have a new
RC up tomorrow. With the upcoming holidays, we'll probably have to extend
the vote until mid-next week.

I'm also worried about the "mvn deploy" step since I thought it was safe to
specify skipShade there too. I'll check that as well.

Best,
Andrew

Re: Apache Hadoop 2.8.3 Release Plan

2017-11-20 Thread Andrew Wang

I'm against including new features in maintenance releases, since they're
meant to be bug-fix only.

If we're struggling with being able to deliver new features in a safe and
timely fashion, let's try to address that, not overload the meaning of
"maintenance release".

Best,
Andrew

On Mon, Nov 20, 2017 at 5:20 PM, Zheng, Kai  wrote:

> Hi Junping,
>
> Thank you for making 2.8.2 happen and now planning the 2.8.3 release.
>
> I have an ask, is it convenient to include the back port work for OSS
> connector module? We have some Hadoop users that wish to have it by default
> for convenience, though in the past they used it by back porting
> themselves. I have raised this and got thoughts from Chris and Steve. Looks
> like this is more wanted for 2.9 but I wanted to ask again here for broad
> feedback and thoughts by this chance. The back port patch is available for
> 2.8 and the one for branch-2 was already in. IMO, 2.8.x is promising as we
> can see some shift from 2.7.x, hence it's worth more important features and
> efforts. How would you think? Thanks!
>
> https://issues.apache.org/jira/browse/HADOOP-14964
>
> Regards,
> Kai
>
> -Original Message-
> From: Junping Du [mailto:j...@hortonworks.com]
> Sent: Tuesday, November 14, 2017 9:02 AM
> To: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
> Subject: Apache Hadoop 2.8.3 Release Plan
>
> Hi,
> We have several important fixes get landed on branch-2.8 and I would
> like to cut off branch-2.8.3 now to start 2.8.3 release work.
> So far, I don't see any pending blockers on 2.8.3, so my current plan
> is to cut off 1st RC of 2.8.3 in next several days:
>  -  For all coming commits to land on branch-2.8, please mark the
> fix version as 2.8.4.
>  -  If there is a really important fix for 2.8.3 and getting
> closed, please notify me ahead before landing it on branch-2.8.3.
> Please let me know if you have any thoughts or comments on the plan.
>
> Thanks,
>
> Junping
> 
> From: dujunp...@gmail.com  on behalf of 俊平堵 <
> junping...@apache.org>
> Sent: Friday, October 27, 2017 3:33 PM
> To: gene...@hadoop.apache.org
> Subject: [ANNOUNCE] Apache Hadoop 2.8.2 Release.
>
> Hi all,
>
> It gives me great pleasure to announce that the Apache Hadoop
> community has voted to release Apache Hadoop 2.8.2, which is now available
> for download from Apache mirrors[1]. For download instructions please refer
> to the Apache Hadoop Release page [2].
>
> Apache Hadoop 2.8.2 is the first GA release of Apache Hadoop 2.8 line and
> our newest stable release for entire Apache Hadoop project. For major
> changes incuded in Hadoop 2.8 line, please refer Hadoop 2.8.2 main page[3].
>
> This release has 315 resolved issues since previous 2.8.1 release with
> following
> breakdown:
>- 91 in Hadoop Common
>- 99 in HDFS
>- 105 in YARN
>- 20 in MapReduce
> Please read the log of CHANGES[4] and RELEASENOTES[5] for more details.
>
> The release news is posted on the Hadoop website too, you can go to the
> downloads section directly [6].
>
> Thank you all for contributing to the Apache Hadoop release!
>
>
> Cheers,
>
> Junping
>
>
> [1] http://www.apache.org/dyn/closer.cgi/hadoop/common
>
> [2] http://hadoop.apache.org/releases.html
>
> [3] http://hadoop.apache.org/docs/r2.8.2/index.html
>
> [4]
> http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/
> hadoop-common/release/2.8.2/CHANGES.2.8.2.html
>
> [5]
> http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/
> hadoop-common/release/2.8.2/RELEASENOTES.2.8.2.html
>
> [6] http://hadoop.apache.org/releases.html#Download
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>
> -
> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>
>

Re: Apache Hadoop 2.8.3 Release Plan

2017-11-20 Thread Andrew Wang

>
>
> >> If we're struggling with being able to deliver new features in a safe
> and timely fashion, let's try to address that...
>
> This is interesting. Do you aware any means to do that? Thanks!
>
> I've mentioned this a few times on the lists before, but our biggest gap
in keeping branches releasable is automated integration testing.

I think we try to put too much into our minor releases, and features arrive
before they're baked. Having automated integration testing helps with this.
When we were finally able to turn on CI for the 3.0.0 release branch, we
started finding bugs much sooner after they were introduced, which made it
easier to revert before too much other code was built on top. The early
alphas felt Sisyphean at times, with bugs being introduced faster than we
could uncover and fix them.

A smaller example would be release validation. I've long wanted a nightly
Jenkins job that makes an RC and runs some basic checks on it. We end up
rolling extra RCs for small stuff that could have been caught earlier.

Best,
Andrew

Re: Apache Hadoop 2.8.3 Release Plan

2017-11-21 Thread Andrew Wang

The Aliyun OSS code isn't a small improvement. If you look at Sammi's
comment
<https://issues.apache.org/jira/browse/HADOOP-14964?focusedCommentId=16247085&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16247085>,
it's
a 17 patch series that is being backported in one shot. What we're talking
about is equivalent to merging a feature branch in a maintenance release. I
see that Kai and Chris are having a discussion about the dependency
changes, which indicates this is not a zero-risk change either. We really
should not be changing dependency versions in a maintenance unless it's
because of a bug.

It's unfortunate from a timing perspective that this missed 2.9.0, but I
still think it should wait for the next minor. Merging a feature into a
maintenance release sets the wrong precedent.

Best,
Andrew

On Tue, Nov 21, 2017 at 1:08 AM, Junping Du  wrote:

> Thanks Kai for calling out this feature/improvement for attention and
> Andrew for comments.
>
>
> While I agree that maintenance release should focus on important bug fix
> only, I doubt we have strict rules to disallow any features/improvements to
> land on maint release especially when those are small footprint or low
> impact on existing code/features. In practice, we indeed has 77 new
> features/improvements in latest 2.7.3 and 2.7.4 release.
>
>
> Back to HADOOP-14964, I did a quick check and it looks like case here
> belongs to self-contained improvement that has very low impact on existing
> code base, so I am OK with the improvement get landed on branch-2.8 in case
> it is well reviewed and tested.
>
>
> However, as RM of branch-2.8, I have two concerns to accept it in our
> 2.8.3 release:
>
> 1. Timing - as I mentioned in beginning, the main purpose of 2.8.3 are for
> several critical bug fixes and we should target to release it very soon -
> my current plan is to cut RC out within this week inline with waiting
> for 3.0.0 vote closing. Can this improvement be well tested against
> branch-2.8.3 within this strictly timeline? It seems a bit rush unless we
> have strong commitment on test plan and activities in such a tight time.
>
>
> 2. Upgrading - I haven't heard we settle down the plan of releasing this
> feature in 2.9.1 release - though I saw some discussions are going on
> at HADOOP-14964. Assume 2.8.3 is released ahead of 2.9.1 and it includes
> this improvement, then users consuming this feature/improvement have no 2.9
> release to upgrade or forcefully upgrade with regression. We may need a
> better upgrade story here.
>
>
> Pls let me know what you think. Thanks!
>
>
>
> Thanks,
>
>
> Junping
>
>
> --
> *From:* Andrew Wang 
> *Sent:* Monday, November 20, 2017 10:22 PM
> *To:* Zheng, Kai
> *Cc:* Junping Du; common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
> *Subject:* Re: Apache Hadoop 2.8.3 Release Plan
>
> I'm against including new features in maintenance releases, since they're
> meant to be bug-fix only.
>
> If we're struggling with being able to deliver new features in a safe and
> timely fashion, let's try to address that, not overload the meaning of
> "maintenance release".
>
> Best,
> Andrew
>
> On Mon, Nov 20, 2017 at 5:20 PM, Zheng, Kai  wrote:
>
>> Hi Junping,
>>
>> Thank you for making 2.8.2 happen and now planning the 2.8.3 release.
>>
>> I have an ask, is it convenient to include the back port work for OSS
>> connector module? We have some Hadoop users that wish to have it by default
>> for convenience, though in the past they used it by back porting
>> themselves. I have raised this and got thoughts from Chris and Steve. Looks
>> like this is more wanted for 2.9 but I wanted to ask again here for broad
>> feedback and thoughts by this chance. The back port patch is available for
>> 2.8 and the one for branch-2 was already in. IMO, 2.8.x is promising as we
>> can see some shift from 2.7.x, hence it's worth more important features and
>> efforts. How would you think? Thanks!
>>
>> https://issues.apache.org/jira/browse/HADOOP-14964
>>
>> Regards,
>> Kai
>>
>> -Original Message-
>> From: Junping Du [mailto:j...@hortonworks.com]
>> Sent: Tuesday, November 14, 2017 9:02 AM
>> To: common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
>> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
>> Subject: Apache Hadoop 2.8.3 Release Plan
>>
>> Hi,
>> We have several important fixes get landed on branch-2.8 and I would
>> like to cut off branch

Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-21 Thread Andrew Wang

On Mon, Nov 20, 2017 at 11:33 PM, Allen Wittenauer  wrote:

>
> The original release script and instructions broke the build up
> into three or so steps. When I rewrote it, I kept that same model. It’s
> probably time to re-think that.  In particular, it should probably be one
> big step that even does the maven deploy.  There’s really no harm in doing
> that given that there is still a manual step to release the deployed jars
> into the production area.
>
> We just need need to:
>
> a) add an option to do deploy instead of just install.  if c-r is in asf
> mode, always activate deploy
> b) pull the maven settings.xml file (and only the maven settings file… we
> don’t want the repo!) into the docker build environment
> c) consolidate the mvn steps
>
> This has the added benefit of greatly speeding up the build by
> removing several passes.
>
> Probably not a small change, but I’d have to look at the code.
> I’m on a plane tomorrow morning though.
>
> I refreshed my memory on this yesterday, and came to a similar conclusion.
+1 to this approach. It'd also solve our current issue, if we build the
site and site tarball after the deploy and building the src/bin tarballs.

So, regarding this current issue, I think our options are:

* The c-r changes to do "mvn clean deploy", create the src and bin
tarballs, then "mvn site" at the end.
* Turn off JDiff in the site build

I'd like to get off of JDiff since both the project and our usage of it
isn't maintained, but that might be a more controversial action than
changing create-release.

I filed HADOOP-15058 to dig further into this issue.

Best,
Andrew

Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-11-21 Thread Andrew Wang

Hi folks,

Thanks again for the testing help with the RC. Here's our dashboard for the
3.0.0 release:

https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12329849

Right now we're tracking three blockers:

* HADOOP-15058 is the create-release fix, I just put up a patch which needs
reviews. It's the worst timing, but I'm hoping Allen could give it a quick
sanity check.
* HADOOP-15059 is the MR rolling upgrade issue that Junping found, needs
triage and an assignee. I asked Ray to look at what we've done with our
existing rolling upgrade testing, since it does run an MR job.
* HDFS-12480 is an EC issue that Eddy would like to get in if we're rolling
another RC, looks close.

Is there anything else from this thread that needs to be addressed? I rely
on the dashboard to track blockers, so please file a JIRA and prioritize if
so.

Best,
Andrew



On Tue, Nov 21, 2017 at 2:08 PM, Vrushali C  wrote:

> Hi Vinod,
>
> bq. (b) We need to figure out if this V1 TimelineService should even be
> support given ATSv2.
>
> Yes, I am following this discussion. Let me chat with Rohith and Varun
> about this and we will respond on this thread. As such, my preliminary
> thoughts are that we should continue to support Timeline Service V1 till we
> have the detailed entity level ACLs in V2 and perhaps also a proposal
> around upgrade/migration paths from TSv1 to TSv2.
>
> But in any case, we do need to work towards phasing out Timeline Service
> V1.
>
> thanks
> Vrushali
>
>
> On Tue, Nov 21, 2017 at 1:16 PM, Vinod Kumar Vavilapalli <
> vino...@apache.org
> > wrote:
>
> > >> - $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start historyserver doesn't
> > even work. Not just deprecated in favor of timelineserver as was
> advertised.
> > >
> > >   This works for me in trunk and the bash code doesn’t appear to
> > have changed in a very long time.  Probably something local to your
> > install.  (I do notice that the deprecation message says “starting” which
> > is awkward when the stop command is given though.)  Also: is the
> > deprecation message even true at this point?
> >
> >
> > Sorry, I mischaracterized the problem.
> >
> > The real issue is that I cannot use this command line when the MapReduce
> > JobHistoryServer is already started on the same machine.
> >
> > ~/tmp/yarn$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start historyserver
> > WARNING: Use of this script to start YARN daemons is deprecated.
> > WARNING: Attempting to execute replacement "yarn --daemon start" instead.
> > DEPRECATED: Use of this command to start the timeline server is
> deprecated.
> > Instead use the timelineserver command for it.
> > Starting the History Server anyway...
> > historyserver is running as process 86156.  Stop it first.
> >
> > So, it looks like in shell-scripts, there can ever be only one daemon of
> a
> > given name, irrespective of which daemon scripts are invoked.
> >
> > We need to figure out two things here
> >  (a) The behavior of this command. Clearly, it will conflict with the
> > MapReduce JHS - only one of them can be started on the same node.
> >  (b) We need to figure out if this V1 TimelineService should even be
> > support given ATSv2.
> >
> > @Vrushani / @Rohith / @Varun Saxena et.al, if you are watching, please
> > comment on (b).
> >
> > Thanks
> > +Vinod
>

2017-12-01 Hadoop 3 release status update

2017-12-01 Thread Andrew Wang

https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+3+release+status+updates

2017-12-01

Haven't written one of these in a month. I had high hopes for RC0, but it
failed due to  HADOOP-15058
 -
create-release
site build outputs dummy shaded jars due to skipShade PATCH AVAILABLE  which
Sangjin found, and then a number of other blockers were found shortly after
that.

We're back to blocker burndown. My new (realistic) goal is to get 3.0.0 out
before Christmas. We could always use more help with reviews; most things
are patch available.



Highlights:

Red flags:

Previously tracked blockers that have been resolved or dropped:

GA blockers:

   - HDFS-12840
    - Creating
   a replicated file in a EC zone does not correctly serialized in
EditLogs PATCH
   AVAILABLE : Has gone through several rounds of review, looks close.
   - HADOOP-15080
    - Cat-X
   transitive dependency on org.json library via json-lib OPEN : New issue,
   waiting on LEGAL but we might need to pull this entire feature.
   - HADOOP-15059
    - 3.0
   deployment cannot work with old version MR tar ball which break rolling
   upgrade PATCH AVAILABLE : Has gone through some review and has a +1 from
   Daryn, could use confirmation from Vinod and others
   - HADOOP-15058
   
- create-release
   site build outputs dummy shaded jars due to skipShade PATCH AVAILABLE :
   Needs review, asked Allen but might need someone else to help.

GA criticals:

   - HDFS-12872
    - EC
   Checksum broken when BlockAccessToken is enabled PATCH AVAILABLE : Patch
   needs review
   - YARN-7381
    - Enable
   the configuration: yarn.nodemanager.log-container-debug-info.enabled PATCH
   AVAILABLE : Has gone through some review and Wangda +1'd, could use
   confirmation from Ray and others

Features merged for GA:

   - Erasure coding
  - Testing is still ongoing at Cloudera, which resulted in  HDFS-12840
  
- Creating
  a replicated file in a EC zone does not correctly serialized in EditLogs
   PATCH AVAILABLE  and HDFS-12872
   - EC
  Checksum broken when BlockAccessToken is enabled PATCH AVAILABLE .
   - Classpath isolation (HADOOP-11656)
   - No change.
   - Compat guide (HADOOP-13714
   )
  - We slid a couple more changes into 3.0.0 after RC0 was cancelled,
  making this work more complete.
   - TSv2 alpha 2
   - No change.
   - API-based scheduler configuration  YARN-5734
    - OrgQueue
   for easy CapacityScheduler queue configuration management RESOLVED
  - No change.
   - HDFS router-based configuration  HDFS-10467
    -
Router-based
   HDFS federation RESOLVED
  - No change.
   - Resource types  YARN-3926
    - Extend
   the YARN resource model for easier resource-type management and profiles
   RESOLVED
  - Had some post-merge issues that were resolved, nothing outstanding.

Re: [VOTE] Release Apache Hadoop 3.0.0 RC0

2017-12-08 Thread Andrew Wang

FYI that we got our last blocker in today, so I'm currently rolling RC1.
Stay tuned!

On Thu, Nov 30, 2017 at 8:32 AM, Allen Wittenauer 
wrote:

>
> > On Nov 30, 2017, at 1:07 AM, Rohith Sharma K S <
> rohithsharm...@apache.org> wrote:
> >
> >
> > >. If ATSv1 isn’t replaced by ATSv2, then why is it marked deprecated?
> > Ideally it should not be. Can you point out where it is marked as
> deprecated? If it is in historyserver daemon start, that change made very
> long back when timeline server added.
>
>
> Ahh, I see where all the problems lie.  No one is paying attention to the
> deprecation message because it’s kind of oddly worded:
>
> * It really means “don’t use ‘yarn historyserver’ use ‘yarn
> timelineserver’ ”
> * ‘yarn historyserver’ was removed from the documentation in 2.7.0
> * ‘yarn historyserver’ doesn’t appear in the yarn usage output
> * ‘yarn timelineserver’ runs the exact same class
>
> There’s no reason for ‘yarn historyserver’ to exist in 3.x.  Just run
> ‘yarn timelineserver’ instead.
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

[VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-08 Thread Andrew Wang

Hi all,

Let me start, as always, by thanking the efforts of all the contributors
who contributed to this release, especially those who jumped on the issues
found in RC0.

I've prepared RC1 for Apache Hadoop 3.0.0. This release incorporates 302
fixed JIRAs since the previous 3.0.0-beta1 release.

You can find the artifacts here:

http://home.apache.org/~wang/3.0.0-RC1/

I've done the traditional testing of building from the source tarball and
running a Pi job on a single node cluster. I also verified that the shaded
jars are not empty.

Found one issue that create-release (probably due to the mvn deploy change)
didn't sign the artifacts, but I fixed that by calling mvn one more time.
Available here:

https://repository.apache.org/content/repositories/orgapachehadoop-1075/

This release will run the standard 5 days, closing on Dec 13th at 12:31pm
Pacific. My +1 to start.

Best,
Andrew

Re: [VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-11 Thread Andrew Wang

Sorry, forgot to push the tag. It's up there now.

On Sun, Dec 10, 2017 at 8:31 PM, Vinod Kumar Vavilapalli  wrote:

> I couldn't find the release tag for RC1 either - is it just me or has the
> release-process changed?
>
> +Vinod
>
> > On Dec 10, 2017, at 4:31 PM, Sangjin Lee  wrote:
> >
> > Hi Andrew,
> >
> > Thanks much for your effort! Just to be clear, could you please state the
> > git commit id of the RC1 we're voting for?
> >
> > Sangjin
> >
> > On Fri, Dec 8, 2017 at 12:31 PM, Andrew Wang 
> > wrote:
> >
> >> Hi all,
> >>
> >> Let me start, as always, by thanking the efforts of all the contributors
> >> who contributed to this release, especially those who jumped on the
> issues
> >> found in RC0.
> >>
> >> I've prepared RC1 for Apache Hadoop 3.0.0. This release incorporates 302
> >> fixed JIRAs since the previous 3.0.0-beta1 release.
> >>
> >> You can find the artifacts here:
> >>
> >> http://home.apache.org/~wang/3.0.0-RC1/
> >>
> >> I've done the traditional testing of building from the source tarball
> and
> >> running a Pi job on a single node cluster. I also verified that the
> shaded
> >> jars are not empty.
> >>
> >> Found one issue that create-release (probably due to the mvn deploy
> change)
> >> didn't sign the artifacts, but I fixed that by calling mvn one more
> time.
> >> Available here:
> >>
> >> https://repository.apache.org/content/repositories/
> orgapachehadoop-1075/
> >>
> >> This release will run the standard 5 days, closing on Dec 13th at
> 12:31pm
> >> Pacific. My +1 to start.
> >>
> >> Best,
> >> Andrew
> >>
>
>

Re: [VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-11 Thread Andrew Wang

Good point on the mutability. Release tags are immutable, RCs are not.

On Mon, Dec 11, 2017 at 1:39 PM, Sangjin Lee  wrote:

> Thanks Andrew. For the record, the commit id would be
> c25427ceca461ee979d30edd7a4b0f50718e6533. I mention that for completeness
> because of the mutability of tags.
>
> On Mon, Dec 11, 2017 at 10:31 AM, Andrew Wang 
> wrote:
>
>> Sorry, forgot to push the tag. It's up there now.
>>
>> On Sun, Dec 10, 2017 at 8:31 PM, Vinod Kumar Vavilapalli <
>> vino...@apache.org> wrote:
>>
>>> I couldn't find the release tag for RC1 either - is it just me or has
>>> the release-process changed?
>>>
>>> +Vinod
>>>
>>> > On Dec 10, 2017, at 4:31 PM, Sangjin Lee  wrote:
>>> >
>>> > Hi Andrew,
>>> >
>>> > Thanks much for your effort! Just to be clear, could you please state
>>> the
>>> > git commit id of the RC1 we're voting for?
>>> >
>>> > Sangjin
>>> >
>>> > On Fri, Dec 8, 2017 at 12:31 PM, Andrew Wang >> >
>>> > wrote:
>>> >
>>> >> Hi all,
>>> >>
>>> >> Let me start, as always, by thanking the efforts of all the
>>> contributors
>>> >> who contributed to this release, especially those who jumped on the
>>> issues
>>> >> found in RC0.
>>> >>
>>> >> I've prepared RC1 for Apache Hadoop 3.0.0. This release incorporates
>>> 302
>>> >> fixed JIRAs since the previous 3.0.0-beta1 release.
>>> >>
>>> >> You can find the artifacts here:
>>> >>
>>> >> http://home.apache.org/~wang/3.0.0-RC1/
>>> >>
>>> >> I've done the traditional testing of building from the source tarball
>>> and
>>> >> running a Pi job on a single node cluster. I also verified that the
>>> shaded
>>> >> jars are not empty.
>>> >>
>>> >> Found one issue that create-release (probably due to the mvn deploy
>>> change)
>>> >> didn't sign the artifacts, but I fixed that by calling mvn one more
>>> time.
>>> >> Available here:
>>> >>
>>> >> https://repository.apache.org/content/repositories/orgapache
>>> hadoop-1075/
>>> >>
>>> >> This release will run the standard 5 days, closing on Dec 13th at
>>> 12:31pm
>>> >> Pacific. My +1 to start.
>>> >>
>>> >> Best,
>>> >> Andrew
>>> >>
>>>
>>>
>>
>

Re: [VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-12 Thread Andrew Wang

Hi Wei-Chiu,

The patchprocess directory is left over from the create-release process,
and it looks empty to me. We should still file a create-release JIRA to fix
this, but I think this is not a blocker. Would you agree?

Best,
Andrew

On Tue, Dec 12, 2017 at 9:44 AM, Wei-Chiu Chuang 
wrote:

> Hi Andrew, thanks the tremendous effort.
> I found an empty "patchprocess" directory in the source tarball, that is
> not there if you clone from github. Any chance you might have some leftover
> trash when you made the tarball?
> Not wanting to nitpicking, but you might want to double check so we don't
> ship anything private to you in public :)
>
>
>
> On Tue, Dec 12, 2017 at 7:48 AM, Ajay Kumar 
> wrote:
>
>> +1 (non-binding)
>> Thanks for driving this, Andrew Wang!!
>>
>> - downloaded the src tarball and verified md5 checksum
>> - built from source with jdk 1.8.0_111-b14
>> - brought up a pseudo distributed cluster
>> - did basic file system operations (mkdir, list, put, cat) and
>> confirmed that everything was working
>> - Run word count, pi and DFSIOTest
>> - run hdfs and yarn, confirmed that the NN, RM web UI worked
>>
>> Cheers,
>> Ajay
>>
>> On 12/11/17, 9:35 PM, "Xiao Chen"  wrote:
>>
>> +1 (binding)
>>
>> - downloaded src tarball, verified md5
>> - built from source with jdk1.8.0_112
>> - started a pseudo cluster with hdfs and kms
>> - sanity checked encryption related operations working
>> - sanity checked webui and logs.
>>
>> -Xiao
>>
>> On Mon, Dec 11, 2017 at 6:10 PM, Aaron T. Myers 
>> wrote:
>>
>> > +1 (binding)
>> >
>> > - downloaded the src tarball and built the source (-Pdist -Pnative)
>> > - verified the checksum
>> > - brought up a secure pseudo distributed cluster
>> > - did some basic file system operations (mkdir, list, put, cat) and
>> > confirmed that everything was working
>> > - confirmed that the web UI worked
>> >
>> > Best,
>> > Aaron
>> >
>> > On Fri, Dec 8, 2017 at 12:31 PM, Andrew Wang <
>> andrew.w...@cloudera.com>
>> > wrote:
>> >
>> > > Hi all,
>> > >
>> > > Let me start, as always, by thanking the efforts of all the
>> contributors
>> > > who contributed to this release, especially those who jumped on
>> the
>> > issues
>> > > found in RC0.
>> > >
>> > > I've prepared RC1 for Apache Hadoop 3.0.0. This release
>> incorporates 302
>> > > fixed JIRAs since the previous 3.0.0-beta1 release.
>> > >
>> > > You can find the artifacts here:
>> > >
>> > > http://home.apache.org/~wang/3.0.0-RC1/
>> > >
>> > > I've done the traditional testing of building from the source
>> tarball and
>> > > running a Pi job on a single node cluster. I also verified that
>> the
>> > shaded
>> > > jars are not empty.
>> > >
>> > > Found one issue that create-release (probably due to the mvn
>> deploy
>> > change)
>> > > didn't sign the artifacts, but I fixed that by calling mvn one
>> more time.
>> > > Available here:
>> > >
>> > > https://repository.apache.org/content/repositories/orgapache
>> hadoop-1075/
>> > >
>> > > This release will run the standard 5 days, closing on Dec 13th at
>> 12:31pm
>> > > Pacific. My +1 to start.
>> > >
>> > > Best,
>> > > Andrew
>> > >
>> >
>>
>>
>>
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>>
>
>
>
>

Re: [VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-12 Thread Andrew Wang

Hi everyone,

As a reminder, this vote closes tomorrow at 12:31pm, so please give it a
whack if you have time. There are already enough binding +1s to pass this
vote, but it'd be great to get additional validation.

Thanks to everyone who's voted thus far!

Best,
Andrew



On Tue, Dec 12, 2017 at 11:08 AM, Lei Xu  wrote:

> +1 (binding)
>
> * Verified src tarball and bin tarball, verified md5 of each.
> * Build source with -Pdist,native
> * Started a pseudo cluster
> * Run ec -listPolicies / -getPolicy / -setPolicy on /  , and run hdfs
> dfs put/get/cat on "/" with XOR-2-1 policy.
>
> Thanks Andrew for this great effort!
>
> Best,
>
>
> On Tue, Dec 12, 2017 at 9:55 AM, Andrew Wang 
> wrote:
> > Hi Wei-Chiu,
> >
> > The patchprocess directory is left over from the create-release process,
> > and it looks empty to me. We should still file a create-release JIRA to
> fix
> > this, but I think this is not a blocker. Would you agree?
> >
> > Best,
> > Andrew
> >
> > On Tue, Dec 12, 2017 at 9:44 AM, Wei-Chiu Chuang 
> > wrote:
> >
> >> Hi Andrew, thanks the tremendous effort.
> >> I found an empty "patchprocess" directory in the source tarball, that is
> >> not there if you clone from github. Any chance you might have some
> leftover
> >> trash when you made the tarball?
> >> Not wanting to nitpicking, but you might want to double check so we
> don't
> >> ship anything private to you in public :)
> >>
> >>
> >>
> >> On Tue, Dec 12, 2017 at 7:48 AM, Ajay Kumar  >
> >> wrote:
> >>
> >>> +1 (non-binding)
> >>> Thanks for driving this, Andrew Wang!!
> >>>
> >>> - downloaded the src tarball and verified md5 checksum
> >>> - built from source with jdk 1.8.0_111-b14
> >>> - brought up a pseudo distributed cluster
> >>> - did basic file system operations (mkdir, list, put, cat) and
> >>> confirmed that everything was working
> >>> - Run word count, pi and DFSIOTest
> >>> - run hdfs and yarn, confirmed that the NN, RM web UI worked
> >>>
> >>> Cheers,
> >>> Ajay
> >>>
> >>> On 12/11/17, 9:35 PM, "Xiao Chen"  wrote:
> >>>
> >>> +1 (binding)
> >>>
> >>> - downloaded src tarball, verified md5
> >>> - built from source with jdk1.8.0_112
> >>> - started a pseudo cluster with hdfs and kms
> >>> - sanity checked encryption related operations working
> >>> - sanity checked webui and logs.
> >>>
> >>> -Xiao
> >>>
> >>> On Mon, Dec 11, 2017 at 6:10 PM, Aaron T. Myers 
> >>> wrote:
> >>>
> >>> > +1 (binding)
> >>> >
> >>> > - downloaded the src tarball and built the source (-Pdist
> -Pnative)
> >>> > - verified the checksum
> >>> > - brought up a secure pseudo distributed cluster
> >>> > - did some basic file system operations (mkdir, list, put, cat)
> and
> >>> > confirmed that everything was working
> >>> > - confirmed that the web UI worked
> >>> >
> >>> > Best,
> >>> > Aaron
> >>> >
> >>> > On Fri, Dec 8, 2017 at 12:31 PM, Andrew Wang <
> >>> andrew.w...@cloudera.com>
> >>> > wrote:
> >>> >
> >>> > > Hi all,
> >>> > >
> >>> > > Let me start, as always, by thanking the efforts of all the
> >>> contributors
> >>> > > who contributed to this release, especially those who jumped on
> >>> the
> >>> > issues
> >>> > > found in RC0.
> >>> > >
> >>> > > I've prepared RC1 for Apache Hadoop 3.0.0. This release
> >>> incorporates 302
> >>> > > fixed JIRAs since the previous 3.0.0-beta1 release.
> >>> > >
> >>> > > You can find the artifacts here:
> >>> > >
> >>> > > http://home.apache.org/~wang/3.0.0-RC1/
> >>> > >
> >>> > > I've done the traditional testing of building from the source
> >>> tarball and
> >>> > > running a Pi job on a single node cluster. I also verified that
> >>> the
> >>> > shaded
> >>> > > jars are not empty.
> >>> > >
> >>> > > Found one issue that create-release (probably due to the mvn
> >>> deploy
> >>> > change)
> >>> > > didn't sign the artifacts, but I fixed that by calling mvn one
> >>> more time.
> >>> > > Available here:
> >>> > >
> >>> > > https://repository.apache.org/content/repositories/orgapache
> >>> hadoop-1075/
> >>> > >
> >>> > > This release will run the standard 5 days, closing on Dec 13th
> at
> >>> 12:31pm
> >>> > > Pacific. My +1 to start.
> >>> > >
> >>> > > Best,
> >>> > > Andrew
> >>> > >
> >>> >
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> -
> >>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> >>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >>>
> >>
> >>
> >>
> >>
>
>
>
> --
> Lei (Eddy) Xu
> Software Engineer, Cloudera
>

Re: [VOTE] Release Apache Hadoop 3.0.0 RC1

2017-12-13 Thread Andrew Wang

Hi folks,

To close this out, the vote passes successfully with 13 binding +1s, 5
non-binding +1s, and no -1s. Thanks everyone for voting! I'll work on
staging.

I'm hoping we can address YARN-7588 and any remaining rolling upgrade
issues in 3.0.x maintenance releases. Beyond a wiki page, it would be
really great to get JIRAs filed and targeted for tracking as soon as
possible.

Vinod, what do you think we need to do regarding caveating rolling upgrade
support? We haven't advertised rolling upgrade support between major
releases outside of dev lists and JIRA. As a new major release, our compat
guidelines allow us to break compatibility, so I don't think it's expected
by users.

Best,
Andrew

On Wed, Dec 13, 2017 at 12:37 PM, Vinod Kumar Vavilapalli <
vino...@apache.org> wrote:

> I was waiting for Daniel to post the minutes from YARN meetup to talk
> about this. Anyways, in that discussion, we identified a bunch of key
> upgrade related scenarios that no-one seems to have validated - atleast
> from the representation in the YARN meetup. I'm going to create a wiki-page
> listing all these scenarios.
>
> But back to the bug that Junping raised. At this point, we don't have a
> clear path towards running 2.x applications on 3.0.0 clusters. So, our
> claim of rolling-upgrades already working is not accurate.
>
> One of the two options that Junping proposed should be pursued before we
> close the release. I'm in favor of calling out rolling-upgrade support be
> with-drawn or caveated and push for progress instead of blocking the
> release.
>
> Thanks
> +Vinod
>
> > On Dec 12, 2017, at 5:44 PM, Junping Du  wrote:
> >
> > Thanks Andrew for pushing new RC for 3.0.0. I was out last week, just
> get chance to validate new RC now.
> >
> > Basically, I found two critical issues with the same rolling upgrade
> scenario as where HADOOP-15059 get found previously:
> > HDFS-12920, we changed value format for some hdfs configurations that
> old version MR client doesn't understand when fetching these
> configurations. Some quick workarounds are to add old value (without time
> unit) in hdfs-site.xml to override new default values but will generate
> many annoying warnings. I provided my fix suggestions on the JIRA already
> for more discussion.
> > The other one is YARN-7646. After we workaround HDFS-12920, will hit the
> issue that old version MR AppMaster cannot communicate with new version of
> YARN RM - could be related to resource profile changes from YARN side but
> root cause are still in investigation.
> >
> > The first issue may not belong to a blocker given we can workaround this
> without code change. I am not sure if we can workaround 2nd issue so far.
> If not, we may have to fix this or compromise with withdrawing support of
> rolling upgrade or calling it a stable release.
> >
> >
> > Thanks,
> >
> > Junping
> >
> > 
> > From: Robert Kanter 
> > Sent: Tuesday, December 12, 2017 3:10 PM
> > To: Arun Suresh
> > Cc: Andrew Wang; Lei Xu; Wei-Chiu Chuang; Ajay Kumar; Xiao Chen; Aaron
> T. Myers; common-...@hadoop.apache.org; hdfs-dev@hadoop.apache.org;
> yarn-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org
> > Subject: Re: [VOTE] Release Apache Hadoop 3.0.0 RC1
> >
> > +1 (binding)
> >
> > + Downloaded the binary release
> > + Deployed on a 3 node cluster on CentOS 7.3
> > + Ran some MR jobs, clicked around the UI, etc
> > + Ran some CLI commands (yarn logs, etc)
> >
> > Good job everyone on Hadoop 3!
> >
> >
> > - Robert
> >
> > On Tue, Dec 12, 2017 at 1:56 PM, Arun Suresh  wrote:
> >
> >> +1 (binding)
> >>
> >> - Verified signatures of the source tarball.
> >> - built from source - using the docker build environment.
> >> - set up a pseudo-distributed test cluster.
> >> - ran basic HDFS commands
> >> - ran some basic MR jobs
> >>
> >> Cheers
> >> -Arun
> >>
> >> On Tue, Dec 12, 2017 at 1:52 PM, Andrew Wang 
> >> wrote:
> >>
> >>> Hi everyone,
> >>>
> >>> As a reminder, this vote closes tomorrow at 12:31pm, so please give it
> a
> >>> whack if you have time. There are already enough binding +1s to pass
> this
> >>> vote, but it'd be great to get additional validation.
> >>>
> >>> Thanks to everyone who's voted thus far!
> >>>
> >>> Best,
> >>> Andrew
> >>>
> >>>
> >>>
> >>> On Tue, Dec 12, 2017 at

1 2 3 4 5 6 7 8 >

1 - 100 of 733 matches

Mail list logo