Re: [VOTE] Release Apache Hadoop 3.0.0-alpha2 RC0

Zhihai Xu Wed, 25 Jan 2017 09:49:24 -0800

Thanks Andrew for creating release Hadoop 3.0.0-alpha2 RC0
+1 ( non-binding)


--Downloaded source and built from it.
--Deployed on a pseudo-distributed cluster.
--Ran sample MR jobs and tested with basics HDFS operations.
--Did a sanity check for RM and NM UI.

Best,
zhihai

On Wed, Jan 25, 2017 at 8:07 AM, Kuhu Shukla <kshu...@yahoo-inc.com.invalid>
wrote:

> +1 (non-binding)* Built from source* Deployed on a pseudo-distributed
> cluster (MAC)* Ran wordcount and sleep jobs.
>
>
>     On Wednesday, January 25, 2017 3:21 AM, Marton Elek <
> me...@hortonworks.com> wrote:
>
>
>  Hi,
>
> I also did a quick smoketest with the provided 3.0.0-alpha2 binaries:
>
> TLDR; It works well
>
> Environment:
>  * 5 hosts, docker based hadoop cluster, every component in separated
> container (5 datanode/5 nodemanager/...)
>  * Components are:
>   * Hdfs/Yarn cluster (upgraded 2.7.3 to 3.0.0-alpha2 using the binary
> package for vote)
>   * Zeppelin 0.6.2/0.7.0-RC2
>   * Spark 2.0.2/2.1.0
>   * HBase 1.2.4 + zookeeper
>   * + additional docker containers for configuration management and
> monitoring
> * No HA, no kerberos, no wire encryption
>
>  * HDFS cluster upgraded successfully from 2.7.3 (with about 200G data)
>  * Imported 100G data to HBase successfully
>  * Started Spark jobs to process 1G json from HDFS (using
> spark-master/slave cluster). It worked even when I used the Zeppelin 0.6.2
> + Spark 2.0.2 (with old hadoop client included). Obviously the old version
> can't use the new Yarn cluster as the token file format has been changed.
>  * I upgraded my setup to use Zeppelin 0.7.0-RC2/Spark 2.1.0(distribution
> without hadoop)/hadoop 3.0.0-alpha2. It also worked well: processed the
> same json files from HDFS with spark jobs (from zeppelin) using yarn
> cluster (master: yarn deploy-mode: cluster)
>  * Started spark jobs (with spark submit, master: yarn) to count records
> from the hbase database: OK
>  * Started example Mapreduce jobs from distribution over yarn. It was OK
> but only with specific configuration (see bellow)
>
> So my overall impression that it works very well (at least with my
> 'smalldata')
>
> Some notes (none of them are blocking):
>
> 1. To run the example mapreduce jobs I defined HADOOP_MAPRED_HOME at
> command line:
> ./bin/yarn jar 
> share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha2.jar
> pi -Dyarn.app.mapreduce.am.env="HADOOP_MAPRED_HOME={{HADOOP_COMMON_HOME}}"
> -Dmapreduce.admin.user.env="HADOOP_MAPRED_HOME={{HADOOP_COMMON_HOME}}" 10
> 10
>
> And in the yarn-site:
>
> yarn.nodemanager.env-whitelist: JAVA_HOME,HADOOP_COMMON_HOME,
> HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_
> DISTCACHE,HADOOP_YARN_HOME,MAPRED_HOME_DIR
>
> I don't know the exact reason for the change, but the 2.7.3 was more
> userfriendly as the example could be run without specific configuration.
>
> For the same reason I didn't start hbase mapreduce job with hbase command
> line app (There could be some option for hbase to define MAPRED_HOME_DIR as
> well, but by default I got ClassNotFoundException for one of the MR class)
>
> 2. For the records: The logging and htrace classes are excluded from the
> shaded hadoop client jar so I added it manually one by one to the spark
> (spark 2.1.0 distribution without hadoop):
>
> RUN wget `cat url` -O spark.tar.gz && tar zxf spark.tar.gz && rm
> spark.tar.gz && mv spark* spark
> RUN cp /opt/hadoop/share/hadoop/client/hadoop-client-api-3.0.0-alpha2.jar
> /opt/spark/jars
> RUN cp /opt/hadoop/share/hadoop/client/hadoop-client-runtime-3.0.0-alpha2.jar
> /opt/spark/jars
> ADD https://repo1.maven.org/maven2/org/slf4j/slf4j-
> log4j12/1.7.10/slf4j-log4j12-1.7.10.jar /opt/spark/jars
> ADD https://repo1.maven.org/maven2/org/apache/htrace/
> htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar
> /opt/spark/jars
> ADD https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.
> 7.10/slf4j-api-1.7.10.jar /opt/spark/jars/
> ADD https://repo1.maven.org/maven2/log4j/log4j/1.2.17/log4j-1.2.17.jar
> /opt/spark/jars
>
> With this jars files spark 2.1.0 works well with the alpha2 version of
> HDFS and YARN.
>
> 3. The messages "Upgrade in progress. Not yet finalized." wasn't
> disappeared from the namenode webui but the cluster works well.
>
> Most probably I missed to do something, but it's a little bit confusing.
>
> (I checked the REST call, it is the jmx bean who reports that it was not
> yet finalized, the code of the webpage seems to be ok.)
>
> Regards
> Marton
>
> On Jan 25, 2017, at 8:38 AM, Yongjun Zhang <yjzhan...@apache.org<mailto:y
> jzhan...@apache.org>> wrote:
>
> Thanks Andrew much for the work here!
>
> +1 (binding).
>
> - Downloaded both binary and src tarballs
> - Verified md5 checksum and signature for both
> - Built from source tarball
> - Deployed 2 pseudo clusters, one with the released tarball and the other
>  with what I built from source, and did the following on both:
>     - Run basic HDFS operations, snapshots and distcp jobs
>     - Run pi job
>     - Examined HDFS webui, YARN webui.
>
> Best,
>
> --Yongjun
>
>
> On Tue, Jan 24, 2017 at 3:56 PM, Eric Badger <ebad...@yahoo-inc.com.invalid
> <mailto:ebad...@yahoo-inc.com.invalid>>
> wrote:
>
> +1 (non-binding)
> - Verified signatures and md5- Built from source- Started single-node
> cluster on my mac- Ran some sleep jobs
> Eric
>
>   On Tuesday, January 24, 2017 4:32 PM, Yufei Gu <flyrain...@gmail.com
> <mailto:flyrain...@gmail.com>>
> wrote:
>
>
> Hi Andrew,
>
> Thanks for working on this.
>
> +1 (Non-Binding)
>
> 1. Downloaded the binary and verified the md5.
> 2. Deployed it on 3 node cluster with 1 ResourceManager and 2 NodeManager.
> 3. Set YARN to use Fair Scheduler.
> 4. Ran MapReduce jobs Pi
> 5. Verified Hadoop version command output is correct.
>
> Best,
>
> Yufei
>
> On Tue, Jan 24, 2017 at 3:02 AM, Marton Elek <me...@hortonworks.com
> <mailto:me...@hortonworks.com>>
> wrote:
>
> ]>
> minicluster is kind of weird on filesystems that don't support mixed
> case, like OS X's default HFS+.
>
> $  jar tf hadoop-client-minicluster-3.0.0-alpha3-SNAPSHOT.jar | grep
> -i
> license
> LICENSE.txt
> license/
> license/LICENSE
> license/LICENSE.dom-documentation.txt
> license/LICENSE.dom-software.txt
> license/LICENSE.sax.txt
> license/NOTICE
> license/README.dom.txt
> license/README.sax.txt
> LICENSE
> Grizzly_THIRDPARTYLICENSEREADME.txt
>
>
> I added a patch to https://issues.apache.org/jira/browse/HADOOP-14018 to
> add the missing META-INF/LICENSE.txt to the shaded files.
>
> Question: what should be done with the other LICENSE files in the
> minicluster. Can we just exclude them (from legal point of view)?
>
> Regards,
> Marton
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org<mailto:
> yarn-dev-unsubscr...@hadoop.apache.org>
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org<mailto:
> yarn-dev-h...@hadoop.apache.org>
>
>
>
>
>
>
>
>
>

Re: [VOTE] Release Apache Hadoop 3.0.0-alpha2 RC0

Reply via email to