[ https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=829310&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-829310 ]
ASF GitHub Bot logged work on HIVE-26400: ----------------------------------------- Author: ASF GitHub Bot Created on: 28/Nov/22 13:04 Start Date: 28/Nov/22 13:04 Worklog Time Spent: 10m Work Description: abstractdog commented on code in PR #3448: URL: https://github.com/apache/hive/pull/3448#discussion_r1033497364 ########## dev-support/docker/README.md: ########## @@ -0,0 +1,93 @@ +### Introduction + +--- +Run Apache Hive inside docker container in pseudo-distributed mode, with MySQL as its back database. +Provide the following +- Quick-start/Debugging/Prepare a test env for Hive +- Images can be used as the basis for the Kubernetes operator + +### Overview + +--- +#### Files +- docker-compose.yml: Docker compose file +- Dockerfile-*, scripts/docker-entrypoint.sh: Instructions of the image. +- conf/hiveserver2-site.xml: Configuration for HiveServer2 +- conf/metastore-site.xml: Configuration for Hive Metastore +- build.sh Scripts for build images + +### Quickstart + +--- +#### Build images +Hive relies on Hadoop, Tez and MySQL to work correctly. Up to now, there are so many versions that these dependents have been released, including Hive itself, +providing a way to build Hive against a specified version of the dependent sounds reasonable. There are some build args for this purpose, as listed below: +```shell +--hadoop <hadoop version> +--tez <tez version> +--hive <hive version> +``` +If the version is not provided, then it will read the version from the properties in project top `pom.xml`, +that is, `project.version` for Hive, `hadoop.version` for Hadoop and `tez.version` for Tez. For example: + +```shell +./build.sh --hive 3.1.3 +``` +The command will pull the tarballs of Hive 3.1.3, Hadoop `hadoop.version` and Tez `tez.version` from apache repository +to build the target image. + +```shell +./build.sh --hadoop 3.1.0 --tez 0.10.1 +``` +The above command does not specify the Hive version, it will use the local `apache-hive-${project.version}-bin.tar.gz`, +together with Hadoop 3.1.0 and Tez 0.10.1 to build the target image. + +#### Run services + +- Launch a single standalone Metastore + +If you just want to play around with Metastore, run: +```shell +docker run --name metastore-standalone apachehive/metastore:$HIVE_VERSION +``` + +- Launch a single standalone HiveServer2 for a quick start + +The HiveServer2 will be started with an embedded Metastore by initiating: +```shell +docker run --name hiveserver2-standalone apachehive/hiveserver2:$HIVE_VERSION +``` +The data of the HiveServer2 would be lost between container restarts. + +- Launch a cluster with HiveServer2, Metastore and MySQL as its back database. + +To save data between container restarts, Volumes is used to persist data generated by and used by Hive. Just by executing: +```shell +export HIVESERVER2_IMAGE=apachehive/hiveserver2:$HIVE_VERSION +export METASTORE_IMAGE=apachehive/metastore:$HIVE_VERSION +docker network create hive && docker-compose -f docker-compose.yml up -d +``` + +#### Usage + +--- +- Show the containers +```shell +docker ps +``` +- Check HiveServer2 web ui + - Accessed on browser at http://localhost:10002/ +- Run Beeline: +```shell +docker exec -it hiveserver2 beeline -u 'jdbc:hive2://hiveserver2:10000/' +``` +- Test running some queries +```sql +show tables; +create table hive_example(a string, b int) partitioned by(c int); +alter table hive_example add partition(c=1); +insert into hive_example partition(c=1) values('a', 1), ('a', 2),('b',3); +select count(distinct a) from hive_example; +set hive.execution.engine=tez; Review Comment: please remove this clause: we're building this image to run on tez local mode, so let's not give the users the opportunity to think "whether it can run on MR or Spark if I configure so?" also, the data loading commands should already run on tez: we should not rely on MR in any way (I tried it out, worked with tez out of the box, so I guess you should just have to add hive.execution.engine to the hiveserver2-site.xml ########## dev-support/docker/README.md: ########## @@ -0,0 +1,93 @@ +### Introduction + +--- +Run Apache Hive inside docker container in pseudo-distributed mode, with MySQL as its back database. +Provide the following +- Quick-start/Debugging/Prepare a test env for Hive +- Images can be used as the basis for the Kubernetes operator + +### Overview + +--- +#### Files +- docker-compose.yml: Docker compose file +- Dockerfile-*, scripts/docker-entrypoint.sh: Instructions of the image. +- conf/hiveserver2-site.xml: Configuration for HiveServer2 +- conf/metastore-site.xml: Configuration for Hive Metastore +- build.sh Scripts for build images + +### Quickstart + +--- +#### Build images +Hive relies on Hadoop, Tez and MySQL to work correctly. Up to now, there are so many versions that these dependents have been released, including Hive itself, +providing a way to build Hive against a specified version of the dependent sounds reasonable. There are some build args for this purpose, as listed below: +```shell +--hadoop <hadoop version> +--tez <tez version> +--hive <hive version> +``` +If the version is not provided, then it will read the version from the properties in project top `pom.xml`, +that is, `project.version` for Hive, `hadoop.version` for Hadoop and `tez.version` for Tez. For example: + +```shell +./build.sh --hive 3.1.3 +``` +The command will pull the tarballs of Hive 3.1.3, Hadoop `hadoop.version` and Tez `tez.version` from apache repository +to build the target image. + +```shell +./build.sh --hadoop 3.1.0 --tez 0.10.1 +``` +The above command does not specify the Hive version, it will use the local `apache-hive-${project.version}-bin.tar.gz`, +together with Hadoop 3.1.0 and Tez 0.10.1 to build the target image. + +#### Run services + +- Launch a single standalone Metastore + +If you just want to play around with Metastore, run: +```shell +docker run --name metastore-standalone apachehive/metastore:$HIVE_VERSION +``` + +- Launch a single standalone HiveServer2 for a quick start + +The HiveServer2 will be started with an embedded Metastore by initiating: +```shell +docker run --name hiveserver2-standalone apachehive/hiveserver2:$HIVE_VERSION +``` +The data of the HiveServer2 would be lost between container restarts. + +- Launch a cluster with HiveServer2, Metastore and MySQL as its back database. + +To save data between container restarts, Volumes is used to persist data generated by and used by Hive. Just by executing: Review Comment: nit: "volumes are" also, this line emphasizes used volumes, if it refers to hive-db and warehouse, please mention them specifically ########## dev-support/docker/README.md: ########## @@ -0,0 +1,93 @@ +### Introduction + +--- +Run Apache Hive inside docker container in pseudo-distributed mode, with MySQL as its back database. +Provide the following +- Quick-start/Debugging/Prepare a test env for Hive +- Images can be used as the basis for the Kubernetes operator + +### Overview + +--- +#### Files +- docker-compose.yml: Docker compose file +- Dockerfile-*, scripts/docker-entrypoint.sh: Instructions of the image. +- conf/hiveserver2-site.xml: Configuration for HiveServer2 +- conf/metastore-site.xml: Configuration for Hive Metastore +- build.sh Scripts for build images + +### Quickstart + +--- +#### Build images +Hive relies on Hadoop, Tez and MySQL to work correctly. Up to now, there are so many versions that these dependents have been released, including Hive itself, +providing a way to build Hive against a specified version of the dependent sounds reasonable. There are some build args for this purpose, as listed below: +```shell +--hadoop <hadoop version> +--tez <tez version> +--hive <hive version> +``` +If the version is not provided, then it will read the version from the properties in project top `pom.xml`, +that is, `project.version` for Hive, `hadoop.version` for Hadoop and `tez.version` for Tez. For example: + +```shell +./build.sh --hive 3.1.3 +``` +The command will pull the tarballs of Hive 3.1.3, Hadoop `hadoop.version` and Tez `tez.version` from apache repository +to build the target image. + +```shell +./build.sh --hadoop 3.1.0 --tez 0.10.1 +``` +The above command does not specify the Hive version, it will use the local `apache-hive-${project.version}-bin.tar.gz`, +together with Hadoop 3.1.0 and Tez 0.10.1 to build the target image. + +#### Run services + +- Launch a single standalone Metastore + +If you just want to play around with Metastore, run: +```shell +docker run --name metastore-standalone apachehive/metastore:$HIVE_VERSION +``` + +- Launch a single standalone HiveServer2 for a quick start + +The HiveServer2 will be started with an embedded Metastore by initiating: +```shell +docker run --name hiveserver2-standalone apachehive/hiveserver2:$HIVE_VERSION +``` +The data of the HiveServer2 would be lost between container restarts. + +- Launch a cluster with HiveServer2, Metastore and MySQL as its back database. + +To save data between container restarts, Volumes is used to persist data generated by and used by Hive. Just by executing: +```shell +export HIVESERVER2_IMAGE=apachehive/hiveserver2:$HIVE_VERSION +export METASTORE_IMAGE=apachehive/metastore:$HIVE_VERSION +docker network create hive && docker-compose -f docker-compose.yml up -d +``` + +#### Usage + +--- +- Show the containers +```shell +docker ps +``` +- Check HiveServer2 web ui + - Accessed on browser at http://localhost:10002/ +- Run Beeline: +```shell +docker exec -it hiveserver2 beeline -u 'jdbc:hive2://hiveserver2:10000/' Review Comment: please mention that if we have beeline installed on host machine, hs2 can be simply reached as: ``` beeline -u 'jdbc:hive2://localhost:10000/' ``` ########## dev-support/docker/README.md: ########## @@ -0,0 +1,93 @@ +### Introduction + +--- +Run Apache Hive inside docker container in pseudo-distributed mode, with MySQL as its back database. +Provide the following +- Quick-start/Debugging/Prepare a test env for Hive +- Images can be used as the basis for the Kubernetes operator + +### Overview + +--- +#### Files +- docker-compose.yml: Docker compose file +- Dockerfile-*, scripts/docker-entrypoint.sh: Instructions of the image. +- conf/hiveserver2-site.xml: Configuration for HiveServer2 +- conf/metastore-site.xml: Configuration for Hive Metastore +- build.sh Scripts for build images + +### Quickstart + +--- +#### Build images +Hive relies on Hadoop, Tez and MySQL to work correctly. Up to now, there are so many versions that these dependents have been released, including Hive itself, +providing a way to build Hive against a specified version of the dependent sounds reasonable. There are some build args for this purpose, as listed below: +```shell +--hadoop <hadoop version> +--tez <tez version> +--hive <hive version> +``` +If the version is not provided, then it will read the version from the properties in project top `pom.xml`, +that is, `project.version` for Hive, `hadoop.version` for Hadoop and `tez.version` for Tez. For example: + +```shell +./build.sh --hive 3.1.3 +``` +The command will pull the tarballs of Hive 3.1.3, Hadoop `hadoop.version` and Tez `tez.version` from apache repository +to build the target image. + +```shell +./build.sh --hadoop 3.1.0 --tez 0.10.1 +``` +The above command does not specify the Hive version, it will use the local `apache-hive-${project.version}-bin.tar.gz`, +together with Hadoop 3.1.0 and Tez 0.10.1 to build the target image. + +#### Run services + +- Launch a single standalone Metastore + +If you just want to play around with Metastore, run: +```shell +docker run --name metastore-standalone apachehive/metastore:$HIVE_VERSION Review Comment: 1. what is "apachehive" here? do we have a docker hub account? at the moment I guess it's just a common namespace (to tag images), please clarify this in a comment somewhere: ``` docker run --name metastore-standalone apachehive/metastore:$HIVE_VERSION ``` above command makes me think we already have the images pushed somewhere, but instead, this command just relies on the fact that we built and tagged the images by build.sh to have this namespace, so it's a prerequisite 2. in build.sh, we made the $HIVE_VERSION be automagically resolved...so a README cannot contain usage that refers to this, instead, let's mention a way to resolve it, etc.: ``` #assuming that you're relying on current hive.version from pom.xml export HIVE_VERSION=$(...example command to resolve hive.version from pom.xml, it should work from hive root folder and dev-support/docker too...) ...rest of the docker run commands... ``` Issue Time Tracking ------------------- Worklog Id: (was: 829310) Time Spent: 3h 50m (was: 3h 40m) > Provide docker images for Hive > ------------------------------ > > Key: HIVE-26400 > URL: https://issues.apache.org/jira/browse/HIVE-26400 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure > Reporter: Zhihua Deng > Assignee: Zhihua Deng > Priority: Blocker > Labels: hive-4.0.0-must, pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > > Make Apache Hive be able to run inside docker container in pseudo-distributed > mode, with MySQL/Derby as its back database, provide the following: > * Quick-start/Debugging/Prepare a test env for Hive; > * Tools to build target image with specified version of Hive and its > dependencies; > * Images can be used as the basis for the Kubernetes operator. -- This message was sent by Atlassian Jira (v8.20.10#820010)