FANNG1 commented on code in PR #98:
URL:
https://github.com/apache/gravitino-playground/pull/98#discussion_r1830552164
##########
docker-compose.yaml:
##########
@@ -145,6 +145,12 @@ services:
volumes:
- ./init/jupyter:/tmp/gravitino
entrypoint: /bin/bash /tmp/gravitino/init.sh
+ environment:
+ -
HADOOP_CLASSPATH=/tmp/gravitino/packages/hadoop-2.7.3/etc/hadoop:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/common/lib/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/common/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/hdfs:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/hdfs/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/yarn/lib/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/yarn/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/mapreduce/*:/tmp/gravitino/packages/contrib/capacity-scheduler/*.jar
Review Comment:
is it necessary to add yarn, mapreduce jar to the CLASSPATH? seems only need
HDFS
##########
init/jupyter/jupyter-dependency.sh:
##########
@@ -33,3 +33,27 @@ fi
ls "${jupyter_dir}/packages/" | xargs -I {} rm "${jupyter_dir}/packages/"{}
find "${jupyter_dir}/../spark/packages/" | grep jar | xargs -I {} ln {}
"${jupyter_dir}/packages/"
+FLINK_HIVE_CONNECTOR_JAR="https://repo1.maven.org/maven2/org/apache/flink/flink-sql-connector-hive-2.3.10_2.12/1.20.0/flink-sql-connector-hive-2.3.10_2.12-1.20.0.jar"
+FLINK_HIVE_CONNECTOR_MD5="${FLINK_HIVE_CONNECTOR_JAR}.md5"
+download_and_verify "${FLINK_HIVE_CONNECTOR_JAR}"
"${FLINK_HIVE_CONNECTOR_MD5}" "${jupyter_dir}"
+
+GRAVITINO_FLINK_JAR="https://repo1.maven.org/maven2/org/apache/gravitino/gravitino-flink-1.18_2.12/0.6.1-incubating/gravitino-flink-1.18_2.12-0.6.1-incubating.jar"
+GRAVITINO_FLINK_MD5="${GRAVITINO_FLINK_JAR}.md5"
+download_and_verify "${GRAVITINO_FLINK_JAR}" "${GRAVITINO_FLINK_MD5}"
"${jupyter_dir}"
+
+GRAVITINO_FLINK_CONNECTOR_RUNTIME_JAR="https://repo1.maven.org/maven2/org/apache/gravitino/gravitino-flink-connector-runtime-1.18_2.12/0.6.1-incubating/gravitino-flink-connector-runtime-1.18_2.12-0.6.1-incubating.jar"
+GRAVITINO_FLINK_CONNECTOR_RUNTIME_MD5="${GRAVITINO_FLINK_CONNECTOR_RUNTIME_JAR}.md5"
+download_and_verify "${GRAVITINO_FLINK_CONNECTOR_RUNTIME_JAR}"
"${GRAVITINO_FLINK_CONNECTOR_RUNTIME_MD5}" "${jupyter_dir}"
+
+
+HADOOP_VERSION="2.7.3"
+HADOOP_URL="https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz"
Review Comment:
it may take too much time to download hadoop in low network environment,
only need HDFS client here?
##########
.gitignore:
##########
@@ -1,2 +1,3 @@
**/.idea
**/.DS_Store
+**/packages/**
Review Comment:
seems a little odd to ignore packages here
##########
docker-compose.yaml:
##########
@@ -145,6 +145,12 @@ services:
volumes:
- ./init/jupyter:/tmp/gravitino
entrypoint: /bin/bash /tmp/gravitino/init.sh
+ environment:
+ -
HADOOP_CLASSPATH=/tmp/gravitino/packages/hadoop-2.7.3/etc/hadoop:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/common/lib/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/common/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/hdfs:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/hdfs/lib/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/hdfs/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/yarn/lib/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/yarn/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/mapreduce/lib/*:/tmp/gravitino/packages/hadoop-2.7.3/share/hadoop/mapreduce/*:/tmp/gravitino/packages/contrib/capacity-scheduler/*.jar
+ - NB_USER=my-username
+ - GRANT_SUDO=yes
Review Comment:
why adding belew environment and use root?
```
GRANT_SUDO=yes
CHOWN_HOME=yes
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]