jerryshao commented on code in PR #5806: URL: https://github.com/apache/gravitino/pull/5806#discussion_r1897363672
########## docs/cloud-storage-fileset-example.md: ########## @@ -0,0 +1,676 @@ +--- +title: "How to use cloud storage fileset" +slug: /how-to-use-cloud-storage-fileset +keyword: fileset S3 GCS ADLS OSS +license: "This software is licensed under the Apache License version 2." +--- + +This document aims to provide a comprehensive guide on how to use cloud storage fileset created by Gravitino, it usually contains the following sections: + + +## Start up Gravitino server + +### Start up Gravitino server + +Before running the Gravitino server, you need to put the following jars into the fileset class path located in `${GRAVITINO_HOME}/catalogs/hadoop/libs`. For example, if you are using S3, you need to put gravitino-aws-hadoop-bundles-{version}.jar into the fileset class path. + + +| Storage type | Description | Jar file | Since Version | +|--------------|---------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|------------------| +| Local file | The local file system. | (none) | 0.5.0 | +| HDFS | HDFS file system. | (none) | 0.5.0 | +| S3 | AWS S3 storage. | [gravitino-aws-hadoop-bundle](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-hadoop-aws-bundle) | 0.8.0-incubating | +| GCS | Google Cloud Storage. | [gravitino-gcp-hadoop-bundle](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-hadoop-gcp-bundle) | 0.8.0-incubating | +| OSS | Aliyun OSS storage. | [gravitino-aliyun-hadoop-bundle](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-hadoop-aliyun-bundle) | 0.8.0-incubating | +| ABS | Azure Blob Storage (aka. ABS, or Azure Data Lake Storage (v2) | [gravitino-azure-hadoop-bundle](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-hadoop-azure-bundle) | 0.8.0-incubating | + +After putting the jars into the fileset class path, you can start up the Gravitino server by running the following command: + +```shell +cd ${GRAVITINO_HOME} +bin/gravitino.sh start +``` + +### Bundle jars + +`gravitino-{aws,gcp,aliyun,azure}-hadoop-bundle` are the jars that contain all the necessary classes to access the corresponding cloud storages, for instance, `gravitino-aws-hadoop-bundle.jar` contains the all necessary classes including `hadoop-common`(hadoop-3.3.1) and `hadoop-aws` to access the S3 storage. +**They are used in the scenario where there is no hadoop environment in the runtime.** + +**If there is already hadoop environment in the runtime, you can use the `gravitino-{aws,gcp,aliyun,azure}-bundle.jar` that does not contain the cloud storage classes (like hadoop-aws) and hadoop environment, you can manually add the necessary jars to the classpath.** + +The following table demonstrates what jars are necessary for different cloud storage filesets: Review Comment: "which jars" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gravitino.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org