This is an automated email from the ASF dual-hosted git repository.
jshao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/gravitino.git
The following commit(s) were added to refs/heads/main by this push:
new aff2c4a897 [#7422] feat(catalog-fileset): Update the doc to change to
use Fileset catalog instead of Hadoop catalog (#7455)
aff2c4a897 is described below
commit aff2c4a897fb84df55cc1386deab93c47889996e
Author: Jerry Shao <[email protected]>
AuthorDate: Tue Jun 24 14:06:34 2025 +0800
[#7422] feat(catalog-fileset): Update the doc to change to use Fileset
catalog instead of Hadoop catalog (#7455)
### What changes were proposed in this pull request?
This PR update the doc and openAPI from Hadoop catalog to Fileset
catalog.
### Why are the changes needed?
This is a follow-up work of #7421
Fix: #7422
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Local verification.
---------
Co-authored-by: Copilot <[email protected]>
---
dev/docker/gravitino/gravitino-dependency.sh | 10 +--
docs/docker-image-details.md | 3 +-
docs/fileset-catalog-index.md | 26 +++++++
...g-with-adls.md => fileset-catalog-with-adls.md} | 46 ++++++-----
...log-with-gcs.md => fileset-catalog-with-gcs.md} | 47 ++++++------
...log-with-oss.md => fileset-catalog-with-oss.md} | 47 ++++++------
...talog-with-s3.md => fileset-catalog-with-s3.md} | 62 ++++++++-------
docs/{hadoop-catalog.md => fileset-catalog.md} | 88 ++++++++++++----------
docs/gravitino-server-config.md | 57 +++++++-------
docs/hadoop-catalog-index.md | 26 -------
docs/hive-catalog-with-cloud-storage.md | 4 +-
docs/how-to-install.md | 5 +-
docs/how-to-use-gvfs.md | 17 +++--
docs/how-to-use-python-client.md | 2 +-
docs/index.md | 4 +-
docs/manage-fileset-metadata-using-gravitino.md | 35 ++++-----
docs/open-api/catalogs.yaml | 10 +--
docs/webui.md | 6 +-
18 files changed, 251 insertions(+), 244 deletions(-)
diff --git a/dev/docker/gravitino/gravitino-dependency.sh
b/dev/docker/gravitino/gravitino-dependency.sh
index 27b99749da..ff6fe9164a 100755
--- a/dev/docker/gravitino/gravitino-dependency.sh
+++ b/dev/docker/gravitino/gravitino-dependency.sh
@@ -73,11 +73,11 @@ mkdir -p "${gravitino_dir}/packages/gravitino/bin"
cp "${gravitino_dir}/rewrite_gravitino_server_config.py"
"${gravitino_dir}/packages/gravitino/bin/"
cp "${gravitino_dir}/start-gravitino.sh"
"${gravitino_dir}/packages/gravitino/bin/"
-# Copy the Aliyun, AWS, GCP and Azure bundles to the Hadoop catalog libs
-cp ${gravitino_home}/bundles/aliyun-bundle/build/libs/*.jar
"${gravitino_dir}/packages/gravitino/catalogs/hadoop/libs"
-cp ${gravitino_home}/bundles/aws-bundle/build/libs/*.jar
"${gravitino_dir}/packages/gravitino/catalogs/hadoop/libs"
-cp ${gravitino_home}/bundles/gcp-bundle/build/libs/*.jar
"${gravitino_dir}/packages/gravitino/catalogs/hadoop/libs"
-cp ${gravitino_home}/bundles/azure-bundle/build/libs/*.jar
"${gravitino_dir}/packages/gravitino/catalogs/hadoop/libs"
+# Copy the Aliyun, AWS, GCP and Azure bundles to the Fileset catalog libs
+cp ${gravitino_home}/bundles/aliyun-bundle/build/libs/*.jar
"${gravitino_dir}/packages/gravitino/catalogs/fileset/libs"
+cp ${gravitino_home}/bundles/aws-bundle/build/libs/*.jar
"${gravitino_dir}/packages/gravitino/catalogs/fileset/libs"
+cp ${gravitino_home}/bundles/gcp-bundle/build/libs/*.jar
"${gravitino_dir}/packages/gravitino/catalogs/fileset/libs"
+cp ${gravitino_home}/bundles/azure-bundle/build/libs/*.jar
"${gravitino_dir}/packages/gravitino/catalogs/fileset/libs"
cp ${gravitino_home}/bundles/aws/build/libs/*.jar
"${gravitino_iceberg_rest_dir}"
cp ${gravitino_home}/bundles/gcp/build/libs/*.jar
"${gravitino_iceberg_rest_dir}"
diff --git a/docs/docker-image-details.md b/docs/docker-image-details.md
index 41dbf9e8dd..64d908becf 100644
--- a/docs/docker-image-details.md
+++ b/docs/docker-image-details.md
@@ -399,6 +399,7 @@ Changelog
- datastrato/gravitino-ci-ranger:0.1.0
- Docker image `datastrato/gravitino-ci-ranger:0.1.0`
- Support Apache Ranger 2.4.0
- - Use environment variable `RANGER_PASSWORD` to set up Apache Ranger admin
password, Please notice Apache Ranger Password should be minimum 8 characters
with min one alphabet and one numeric.
+ - Use environment variable `RANGER_PASSWORD` to set up Apache Ranger admin
password, please
+ notice Apache Ranger Password should be minimum 8 characters with min one
alphabet and one numeric.
- Expose ports:
- `6080` Apache Ranger admin port
diff --git a/docs/fileset-catalog-index.md b/docs/fileset-catalog-index.md
new file mode 100644
index 0000000000..5c60797235
--- /dev/null
+++ b/docs/fileset-catalog-index.md
@@ -0,0 +1,26 @@
+---
+title: "Fileset catalog index"
+slug: /fileset-catalog-index
+date: 2025-01-13
+keyword: Fileset catalog index S3 GCS ADLS OSS
+license: "This software is licensed under the Apache License version 2."
+---
+
+### Fileset catalog overall
+
+Gravitino Fileset catalog index includes the following chapters:
+
+- [Fileset catalog overview and features](./fileset-catalog.md): This chapter
provides an overview of the Fileset catalog, its features, capabilities and
related configurations.
+- [Manage Fileset catalog with Gravitino
API](./manage-fileset-metadata-using-gravitino.md): This chapter explains how
to manage fileset metadata using Gravitino API and provides detailed examples.
+- [Using Fileset catalog with Gravitino virtual file
system](how-to-use-gvfs.md): This chapter explains how to use Fileset catalog
with the Gravitino virtual file system and provides detailed examples.
+
+### Fileset catalog with cloud storage
+
+Apart from the above, you can also refer to the following topics to manage and
access cloud storage like S3, GCS, ADLS, and OSS:
+
+- [Using Fileset catalog to manage S3](./fileset-catalog-with-s3.md).
+- [Using Fileset catalog to manage GCS](./fileset-catalog-with-gcs.md).
+- [Using Fileset catalog to manage ADLS](./fileset-catalog-with-adls.md).
+- [Using Fileset catalog to manage OSS](./fileset-catalog-with-oss.md).
+
+More storage options will be added soon. Stay tuned!
diff --git a/docs/hadoop-catalog-with-adls.md
b/docs/fileset-catalog-with-adls.md
similarity index 91%
rename from docs/hadoop-catalog-with-adls.md
rename to docs/fileset-catalog-with-adls.md
index 21d64af84d..720980ae7f 100644
--- a/docs/hadoop-catalog-with-adls.md
+++ b/docs/fileset-catalog-with-adls.md
@@ -1,37 +1,37 @@
---
-title: "Hadoop catalog with ADLS"
-slug: /hadoop-catalog-with-adls
+title: "Fileset catalog with ADLS"
+slug: /fileset-catalog-with-adls
date: 2025-01-03
-keyword: Hadoop catalog ADLS
+keyword: Fileset catalog ADLS
license: "This software is licensed under the Apache License version 2."
---
-This document describes how to configure a Hadoop catalog with ADLS (aka.
Azure Blob Storage (ABS), or Azure Data Lake Storage (v2)).
+This document describes how to configure a Fileset catalog with ADLS (aka.
Azure Blob Storage (ABS), or Azure Data Lake Storage (v2)).
## Prerequisites
-To set up a Hadoop catalog with ADLS, follow these steps:
+To set up a Fileset catalog with ADLS, follow these steps:
1. Download the
[`gravitino-azure-bundle-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-azure-bundle)
file.
-2. Place the downloaded file into the Gravitino Hadoop catalog classpath at
`${GRAVITINO_HOME}/catalogs/hadoop/libs/`.
+2. Place the downloaded file into the Gravitino Fileset catalog classpath at
`${GRAVITINO_HOME}/catalogs/fileset/libs/`.
3. Start the Gravitino server by running the following command:
```bash
$ ${GRAVITINO_HOME}/bin/gravitino-server.sh start
```
-Once the server is up and running, you can proceed to configure the Hadoop
catalog with ADLS. In the rest of this document we will use
`http://localhost:8090` as the Gravitino server URL, please replace it with
your actual server URL.
+Once the server is up and running, you can proceed to configure the Fileset
catalog with ADLS. In the rest of this document we will use
`http://localhost:8090` as the Gravitino server URL, please replace it with
your actual server URL.
-## Configurations for creating a Hadoop catalog with ADLS
+## Configurations for creating a Fileset catalog with ADLS
-### Configuration for a ADLS Hadoop catalog
+### Configuration for a ADLS Fileset catalog
-Apart from configurations mentioned in
[Hadoop-catalog-catalog-configuration](./hadoop-catalog.md#catalog-properties),
the following properties are required to configure a Hadoop catalog with ADLS:
+Apart from configurations mentioned in
[fileset-catalog-catalog-configuration](./fileset-catalog.md#catalog-properties),
the following properties are required to configure a Fileset catalog with ADLS:
| Configuration item | Description
[...]
|-------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
| `filesystem-providers` | The file system providers to add. Set it to
`abs` if it's a Azure Blob Storage fileset, or a comma separated string that
contains `abs` like `oss,abs,s3` to support multiple kinds of fileset including
`abs`.
[...]
-| `default-filesystem-provider` | The name default filesystem providers of
this Hadoop catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for Azure Blob Storage, if we set this value, we can
omit the prefix 'abfss://' in the location.
[...]
+| `default-filesystem-provider` | The name default filesystem providers of
this Fileset catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for Azure Blob Storage, if we set this value, we can
omit the prefix 'abfss://' in the location.
[...]
| `azure-storage-account-name ` | The account name of Azure Blob Storage.
[...]
| `azure-storage-account-key` | The account key of Azure Blob Storage.
[...]
| `credential-providers` | The credential provider types, separated by
comma, possible value can be `adls-token`, `azure-account-key`. As the default
authentication type is using account name and account key as the above, this
configuration can enable credential vending provided by Gravitino server and
client will no longer need to provide authentication information like
account_name/account_key to access ADLS by GVFS. Once it's set, more
configuration items are needed to make it [...]
@@ -39,19 +39,19 @@ Apart from configurations mentioned in
[Hadoop-catalog-catalog-configuration](./
### Configurations for a schema
-Refer to [Schema configurations](./hadoop-catalog.md#schema-properties) for
more details.
+Refer to [Schema configurations](./fileset-catalog.md#schema-properties) for
more details.
### Configurations for a fileset
-Refer to [Fileset configurations](./hadoop-catalog.md#fileset-properties) for
more details.
+Refer to [Fileset configurations](./fileset-catalog.md#fileset-properties) for
more details.
-## Example of creating Hadoop catalog with ADLS
+## Example of creating Fileset catalog with ADLS
-This section demonstrates how to create the Hadoop catalog with ADLS in
Gravitino, with a complete example.
+This section demonstrates how to create the Fileset catalog with ADLS in
Gravitino, with a complete example.
-### Step1: Create a Hadoop catalog with ADLS
+### Step1: Create a Fileset catalog with ADLS
-First, you need to create a Hadoop catalog with ADLS. The following example
shows how to create a Hadoop catalog with ADLS:
+First, you need to create a Fileset catalog with ADLS. The following example
shows how to create a Fileset catalog with ADLS:
<Tabs groupId="language" queryString>
<TabItem value="shell" label="Shell">
@@ -62,7 +62,6 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
"name": "example_catalog",
"type": "FILESET",
"comment": "This is a ADLS fileset catalog",
- "provider": "hadoop",
"properties": {
"location": "abfss://[email protected]/path",
"azure-storage-account-name": "The account name of the Azure Blob Storage",
@@ -90,7 +89,6 @@ Map<String, String> adlsProperties = ImmutableMap.<String,
String>builder()
Catalog adlsCatalog = gravitinoClient.createCatalog("example_catalog",
Type.FILESET,
- "hadoop", // provider, Gravitino only supports "hadoop" for now.
"This is a ADLS fileset catalog",
adlsProperties);
// ...
@@ -111,7 +109,7 @@ adls_properties = {
adls_properties = gravitino_client.create_catalog(name="example_catalog",
catalog_type=Catalog.Type.FILESET,
- provider="hadoop",
+ provider=None,
comment="This is a ADLS
fileset catalog",
properties=adls_properties)
```
@@ -480,9 +478,11 @@ For other use cases, please refer to the [Gravitino
Virtual File System](./how-t
Since 0.8.0-incubating, Gravitino supports credential vending for ADLS
fileset. If the catalog has been [configured with
credential](./security/credential-vending.md), you can access ADLS fileset
without providing authentication information like `azure-storage-account-name`
and `azure-storage-account-key` in the properties.
-### How to create an ADLS Hadoop catalog with credential vending
+### How to create an ADLS Fileset catalog with credential vending
-Apart from configuration method in
[create-adls-hadoop-catalog](#configuration-for-a-adls-hadoop-catalog),
properties needed by
[adls-credential](./security/credential-vending.md#adls-credentials) should
also be set to enable credential vending for ADLS fileset. Take `adls-token`
credential provider for example:
+Apart from configuration method in
[create-adls-fileset-catalog](#configuration-for-a-adls-fileset-catalog),
+properties needed by
[adls-credential](./security/credential-vending.md#adls-credentials) should
+also be set to enable credential vending for ADLS fileset. Take `adls-token`
credential provider for example:
```shell
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
@@ -490,7 +490,6 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json"
\
"name": "adls-catalog-with-token",
"type": "FILESET",
"comment": "This is a ADLS fileset catalog",
- "provider": "hadoop",
"properties": {
"location": "abfss://[email protected]/path",
"azure-storage-account-name": "The account name of the Azure Blob Storage",
@@ -542,4 +541,3 @@ spark = SparkSession.builder
```
Python client and Hadoop command are similar to the above examples.
-
diff --git a/docs/hadoop-catalog-with-gcs.md b/docs/fileset-catalog-with-gcs.md
similarity index 90%
rename from docs/hadoop-catalog-with-gcs.md
rename to docs/fileset-catalog-with-gcs.md
index c89c380218..f4499ee905 100644
--- a/docs/hadoop-catalog-with-gcs.md
+++ b/docs/fileset-catalog-with-gcs.md
@@ -1,55 +1,55 @@
---
-title: "Hadoop catalog with GCS"
-slug: /hadoop-catalog-with-gcs
+title: "Fileset catalog with GCS"
+slug: /fileset-catalog-with-gcs
date: 2024-01-03
-keyword: Hadoop catalog GCS
+keyword: Fileset catalog GCS
license: "This software is licensed under the Apache License version 2."
---
-This document describes how to configure a Hadoop catalog with GCS.
+This document describes how to configure a Fileset catalog with GCS.
## Prerequisites
-To set up a Hadoop catalog with OSS, follow these steps:
+To set up a Fileset catalog with OSS, follow these steps:
1. Download the
[`gravitino-gcp-bundle-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-gcp-bundle)
file.
-2. Place the downloaded file into the Gravitino Hadoop catalog classpath at
`${GRAVITINO_HOME}/catalogs/hadoop/libs/`.
+2. Place the downloaded file into the Gravitino Fileset catalog classpath at
`${GRAVITINO_HOME}/catalogs/fileset/libs/`.
3. Start the Gravitino server by running the following command:
```bash
$ ${GRAVITINO_HOME}/bin/gravitino-server.sh start
```
-Once the server is up and running, you can proceed to configure the Hadoop
catalog with GCS. In the rest of this document we will use
`http://localhost:8090` as the Gravitino server URL, please replace it with
your actual server URL.
+Once the server is up and running, you can proceed to configure the Fileset
catalog with GCS. In the rest of this document we will use
`http://localhost:8090` as the Gravitino server URL, please replace it with
your actual server URL.
-## Configurations for creating a Hadoop catalog with GCS
+## Configurations for creating a Fileset catalog with GCS
-### Configurations for a GCS Hadoop catalog
+### Configurations for a GCS Fileset catalog
-Apart from configurations mentioned in
[Hadoop-catalog-catalog-configuration](./hadoop-catalog.md#catalog-properties),
the following properties are required to configure a Hadoop catalog with GCS:
+Apart from configurations mentioned in
[Fileset-catalog-catalog-configuration](./fileset-catalog.md#catalog-properties),
the following properties are required to configure a Fileset catalog with GCS:
| Configuration item | Description
[...]
|-------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
| `filesystem-providers` | The file system providers to add. Set it to
`gcs` if it's a GCS fileset, a comma separated string that contains `gcs` like
`gcs,s3` to support multiple kinds of fileset including `gcs`.
[...]
-| `default-filesystem-provider` | The name default filesystem providers of
this Hadoop catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for GCS, if we set this value, we can omit the prefix
'gs://' in the location.
[...]
+| `default-filesystem-provider` | The name default filesystem providers of
this Fileset catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for GCS, if we set this value, we can omit the prefix
'gs://' in the location.
[...]
| `gcs-service-account-file` | The path of GCS service account JSON file.
[...]
| `credential-providers` | The credential provider types, separated by
comma, possible value can be `gcs-token`. As the default authentication type is
using service account as the above, this configuration can enable credential
vending provided by Gravitino server and client will no longer need to provide
authentication information like service account to access GCS by GVFS. Once
it's set, more configuration items are needed to make it works, please see
[gcs-credential-vending](se [...]
### Configurations for a schema
-Refer to [Schema configurations](./hadoop-catalog.md#schema-properties) for
more details.
+Refer to [Schema configurations](./fileset-catalog.md#schema-properties) for
more details.
### Configurations for a fileset
-Refer to [Fileset configurations](./hadoop-catalog.md#fileset-properties) for
more details.
+Refer to [Fileset configurations](./fileset-catalog.md#fileset-properties) for
more details.
-## Example of creating Hadoop catalog with GCS
+## Example of creating Fileset catalog with GCS
-This section will show you how to use the Hadoop catalog with GCS in
Gravitino, including detailed examples.
+This section will show you how to use the Fileset catalog with GCS in
Gravitino, including detailed examples.
-### Step1: Create a Hadoop catalog with GCS
+### Step1: Create a Fileset catalog with GCS
-First, you need to create a Hadoop catalog with GCS. The following example
shows how to create a Hadoop catalog with GCS:
+First, you need to create a Fileset catalog with GCS. The following example
shows how to create a Fileset catalog with GCS:
<Tabs groupId="language" queryString>
<TabItem value="shell" label="Shell">
@@ -60,7 +60,6 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
"name": "test_catalog",
"type": "FILESET",
"comment": "This is a GCS fileset catalog",
- "provider": "hadoop",
"properties": {
"location": "gs://bucket/root",
"gcs-service-account-file": "path_of_gcs_service_account_file",
@@ -86,7 +85,6 @@ Map<String, String> gcsProperties = ImmutableMap.<String,
String>builder()
Catalog gcsCatalog = gravitinoClient.createCatalog("test_catalog",
Type.FILESET,
- "hadoop", // provider, Gravitino only supports "hadoop" for now.
"This is a GCS fileset catalog",
gcsProperties);
// ...
@@ -106,7 +104,7 @@ gcs_properties = {
gcs_properties = gravitino_client.create_catalog(name="test_catalog",
catalog_type=Catalog.Type.FILESET,
- provider="hadoop",
+ provider=None,
comment="This is a GCS
fileset catalog",
properties=gcs_properties)
```
@@ -116,7 +114,7 @@ gcs_properties =
gravitino_client.create_catalog(name="test_catalog",
### Step2: Create a schema
-Once you have created a Hadoop catalog with GCS, you can create a schema. The
following example shows how to create a schema:
+Once you have created a Fileset catalog with GCS, you can create a schema. The
following example shows how to create a schema:
<Tabs groupId="language" queryString>
<TabItem value="shell" label="Shell">
@@ -459,9 +457,11 @@ For other use cases, please refer to the [Gravitino
Virtual File System](./how-t
Since 0.8.0-incubating, Gravitino supports credential vending for GCS fileset.
If the catalog has been [configured with
credential](./security/credential-vending.md), you can access GCS fileset
without providing authentication information like `gcs-service-account-file` in
the properties.
-### How to create a GCS Hadoop catalog with credential vending
+### How to create a GCS Fileset catalog with credential vending
-Apart from configuration method in
[create-gcs-hadoop-catalog](#configurations-for-a-gcs-hadoop-catalog),
properties needed by
[gcs-credential](./security/credential-vending.md#gcs-credentials) should also
be set to enable credential vending for GCS fileset. Take `gcs-token`
credential provider for example:
+Apart from configuration method in
[create-gcs-fileset-catalog](#configurations-for-a-gcs-fileset-catalog),
+properties needed by
[gcs-credential](./security/credential-vending.md#gcs-credentials) should also
+be set to enable credential vending for GCS fileset. Take `gcs-token`
credential provider for example:
```shell
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
@@ -469,7 +469,6 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json"
\
"name": "gcs-catalog-with-token",
"type": "FILESET",
"comment": "This is a GCS fileset catalog",
- "provider": "hadoop",
"properties": {
"location": "gs://bucket/root",
"gcs-service-account-file": "path_of_gcs_service_account_file",
diff --git a/docs/hadoop-catalog-with-oss.md b/docs/fileset-catalog-with-oss.md
similarity index 90%
rename from docs/hadoop-catalog-with-oss.md
rename to docs/fileset-catalog-with-oss.md
index 7d2f05caf9..081906d3b1 100644
--- a/docs/hadoop-catalog-with-oss.md
+++ b/docs/fileset-catalog-with-oss.md
@@ -1,37 +1,37 @@
---
-title: "Hadoop catalog with OSS"
-slug: /hadoop-catalog-with-oss
+title: "Fileset catalog with OSS"
+slug: /fileset-catalog-with-oss
date: 2025-01-03
-keyword: Hadoop catalog OSS
+keyword: Fileset catalog OSS
license: "This software is licensed under the Apache License version 2."
---
-This document explains how to configure a Hadoop catalog with Aliyun OSS
(Object Storage Service) in Gravitino.
+This document explains how to configure a Fileset catalog with Aliyun OSS
(Object Storage Service) in Gravitino.
## Prerequisites
-To set up a Hadoop catalog with OSS, follow these steps:
+To set up a Fileset catalog with OSS, follow these steps:
1. Download the
[`gravitino-aliyun-bundle-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-aliyun-bundle)
file.
-2. Place the downloaded file into the Gravitino Hadoop catalog classpath at
`${GRAVITINO_HOME}/catalogs/hadoop/libs/`.
+2. Place the downloaded file into the Gravitino Fileset catalog classpath at
`${GRAVITINO_HOME}/catalogs/fileset/libs/`.
3. Start the Gravitino server by running the following command:
```bash
$ ${GRAVITINO_HOME}/bin/gravitino-server.sh start
```
-Once the server is up and running, you can proceed to configure the Hadoop
catalog with OSS. In the rest of this document we will use
`http://localhost:8090` as the Gravitino server URL, please replace it with
your actual server URL.
+Once the server is up and running, you can proceed to configure the Fileset
catalog with OSS. In the rest of this document we will use
`http://localhost:8090` as the Gravitino server URL, please replace it with
your actual server URL.
-## Configurations for creating a Hadoop catalog with OSS
+## Configurations for creating a Fileset catalog with OSS
-### Configuration for an OSS Hadoop catalog
+### Configuration for an OSS Fileset catalog
-In addition to the basic configurations mentioned in
[Hadoop-catalog-catalog-configuration](./hadoop-catalog.md#catalog-properties),
the following properties are required to configure a Hadoop catalog with OSS:
+In addition to the basic configurations mentioned in
[Fileset-catalog-catalog-configuration](./fileset-catalog.md#catalog-properties),
the following properties are required to configure a Fileset catalog with OSS:
| Configuration item | Description
[...]
|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
| `filesystem-providers` | The file system providers to add. Set it to
`oss` if it's a OSS fileset, or a comma separated string that contains `oss`
like `oss,gs,s3` to support multiple kinds of fileset including `oss`.
[...]
-| `default-filesystem-provider` | The name default filesystem providers of
this Hadoop catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for OSS, if we set this value, we can omit the prefix
'oss://' in the location.
[...]
+| `default-filesystem-provider` | The name default filesystem providers of
this Fileset catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for OSS, if we set this value, we can omit the prefix
'oss://' in the location.
[...]
| `oss-endpoint` | The endpoint of the Aliyun OSS.
[...]
| `oss-access-key-id` | The access key of the Aliyun OSS.
[...]
| `oss-secret-access-key` | The secret key of the Aliyun OSS.
[...]
@@ -40,19 +40,19 @@ In addition to the basic configurations mentioned in
[Hadoop-catalog-catalog-con
### Configurations for a schema
-To create a schema, refer to [Schema
configurations](./hadoop-catalog.md#schema-properties).
+To create a schema, refer to [Schema
configurations](./fileset-catalog.md#schema-properties).
### Configurations for a fileset
-For instructions on how to create a fileset, refer to [Fileset
configurations](./hadoop-catalog.md#fileset-properties) for more details.
+For instructions on how to create a fileset, refer to [Fileset
configurations](./fileset-catalog.md#fileset-properties) for more details.
-## Example of creating Hadoop catalog/schema/fileset with OSS
+## Example of creating Fileset catalog/schema/fileset with OSS
-This section will show you how to use the Hadoop catalog with OSS in
Gravitino, including detailed examples.
+This section will show you how to use the Fileset catalog with OSS in
Gravitino, including detailed examples.
-### Step1: Create a Hadoop catalog with OSS
+### Step1: Create a Fileset catalog with OSS
-First, you need to create a Hadoop catalog for OSS. The following examples
demonstrate how to create a Hadoop catalog with OSS:
+First, you need to create a Fileset catalog for OSS. The following examples
demonstrate how to create a Fileset catalog with OSS:
<Tabs groupId="language" queryString>
<TabItem value="shell" label="Shell">
@@ -63,7 +63,6 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
"name": "test_catalog",
"type": "FILESET",
"comment": "This is a OSS fileset catalog",
- "provider": "hadoop",
"properties": {
"location": "oss://bucket/root",
"oss-access-key-id": "access_key",
@@ -93,7 +92,6 @@ Map<String, String> ossProperties = ImmutableMap.<String,
String>builder()
Catalog ossCatalog = gravitinoClient.createCatalog("test_catalog",
Type.FILESET,
- "hadoop", // provider, Gravitino only supports "hadoop" for now.
"This is a OSS fileset catalog",
ossProperties);
// ...
@@ -115,7 +113,7 @@ oss_properties = {
oss_catalog = gravitino_client.create_catalog(name="test_catalog",
catalog_type=Catalog.Type.FILESET,
- provider="hadoop",
+ provider=None,
comment="This is a OSS fileset
catalog",
properties=oss_properties)
```
@@ -125,7 +123,7 @@ oss_catalog =
gravitino_client.create_catalog(name="test_catalog",
### Step 2: Create a Schema
-Once the Hadoop catalog with OSS is created, you can create a schema inside
that catalog. Below are examples of how to do this:
+Once the Fileset catalog with OSS is created, you can create a schema inside
that catalog. Below are examples of how to do this:
<Tabs groupId="language" queryString>
<TabItem value="shell" label="Shell">
@@ -494,9 +492,11 @@ For other use cases, please refer to the [Gravitino
Virtual File System](./how-t
Since 0.8.0-incubating, Gravitino supports credential vending for OSS fileset.
If the catalog has been [configured with
credential](./security/credential-vending.md), you can access OSS fileset
without providing authentication information like `oss-access-key-id` and
`oss-secret-access-key` in the properties.
-### How to create an OSS Hadoop catalog with credential vending
+### How to create an OSS Fileset catalog with credential vending
-Apart from configuration method in
[create-oss-hadoop-catalog](#configuration-for-an-oss-hadoop-catalog),
properties needed by
[oss-credential](./security/credential-vending.md#oss-credentials) should also
be set to enable credential vending for OSS fileset. Take `oss-token`
credential provider for example:
+Apart from configuration method in
[create-oss-fileset-catalog](#configuration-for-an-oss-fileset-catalog),
+properties needed by
[oss-credential](./security/credential-vending.md#oss-credentials)
+should also be set to enable credential vending for OSS fileset. Take
`oss-token` credential provider for example:
```shell
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
@@ -504,7 +504,6 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json"
\
"name": "oss-catalog-with-token",
"type": "FILESET",
"comment": "This is a OSS fileset catalog",
- "provider": "hadoop",
"properties": {
"location": "oss://bucket/root",
"oss-access-key-id": "access_key",
diff --git a/docs/hadoop-catalog-with-s3.md b/docs/fileset-catalog-with-s3.md
similarity index 86%
rename from docs/hadoop-catalog-with-s3.md
rename to docs/fileset-catalog-with-s3.md
index 7d5a7bcadd..0d80f03066 100644
--- a/docs/hadoop-catalog-with-s3.md
+++ b/docs/fileset-catalog-with-s3.md
@@ -1,58 +1,57 @@
---
-title: "Hadoop catalog with S3"
-slug: /hadoop-catalog-with-s3
+title: "Fileset catalog with S3"
+slug: /fileset-catalog-with-s3
date: 2025-01-03
-keyword: Hadoop catalog S3
+keyword: Fileset catalog S3
license: "This software is licensed under the Apache License version 2."
---
-This document explains how to configure a Hadoop catalog with S3 in Gravitino.
+This document explains how to configure a Fileset catalog with S3 in Gravitino.
## Prerequisites
-To create a Hadoop catalog with S3, follow these steps:
+To create a Fileset catalog with S3, follow these steps:
1. Download the
[`gravitino-aws-bundle-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-aws-bundle)
file.
-2. Place this file in the Gravitino Hadoop catalog classpath at
`${GRAVITINO_HOME}/catalogs/hadoop/libs/`.
+2. Place this file in the Gravitino Fileset catalog classpath at
`${GRAVITINO_HOME}/catalogs/fileset/libs/`.
3. Start the Gravitino server using the following command:
```bash
$ ${GRAVITINO_HOME}/bin/gravitino-server.sh start
```
-Once the server is up and running, you can proceed to configure the Hadoop
catalog with S3. In the rest of this document we will use
`http://localhost:8090` as the Gravitino server URL, please replace it with
your actual server URL.
+Once the server is up and running, you can proceed to configure the Fileset
catalog with S3. In the rest of this document we will use
`http://localhost:8090` as the Gravitino server URL, please replace it with
your actual server URL.
-## Configurations for creating a Hadoop catalog with S3
+## Configurations for creating a Fileset catalog with S3
-### Configurations for S3 Hadoop Catalog
+### Configurations for S3 Fileset Catalog
-In addition to the basic configurations mentioned in
[Hadoop-catalog-catalog-configuration](./hadoop-catalog.md#catalog-properties),
the following properties are necessary to configure a Hadoop catalog with S3:
+In addition to the basic configurations mentioned in
[Fileset-catalog-catalog-configuration](./fileset-catalog.md#catalog-properties),
the following properties are necessary to configure a Fileset catalog with S3:
-| Configuration item | Description
| Default value | Required | Since version |
-|--------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------|------------------|
-| `filesystem-providers` | The file system providers to add. Set it to
`s3` if it's a S3 fileset, or a comma separated string that contains `s3` like
`gs,s3` to support multiple kinds of fileset including `s3`.
| (none) | Yes | 0.7.0-incubating |
-| `default-filesystem-provider` | The name default filesystem providers of
this Hadoop catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for S3, if we set this value, we can omit the prefix
's3a://' in the location.
| `builtin-local` | No | 0.7.0-incubating |
-| `s3-endpoint` | The endpoint of the AWS S3.
| (none) | Yes | 0.7.0-incubating |
-| `s3-access-key-id` | The access key of the AWS S3.
| (none) | Yes | 0.7.0-incubating |
-| `s3-secret-access-key` | The secret key of the AWS S3.
| (none) | Yes | 0.7.0-incubating |
+| Configuration item | Description
[...]
+|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
+| `filesystem-providers` | The file system providers to add. Set it to
`s3` if it's a S3 fileset, or a comma separated string that contains `s3` like
`gs,s3` to support multiple kinds of fileset including `s3`.
[...]
+| `default-filesystem-provider` | The name default filesystem providers of
this Fileset catalog if users do not specify the scheme in the URI. Default
value is `builtin-local`, for S3, if we set this value, we can omit the prefix
's3a://' in the location.
[...]
+| `s3-endpoint` | The endpoint of the AWS S3.
[...]
+| `s3-access-key-id` | The access key of the AWS S3.
[...]
+| `s3-secret-access-key` | The secret key of the AWS S3.
[...]
| `credential-providers` | The credential provider types, separated by
comma, possible value can be `s3-token`, `s3-secret-key`. As the default
authentication type is using AKSK as the above, this configuration can enable
credential vending provided by Gravitino server and client will no longer need
to provide authentication information like AKSK to access S3 by GVFS. Once it's
set, more configuration items are needed to make it works, please see
[s3-credential-vending](security/ [...]
### Configurations for a schema
-To learn how to create a schema, refer to [Schema
configurations](./hadoop-catalog.md#schema-properties).
+To learn how to create a schema, refer to [Schema
configurations](./fileset-catalog.md#schema-properties).
### Configurations for a fileset
-For more details on creating a fileset, Refer to [Fileset
configurations](./hadoop-catalog.md#fileset-properties).
+For more details on creating a fileset, Refer to [Fileset
configurations](./fileset-catalog.md#fileset-properties).
+## Using the Fileset catalog with S3
-## Using the Hadoop catalog with S3
+This section demonstrates how to use the Fileset catalog with S3 in Gravitino,
with a complete example.
-This section demonstrates how to use the Hadoop catalog with S3 in Gravitino,
with a complete example.
+### Step1: Create a Fileset Catalog with S3
-### Step1: Create a Hadoop Catalog with S3
-
-First of all, you need to create a Hadoop catalog with S3. The following
example shows how to create a Hadoop catalog with S3:
+First of all, you need to create a Fileset catalog with S3. The following
example shows how to create a Fileset catalog with S3:
<Tabs groupId="language" queryString>
<TabItem value="shell" label="Shell">
@@ -63,7 +62,6 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
"name": "test_catalog",
"type": "FILESET",
"comment": "This is a S3 fileset catalog",
- "provider": "hadoop",
"properties": {
"location": "s3a://bucket/root",
"s3-access-key-id": "access_key",
@@ -93,7 +91,6 @@ Map<String, String> s3Properties = ImmutableMap.<String,
String>builder()
Catalog s3Catalog = gravitinoClient.createCatalog("test_catalog",
Type.FILESET,
- "hadoop", // provider, Gravitino only supports "hadoop" for now.
"This is a S3 fileset catalog",
s3Properties);
// ...
@@ -115,7 +112,7 @@ s3_properties = {
s3_catalog = gravitino_client.create_catalog(name="test_catalog",
catalog_type=Catalog.Type.FILESET,
- provider="hadoop",
+ provider=None,
comment="This is a S3 fileset
catalog",
properties=s3_properties)
```
@@ -124,12 +121,12 @@ s3_catalog =
gravitino_client.create_catalog(name="test_catalog",
</Tabs>
:::note
-When using S3 with Hadoop, ensure that the location value starts with s3a://
(not s3://) for AWS S3. For example, use s3a://bucket/root, as the s3:// format
is not supported by the hadoop-aws library.
+When using S3, ensure that the location value starts with s3a:// (not s3://)
for AWS S3. For example, use s3a://bucket/root, as the s3:// format is not
supported by the hadoop-aws library.
:::
### Step2: Create a schema
-Once your Hadoop catalog with S3 is created, you can create a schema under the
catalog. Here are examples of how to do that:
+Once your Fileset catalog with S3 is created, you can create a schema under
the catalog. Here are examples of how to do that:
<Tabs groupId="language" queryString>
<TabItem value="shell" label="Shell">
@@ -497,9 +494,11 @@ For more use cases, please refer to the [Gravitino Virtual
File System](./how-to
Since 0.8.0-incubating, Gravitino supports credential vending for S3 fileset.
If the catalog has been [configured with
credential](./security/credential-vending.md), you can access S3 fileset
without providing authentication information like `s3-access-key-id` and
`s3-secret-access-key` in the properties.
-### How to create a S3 Hadoop catalog with credential vending
+### How to create a S3 Fileset catalog with credential vending
-Apart from configuration method in
[create-s3-hadoop-catalog](#configurations-for-s3-hadoop-catalog), properties
needed by [s3-credential](./security/credential-vending.md#s3-credentials)
should also be set to enable credential vending for S3 fileset. Take `s3-token`
credential provider for example:
+Apart from configuration method in
[create-s3-fileset-catalog](#configurations-for-s3-fileset-catalog),
+properties needed by
[s3-credential](./security/credential-vending.md#s3-credentials)
+should also be set to enable credential vending for S3 fileset. Take
`s3-token` credential provider for example:
```shell
curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
@@ -507,7 +506,6 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json"
\
"name": "s3-catalog-with-token",
"type": "FILESET",
"comment": "This is a S3 fileset catalog",
- "provider": "hadoop",
"properties": {
"location": "s3a://bucket/root",
"s3-access-key-id": "access_key",
diff --git a/docs/hadoop-catalog.md b/docs/fileset-catalog.md
similarity index 74%
rename from docs/hadoop-catalog.md
rename to docs/fileset-catalog.md
index cf57367950..a34b32260e 100644
--- a/docs/hadoop-catalog.md
+++ b/docs/fileset-catalog.md
@@ -1,19 +1,23 @@
---
-title: "Hadoop catalog"
-slug: /hadoop-catalog
+title: "Fileset catalog"
+slug: /fileset-catalog
date: 2024-4-2
-keyword: hadoop catalog
+keyword: fileset catalog
license: "This software is licensed under the Apache License version 2."
---
## Introduction
-Hadoop catalog is a fileset catalog that using Hadoop Compatible File System
(HCFS) to manage
-the storage location of the fileset. Currently, it supports the local
filesystem and HDFS. Since 0.7.0-incubating, Gravitino supports
[S3](hadoop-catalog-with-s3.md), [GCS](hadoop-catalog-with-gcs.md),
[OSS](hadoop-catalog-with-oss.md) and [Azure Blob
Storage](hadoop-catalog-with-adls.md) through Hadoop catalog.
+Fileset catalog is a fileset catalog that using Hadoop Compatible File System
(HCFS) to manage
+the storage location of the fileset. Currently, it supports the local
filesystem and HDFS. Since
+0.7.0-incubating, Gravitino supports [S3](fileset-catalog-with-s3.md),
[GCS](fileset-catalog-with-gcs.md),
+[OSS](fileset-catalog-with-oss.md) and [Azure Blob
Storage](fileset-catalog-with-adls.md) through Fileset catalog.
-The rest of this document will use HDFS or local file as an example to
illustrate how to use the Hadoop catalog. For S3, GCS, OSS and Azure Blob
Storage, the configuration is similar to HDFS, please refer to the
corresponding document for more details.
+The rest of this document will use HDFS or local file as an example to
illustrate how to use the Fileset catalog.
+For S3, GCS, OSS and Azure Blob Storage, the configuration is similar to HDFS,
+please refer to the corresponding document for more details.
-Note that Gravitino uses Hadoop 3 dependencies to build Hadoop catalog.
Theoretically, it should be
+Note that Gravitino uses Hadoop 3 dependencies to build Fileset catalog.
Theoretically, it should be
compatible with both Hadoop 2.x and 3.x, since Gravitino doesn't leverage any
new features in
Hadoop 3. If there's any compatibility issue, please create an
[issue](https://github.com/apache/gravitino/issues).
@@ -21,19 +25,19 @@ Hadoop 3. If there's any compatibility issue, please create
an [issue](https://g
### Catalog properties
-Besides the [common catalog
properties](./gravitino-server-config.md#apache-gravitino-catalog-properties-configuration),
the Hadoop catalog has the following properties:
+Besides the [common catalog
properties](./gravitino-server-config.md#apache-gravitino-catalog-properties-configuration),
the Fileset catalog has the following properties:
-| Property Name | Description
| Default Value | Required | Since Version |
-|--------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------|------------------|
-| `location` | The storage location managed by
Hadoop catalog. Its location name is `unknown`.
| (none) | No | 0.5.0
|
-| `location-` | The property prefix. User can use
`location-{name}={path}` to set multiple locations with different names for the
catalog.
| (none) | No |
0.9.0-incubating |
-| `default-filesystem-provider` | The default filesystem provider of
this Hadoop catalog if users do not specify the scheme in the URI. Candidate
values are 'builtin-local', 'builtin-hdfs', 's3', 'gcs', 'abs' and 'oss'.
Default value is `builtin-local`. For S3, if we set this value to 's3', we can
omit the prefix 's3a://' in the location. | `builtin-local` | No |
0.7.0-incubating |
-| `filesystem-providers` | The file system providers to add.
Users need to set this configuration to support cloud storage or custom HCFS.
For instance, set it to `s3` or a comma separated string that contains `s3`
like `gs,s3` to support multiple kinds of fileset including `s3`.
| (none) | Yes |
0.7.0-incubating |
-| `credential-providers` | The credential provider types,
separated by comma.
| (none) | No |
0.8.0-incubating |
-| `filesystem-conn-timeout-secs` | The timeout of getting the file
system using Hadoop FileSystem client instance. Time unit: seconds.
| 6 | No |
0.8.0-incubating |
-| `disable-filesystem-ops` | The configuration to disable file
system operations in the server side. If set to true, the Hadoop catalog in the
server side will not create, drop files or folder when the schema, fileset is
created, dropped.
| false | No |
0.9.0-incubating |
-| `fileset-cache-eviction-interval-ms` | The interval in milliseconds to evict
the fileset cache, -1 means never evict.
| 3600000 | No | 0.9.0-incubating |
-| `fileset-cache-max-size` | The maximum number of the filesets
the cache may contain, -1 means no limit.
| 200000 | No | 0.9.0-incubating
|
+| Property Name | Description
| Default Value | Required | Since Version |
+|--------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------|------------------|
+| `location` | The storage location managed by
Fileset catalog. Its location name is `unknown`.
| (none) | No | 0.5.0
|
+| `location-` | The property prefix. User can use
`location-{name}={path}` to set multiple locations with different names for the
catalog.
| (none) | No |
0.9.0-incubating |
+| `default-filesystem-provider` | The default filesystem provider of
this Fileset catalog if users do not specify the scheme in the URI. Candidate
values are 'builtin-local', 'builtin-hdfs', 's3', 'gcs', 'abs' and 'oss'.
Default value is `builtin-local`. For S3, if we set this value to 's3', we can
omit the prefix 's3a://' in the location. | `builtin-local` | No |
0.7.0-incubating |
+| `filesystem-providers` | The file system providers to add.
Users need to set this configuration to support cloud storage or custom HCFS.
For instance, set it to `s3` or a comma separated string that contains `s3`
like `gs,s3` to support multiple kinds of fileset including `s3`.
| (none) | Yes |
0.7.0-incubating |
+| `credential-providers` | The credential provider types,
separated by comma.
| (none) | No |
0.8.0-incubating |
+| `filesystem-conn-timeout-secs` | The timeout of getting the file
system using Hadoop FileSystem client instance. Time unit: seconds.
| 6 | No |
0.8.0-incubating |
+| `disable-filesystem-ops` | The configuration to disable file
system operations in the server side. If set to true, the Fileset catalog in
the server side will not create, drop files or folder when the schema, fileset
is created, dropped.
| false | No |
0.9.0-incubating |
+| `fileset-cache-eviction-interval-ms` | The interval in milliseconds to evict
the fileset cache, -1 means never evict.
| 3600000 | No | 0.9.0-incubating |
+| `fileset-cache-max-size` | The maximum number of the filesets
the cache may contain, -1 means no limit.
| 200000 | No |
0.9.0-incubating |
Please refer to [Credential vending](./security/credential-vending.md) for
more details about credential vending.
@@ -43,22 +47,22 @@ Apart from the above properties, to access fileset like
HDFS fileset, you need t
| Property Name | Description
|
Default Value | Required |
Since Version |
|----------------------------------------------------|------------------------------------------------------------------------------------------------|---------------|-------------------------------------------------------------|---------------|
-| `authentication.impersonation-enable` | Whether to enable
impersonation for the Hadoop catalog. |
`false` | No |
0.5.1 |
-| `authentication.type` | The type of
authentication for Hadoop catalog, currently we only support `kerberos`,
`simple`. | `simple` | No
| 0.5.1 |
+| `authentication.impersonation-enable` | Whether to enable
impersonation for the Fileset catalog. |
`false` | No |
0.5.1 |
+| `authentication.type` | The type of
authentication for Fileset catalog, currently we only support `kerberos`,
`simple`. | `simple` | No
| 0.5.1 |
| `authentication.kerberos.principal` | The principal of the
Kerberos authentication |
(none) | required if the value of `authentication.type` is Kerberos. |
0.5.1 |
| `authentication.kerberos.keytab-uri` | The URI of The keytab
for the Kerberos authentication. |
(none) | required if the value of `authentication.type` is Kerberos. |
0.5.1 |
-| `authentication.kerberos.check-interval-sec` | The check interval of
Kerberos credential for Hadoop catalog. | 60
| No | 0.5.1
|
+| `authentication.kerberos.check-interval-sec` | The check interval of
Kerberos credential for Fileset catalog. | 60
| No | 0.5.1
|
| `authentication.kerberos.keytab-fetch-timeout-sec` | The fetch timeout of
retrieving Kerberos keytab from `authentication.kerberos.keytab-uri`. | 60
| No | 0.5.1
|
-### Hadoop catalog with Cloud Storage
-- For S3, please refer to
[Hadoop-catalog-with-s3](./hadoop-catalog-with-s3.md) for more details.
-- For GCS, please refer to
[Hadoop-catalog-with-gcs](./hadoop-catalog-with-gcs.md) for more details.
-- For OSS, please refer to
[Hadoop-catalog-with-oss](./hadoop-catalog-with-oss.md) for more details.
-- For Azure Blob Storage, please refer to
[Hadoop-catalog-with-adls](./hadoop-catalog-with-adls.md) for more details.
+### Fileset catalog with Cloud Storage
+- For S3, please refer to
[Fileset-catalog-with-s3](./fileset-catalog-with-s3.md) for more details.
+- For GCS, please refer to
[Fileset-catalog-with-gcs](./fileset-catalog-with-gcs.md) for more details.
+- For OSS, please refer to
[Fileset-catalog-with-oss](./fileset-catalog-with-oss.md) for more details.
+- For Azure Blob Storage, please refer to
[Fileset-catalog-with-adls](./fileset-catalog-with-adls.md) for more details.
### How to custom your own HCFS file system fileset?
-Developers and users can custom their own HCFS file system fileset by
implementing the `FileSystemProvider` interface in the jar
[gravitino-catalog-hadoop](https://repo1.maven.org/maven2/org/apache/gravitino/catalog-hadoop/).
The `FileSystemProvider` interface is defined as follows:
+Developers and users can custom their own HCFS file system fileset by
implementing the`FileSystemProvider` interface in the jar
[gravitino-hadoop-common](https://repo1.maven.org/maven2/org/apache/gravitino/gravitino-hadoop-common/).
The `FileSystemProvider` interface is defined as follows:
```java
@@ -76,15 +80,19 @@ Developers and users can custom their own HCFS file system
fileset by implementi
String name();
```
-In the meantime, `FileSystemProvider` uses Java SPI to load the custom file
system provider. You need to create a file named
`org.apache.gravitino.catalog.fs.FileSystemProvider` in the `META-INF/services`
directory of the jar file. The content of the file is the full class name of
the custom file system provider.
-For example, the content of `S3FileSystemProvider` is as follows:
+In the meantime, `FileSystemProvider` uses Java SPI to load the custom file
system provider. You
+need to create a file named
`org.apache.gravitino.catalog.hadoop.fs.FileSystemProvider` in the
+`META-INF/services` directory of the jar file. The content of the file is the
full class name of
+the custom file system provider. For example, the content of
`S3FileSystemProvider` is as follows:

-After implementing the `FileSystemProvider` interface, you need to put the jar
file into the `$GRAVITINO_HOME/catalogs/hadoop/libs` directory. Then you can
set the `filesystem-providers` property to use your custom file system provider.
+After implementing the `FileSystemProvider` interface, you need to put the jar
file into the
+`$GRAVITINO_HOME/catalogs/fileset/libs` directory. Then you can set the
`filesystem-providers`
+property to use your custom file system provider.
-### Authentication for Hadoop Catalog
+### Authentication for Fileset Catalog
-The Hadoop catalog supports multi-level authentication to control access,
allowing different authentication settings for the catalog, schema, and
fileset. The priority of authentication settings is as follows: catalog <
schema < fileset. Specifically:
+The Fileset catalog supports multi-level authentication to control access,
allowing different authentication settings for the catalog, schema, and
fileset. The priority of authentication settings is as follows: catalog <
schema < fileset. Specifically:
- **Catalog**: The default authentication is `simple`.
- **Schema**: Inherits the authentication setting from the catalog if not
explicitly set. For more information about schema settings, please refer to
[Schema properties](#schema-properties).
@@ -101,16 +109,16 @@ Refer to [Catalog
operations](./manage-fileset-metadata-using-gravitino.md#catal
### Schema capabilities
-The Hadoop catalog supports creating, updating, deleting, and listing schema.
+The Fileset catalog supports creating, updating, deleting, and listing schema.
### Schema properties
| Property name | Description
| Default value | Required | Since Version |
|---------------------------------------|---------------------------------------------------------------------------------------------------------------------------|---------------------------|----------|------------------|
-| `location` | The storage location managed by
Hadoop schema. Its location name is `unknown`.
| (none) | No | 0.5.0 |
+| `location` | The storage location managed by
schema. Its location name is `unknown`.
| (none) | No | 0.5.0 |
| `location-` | The property prefix. User can use
`location-{name}={path}` to set multiple locations with different names for the
schema. | (none) | No | 0.9.0-incubating |
-| `authentication.impersonation-enable` | Whether to enable impersonation for
this schema of the Hadoop catalog.
| The parent(catalog) value | No | 0.6.0-incubating |
-| `authentication.type` | The type of authentication for this
schema of Hadoop catalog , currently we only support `kerberos`, `simple`.
| The parent(catalog) value | No | 0.6.0-incubating |
+| `authentication.impersonation-enable` | Whether to enable impersonation for
this schema of the Fileset catalog.
| The parent(catalog) value | No | 0.6.0-incubating |
+| `authentication.type` | The type of authentication for this
schema of Fileset catalog , currently we only support `kerberos`, `simple`.
| The parent(catalog) value | No | 0.6.0-incubating |
| `authentication.kerberos.principal` | The principal of the Kerberos
authentication for this schema.
| The parent(catalog) value | No | 0.6.0-incubating |
| `authentication.kerberos.keytab-uri` | The URI of The keytab for the
Kerberos authentication for this schema.
| The parent(catalog) value | No | 0.6.0-incubating |
| `credential-providers` | The credential provider types,
separated by comma.
| (none) | No | 0.8.0-incubating |
@@ -130,14 +138,14 @@ This behavior is skipped in either of these cases:
### Fileset capabilities
-- The Hadoop catalog supports creating, updating, deleting, and listing
filesets.
+- The Fileset catalog supports creating, updating, deleting, and listing
filesets.
### Fileset properties
| Property name | Description
| Default value
| Required |
Immutable | Since Version |
|---------------------------------------|----------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|--------------------------------------------|-----------|------------------|
-| `authentication.impersonation-enable` | Whether to enable impersonation for
the Hadoop catalog fileset.
| The parent(schema) value
| No
| Yes | 0.6.0-incubating |
-| `authentication.type` | The type of authentication for
Hadoop catalog fileset, currently we only support `kerberos`, `simple`.
| The parent(schema) value
| No
| No | 0.6.0-incubating |
+| `authentication.impersonation-enable` | Whether to enable impersonation for
the Fileset catalog fileset.
| The parent(schema) value
| No
| Yes | 0.6.0-incubating |
+| `authentication.type` | The type of authentication for
Fileset catalog fileset, currently we only support `kerberos`, `simple`.
| The parent(schema) value
| No
| No | 0.6.0-incubating |
| `authentication.kerberos.principal` | The principal of the Kerberos
authentication for the fileset.
| The parent(schema) value
| No
| No | 0.6.0-incubating |
| `authentication.kerberos.keytab-uri` | The URI of The keytab for the
Kerberos authentication for the fileset.
| The parent(schema) value
| No
| No | 0.6.0-incubating |
| `credential-providers` | The credential provider types,
separated by comma.
| (none)
| No
| No | 0.8.0-incubating |
diff --git a/docs/gravitino-server-config.md b/docs/gravitino-server-config.md
index 3ba749bc1e..eb25843246 100644
--- a/docs/gravitino-server-config.md
+++ b/docs/gravitino-server-config.md
@@ -247,7 +247,8 @@ The following table lists the catalog specific properties
and their default path
| `jdbc-doris` | [Doris catalog
properties](jdbc-doris-catalog.md#catalog-properties) |
`catalogs/jdbc-doris/conf/jdbc-doris.conf` |
| `jdbc-oceanbase` | [OceanBase catalog
properties](jdbc-oceanbase-catalog.md#catalog-properties) |
`catalogs/jdbc-oceanbase/conf/jdbc-oceanbase.conf` |
| `kafka` | [Kafka catalog
properties](kafka-catalog.md#catalog-properties) |
`catalogs/kafka/conf/kafka.conf` |
-| `hadoop` | [Hadoop catalog
properties](hadoop-catalog.md#catalog-properties) |
`catalogs/hadoop/conf/hadoop.conf` |
+| `fileset` | [Fileset catalog
properties](fileset-catalog#catalog-properties) |
`catalogs/fileset/conf/fileset.conf` |
+| `model` | [Fileset catalog
properties](model-catalog#catalog-properties) |
`catalogs/model/conf/model.conf` |
:::info
The Gravitino server automatically adds the catalog properties configuration
directory to classpath.
@@ -271,33 +272,33 @@ This is done using a startup script that parses
environment variables prefixed w
These variables override the corresponding entries in `gravitino.conf` at
startup.
-| Environment Variable |
Configuration Key | Default Value
| Since Version |
-|---------------------------------------------------------------|----------------------------------------------------------|----------------------------------------------------------|---------------|
-| `GRAVITINO_SERVER_SHUTDOWN_TIMEOUT` |
`gravitino.server.shutdown.timeout` | `3000`
| 1.0.0 |
-| `GRAVITINO_SERVER_WEBSERVER_HOST` |
`gravitino.server.webserver.host` | `0.0.0.0`
| 1.0.0 |
-| `GRAVITINO_SERVER_WEBSERVER_HTTP_PORT` |
`gravitino.server.webserver.httpPort` | `8090`
| 1.0.0 |
-| `GRAVITINO_SERVER_WEBSERVER_MIN_THREADS` |
`gravitino.server.webserver.minThreads` | `24`
| 1.0.0 |
-| `GRAVITINO_SERVER_WEBSERVER_MAX_THREADS` |
`gravitino.server.webserver.maxThreads` | `200`
| 1.0.0 |
-| `GRAVITINO_SERVER_WEBSERVER_STOP_TIMEOUT` |
`gravitino.server.webserver.stopTimeout` | `30000`
| 1.0.0 |
-| `GRAVITINO_SERVER_WEBSERVER_IDLE_TIMEOUT` |
`gravitino.server.webserver.idleTimeout` | `30000`
| 1.0.0 |
-| `GRAVITINO_SERVER_WEBSERVER_THREAD_POOL_WORK_QUEUE_SIZE` |
`gravitino.server.webserver.threadPoolWorkQueueSize` | `100`
| 1.0.0 |
-| `GRAVITINO_SERVER_WEBSERVER_REQUEST_HEADER_SIZE` |
`gravitino.server.webserver.requestHeaderSize` | `131072`
| 1.0.0 |
-| `GRAVITINO_SERVER_WEBSERVER_RESPONSE_HEADER_SIZE` |
`gravitino.server.webserver.responseHeaderSize` | `131072`
| 1.0.0 |
-| `GRAVITINO_ENTITY_STORE` |
`gravitino.entity.store` | `relational`
| 1.0.0 |
-| `GRAVITINO_ENTITY_STORE_RELATIONAL` |
`gravitino.entity.store.relational` | `JDBCBackend`
| 1.0.0 |
-| `GRAVITINO_ENTITY_STORE_RELATIONAL_JDBC_URL` |
`gravitino.entity.store.relational.jdbcUrl` | `jdbc:h2`
| 1.0.0 |
-| `GRAVITINO_ENTITY_STORE_RELATIONAL_JDBC_DRIVER` |
`gravitino.entity.store.relational.jdbcDriver` | `org.h2.Driver`
| 1.0.0 |
-| `GRAVITINO_ENTITY_STORE_RELATIONAL_JDBC_USER` |
`gravitino.entity.store.relational.jdbcUser` | `gravitino`
| 1.0.0 |
-| `GRAVITINO_ENTITY_STORE_RELATIONAL_JDBC_PASSWORD` |
`gravitino.entity.store.relational.jdbcPassword` | `gravitino`
| 1.0.0 |
-| `GRAVITINO_CATALOG_CACHE_EVICTION_INTERVAL_MS` |
`gravitino.catalog.cache.evictionIntervalMs` | `3600000`
| 1.0.0 |
-| `GRAVITINO_AUTHORIZATION_ENABLE` |
`gravitino.authorization.enable` | `false`
| 1.0.0 |
-| `GRAVITINO_AUTHORIZATION_SERVICE_ADMINS` |
`gravitino.authorization.serviceAdmins` | `anonymous`
| 1.0.0 |
-| `GRAVITINO_AUX_SERVICE_NAMES` |
`gravitino.auxService.names` | `iceberg-rest`
| 1.0.0 |
-| `GRAVITINO_ICEBERG_REST_CLASSPATH` |
`gravitino.iceberg-rest.classpath` |
`iceberg-rest-server/libs, iceberg-rest-server/conf` | 1.0.0 |
-| `GRAVITINO_ICEBERG_REST_HOST` |
`gravitino.iceberg-rest.host` | `0.0.0.0`
| 1.0.0 |
-| `GRAVITINO_ICEBERG_REST_HTTP_PORT` |
`gravitino.iceberg-rest.httpPort` | `9001`
| 1.0.0 |
-| `GRAVITINO_ICEBERG_REST_CATALOG_BACKEND` |
`gravitino.iceberg-rest.catalog-backend` | `memory`
| 1.0.0 |
-| `GRAVITINO_ICEBERG_REST_WAREHOUSE` |
`gravitino.iceberg-rest.warehouse` | `/tmp/`
| 1.0.0 |
+| Environment Variable | Configuration Key
| Default Value
| Since Version |
+|----------------------------------------------------------|------------------------------------------------------|------------------------------------------------------|---------------|
+| `GRAVITINO_SERVER_SHUTDOWN_TIMEOUT` |
`gravitino.server.shutdown.timeout` | `3000`
| 1.0.0 |
+| `GRAVITINO_SERVER_WEBSERVER_HOST` |
`gravitino.server.webserver.host` | `0.0.0.0`
| 1.0.0 |
+| `GRAVITINO_SERVER_WEBSERVER_HTTP_PORT` |
`gravitino.server.webserver.httpPort` | `8090`
| 1.0.0 |
+| `GRAVITINO_SERVER_WEBSERVER_MIN_THREADS` |
`gravitino.server.webserver.minThreads` | `24`
| 1.0.0 |
+| `GRAVITINO_SERVER_WEBSERVER_MAX_THREADS` |
`gravitino.server.webserver.maxThreads` | `200`
| 1.0.0 |
+| `GRAVITINO_SERVER_WEBSERVER_STOP_TIMEOUT` |
`gravitino.server.webserver.stopTimeout` | `30000`
| 1.0.0 |
+| `GRAVITINO_SERVER_WEBSERVER_IDLE_TIMEOUT` |
`gravitino.server.webserver.idleTimeout` | `30000`
| 1.0.0 |
+| `GRAVITINO_SERVER_WEBSERVER_THREAD_POOL_WORK_QUEUE_SIZE` |
`gravitino.server.webserver.threadPoolWorkQueueSize` | `100`
| 1.0.0 |
+| `GRAVITINO_SERVER_WEBSERVER_REQUEST_HEADER_SIZE` |
`gravitino.server.webserver.requestHeaderSize` | `131072`
| 1.0.0 |
+| `GRAVITINO_SERVER_WEBSERVER_RESPONSE_HEADER_SIZE` |
`gravitino.server.webserver.responseHeaderSize` | `131072`
| 1.0.0 |
+| `GRAVITINO_ENTITY_STORE` |
`gravitino.entity.store` | `relational`
| 1.0.0 |
+| `GRAVITINO_ENTITY_STORE_RELATIONAL` |
`gravitino.entity.store.relational` | `JDBCBackend`
| 1.0.0 |
+| `GRAVITINO_ENTITY_STORE_RELATIONAL_JDBC_URL` |
`gravitino.entity.store.relational.jdbcUrl` | `jdbc:h2`
| 1.0.0 |
+| `GRAVITINO_ENTITY_STORE_RELATIONAL_JDBC_DRIVER` |
`gravitino.entity.store.relational.jdbcDriver` | `org.h2.Driver`
| 1.0.0 |
+| `GRAVITINO_ENTITY_STORE_RELATIONAL_JDBC_USER` |
`gravitino.entity.store.relational.jdbcUser` | `gravitino`
| 1.0.0 |
+| `GRAVITINO_ENTITY_STORE_RELATIONAL_JDBC_PASSWORD` |
`gravitino.entity.store.relational.jdbcPassword` | `gravitino`
| 1.0.0 |
+| `GRAVITINO_CATALOG_CACHE_EVICTION_INTERVAL_MS` |
`gravitino.catalog.cache.evictionIntervalMs` | `3600000`
| 1.0.0 |
+| `GRAVITINO_AUTHORIZATION_ENABLE` |
`gravitino.authorization.enable` | `false`
| 1.0.0 |
+| `GRAVITINO_AUTHORIZATION_SERVICE_ADMINS` |
`gravitino.authorization.serviceAdmins` | `anonymous`
| 1.0.0 |
+| `GRAVITINO_AUX_SERVICE_NAMES` |
`gravitino.auxService.names` | `iceberg-rest`
| 1.0.0 |
+| `GRAVITINO_ICEBERG_REST_CLASSPATH` |
`gravitino.iceberg-rest.classpath` |
`iceberg-rest-server/libs, iceberg-rest-server/conf` | 1.0.0 |
+| `GRAVITINO_ICEBERG_REST_HOST` |
`gravitino.iceberg-rest.host` | `0.0.0.0`
| 1.0.0 |
+| `GRAVITINO_ICEBERG_REST_HTTP_PORT` |
`gravitino.iceberg-rest.httpPort` | `9001`
| 1.0.0 |
+| `GRAVITINO_ICEBERG_REST_CATALOG_BACKEND` |
`gravitino.iceberg-rest.catalog-backend` | `memory`
| 1.0.0 |
+| `GRAVITINO_ICEBERG_REST_WAREHOUSE` |
`gravitino.iceberg-rest.warehouse` | `/tmp/`
| 1.0.0 |
:::note
This feature is supported in the Gravitino Docker image starting from version
`1.0.0`.
diff --git a/docs/hadoop-catalog-index.md b/docs/hadoop-catalog-index.md
deleted file mode 100644
index dfa7a18717..0000000000
--- a/docs/hadoop-catalog-index.md
+++ /dev/null
@@ -1,26 +0,0 @@
----
-title: "Hadoop catalog index"
-slug: /hadoop-catalog-index
-date: 2025-01-13
-keyword: Hadoop catalog index S3 GCS ADLS OSS
-license: "This software is licensed under the Apache License version 2."
----
-
-### Hadoop catalog overall
-
-Gravitino Hadoop catalog index includes the following chapters:
-
-- [Hadoop catalog overview and features](./hadoop-catalog.md): This chapter
provides an overview of the Hadoop catalog, its features, capabilities and
related configurations.
-- [Manage Hadoop catalog with Gravitino
API](./manage-fileset-metadata-using-gravitino.md): This chapter explains how
to manage fileset metadata using Gravitino API and provides detailed examples.
-- [Using Hadoop catalog with Gravitino virtual file
system](how-to-use-gvfs.md): This chapter explains how to use Hadoop catalog
with the Gravitino virtual file system and provides detailed examples.
-
-### Hadoop catalog with cloud storage
-
-Apart from the above, you can also refer to the following topics to manage and
access cloud storage like S3, GCS, ADLS, and OSS:
-
-- [Using Hadoop catalog to manage S3](./hadoop-catalog-with-s3.md).
-- [Using Hadoop catalog to manage GCS](./hadoop-catalog-with-gcs.md).
-- [Using Hadoop catalog to manage ADLS](./hadoop-catalog-with-adls.md).
-- [Using Hadoop catalog to manage OSS](./hadoop-catalog-with-oss.md).
-
-More storage options will be added soon. Stay tuned!
\ No newline at end of file
diff --git a/docs/hive-catalog-with-cloud-storage.md
b/docs/hive-catalog-with-cloud-storage.md
index f2a5fe20eb..8344bc5b4b 100644
--- a/docs/hive-catalog-with-cloud-storage.md
+++ b/docs/hive-catalog-with-cloud-storage.md
@@ -135,7 +135,7 @@ GravitinoClient gravitinoClient = GravitinoClient
.withMetalake("metalake")
.build();
-// Assuming you have just created a Hadoop catalog named `catalog`
+// Assuming you have just created a Hive catalog named `catalog`
Catalog catalog = gravitinoClient.loadCatalog("catalog");
SupportsSchemas supportsSchemas = catalog.asSchemas();
@@ -276,4 +276,4 @@ Azure Blob Storage(ADLS) requires the [Hadoop Azure
jar](https://mvnrepository.c
for Google Cloud Storage(GCS), you need to download the [Hadoop GCS
jar](https://github.com/GoogleCloudDataproc/hadoop-connectors/releases) and
place it in the classpath of the Spark.
:::
-By following these instructions, you can effectively manage and access your
S3, ADLS or GCS data through both Hive CLI and Spark, leveraging the
capabilities of Gravitino for optimal data management.
\ No newline at end of file
+By following these instructions, you can effectively manage and access your
S3, ADLS or GCS data through both Hive CLI and Spark, leveraging the
capabilities of Gravitino for optimal data management.
diff --git a/docs/how-to-install.md b/docs/how-to-install.md
index 9729cbd0e7..6619e323a0 100644
--- a/docs/how-to-install.md
+++ b/docs/how-to-install.md
@@ -32,7 +32,7 @@ The Gravitino binary distribution package contains the
following files:
| ├── gravitino.sh # Gravitino server Launching
scripts.
| └── gravitino-iceberg-rest-server.sh # Gravitino Iceberg REST
server Launching scripts.
|── catalogs
- | └── hadoop/ # Apache Hadoop catalog
dependencies and configurations.
+ | └── fileset/ # Fileset catalog dependencies
and configurations.
| └── hive/ # Apache Hive catalog
dependencies and configurations.
| └── jdbc-doris/ # JDBC doris catalog
dependencies and configurations.
| └── jdbc-mysql/ # JDBC MySQL catalog
dependencies and configurations.
@@ -40,6 +40,7 @@ The Gravitino binary distribution package contains the
following files:
| └── kafka/ # Apache Kafka PostgreSQL
catalog dependencies and configurations.
| └── lakehouse-iceberg/ # Apache Iceberg catalog
dependencies and configurations.
| └── lakehouse-paimon/ # Apache Paimon catalog
dependencies and configurations.
+ | └── model/ # Model catalog dependencies
and configurations.
|── conf/ # All configurations for
Gravitino.
| ├── gravitino.conf # Gravitino server and
Gravitino Iceberg REST server configuration.
| ├── gravitino-iceberg-rest-server.conf # Gravitino server
configuration.
@@ -152,4 +153,4 @@ For the details, review the
## Deploy Apache Gravitino on Kubernetes Using Helm Chart
The Apache Gravitino Helm chart provides a way to deploy Gravitino on
Kubernetes with fully customizable configurations.
-For detailed installation instructions and configuration options, refer to the
[Apache Gravitino Helm Chart](./chart.md).
\ No newline at end of file
+For detailed installation instructions and configuration options, refer to the
[Apache Gravitino Helm Chart](./chart.md).
diff --git a/docs/how-to-use-gvfs.md b/docs/how-to-use-gvfs.md
index 996041462b..c3648f35c7 100644
--- a/docs/how-to-use-gvfs.md
+++ b/docs/how-to-use-gvfs.md
@@ -73,10 +73,14 @@ the path mapping and convert automatically.
| `fs.gravitino.enableCredentialVending` | Whether to enable
credential vending for the Gravitino Virtual File System.
| `false`
| No |
0.9.0-incubating |
Apart from the above properties, to access fileset like S3, GCS, OSS and
custom fileset, extra properties are needed, please see
-[S3 GVFS Java client
configurations](./hadoop-catalog-with-s3.md#using-the-gvfs-java-client-to-access-the-fileset),
[GCS GVFS Java client
configurations](./hadoop-catalog-with-gcs.md#using-the-gvfs-java-client-to-access-the-fileset),
[OSS GVFS Java client
configurations](./hadoop-catalog-with-oss.md#using-the-gvfs-java-client-to-access-the-fileset)
and [Azure Blob Storage GVFS Java client
configurations](./hadoop-catalog-with-adls.md#using-the-gvfs-java-client-to-access-the-fileset)
for [...]
+[S3 GVFS Java client
configurations](./fileset-catalog-with-s3.md#using-the-gvfs-java-client-to-access-the-fileset),
+[GCS GVFS Java client
configurations](./fileset-catalog-with-gcs.md#using-the-gvfs-java-client-to-access-the-fileset),
+[OSS GVFS Java client
configurations](./fileset-catalog-with-oss.md#using-the-gvfs-java-client-to-access-the-fileset)
+and [Azure Blob Storage GVFS Java client
configurations](./fileset-catalog-with-adls.md#using-the-gvfs-java-client-to-access-the-fileset)
for more details.
#### Custom fileset
-Since 0.7.0-incubating, users can define their own fileset type and configure
the corresponding properties, for more, please refer to [Custom
Fileset](./hadoop-catalog.md#how-to-custom-your-own-hcfs-file-system-fileset).
+Since 0.7.0-incubating, users can define their own fileset type and configure
the corresponding
+properties, for more, please refer to [Custom
Fileset](./fileset-catalog.md#how-to-custom-your-own-hcfs-file-system-fileset).
So, if you want to access the custom fileset through GVFS, you need to
configure the corresponding properties.
| Configuration item | Description
| Default value |
Required | Since version |
@@ -376,7 +380,7 @@ to recompile the native libraries like `libhdfs` and
others, and completely repl
| `oauth2_credential` | The auth credential for the Gravitino
client when using `oauth2` auth type.
| (none)
| Yes if you use `oauth2` auth type | 0.7.0-incubating |
| `oauth2_path` | The auth server path for the Gravitino
client when using `oauth2` auth type. Please remove the first slash `/` from
the path, for example `oauth/token`.
| (none)
| Yes if you use `oauth2` auth type | 0.7.0-incubating |
| `oauth2_scope` | The auth scope for the Gravitino client
when using `oauth2` auth type with the Gravitino Virtual File System.
| (none)
| Yes if you use `oauth2` auth type | 0.7.0-incubating |
-| `credential_expiration_ratio` | The ratio of expiration time for
credential from Gravitino. This is used in the cases where Gravitino Hadoop
catalogs have enable credential vending. if the expiration time of credential
fetched from Gravitino is 1 hour, GVFS client will try to refresh the
credential in 1 * 0.9 = 0.5 hour. | 0.5
| No |
0.8.0-incubating |
+| `credential_expiration_ratio` | The ratio of expiration time for
credential from Gravitino. This is used in the cases where Gravitino Fileset
catalogs have enable credential vending. if the expiration time of credential
fetched from Gravitino is 1 hour, GVFS client will try to refresh the
credential in 1 * 0.9 = 0.5 hour. | 0.5
| No |
0.8.0-incubating |
| `current_location_name` | The configuration used to select the
location of the fileset. If this configuration is not set, the value of
environment variable configured by `current_location_name_env_var` will be
checked. If neither is set, the value of fileset property
`default-location-name` will be used as the location name. | the value of
fileset property `default-location-name` | No
| 0.9.0-incubating |
| `current_location_name_env_var` | The environment variable name to get the
current location name.
| `CURRENT_LOCATION_NAME`
| No | 0.9.0-incubating |
| `operations_class` | The operations class to provide the FS
operations for the Gravitino Virtual File System. Users can extends
`BaseGVFSOperations` to implement their own operations and configure the class
name in this conf to use custom FS operations.
|
`gravitino.filesystem.gvfs_default_operations.DefaultGVFSOperations` | No
| 0.9.0-incubating |
@@ -386,10 +390,13 @@ to recompile the native libraries like `libhdfs` and
others, and completely repl
#### Configurations for S3, GCS, OSS and Azure Blob storage fileset
-Please see the cloud-storage-specific configurations [GCS GVFS Java client
configurations](./hadoop-catalog-with-gcs.md#using-the-gvfs-python-client-to-access-a-fileset),
[S3 GVFS Java client
configurations](./hadoop-catalog-with-s3.md#using-the-gvfs-python-client-to-access-a-fileset),
[OSS GVFS Java client
configurations](./hadoop-catalog-with-oss.md#using-the-gvfs-python-client-to-access-a-fileset)
and [Azure Blob Storage GVFS Java client
configurations](./hadoop-catalog-with-adls.md#u [...]
+Please see the cloud-storage-specific configurations [GCS GVFS Java client
configurations](./fileset-catalog-with-gcs.md#using-the-gvfs-python-client-to-access-a-fileset),
+[S3 GVFS Java client
configurations](./fileset-catalog-with-s3.md#using-the-gvfs-python-client-to-access-a-fileset),
+[OSS GVFS Java client
configurations](./fileset-catalog-with-oss.md#using-the-gvfs-python-client-to-access-a-fileset)
+and [Azure Blob Storage GVFS Java client
configurations](./fileset-catalog-with-adls.md#using-the-gvfs-python-client-to-access-a-fileset)
for more details.
:::note
-Gravitino python client does not support [customized file
systems](hadoop-catalog.md#how-to-custom-your-own-hcfs-file-system-fileset)
defined by users due to the limit of `fsspec` library.
+Gravitino python client does not support [customized file
systems](fileset-catalog.md#how-to-custom-your-own-hcfs-file-system-fileset)
defined by users due to the limit of `fsspec` library.
:::
### Usage examples
diff --git a/docs/how-to-use-python-client.md b/docs/how-to-use-python-client.md
index 1e92126cc9..224679508d 100644
--- a/docs/how-to-use-python-client.md
+++ b/docs/how-to-use-python-client.md
@@ -51,7 +51,7 @@ contains the following code snippets:
4. Initialize Gravitino admin client and create a Gravitino metalake.
5. Initialize Gravitino client and list metalakes.
6. Create a Gravitino `Catalog` and special `type` is `Catalog.Type.FILESET`
and `provider` is
- [hadoop](./hadoop-catalog.md)
+ [fileset](./fileset-catalog.md)
7. Create a Gravitino `Schema` with the `location` pointed to a HDFS path, and
use `hdfs client` to
check if the schema location is successfully created in HDFS.
8. Create a `Fileset` with `type` is
[Fileset.Type.MANAGED](./manage-fileset-metadata-using-gravitino.md#fileset-operations),
diff --git a/docs/index.md b/docs/index.md
index 4078468d89..1a1bbc236b 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -88,7 +88,7 @@ Gravitino currently supports the following catalogs:
**Fileset catalogs:**
-* [**Hadoop catalog**](./hadoop-catalog.md)
+* [**Fileset catalog**](./fileset-catalog.md)
**Messaging catalogs:**
@@ -120,7 +120,7 @@ complete environment. To experience all the features, see
Gravitino supports different catalogs to manage the metadata in different
sources. Please see:
* [Doris catalog](./jdbc-doris-catalog.md): a complete guide to using
Gravitino to manage Doris data.
-* [Hadoop catalog](./hadoop-catalog.md): a complete guide to using Gravitino
to manage fileset
+* [Fileset catalog](./fileset-catalog.md): a complete guide to using Gravitino
to manage fileset
using Hadoop Compatible File System (HCFS).
* [Hive catalog](./apache-hive-catalog.md): a complete guide to using
Gravitino to manage Apache Hive data.
* [Hudi catalog](./lakehouse-hudi-catalog.md): a complete guide to using
Gravitino to manage Apache Hudi data.
diff --git a/docs/manage-fileset-metadata-using-gravitino.md
b/docs/manage-fileset-metadata-using-gravitino.md
index 17291ff63b..7faa351a47 100644
--- a/docs/manage-fileset-metadata-using-gravitino.md
+++ b/docs/manage-fileset-metadata-using-gravitino.md
@@ -16,8 +16,9 @@ filesets to manage non-tabular data like training datasets
and other raw data.
Typically, a fileset is mapped to a directory on a file system like HDFS, S3,
ADLS, GCS, etc.
With the fileset managed by Gravitino, the non-tabular data can be managed as
assets together with
tabular data in Gravitino in a unified way. The following operations will use
HDFS as an example, for other
-HCFS like S3, OSS, GCS, etc., please refer to the corresponding operations
[hadoop-with-s3](./hadoop-catalog-with-s3.md),
[hadoop-with-oss](./hadoop-catalog-with-oss.md),
[hadoop-with-gcs](./hadoop-catalog-with-gcs.md) and
-[hadoop-with-adls](./hadoop-catalog-with-adls.md).
+HCFS like S3, OSS, GCS, etc., please refer to the corresponding operations
[fileset-with-s3](./fileset-catalog-with-s3.md),
+[fileset-with-oss](./fileset-catalog-with-oss.md),
[fileset-with-gcs](./fileset-catalog-with-gcs.md) and
+[fileset-with-adls](./fileset-catalog-with-adls.md).
After a fileset is created, users can easily access, manage the
files/directories through
the fileset's identifier, without needing to know the physical path of the
managed dataset. Also, with
@@ -49,7 +50,6 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
"name": "catalog",
"type": "FILESET",
"comment": "comment",
- "provider": "hadoop",
"properties": {
"location": "file:///tmp/root"
}
@@ -74,8 +74,7 @@ Map<String, String> properties = ImmutableMap.<String,
String>builder()
Catalog catalog = gravitinoClient.createCatalog("catalog",
Type.FILESET,
- "hadoop", // provider, Gravitino only supports "hadoop" for now.
- "This is a Hadoop fileset catalog",
+ "This is a fileset catalog",
properties);
// ...
@@ -88,8 +87,8 @@ Catalog catalog = gravitinoClient.createCatalog("catalog",
gravitino_client: GravitinoClient =
GravitinoClient(uri="http://localhost:8090", metalake_name="metalake")
catalog = gravitino_client.create_catalog(name="catalog",
catalog_type=Catalog.Type.FILESET,
- provider="hadoop",
- comment="This is a Hadoop fileset
catalog",
+ provider=None,
+ comment="This is a fileset catalog",
properties={"location":
"/tmp/test1"})
```
@@ -98,9 +97,9 @@ catalog = gravitino_client.create_catalog(name="catalog",
Currently, Gravitino supports the following catalog providers:
-| Catalog provider | Catalog property
|
-|---------------------|-------------------------------------------------------------------------|
-| `hadoop` | [Hadoop catalog
property](./hadoop-catalog.md#catalog-properties) |
+| Catalog provider | Catalog property
|
+|---------------------|---------------------------------------------------------------------|
+| `fileset` or `None` | [Fileset catalog
property](./fileset-catalog.md#catalog-properties) |
### Load a catalog
@@ -169,7 +168,7 @@ GravitinoClient gravitinoClient = GravitinoClient
.withMetalake("metalake")
.build();
-// Assuming you have just created a Hadoop catalog named `catalog`
+// Assuming you have just created a Fileset catalog named `catalog`
Catalog catalog = gravitinoClient.loadCatalog("catalog");
SupportsSchemas supportsSchemas = catalog.asSchemas();
@@ -203,9 +202,9 @@ catalog.as_schemas().create_schema(name="schema",
Currently, Gravitino supports the following schema property:
-| Catalog provider | Schema property
|
-|---------------------|------------------------------------------------------------------------------|
-| `hadoop` | [Hadoop schema
property](./hadoop-catalog.md#schema-properties) |
+| Catalog provider | Schema property
|
+|------------------|-------------------------------------------------------------------|
+| `fileset` | [Fileset schema
property](./fileset-catalog.md#schema-properties) |
### Load a schema
@@ -368,7 +367,6 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json"
\
"name": "test_catalog",
"type": "FILESET",
"comment": "comment",
- "provider": "hadoop",
"properties": {
"location": "file:///{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}"
}
@@ -409,7 +407,6 @@ GravitinoClient gravitinoClient = GravitinoClient
Catalog catalog = gravitinoClient.createCatalog(
"test_catalog",
Type.FILESET,
- "hadoop", // provider
"comment",
ImmutableMap.of("location",
"file:///{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}"));
FilesetCatalog filesetCatalog = catalog.asFilesetCatalog();
@@ -438,7 +435,7 @@ gravitino_client: GravitinoClient =
GravitinoClient(uri="http://localhost:8090",
# create a catalog first
catalog: Catalog = gravitino_client.create_catalog(name="test_catalog",
catalog_type=Catalog.Type.FILESET,
- provider="hadoop",
+ provider=None,
comment="comment",
properties={"location":
"file:///{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}"})
@@ -473,7 +470,6 @@ curl -X POST -H "Accept: application/vnd.gravitino.v1+json"
\
"name": "test_catalog",
"type": "FILESET",
"comment": "comment",
- "provider": "hadoop",
"properties": {
"filesystem-providers": "builtin-local,builtin-hdfs,s3,gcs",
"location-l1":
"file:///{{catalog}}/{{schema}}/workspace_{{project}}/{{user}}",
@@ -539,7 +535,6 @@ GravitinoClient gravitinoClient = GravitinoClient
Catalog catalog = gravitinoClient.createCatalog(
"test_catalog",
Type.FILESET,
- "hadoop", // provider
"comment",
ImmutableMap.of(
"filesystem-providers", "builtin-local,builtin-hdfs,s3,gcs",
@@ -595,7 +590,7 @@ gravitino_client: GravitinoClient =
GravitinoClient(uri="http://localhost:8090",
catalog: Catalog = gravitino_client.create_catalog(
name="test_catalog",
catalog_type=Catalog.Type.FILESET,
- provider="hadoop",
+ provider=None,
comment="comment",
properties={
"filesystem-providers": "builtin-local,builtin-hdfs,s3,gcs",
diff --git a/docs/open-api/catalogs.yaml b/docs/open-api/catalogs.yaml
index 1a9e9f8424..54fb971bd4 100644
--- a/docs/open-api/catalogs.yaml
+++ b/docs/open-api/catalogs.yaml
@@ -284,6 +284,7 @@ components:
- relational
- fileset
- messaging
+ - model
provider:
type: string
description: The provider of the catalog
@@ -295,8 +296,9 @@ components:
- jdbc-mysql
- jdbc-postgresql
- jdbc-doris
- - hadoop
+ - fileset
- kafka
+ - model
comment:
type: string
description: A comment about the catalog
@@ -344,7 +346,6 @@ components:
required:
- name
- type
- - provider
properties:
name:
type: string
@@ -359,7 +360,7 @@ components:
- model
provider:
type: string
- description: The provider of the catalog
+ description: The provider of the catalog (provider is not required
for fileset and model catalog)
enum:
- hive
- lakehouse-iceberg
@@ -369,7 +370,6 @@ components:
- jdbc-postgresql
- jdbc-doris
- jdbc-oceanbase
- - hadoop
- kafka
comment:
type: string
@@ -555,7 +555,7 @@ components:
{
"name": "my_hadoop_catalog",
"type": "fileset",
- "provider": "hadoop",
+ "provider": "fileset",
"comment": "This is my hadoop catalog",
"properties": {
"key2": "value2"
diff --git a/docs/webui.md b/docs/webui.md
index 00f946f984..88d6a0f498 100644
--- a/docs/webui.md
+++ b/docs/webui.md
@@ -163,7 +163,7 @@ Creating a catalog requires these fields:
2. **Type**(**_required_**): `relational`/`fileset`/`messaging`/`model`, the
default value is `relational`
3. **Provider**(**_required_**):
1. Type `relational` -
`hive`/`iceberg`/`mysql`/`postgresql`/`doris`/`paimon`/`hudi`/`oceanbase`
- 2. Type `fileset` - `hadoop`
+ 2. Type `fileset` has no provider
3. Type `messaging` - `kafka`
4. Type `model` has no provider
4. **Comment**(_optional_): the comment of this catalog
@@ -354,8 +354,8 @@ Creating a catalog requires these fields:
###### 2. Type `fileset`
<Tabs>
- <TabItem value='hadoop' label='Hadoop'>
- Follow the [Hadoop catalog](./hadoop-catalog.md) document.
+ <TabItem value='fileset' label='Fileset'>
+ Follow the [Fileset catalog](./fileset-catalog.md) document.
<Image
img={require('./assets/webui/create-fileset-hadoop-catalog-dialog.png')}
style={{ width: 480 }} />