jerryshao commented on code in PR #6059:
URL: https://github.com/apache/gravitino/pull/6059#discussion_r1914286668


##########
docs/hadoop-catalog-with-gcs.md:
##########
@@ -0,0 +1,503 @@
+---
+title: "Hadoop catalog with GCS"
+slug: /hadoop-catalog-with-gcs
+date: 2024-01-03
+keyword: Hadoop catalog GCS
+license: "This software is licensed under the Apache License version 2."
+---
+
+This document describes how to configure a Hadoop catalog with GCS.
+
+## Prerequisites
+To set up a Hadoop catalog with OSS, follow these steps:
+
+1. Download the 
[`gravitino-gcp-bundle-${gravitino-version}.jar`](https://mvnrepository.com/artifact/org.apache.gravitino/gravitino-gcp-bundle)
 file.
+2. Place the downloaded file into the Gravitino Hadoop catalog classpath at 
`${GRAVITINO_HOME}/catalogs/hadoop/libs/`.
+3. Start the Gravitino server by running the following command:
+
+```bash
+$ ${GRAVITINO_HOME}/bin/gravitino-server.sh start
+```
+
+Once the server is up and running, you can proceed to configure the Hadoop 
catalog with GCS. In the rest of this document we will use 
`http://localhost:8090` as the Gravitino server URL, please replace it with 
your actual server URL.
+
+## Configurations for creating a Hadoop catalog with GCS
+
+### Configurations for a GCS Hadoop catalog
+
+Apart from configurations mentioned in 
[Hadoop-catalog-catalog-configuration](./hadoop-catalog.md#catalog-properties), 
the following properties are required to configure a Hadoop catalog with GCS:
+
+| Configuration item            | Description                                  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                           | Default value   | 
Required | Since version    |
+|-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----------|------------------|
+| `filesystem-providers`        | The file system providers to add. Set it to 
`gcs` if it's a GCS fileset, a comma separated string that contains `gcs` like 
`gcs,s3` to support multiple kinds of fileset including `gcs`.                  
                                                                                
                                                                                
                                                                                
                                                             | (none)          
| Yes      | 0.7.0-incubating |
+| `default-filesystem-provider` | The name default filesystem providers of 
this Hadoop catalog if users do not specify the scheme in the URI. Default 
value is `builtin-local`, for GCS, if we set this value, we can omit the prefix 
'gs://' in the location.                                                        
                                                                                
                                                                                
                                                                    | 
`builtin-local` | No       | 0.7.0-incubating |
+| `gcs-service-account-file`    | The path of GCS service account JSON file.   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                           | (none)          | 
Yes      | 0.7.0-incubating |
+| `credential-providers`        | The credential provider types, separated by 
comma, possible value can be `gcs-token`. As the default authentication type is 
using service account as the above, this configuration can enable credential 
vending provided by Gravitino server and client will no longer need to provide 
authentication information like service account to access GCS by GVFS. Once 
it's set, more configuration items are needed to make it works, please see 
[gcs-credential-vending](security/credential-vending.md#gcs-credentials) | 
(none)          | No       | 0.8.0-incubating |
+
+
+### Configurations for a schema
+
+Refer to [Schema configurations](./hadoop-catalog.md#schema-properties) for 
more details.
+
+### Configurations for a fileset
+
+Refer to [Fileset configurations](./hadoop-catalog.md#fileset-properties) for 
more details.
+
+## Example of creating Hadoop catalog with GCS
+
+This section will show you how to use the Hadoop catalog with GCS in 
Gravitino, including detailed examples.
+
+### Create a Hadoop catalog with GCS
+
+First, you need to create a Hadoop catalog with GCS. The following example 
shows how to create a Hadoop catalog with GCS:
+
+<Tabs groupId="language" queryString>
+<TabItem value="shell" label="Shell">
+
+```shell
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+-H "Content-Type: application/json" -d '{
+  "name": "test_catalog",
+  "type": "FILESET",
+  "comment": "This is a GCS fileset catalog",
+  "provider": "hadoop",
+  "properties": {
+    "location": "gs://bucket/root",
+    "gcs-service-account-file": "path_of_gcs_service_account_file",
+    "filesystem-providers": "gcs"
+  }
+}' http://localhost:8090/api/metalakes/metalake/catalogs
+```
+
+</TabItem>
+<TabItem value="java" label="Java">
+
+```java
+GravitinoClient gravitinoClient = GravitinoClient
+    .builder("http://localhost:8090";)
+    .withMetalake("metalake")
+    .build();
+
+Map<String, String> gcsProperties = ImmutableMap.<String, String>builder()
+    .put("location", "gs://bucket/root")
+    .put("gcs-service-account-file", "path_of_gcs_service_account_file")
+    .put("filesystem-providers", "gcs")
+    .build();
+
+Catalog gcsCatalog = gravitinoClient.createCatalog("test_catalog", 
+    Type.FILESET,
+    "hadoop", // provider, Gravitino only supports "hadoop" for now.
+    "This is a GCS fileset catalog",
+    gcsProperties);
+// ...
+
+```
+
+</TabItem>
+<TabItem value="python" label="Python">
+
+```python
+gravitino_client: GravitinoClient = 
GravitinoClient(uri="http://localhost:8090";, metalake_name="metalake")
+gcs_properties = {
+    "location": "gs://bucket/root",
+    "gcs-service-account-file": "path_of_gcs_service_account_file",
+    "filesystem-providers": "gcs"
+}
+
+gcs_properties = gravitino_client.create_catalog(name="test_catalog",
+                                             type=Catalog.Type.FILESET,
+                                             provider="hadoop",
+                                             comment="This is a GCS fileset 
catalog",
+                                             properties=gcs_properties)

Review Comment:
   Also here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@gravitino.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to