justinmclean commented on code in PR #6849: URL: https://github.com/apache/gravitino/pull/6849#discussion_r2034314578
########## docs/admin/iceberg-server.md: ########## @@ -0,0 +1,1351 @@ +--- +title: Iceberg REST catalog service +slug: /iceberg-rest-service +keywords: + - Iceberg REST catalog +license: "This software is licensed under the Apache License version 2." +--- + +## Background + +The Apache Gravitino Iceberg REST Server follows the +[Apache Iceberg REST API specification](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml) +and acts as an Iceberg REST catalog server, +you could access the Iceberg REST endpoint at `http://$ip:$port/iceberg/`. + +### Capabilities + +- Supports the Apache Iceberg REST API defined in Iceberg 1.5, and supports all namespace and table interfaces. + The following interfaces are not implemented yet: + - multi-table transaction + - pagination +- Works as a catalog proxy, supporting `Hive` and `JDBC` as catalog backend. +- Supports credential vending for `S3`、`GCS`、`OSS` and `ADLS`. +- Supports different storages like `S3`, `HDFS`, `OSS`, `GCS`, `ADLS`. +- Capable of supporting other storages. +- Supports event listener. +- Supports Audit log. +- Supports OAuth2 and HTTPS. +- Provides a pluggable metrics store interface to store and delete Iceberg metrics. + +## Server management + +There are three deployment scenarios for Gravitino Iceberg REST server: + +- A standalone server in a standalone Gravitino Iceberg REST server package, the CLASSPATH is `libs`. +- A standalone server in the Gravitino server package, the CLASSPATH is `iceberg-rest-server/libs`. +- An auxiliary service embedded in the Gravitino server, the CLASSPATH is `iceberg-rest-server/libs`. + +For detailed instructions on how to build and install the Gravitino server package, +please refer to [the build guide](../develop/how-to-build.md) and [the installation guide](../install/install.md). +To build the Gravitino Iceberg REST server package, use the command `./gradlew compileIcebergRESTServer -x test`. +Alternatively, to create the corresponding compressed package in the distribution directory, +use `./gradlew assembleIcebergRESTServer -x test`. +The Gravitino Iceberg REST server package includes the following files: + +```text +├─ ... +└─ distribution/gravitino-iceberg-rest-server + ├─ bin/ + │ └─ gravitino-iceberg-rest-server.sh # Launching scripts. + ├─ conf/ # All configurations. + │ ├─ gravitino-iceberg-rest-server.conf # Server configuration. + │ ├─ gravitino-env.sh # Environment variables, e.g. JAVA_HOME, GRAVITINO_HOME, etc. + │ ├─ log4j2.properties # log4j configurations. + │ └─ hdfs-site.xml & core-site.xml # HDFS configuration files. + ├─ libs/ # Dependencies libraries. + └─ logs/ # Logs directory. Auto-created after the server starts. +``` + +## Server configuration + +There are distinct configuration files for standalone and auxiliary server: + +- `gravitino-iceberg-rest-server.conf` is used for the standalone server; +- `gravitino.conf` is for the auxiliary server. + +Although the configuration files differ, the configuration items remain the same. + +Starting with version `0.6.0-incubating`, the prefix `gravitino.auxService.iceberg-rest.` +for auxiliary server configurations has been deprecated. +If both `gravitino.auxService.iceberg-rest.key` and `gravitino.iceberg-rest.key` are present, +the latter will take precedence. +The configurations listed below use the `gravitino.iceberg-rest.` prefix. + +### Configuration to enable Iceberg REST service in Gravitino server. + +<table> +<thead> +<tr> + <th>Configuration item</th> + <th>Description</th> + <th>Default value</th> + <th>Required</th> + <th>Since Version</th> +</tr> +</thead> +<tbody> +<tr> + <td><tt>gravitino.auxService.names</tt></td> + <td> + The auxiliary service name of the Gravitino Iceberg REST catalog service. + Use `iceberg-rest`. + </td> + <td>(none)</td> + <td>Yes</td> + <td>`0.2.0`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.classpath</tt></td> + <td> + The CLASSPATH of the Gravitino Iceberg REST catalog service, + including the directory containing JARs and configuration. + It supports both absolute and relative paths. + For example, `iceberg-rest-server/libs,iceberg-rest-server/conf`. + </td> + <td>(none)</td> + <td>Yes</td> + <td>`0.2.0`</td> +</tr> +</tbody> +</table> + +:::note +These configurations only are only effective in `gravitino.conf`. +You don't need to specify them if the Iceberg server is started +as a standalone server. +::: + +### HTTP server configuration + +<table> +<thead> +<tr> + <th>Configuration item</th> + <th>Description</th> + <th>Default value</th> + <th>Required</th> + <th>Since Version</th> +</tr> +</thead> +<tbody> +<tr> + <td><tt>gravitino.iceberg-rest.host</tt></td> + <td>The host of the Gravitino Iceberg REST catalog service.</td> + <td>`0.0.0.0`</td> + <td>No</td> + <td>`0.2.0`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.httpPort</tt></td> + <td>The port of the Gravitino Iceberg REST catalog service.</td> + <td>`9001`</td> + <td>No</td> + <td>`0.2.0`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.minThreads</tt></td> + <td> + The minimum number of threads in the thread pool used by the Jetty Web server. + `minThreads` is 8 if the value is less than 8. + </td> + <td>`Math.max(Math.min(Runtime.getRuntime().availableProcessors() * 2, 100), 8)`</td> + <td>No</td> + <td>`0.2.0`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.maxThreads</tt></td> + <td> + The maximum number of threads in the thread pool used by the Jetty Web server. + `maxThreads` is 8 if the value is less than 8, and the value must be greater than or equal to `minThreads`. + </td> + <td>`Math.max(Runtime.getRuntime().availableProcessors() * 4, 400)`</td> + <td>No</td> + <td>`0.2.0`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.threadPoolWorkQueueSize</tt></td> + <td> + The size of the queue in the thread pool used by Gravitino Iceberg REST catalog service. + </td> + <td>`100`</td> + <td>No</td> + <td>`0.2.0`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.stopTimeout</tt></td> + <td> + The amount of time in ms for the Gravitino Iceberg REST catalog service to stop gracefully. + For more information, see `org.eclipse.jetty.server.Server#setStopTimeout`. + </td> + <td>`30000`</td> + <td>No</td> + <td>`0.2.0`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.idleTimeout</tt></td> + <td>The timeout in ms of idle connections.</td> + <td>`30000`</td> + <td>No</td> + <td>`0.2.0`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.requestHeaderSize</tt></td> + <td>The maximum size in bytes for a HTTP request.</td> + <td>`131072`</td> + <td>No</td> + <td>`0.2.0`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.responseHeaderSize</tt></td> + <td>The maximum size in bytes for a HTTP response.</td> + <td>`131072`</td> + <td>No</td> + <td>`0.2.0`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.customFilters</tt></td> + <td> + Comma-separated list of filter class names to apply to the APIs. + </td> + <td>(none)</td> + <td>No</td> + <td>`0.4.0`</td> +</tr> +</tbody> +</table> + +The filter in `customFilters` should be a standard javax servlet filter. +You can also specify filter parameters by setting configuration entries +in the format `gravitino.iceberg-rest.<filter class name>.param.<name>=<value>`. + +### Security + +Gravitino Iceberg REST server supports OAuth2 and HTTPS, +please refer to [security documentation](../security/index.md) for more details. + +#### Backend authentication + +For JDBC backend, you can use the `gravitino.iceberg-rest.jdbc-user` and `gravitino.iceberg-rest.jdbc-password` +to authenticate the JDBC connection. +For Hive backend, you can use the `gravitino.iceberg-rest.authentication.type` +to specify the authentication type, and use the `gravitino.iceberg-rest.authentication.kerberos.principal` +and `gravitino.iceberg-rest.authentication.kerberos.keytab-uri` +to authenticate the Kerberos connection. +The detailed configuration items are as follows: + +<table> +<thead> +<tr> + <th>Configuration item</th> + <th>Description</th> + <th>Default value</th> + <th>Required</th> + <th>Since Version</th> +</tr> +</thead> +<tbody> +<tr> + <td><tt>gravitino.iceberg-rest.authentication.type</tt></td> + <td> + The type of authentication for Iceberg rest catalog backend. + This configuration only applicable for for Hive backend, + and only supports `Kerberos`, `simple` currently. + As for JDBC backend, only username/password authentication is supported now. + </td> + <td>`simple`</td> + <td>No</td> + <td>`0.7.0-incubating`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.authentication.impersonation-enable</tt></td> + <td>Whether impersonation is enabled for the Iceberg catalog service.</td> + <td>`false`</td> + <td>No</td> + <td>`0.7.0-incubating`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.hive.metastore.sasl.enabled</tt></td> + <td> + Whether SASL authentication protocol is enabled when connecting to Kerberos Hive metastore. + + This value should be `true` in most case + when the value of `gravitino.iceberg-rest.authentication.type` is Kerberos. + In some very rare cases, the SSL protocol is used. + </td> + <td>`false`</td> + <td>No</td> + <td>`0.7.0-incubating`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.authentication.kerberos.principal</tt></td> + <td> + The principal of the Kerberos authentication. + + This field required if the value of `gravitino.iceberg-rest.authentication.type` is `Kerberos`. + </td> + <td>(none)</td> + <td>Yes|No</td> + <td>`0.7.0-incubating`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.authentication.kerberos.keytab-uri</tt></td> + <td> + The URI of the keytab for the Kerberos authentication. + This field required if the value of `gravitino.iceberg-rest.authentication.type` is `Kerberos`. + </td> + <td>(none)</td> + <td>Yes|No</td> + <td>`0.7.0-incubating`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.authentication.kerberos.check-interval-sec</tt></td> + <td>The check interval in seconds of Kerberos credential for Iceberg catalog.</td> + <td>60</td> + <td>No</td> + <td>`0.7.0-incubating`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.authentication.kerberos.keytab-fetch-timeout-sec</tt></td> + <td> + The fetch timeout in seconds when retrieving Kerberos keytab + from `authentication.kerberos.keytab-uri`. + </td> + <td>60</td> + <td>No</td> + <td>`0.7.0-incubating`</td> +</tr> +</tbody> +</table> + +### Credential vending + +Please refer to [credential vending](../security/credential-vending.md) for more details. + +### Storage + +#### S3 configuration + +<table> +<thead> +<tr> + <th>Configuration item</th> + <th>Description</th> + <th>Default value</th> + <th>Required</th> + <th>Since Version</th> +</tr> +</thead> +<tbody> +<tr> + <td><tt>gravitino.iceberg-rest.io-impl</tt></td> + <td> + The I/O implementation for `FileIO` in Iceberg. + Use `org.apache.iceberg.aws.s3.S3FileIO` for S3. + </td> + <td>(none)</td> + <td>No</td> + <td>`0.6.0-incubating`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.s3-endpoint</tt></td> + <td> + An alternative endpoint of the S3 service. + This could be used for S3FileIO with any s3-compatible object storage service + that has a different endpoint, or access a private S3 endpoint + in a virtual private cloud. + </td> + <td>(none)</td> + <td>No</td> + <td>`0.6.0-incubating`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.s3-region</tt></td> + <td>The region of the S3 service, like `us-west-2`.</td> + <td>(none)</td> + <td>No</td> + <td>`0.6.0-incubating`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.s3-path-style-access</tt></td> + <td>Whether to use path style access for S3.</td> + <td>`false`</td> + <td>No</td> + <td>`0.9.0-incubating`</td> +</tr> +</tbody> +</table> + +For other Iceberg s3 properties not managed by Gravitino like `s3.sse.type`, +you could config it directly by `gravitino.iceberg-rest.s3.sse.type`. + +Please refer to [S3 credentials](../security/credential-vending.md#s3-credentials) +for credential related configurations. + +:::info +To configure the JDBC catalog backend, set the `gravitino.iceberg-rest.warehouse` parameter +to `s3://{bucket_name}/${prefix_name}`. +For the Hive catalog backend, set `gravitino.iceberg-rest.warehouse` +to `s3a://{bucket_name}/${prefix_name}`. +Additionally, download the [Iceberg AWS bundle](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-aws-bundle) +and place it in the CLASSPATH of Iceberg REST server. +::: + +#### OSS configuration + +<table> +<thead> +<tr> + <th>Configuration item</th> + <th>Description</th> + <th>Default value</th> + <th>Required</th> + <th>Since Version</th> +</tr> +</thead> +<tbody> +<tr> + <td><tt>gravitino.iceberg-rest.io-impl</tt></td> + <td> + The I/O implementation for `FileIO` in Iceberg. + Use `org.apache.iceberg.aliyun.oss.OSSFileIO` for OSS. + </td> + <td>(none)</td> + <td>No</td> + <td>`0.6.0-incubating`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.oss-endpoint</tt></td> + <td>The endpoint of Aliyun OSS service.</td> + <td>(none)</td> + <td>No</td> + <td>`0.7.0-incubating`</td> +</tr> +<tr> + <td><tt>gravitino.iceberg-rest.oss-region</tt></td> + <td> + The region of the OSS service, like `oss-cn-hangzhou`. + Only used when `credential-providers` is `oss-token`. + </td> + <td>(none)</td> + <td>No</td> + <td>`0.8.0-incubating`</td> +</tr> +</tbody> +</table> + +For other Iceberg OSS properties not managed by Gravitino like `client.security-token`, +you could config it directly by `gravitino.iceberg-rest.client.security-token`. + +Please refer to [OSS credentials](../security/credential-vending.md#oss-credentials) +for credential related configurations. Review Comment: for configuration related to credentials. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@gravitino.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org