dimas-b commented on code in PR #4106: URL: https://github.com/apache/polaris/pull/4106#discussion_r3024290492
########## site/content/in-dev/unreleased/configuration/configuring-polaris-for-production/configuring-gcs-cloud-storage-specific.md: ########## @@ -23,10 +23,55 @@ type: docs weight: 600 --- -This page provides guidance for configuring GCS Cloud Storage provider for use with Polaris. It covers credential vending, IAM roles, ACL requirements, and best practices to ensure secure and reliable integration. +This guide covers how to configure Google Cloud Storage (GCS) as a storage backend for Polaris catalogs, including credential vending, IAM configuration, and access control. -All catalog operations in Polaris for Google Cloud Storage (GCS)—including listing, reading, and writing objects—are performed using credential vending, which issues scoped (vended) tokens for secure access. +## Overview -Polaris requires both IAM roles and [Hierarchical Namespace (HNS)](https://docs.cloud.google.com/storage/docs/hns-overview) ACLs (if HNS is enabled) to be properly configured. Even with the correct IAM role (e.g., `roles/storage.objectAdmin`), access to paths such as `gs://<bucket>/idsp_ns/sample_table4/` may fail with 403 errors if HNS ACLs are missing for scoped tokens. The original access token may work, but scoped (vended) tokens require HNS ACLs on the base path or relevant subpath. +Polaris uses **credential vending** to securely manage access to GCS objects. When you configure a catalog with GCS storage, Polaris issues scoped (vended) tokens with limited permissions and duration for each operation, rather than using long-lived credentials. -**Note:** HNS is not mandatory when using GCS for a catalog in Polaris. If HNS is not enabled on the bucket, only IAM roles are required for access. Always verify HNS ACLs in addition to IAM roles when troubleshooting GCS access issues with credential vending and HNS enabled. +## Storage Configuration + +When creating a Polaris catalog with GCS storage, you need to specify: + +1. **Storage Type**: `GCS` +2. **Base Location**: The default GCS path for the catalog (e.g., `gs://your-bucket/catalogs/catalog-name`) +3. **Allowed Locations**: GCS paths where the catalog can read/write data + +## IAM Configuration + +### Service Account Permissions + +The service account running Polaris (e.g., on Cloud Run) needs appropriate IAM roles to access GCS: + +**Required IAM Roles:** +- `roles/storage.objectAdmin` - For read/write access to objects +- OR `roles/storage.objectViewer` + `roles/storage.objectCreator` - For more granular control + +Grant the role at the bucket level: + +```bash +gsutil iam ch serviceAccount:[email protected]:roles/storage.objectAdmin gs://your-bucket +``` + +### User Access Permissions + +In addition to GCS IAM, users need Polaris catalog roles to access tables: + +1. Create a catalog role with appropriate privileges: + - `TABLE_READ_DATA` - Read table data + - `TABLE_WRITE_DATA` - Write table data + - `NAMESPACE_FULL_METADATA` - Access namespace/table metadata +2. Assign the catalog role to a principal role (e.g., `service_admin`) + +This two-level permission model ensures both GCS access (via IAM) and Polaris access control (via catalog roles) are properly configured. + +## Google Cloud Storage Configuration +The preferred GCS configuration to have Hierarchical Namespaces disabled on the bucket and Fine-grained ACLS for access control. Review Comment: I personally do not have enough data to say what works and what does not with certainty :wink: I know of some cases with 403 errors in HNS GCS storage, but I cannot rule out mistakes :slightly_smiling_face: PR #3996 is still in review. "Verified" would assume Polaris as a project stands behind it, but we do not have CI for GCS, so anything that works now is not guaranteed to work tomorrow :shrug: Proposal: `GCS storage without hierarchical namespaces have been confirmed by the user community to work fine with Polaris. However, issues have been reported for hierarchical namespaces, so they should be considered with caution in production deployments.` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
