dimas-b commented on code in PR #3960: URL: https://github.com/apache/polaris/pull/3960#discussion_r3023786198
########## site/content/in-dev/unreleased/metastores/jdbc-multi-datasource.md: ########## @@ -0,0 +1,90 @@ +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> +# Design: Multi-DataSource support in JDBC persistence Review Comment: Should this perhaps be under `unreleased/proposals`?.. e.g. like in this proposal using GH PR and `.md` files: #3924 I personally do not mind having the proposal doc in the same PR as the implementation, but we need to be careful with reviews so as to settle on the general proposal first and proceed to code changes later (to avoid unnecessary work). ########## site/content/in-dev/unreleased/metastores/jdbc-multi-datasource.md: ########## @@ -0,0 +1,90 @@ +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> +# Design: Multi-DataSource support in JDBC persistence + +## Goal +The goal of this design is to decouple `DataSource` selection from the core JDBC persistence logic. This allows for: +1. **Workload Isolation**: Separating the Metastore, Metrics reporting, and Event logging into different physical databases or connection pools. +2. **Scalable Multi-Tenancy**: Enabling per-realm (tenant) routing to support large-scale deployments. + +## Core Interface: `DataSourceResolver` +A service interface designed to resolve the correct `DataSource` based on the workload's metadata. + +```java +public interface DataSourceResolver { + enum StoreType { + METASTORE, + METRICS, + EVENTS + } + + DataSource resolve(RealmContext realmContext, StoreType storeType); +} +``` + +### Key Components +- **`DataSourceResolver`**: SPI for resolution logic. +- **`DefaultDataSourceResolver`**: Backward-compatible implementation that returns the single primary `DataSource` for all requests. +- **`JdbcMetaStoreManagerFactory`**: Overwrites the resolution logic to use the `DataSourceResolver`. +- **`JdbcBasePersistenceImpl`**: Manages three separate `DatasourceOperations` objects. + +## Schema Management and Evolution + +### Functional SQL Script Splitting +To support hosting different workloads on different databases, the monolithic `schema-vX.sql` scripts are split into functional components: +- `schema-vX-metastore.sql`: Definitions for `polaris_entities`, `polaris_grant_records`, and `polaris_schema_version`. +- `schema-vX-metrics.sql`: Definitions for `polaris_metrics_scan` and `polaris_metrics_commit`. +- `schema-vX-events.sql`: Definitions for `polaris_events`. + +### Version Authority +The **Metastore** remains the single source of truth for the realm's logical schema version. The `polaris_schema_version` table is exclusively maintained within the `METASTORE` data source. All associated stores (Metrics, Events) are expected to be physically compatible with this detected version. + +## Operational Behaviors + +### Bootstrap +When a realm is bootstrapped via `JdbcMetaStoreManagerFactory.bootstrapRealms()`: +1. **Resolution**: The `DataSourceResolver` is called for each `StoreType` (METASTORE, METRICS, EVENTS). Review Comment: I'd think the bootstrap process should indicate which of the possible store types to initialize. Ultimately this is a the Admin user's decision, I think (defaulting to all). The rationale is to not create tables that the Admin user knows are not going to be used (reducing database maintenance burden by not creating "dead" tables). ########## site/content/in-dev/unreleased/metastores/jdbc-multi-datasource.md: ########## @@ -0,0 +1,90 @@ +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> +# Design: Multi-DataSource support in JDBC persistence + +## Goal +The goal of this design is to decouple `DataSource` selection from the core JDBC persistence logic. This allows for: +1. **Workload Isolation**: Separating the Metastore, Metrics reporting, and Event logging into different physical databases or connection pools. +2. **Scalable Multi-Tenancy**: Enabling per-realm (tenant) routing to support large-scale deployments. + +## Core Interface: `DataSourceResolver` +A service interface designed to resolve the correct `DataSource` based on the workload's metadata. + +```java +public interface DataSourceResolver { + enum StoreType { + METASTORE, + METRICS, + EVENTS + } + + DataSource resolve(RealmContext realmContext, StoreType storeType); +} +``` + +### Key Components +- **`DataSourceResolver`**: SPI for resolution logic. +- **`DefaultDataSourceResolver`**: Backward-compatible implementation that returns the single primary `DataSource` for all requests. +- **`JdbcMetaStoreManagerFactory`**: Overwrites the resolution logic to use the `DataSourceResolver`. +- **`JdbcBasePersistenceImpl`**: Manages three separate `DatasourceOperations` objects. + +## Schema Management and Evolution + +### Functional SQL Script Splitting +To support hosting different workloads on different databases, the monolithic `schema-vX.sql` scripts are split into functional components: +- `schema-vX-metastore.sql`: Definitions for `polaris_entities`, `polaris_grant_records`, and `polaris_schema_version`. +- `schema-vX-metrics.sql`: Definitions for `polaris_metrics_scan` and `polaris_metrics_commit`. +- `schema-vX-events.sql`: Definitions for `polaris_events`. + +### Version Authority +The **Metastore** remains the single source of truth for the realm's logical schema version. The `polaris_schema_version` table is exclusively maintained within the `METASTORE` data source. All associated stores (Metrics, Events) are expected to be physically compatible with this detected version. + +## Operational Behaviors + +### Bootstrap +When a realm is bootstrapped via `JdbcMetaStoreManagerFactory.bootstrapRealms()`: +1. **Resolution**: The `DataSourceResolver` is called for each `StoreType` (METASTORE, METRICS, EVENTS). +2. **Initialization**: Each resolved data source is initialized with its specific functional SQL script. +3. **Idempotency**: If all three `StoreType` mappings point to the same physical database, the scripts are applied sequentially. The DDL is designed to be idempotent to prevent errors during this "merged" initialization. + +### Purge (Cleanup) +When a realm is purged: +1. The `purge()` operation is initiated. +2. The `JdbcBasePersistenceImpl.deleteAll()` method is triggered. +3. **Multi-Store Cleanup**: `deleteAll()` executes `DELETE` commands targeting the specific `realm_id` across all three `DatasourceOperations`: + - `metastoreOps`: Clears entities, grants, secrets, and policy mappings. + - `metricsOps`: Clears `scan_metrics_report` and `commit_metrics_report`. + - `eventOps`: Clears the `events` table. +4. This ensures that all data across all three potential data sources is cleaned up for the specified `realm_id`. + +## Schema Upgrades +Upgrading a Polaris deployment from version N to N+1 involves: +1. **Detection**: The service detects that the `metastore`'s `polaris_schema_version` is below the requested level. +2. **Execution**: The migration logic resolves each `StoreType` and applies the relevant "vN-to-vN+1" upgrade script. Review Comment: I believe schema upgrades should be part of the `bootstrap` flow and require an explicit user action... WDYT? ########## site/content/in-dev/unreleased/metastores/jdbc-multi-datasource.md: ########## @@ -0,0 +1,90 @@ +<!-- + ~ Licensed to the Apache Software Foundation (ASF) under one + ~ or more contributor license agreements. See the NOTICE file + ~ distributed with this work for additional information + ~ regarding copyright ownership. The ASF licenses this file + ~ to you under the Apache License, Version 2.0 (the + ~ "License"); you may not use this file except in compliance + ~ with the License. You may obtain a copy of the License at + ~ + ~ http://www.apache.org/licenses/LICENSE-2.0 + ~ + ~ Unless required by applicable law or agreed to in writing, + ~ software distributed under the License is distributed on an + ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + ~ KIND, either express or implied. See the License for the + ~ specific language governing permissions and limitations + ~ under the License. + --> +# Design: Multi-DataSource support in JDBC persistence + +## Goal +The goal of this design is to decouple `DataSource` selection from the core JDBC persistence logic. This allows for: +1. **Workload Isolation**: Separating the Metastore, Metrics reporting, and Event logging into different physical databases or connection pools. +2. **Scalable Multi-Tenancy**: Enabling per-realm (tenant) routing to support large-scale deployments. + +## Core Interface: `DataSourceResolver` +A service interface designed to resolve the correct `DataSource` based on the workload's metadata. + +```java +public interface DataSourceResolver { + enum StoreType { + METASTORE, + METRICS, + EVENTS + } + + DataSource resolve(RealmContext realmContext, StoreType storeType); +} +``` + +### Key Components +- **`DataSourceResolver`**: SPI for resolution logic. +- **`DefaultDataSourceResolver`**: Backward-compatible implementation that returns the single primary `DataSource` for all requests. +- **`JdbcMetaStoreManagerFactory`**: Overwrites the resolution logic to use the `DataSourceResolver`. +- **`JdbcBasePersistenceImpl`**: Manages three separate `DatasourceOperations` objects. + +## Schema Management and Evolution + +### Functional SQL Script Splitting +To support hosting different workloads on different databases, the monolithic `schema-vX.sql` scripts are split into functional components: +- `schema-vX-metastore.sql`: Definitions for `polaris_entities`, `polaris_grant_records`, and `polaris_schema_version`. +- `schema-vX-metrics.sql`: Definitions for `polaris_metrics_scan` and `polaris_metrics_commit`. +- `schema-vX-events.sql`: Definitions for `polaris_events`. + +### Version Authority +The **Metastore** remains the single source of truth for the realm's logical schema version. The `polaris_schema_version` table is exclusively maintained within the `METASTORE` data source. All associated stores (Metrics, Events) are expected to be physically compatible with this detected version. + +## Operational Behaviors + +### Bootstrap +When a realm is bootstrapped via `JdbcMetaStoreManagerFactory.bootstrapRealms()`: +1. **Resolution**: The `DataSourceResolver` is called for each `StoreType` (METASTORE, METRICS, EVENTS). +2. **Initialization**: Each resolved data source is initialized with its specific functional SQL script. +3. **Idempotency**: If all three `StoreType` mappings point to the same physical database, the scripts are applied sequentially. The DDL is designed to be idempotent to prevent errors during this "merged" initialization. + +### Purge (Cleanup) +When a realm is purged: +1. The `purge()` operation is initiated. +2. The `JdbcBasePersistenceImpl.deleteAll()` method is triggered. +3. **Multi-Store Cleanup**: `deleteAll()` executes `DELETE` commands targeting the specific `realm_id` across all three `DatasourceOperations`: + - `metastoreOps`: Clears entities, grants, secrets, and policy mappings. + - `metricsOps`: Clears `scan_metrics_report` and `commit_metrics_report`. + - `eventOps`: Clears the `events` table. +4. This ensures that all data across all three potential data sources is cleaned up for the specified `realm_id`. + +## Schema Upgrades +Upgrading a Polaris deployment from version N to N+1 involves: +1. **Detection**: The service detects that the `metastore`'s `polaris_schema_version` is below the requested level. +2. **Execution**: The migration logic resolves each `StoreType` and applies the relevant "vN-to-vN+1" upgrade script. +3. **Consistency**: Because the process is unified within the `JdbcMetaStoreManagerFactory`, it provides a single point of failure and ensures that all data sources are either upgraded or rolled back (where possible). Review Comment: Different store types do not have to be on the same schema version, IMHO. Their tables are not transactionally related. Moreover, storage for `METRICS` does not have to be JDBC even if `METASTORE` is JDBC (same for events). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
