Thanks for your perspective, Pierre! You make good points and I agree with them.
>From my POV, I'd add that we probably need to take deployment concerns into account too. If the deployment uses the database per realm approach (option 1) then someone has to provide database connection parameters (including secrets). If that is the deployment administrator, then the admin necessarily has to be aware of all realms and effectively has control of the data in all realms. Isolation is achieved only for end users. That said, even with option 3 the deployment owner has control over all realms and end users are isolated as far as their access to APIs is concerned. End users cannot discover each other's data (barring coding mistakes in Polaris). The same goes for option 2 as it's the middle ground. I do not see any material difference between options 1, 2 and 3 from the end user's perspective. If, however, the database connection parameters are not controlled by the administrator, but by the end user who wants to define a realm, then Polaris needs to expose managing database connections and secrets. This may be a valuable feature, but I believe it is far beyond current Polaris backend capabilities. I do not think going this way is justified at this time. I'd like to propose a hybrid approach where Polaris provides capabilities (and config) for the administrators to choose between options 1, 2, 3 according to their specific deployment concerns. This means that the primary key has to include the realm ID, because if the Polaris code does not provide it then the admin will not be able to choose option 3 at runtime. WDYT? Thanks, Dmitri. On Tue, Apr 15, 2025 at 8:35 AM Pierre Laporte <pie...@pingtimeout.fr> wrote: > Hi Prashant > > I guess the answer will depend on how easy it should be for Polaris to > support multi-tenancy. > > A separate database per realm would allow administrators to limit the > amount of resources that a realm can consume (e.g. the maximum number of > database connections). Indeed, it would be one of the strongest isolation > mode. However, the code would need to support a complete database > configuration per realm (think username and password and possibly IP > address) if the goal is to match Postgres capabilities. In terms of > backup/restore, it is the most flexible option. > > A "one schema per realm" approach would be a simpler approach, regarding > datasource configuration. However, there would be less isolation between > realms, and a resource utilization spike on one realm could impact > performance of another realm. It is as flexible as option #1 regarding > backup and restore. > > A "realm as part of the primary key" approach is the most efficient way, in > that the cost of adding tenants is close to zero. Like in option #2, there > is no real resource isolation between tenants and a noisy-neighbor > situation is a possible issue. The biggest difference is regarding backup > and restore. Consider the case where data is accidentally > wiped/corrupted/modified/... in a given tenant and administrators want to > restore it to a previous state. With this approach, it is a much more > complex as Postgres does not (AFAIK) allow the possibility to restore > tables partially. > > Just my 2 cents > > -- > > Pierre > > > On Tue, Apr 15, 2025 at 12:42 AM Prashant Singh > <prashant.si...@snowflake.com.invalid> wrote: > > > Dear Polaris Community, > > > > This email initiates a discussion regarding the modeling of Realms within > > the Polaris project, following its recent mention in my JDBC > implementation > > pull request: > > https://github.com/apache/polaris/pull/1287/files#r2040383971. > > > > My current understanding, based on available information, is that Realms > > were primarily intended for isolation. Consequently, the EclipseLink > > implementation treats each Realm as a separate database. > > > > As we are re-implementing this functionality, it was suggested that we > > gather community feedback on the optimal approach to modeling Realms. > > > > Based on my current understanding, here are potential modeling options: > > > > *1. Separate Databases per Realm:* > > > > - Each Realm would correspond to a distinct database. > > - This could be implemented using Quarkus custom data sources, with > one > > data source per Realm. > > > > *2. Separate Schemas per Realm:* > > > > - Each Realm would correspond to a distinct database schema within a > > single database. > > - Most database systems support two-part identifiers ( > > <schema_name>.<table_name>), allowing for data isolation. > > > > *3. Realm as a Primary Key:* > > > > - A realm identifier would be added as a primary key (or part of a > > composite primary key) to each Polaris table. > > - Data isolation would be enforced through filtering based on this key > > during data access. > > > > The optimal approach will likely depend on ease of use and > maintainability > > for database administrators. > > > > Please share your thoughts and preferences regarding these options. > > > > Best regards, > > > > Prashant Singh > > >