Hi folks, The three approaches should be transparent for end users.
According to the isolation that a realm is supposed to guarantee, I think we should consider 1 and 2 (3 is maybe too light in terms of strong isolation). 1 and 2 can be achieved by configuration: it could be a dedicated database, or a dedicated schema, or a dedicated schema in a dedicated database. So, why not applying 2 (dedicated schema) on a datasource, this datasource can link to a dedicated database or the "Polaris" database. My $0.01 :) Regards JB On Tue, Apr 15, 2025 at 4:00 PM Dmitri Bourlatchkov <di...@apache.org> wrote: > > Thanks for your perspective, Pierre! You make good points and I agree with > them. > > From my POV, I'd add that we probably need to take deployment concerns into > account too. > > If the deployment uses the database per realm approach (option 1) then > someone has to provide database connection parameters (including secrets). > If that is the deployment administrator, then the admin necessarily has to > be aware of all realms and effectively has control of the data in all > realms. Isolation is achieved only for end users. > > That said, even with option 3 the deployment owner has control over all > realms and end users are isolated as far as their access to APIs is > concerned. End users cannot discover each other's data (barring coding > mistakes in Polaris). The same goes for option 2 as it's the middle ground. > > I do not see any material difference between options 1, 2 and 3 from the > end user's perspective. > > If, however, the database connection parameters are not controlled by the > administrator, but by the end user who wants to define a realm, then > Polaris needs to expose managing database connections and secrets. This may > be a valuable feature, but I believe it is far beyond current Polaris > backend capabilities. I do not think going this way is justified at this > time. > > I'd like to propose a hybrid approach where Polaris provides capabilities > (and config) for the administrators to choose between options 1, 2, 3 > according to their specific deployment concerns. > > This means that the primary key has to include the realm ID, because if the > Polaris code does not provide it then the admin will not be able to choose > option 3 at runtime. > > WDYT? > > Thanks, > Dmitri. > > On Tue, Apr 15, 2025 at 8:35 AM Pierre Laporte <pie...@pingtimeout.fr> > wrote: > > > Hi Prashant > > > > I guess the answer will depend on how easy it should be for Polaris to > > support multi-tenancy. > > > > A separate database per realm would allow administrators to limit the > > amount of resources that a realm can consume (e.g. the maximum number of > > database connections). Indeed, it would be one of the strongest isolation > > mode. However, the code would need to support a complete database > > configuration per realm (think username and password and possibly IP > > address) if the goal is to match Postgres capabilities. In terms of > > backup/restore, it is the most flexible option. > > > > A "one schema per realm" approach would be a simpler approach, regarding > > datasource configuration. However, there would be less isolation between > > realms, and a resource utilization spike on one realm could impact > > performance of another realm. It is as flexible as option #1 regarding > > backup and restore. > > > > A "realm as part of the primary key" approach is the most efficient way, in > > that the cost of adding tenants is close to zero. Like in option #2, there > > is no real resource isolation between tenants and a noisy-neighbor > > situation is a possible issue. The biggest difference is regarding backup > > and restore. Consider the case where data is accidentally > > wiped/corrupted/modified/... in a given tenant and administrators want to > > restore it to a previous state. With this approach, it is a much more > > complex as Postgres does not (AFAIK) allow the possibility to restore > > tables partially. > > > > Just my 2 cents > > > > -- > > > > Pierre > > > > > > On Tue, Apr 15, 2025 at 12:42 AM Prashant Singh > > <prashant.si...@snowflake.com.invalid> wrote: > > > > > Dear Polaris Community, > > > > > > This email initiates a discussion regarding the modeling of Realms within > > > the Polaris project, following its recent mention in my JDBC > > implementation > > > pull request: > > > https://github.com/apache/polaris/pull/1287/files#r2040383971. > > > > > > My current understanding, based on available information, is that Realms > > > were primarily intended for isolation. Consequently, the EclipseLink > > > implementation treats each Realm as a separate database. > > > > > > As we are re-implementing this functionality, it was suggested that we > > > gather community feedback on the optimal approach to modeling Realms. > > > > > > Based on my current understanding, here are potential modeling options: > > > > > > *1. Separate Databases per Realm:* > > > > > > - Each Realm would correspond to a distinct database. > > > - This could be implemented using Quarkus custom data sources, with > > one > > > data source per Realm. > > > > > > *2. Separate Schemas per Realm:* > > > > > > - Each Realm would correspond to a distinct database schema within a > > > single database. > > > - Most database systems support two-part identifiers ( > > > <schema_name>.<table_name>), allowing for data isolation. > > > > > > *3. Realm as a Primary Key:* > > > > > > - A realm identifier would be added as a primary key (or part of a > > > composite primary key) to each Polaris table. > > > - Data isolation would be enforced through filtering based on this key > > > during data access. > > > > > > The optimal approach will likely depend on ease of use and > > maintainability > > > for database administrators. > > > > > > Please share your thoughts and preferences regarding these options. > > > > > > Best regards, > > > > > > Prashant Singh > > > > >