Hi folks,

The three approaches should be transparent for end users.

According to the isolation that a realm is supposed to guarantee, I
think we should consider 1 and 2 (3 is maybe too light in terms of
strong isolation).
1 and 2 can be achieved by configuration: it could be a dedicated
database, or a dedicated schema, or a dedicated schema in a dedicated
database.
So, why not applying 2 (dedicated schema) on a datasource, this
datasource can link to a dedicated database or the "Polaris" database.

My $0.01 :)

Regards
JB

On Tue, Apr 15, 2025 at 4:00 PM Dmitri Bourlatchkov <di...@apache.org> wrote:
>
> Thanks for your perspective, Pierre! You make good points and I agree with
> them.
>
> From my POV, I'd add that we probably need to take deployment concerns into
> account too.
>
> If the deployment uses the database per realm approach (option 1) then
> someone has to provide database connection parameters (including secrets).
> If that is the deployment administrator, then the admin necessarily has to
> be aware of all realms and effectively has control of the data in all
> realms. Isolation is achieved only for end users.
>
> That said, even with option 3 the deployment owner has control over all
> realms and end users are isolated as far as their access to APIs is
> concerned. End users cannot discover each other's data (barring coding
> mistakes in Polaris). The same goes for option 2 as it's the middle ground.
>
> I do not see any material difference between options 1, 2 and 3 from the
> end user's perspective.
>
> If, however, the database connection parameters are not controlled by the
> administrator, but by the end user who wants to define a realm, then
> Polaris needs to expose managing database connections and secrets. This may
> be a valuable feature, but I believe it is far beyond current Polaris
> backend capabilities. I do not think going this way is justified at this
> time.
>
> I'd like to propose a hybrid approach where Polaris provides capabilities
> (and config) for the administrators to choose between options 1, 2, 3
> according to their specific deployment concerns.
>
> This means that the primary key has to include the realm ID, because if the
> Polaris code does not provide it then the admin will not be able to choose
> option 3 at runtime.
>
> WDYT?
>
> Thanks,
> Dmitri.
>
> On Tue, Apr 15, 2025 at 8:35 AM Pierre Laporte <pie...@pingtimeout.fr>
> wrote:
>
> > Hi Prashant
> >
> > I guess the answer will depend on how easy it should be for Polaris to
> > support multi-tenancy.
> >
> > A separate database per realm would allow administrators to limit the
> > amount of resources that a realm can consume (e.g. the maximum number of
> > database connections).  Indeed, it would be one of the strongest isolation
> > mode.  However, the code would need to support a complete database
> > configuration per realm (think username and password and possibly IP
> > address) if the goal is to match Postgres capabilities.  In terms of
> > backup/restore, it is the most flexible option.
> >
> > A "one schema per realm" approach would be a simpler approach, regarding
> > datasource configuration.  However, there would be less isolation between
> > realms, and a resource utilization spike on one realm could impact
> > performance of another realm.  It is as flexible as option #1 regarding
> > backup and restore.
> >
> > A "realm as part of the primary key" approach is the most efficient way, in
> > that the cost of adding tenants is close to zero.  Like in option #2, there
> > is no real resource isolation between tenants and a noisy-neighbor
> > situation is a possible issue.  The biggest difference is regarding backup
> > and restore.  Consider the case where data is accidentally
> > wiped/corrupted/modified/... in a given tenant and administrators want to
> > restore it to a previous state.  With this approach, it is a much more
> > complex as Postgres does not (AFAIK) allow the possibility to restore
> > tables partially.
> >
> > Just my 2 cents
> >
> > --
> >
> > Pierre
> >
> >
> > On Tue, Apr 15, 2025 at 12:42 AM Prashant Singh
> > <prashant.si...@snowflake.com.invalid> wrote:
> >
> > > Dear Polaris Community,
> > >
> > > This email initiates a discussion regarding the modeling of Realms within
> > > the Polaris project, following its recent mention in my JDBC
> > implementation
> > > pull request:
> > > https://github.com/apache/polaris/pull/1287/files#r2040383971.
> > >
> > > My current understanding, based on available information, is that Realms
> > > were primarily intended for isolation. Consequently, the EclipseLink
> > > implementation treats each Realm as a separate database.
> > >
> > > As we are re-implementing this functionality, it was suggested that we
> > > gather community feedback on the optimal approach to modeling Realms.
> > >
> > > Based on my current understanding, here are potential modeling options:
> > >
> > > *1. Separate Databases per Realm:*
> > >
> > >    - Each Realm would correspond to a distinct database.
> > >    - This could be implemented using Quarkus custom data sources, with
> > one
> > >    data source per Realm.
> > >
> > > *2. Separate Schemas per Realm:*
> > >
> > >    - Each Realm would correspond to a distinct database schema within a
> > >    single database.
> > >    - Most database systems support two-part identifiers (
> > >    <schema_name>.<table_name>), allowing for data isolation.
> > >
> > > *3. Realm as a Primary Key:*
> > >
> > >    - A realm identifier would be added as a primary key (or part of a
> > >    composite primary key) to each Polaris table.
> > >    - Data isolation would be enforced through filtering based on this key
> > >    during data access.
> > >
> > > The optimal approach will likely depend on ease of use and
> > maintainability
> > > for database administrators.
> > >
> > > Please share your thoughts and preferences regarding these options.
> > >
> > > Best regards,
> > >
> > > Prashant Singh
> > >
> >

Reply via email to