Hi all,

I'm in agreement with Pierre, JB and Dmitri's points. I’d like to add some
context from the Quarkus configuration angle:

Option 1, which involves distinct datasources, presents a challenge.
Quarkus requires all datasources to be present and fully configured at
build time. This requirement could be quite cumbersome for end users,
making this option less user-friendly in practice.

Regarding Option 2, while it's theoretically possible to manage multiple
schemas with a single datasource, implementing this can be complex. To
effectively work with different schemas in PostgreSQL, you would need to
either qualify all table identifiers or adjust the `search_path` URL
parameter. Additionally, other JDBC backends like MySQL don't support
multiple schemas per database, which would make Option 2 less portable
across different JDBC databases.

That's why I think Option 3 is the most portable one, and the easiest for
users or administrators to configure. As Pierre noted, it is subject to
noisy neighbor interferences – but to some extent, I think interferences
could also happen with separate schemas like in option 2.

Just my 2 cents.

Thanks,

Alex


On Tue, Apr 15, 2025 at 4:00 PM Dmitri Bourlatchkov <di...@apache.org>
wrote:

> Thanks for your perspective, Pierre! You make good points and I agree with
> them.
>
> From my POV, I'd add that we probably need to take deployment concerns into
> account too.
>
> If the deployment uses the database per realm approach (option 1) then
> someone has to provide database connection parameters (including secrets).
> If that is the deployment administrator, then the admin necessarily has to
> be aware of all realms and effectively has control of the data in all
> realms. Isolation is achieved only for end users.
>
> That said, even with option 3 the deployment owner has control over all
> realms and end users are isolated as far as their access to APIs is
> concerned. End users cannot discover each other's data (barring coding
> mistakes in Polaris). The same goes for option 2 as it's the middle ground.
>
> I do not see any material difference between options 1, 2 and 3 from the
> end user's perspective.
>
> If, however, the database connection parameters are not controlled by the
> administrator, but by the end user who wants to define a realm, then
> Polaris needs to expose managing database connections and secrets. This may
> be a valuable feature, but I believe it is far beyond current Polaris
> backend capabilities. I do not think going this way is justified at this
> time.
>
> I'd like to propose a hybrid approach where Polaris provides capabilities
> (and config) for the administrators to choose between options 1, 2, 3
> according to their specific deployment concerns.
>
> This means that the primary key has to include the realm ID, because if the
> Polaris code does not provide it then the admin will not be able to choose
> option 3 at runtime.
>
> WDYT?
>
> Thanks,
> Dmitri.
>
> On Tue, Apr 15, 2025 at 8:35 AM Pierre Laporte <pie...@pingtimeout.fr>
> wrote:
>
> > Hi Prashant
> >
> > I guess the answer will depend on how easy it should be for Polaris to
> > support multi-tenancy.
> >
> > A separate database per realm would allow administrators to limit the
> > amount of resources that a realm can consume (e.g. the maximum number of
> > database connections).  Indeed, it would be one of the strongest
> isolation
> > mode.  However, the code would need to support a complete database
> > configuration per realm (think username and password and possibly IP
> > address) if the goal is to match Postgres capabilities.  In terms of
> > backup/restore, it is the most flexible option.
> >
> > A "one schema per realm" approach would be a simpler approach, regarding
> > datasource configuration.  However, there would be less isolation between
> > realms, and a resource utilization spike on one realm could impact
> > performance of another realm.  It is as flexible as option #1 regarding
> > backup and restore.
> >
> > A "realm as part of the primary key" approach is the most efficient way,
> in
> > that the cost of adding tenants is close to zero.  Like in option #2,
> there
> > is no real resource isolation between tenants and a noisy-neighbor
> > situation is a possible issue.  The biggest difference is regarding
> backup
> > and restore.  Consider the case where data is accidentally
> > wiped/corrupted/modified/... in a given tenant and administrators want to
> > restore it to a previous state.  With this approach, it is a much more
> > complex as Postgres does not (AFAIK) allow the possibility to restore
> > tables partially.
> >
> > Just my 2 cents
> >
> > --
> >
> > Pierre
> >
> >
> > On Tue, Apr 15, 2025 at 12:42 AM Prashant Singh
> > <prashant.si...@snowflake.com.invalid> wrote:
> >
> > > Dear Polaris Community,
> > >
> > > This email initiates a discussion regarding the modeling of Realms
> within
> > > the Polaris project, following its recent mention in my JDBC
> > implementation
> > > pull request:
> > > https://github.com/apache/polaris/pull/1287/files#r2040383971.
> > >
> > > My current understanding, based on available information, is that
> Realms
> > > were primarily intended for isolation. Consequently, the EclipseLink
> > > implementation treats each Realm as a separate database.
> > >
> > > As we are re-implementing this functionality, it was suggested that we
> > > gather community feedback on the optimal approach to modeling Realms.
> > >
> > > Based on my current understanding, here are potential modeling options:
> > >
> > > *1. Separate Databases per Realm:*
> > >
> > >    - Each Realm would correspond to a distinct database.
> > >    - This could be implemented using Quarkus custom data sources, with
> > one
> > >    data source per Realm.
> > >
> > > *2. Separate Schemas per Realm:*
> > >
> > >    - Each Realm would correspond to a distinct database schema within a
> > >    single database.
> > >    - Most database systems support two-part identifiers (
> > >    <schema_name>.<table_name>), allowing for data isolation.
> > >
> > > *3. Realm as a Primary Key:*
> > >
> > >    - A realm identifier would be added as a primary key (or part of a
> > >    composite primary key) to each Polaris table.
> > >    - Data isolation would be enforced through filtering based on this
> key
> > >    during data access.
> > >
> > > The optimal approach will likely depend on ease of use and
> > maintainability
> > > for database administrators.
> > >
> > > Please share your thoughts and preferences regarding these options.
> > >
> > > Best regards,
> > >
> > > Prashant Singh
> > >
> >
>

Reply via email to