Re: Context-Aware Functions for Apache Polaris

Eric Maynard Tue, 20 May 2025 07:48:03 -0700

I wouldn’t say that Polaris is changing a view definition, but per my
understanding Polaris is actually generating a view based on a Policy.


We will need a way for Polaris to embed some information into these views.
I don’t think this is a P0 to make FGAC work, and I don’t think this
necessarily needs to take the form of a SQL function. For example, it could
be through a policy like:

{
  “allow_columns”: [“a”, “b”],
  “transform_columns”: {
    {
      “col”: “c”,
      “predicate”: “some_func(x)”},
    {
      “col”: “d”,
      “predicate: “${current_user} == admin”
  }
}

Perhaps we can try to get the first iteration of FGAC (with only field
“allow_columns”) out first. Then, we can implement the engine-driven
predicates (like that on column “c”). Finally, we can examine options for
catalog-driven predicates like the one on column “d”.

—EM

On Tue, May 20, 2025 at 9:44 AM Robert Stupp <sn...@snazy.de> wrote:

> I don't think that Polaris should change any view definition in any way,
> but this is what the proposed approach does.
>
> The approach breaks the contract (behavior defined by the specification)
> and in turn the observed behavior, absolute no go's.
>
> Practically speaking, the approach is blindly changing some string
> without any knowledge about the actual meaning. But it has to know
> exactly what it's doing - and to do that it has to know all the SQL
> dialects.
>
> As a side note: every query engine already has information about the
> user. I'm definitely not supporting exposing any authZ related information.
>
> FGAC as a feature is a great thing to have. But the proposed approach is
> not the right way.
>
>
> On 19.05.25 20:33, Prashant Singh wrote:
> > Hey Robert,
> >
> > Thank you for your honest feedbacks, please let me try answering your
> > concerns :
> >
> >> There are tons of SQL dialects out there, each requires its own fully
> > implemented lexer/parser/interpreter
> > That's true and we are not interpreting it either, we are just replacing
> > the sql text wherever there is `is_principal('<principal_name>')` with
> the
> > value of TRUE and FALSE from the server end
> > we are not re-interpreting or parsing the tree, i am assuming this is
> what
> > Analyzer already does in constant folding and boolean simplification, but
> > yes post parsing, but IMHO i don't think it's an impossible thing to
> > achieve. If it helps i am even fine is wrapping this as
> > `{{is_principal('<principal_name>')}}` to make this very specific and let
> > only Polaris work, IMHO we can work out the rough edges with the
> > replacement.
> >
> >> for view containing view
> > This should not be a problem, as we just replace the text of the current
> > view definition when it comes to resolve the nested view it will issue
> the
> > same call of LOAD view but with the nested view identifier, when it will
> be
> > the call of nested view and that's when i we will do the replace
> > we don't open the nested view in the definition during the loadView of
> the
> > parent, if that's the concern here, the nested view is treated equivalent
> > to any other identifier which is opened / interpreted at later state of
> > execution.
> >
> >> Exposing authZ information via any kind of publicly accessible API to
> > every user sounds like an interesting source of information - especially
> > for the "not so good and nice guys".
> >
> > Yes that's true and that's my intention it's just how we are delivering
> the
> > info, i.e i expose it by view definition itself (or by any other entity
> > stored in Polaris) , but exposing this as an API would require engine
> side
> > integration too, which we as catalog have a very less control over as a
> > catalog.
> >
> >> What's the benefit over having the ACLs on the table/view defined in the
> > intended way?
> >
> > It's more from feature parity perspective and giving more control on view
> > rather than just ACL (which are conjunctions) for ex if we just
> complicate
> > the view def with more predicated for ex disjunction
> >
> > select * from ns1.layer1_table where (condition1) OR
> > (is_principal_role('ANALYST'))
> >
> > I would love to get your further feedback, considering the above.
> >
> >
> > Best,
> > Prashant Singh
> >
> >
> >
> >
> > On Mon, May 19, 2025 at 11:04 AM Robert Stupp <sn...@snazy.de> wrote:
> >
> >> I'm brutally honest here:
> >>
> >> I think we should really stay away from interpreting SQL or any other
> >> kind of (view) definition in Polaris. There are tons of SQL dialects out
> >> there, each requires its own fully implemented lexer/parser/interpreter
> >> - plus views-in-views-in-views-in-views... constructs requiring
> >> resolution of nested views. It eventually ends in implementing
> >> yet-another-query-engine. I doubt that this is doable with a
> >> "java.lang.String.replace(from, to)" approach.
> >>
> >> Exposing authZ information via any kind of publicly accessible API to
> >> every user sounds like an interesting source of information - especially
> >> for the "not so good and nice guys".
> >>
> >> Regarding the examples: what's the benefit over having the the ACLs on
> >> the table/view defined in the intended way?
> >>
> >> On 19.05.25 19:26, Prashant Singh wrote:
> >>> Hi everyone,
> >>>
> >>> I’d like to propose adding *context-aware functions* to Apache Polaris
> so
> >>> that view definitions can resolve security context on the Polaris side
> >> (aka
> >>> catalog end without depending on engines).
> >>>
> >>> *Proposed functions*
> >>>
> >>>      1.
> >>>
> >>>      *is_principal('<principal_name>')* – returns TRUE if the
> >> authenticated
> >>>      principal matches <principal_name>, otherwise FALSE.
> >>>      2.
> >>>
> >>>      *is_principal_role('<principal_role_name>')* – returns TRUE when
> >>>      <principal_role_name> appears in the principal’s role set.
> >>>      3.
> >>>
> >>>      *is_catalog_role('<catalog_role_name>')* – analogous check at the
> >>>      catalog-role level.
> >>>
> >>> *Why it matters*
> >>>
> >>> These predicates make views dynamic. Example:
> >>>
> >>> CREATE VIEW dynamic_vw ASSELECT *FROM ns1.layer1_tableWHERE
> >>> is_principal_role('ANALYST');
> >>>
> >>> When a user whose one of principal roles include *ANALYST* calls LOAD
> >>> VIEW, Polaris rewrites the view to
> >>>
> >>>
> >>>      -
> >>>
> >>>      SELECT * FROM ns1.layer1_table WHERE TRUE;
> >>>
> >>>
> >>> For everyone else the view becomes
> >>>
> >>>      -
> >>>
> >>>      SELECT * FROM ns1.layer1_table WHERE FALSE;
> >>>
> >>>
> >>> The result is better and consistent control of the identity resolution
> >>> without relying on the engine side changes and giving polaris more
> >>> authority in enforcing things like FGAC (WIP by me).
> >>> Note the same can be extrapolated to any Polaris stored entity.
> >>>
> >>> *Proof of concept*
> >>>
> >>> I’ve put together a quick POC branch:
> >>>
> >>
> https://github.com/apache/polaris/compare/main...singhpk234:polaris:dyanmic/view
> >>> *Prior art*
> >>>
> >>> Snowflake context functions :
> >>>    https://docs.snowflake.com/en/sql-reference/functions-context
> >>> <https://docs.snowflake.com/en/sql-reference/functions-context>
> >>> Databricks Unity Catalog offers a similar mechanism called *dynamic
> >> views*:
> >>> https://docs.databricks.com/aws/en/views/dynamic
> >>>
> >>> *Next steps*
> >>>
> >>> If the community is interested, we can discuss API surface, engine
> >>> implications, and a roadmap for merging.
> >>>
> >>> Eager to hear your feedback!
> >>>
> >>> Best,
> >>> Prashant Singh
> >>>
> >> --
> >> Robert Stupp
> >> @snazy
> >>
> >>
> --
> Robert Stupp
> @snazy
>
>

Re: Context-Aware Functions for Apache Polaris

Reply via email to