Re: Context-Aware Functions for Apache Polaris

Prashant Singh Tue, 20 May 2025 09:37:12 -0700

Hey Robert,
I believe you are quoting Iceberg view spec :
https://iceberg.apache.org/view-spec/#versions
 1. All representations for a version should express the same underlying
definition (This holds true )
 2. Immutable View versions : if the concern is that we are using the same
view version, we can always generate a polaris generated view version, and
include these representation, this is an implementation detail
Note : The spec doesn't say who can generate the view representation as
very well you can do it with JAVA api, so IMHO we are not in violation of
spec if we create a new view version.
>  the approach is blindly changing some string without any knowledge about
the actual meaning.
I think I clearly called out it's a *POC* in my pr if that's what is being
quoted as the end solution, I am happy to work rough edges, though I think
if you strictly define your return type as boolean you can hold the
accountability to the view definer
if this string match leads to broken user experience, I would request to
objectively evaluate this idea of resolving the identities in the Polaris
side that's all I really wanna request for as its a very unorganized world
to unify spark's current_user() to
is_prinicpal() in polaris.


I hope this answers your concern. I am totally open to any recommendation
and work rough edges, let's solve this problem, together as a community !

Best,
Prashant Singh

On Tue, May 20, 2025 at 9:06 AM Robert Stupp <sn...@snazy.de> wrote:

> This proposal _does_ change the view definition - it returns a
> _different_ representation than the one that has been stored before.
> This is a change that breaks the contract of the specification and it
> changes the observed behavior.
>
> FGAC is a very complex topic. The right way would be to have a holistic
> design and agreed-on approach. That does take time.
>
> On 20.05.25 16:47, Eric Maynard wrote:
> > I wouldn’t say that Polaris is changing a view definition, but per my
> > understanding Polaris is actually generating a view based on a Policy.
> >
> > We will need a way for Polaris to embed some information into these
> views.
> > I don’t think this is a P0 to make FGAC work, and I don’t think this
> > necessarily needs to take the form of a SQL function. For example, it
> could
> > be through a policy like:
> >
> > {
> >    “allow_columns”: [“a”, “b”],
> >    “transform_columns”: {
> >      {
> >        “col”: “c”,
> >        “predicate”: “some_func(x)”},
> >      {
> >        “col”: “d”,
> >        “predicate: “${current_user} == admin”
> >    }
> > }
> >
> > Perhaps we can try to get the first iteration of FGAC (with only field
> > “allow_columns”) out first. Then, we can implement the engine-driven
> > predicates (like that on column “c”). Finally, we can examine options for
> > catalog-driven predicates like the one on column “d”.
> >
> > —EM
> >
> > On Tue, May 20, 2025 at 9:44 AM Robert Stupp <sn...@snazy.de> wrote:
> >
> >> I don't think that Polaris should change any view definition in any way,
> >> but this is what the proposed approach does.
> >>
> >> The approach breaks the contract (behavior defined by the specification)
> >> and in turn the observed behavior, absolute no go's.
> >>
> >> Practically speaking, the approach is blindly changing some string
> >> without any knowledge about the actual meaning. But it has to know
> >> exactly what it's doing - and to do that it has to know all the SQL
> >> dialects.
> >>
> >> As a side note: every query engine already has information about the
> >> user. I'm definitely not supporting exposing any authZ related
> information.
> >>
> >> FGAC as a feature is a great thing to have. But the proposed approach is
> >> not the right way.
> >>
> >>
> >> On 19.05.25 20:33, Prashant Singh wrote:
> >>> Hey Robert,
> >>>
> >>> Thank you for your honest feedbacks, please let me try answering your
> >>> concerns :
> >>>
> >>>> There are tons of SQL dialects out there, each requires its own fully
> >>> implemented lexer/parser/interpreter
> >>> That's true and we are not interpreting it either, we are just
> replacing
> >>> the sql text wherever there is `is_principal('<principal_name>')` with
> >> the
> >>> value of TRUE and FALSE from the server end
> >>> we are not re-interpreting or parsing the tree, i am assuming this is
> >> what
> >>> Analyzer already does in constant folding and boolean simplification,
> but
> >>> yes post parsing, but IMHO i don't think it's an impossible thing to
> >>> achieve. If it helps i am even fine is wrapping this as
> >>> `{{is_principal('<principal_name>')}}` to make this very specific and
> let
> >>> only Polaris work, IMHO we can work out the rough edges with the
> >>> replacement.
> >>>
> >>>> for view containing view
> >>> This should not be a problem, as we just replace the text of the
> current
> >>> view definition when it comes to resolve the nested view it will issue
> >> the
> >>> same call of LOAD view but with the nested view identifier, when it
> will
> >> be
> >>> the call of nested view and that's when i we will do the replace
> >>> we don't open the nested view in the definition during the loadView of
> >> the
> >>> parent, if that's the concern here, the nested view is treated
> equivalent
> >>> to any other identifier which is opened / interpreted at later state of
> >>> execution.
> >>>
> >>>> Exposing authZ information via any kind of publicly accessible API to
> >>> every user sounds like an interesting source of information -
> especially
> >>> for the "not so good and nice guys".
> >>>
> >>> Yes that's true and that's my intention it's just how we are delivering
> >> the
> >>> info, i.e i expose it by view definition itself (or by any other entity
> >>> stored in Polaris) , but exposing this as an API would require engine
> >> side
> >>> integration too, which we as catalog have a very less control over as a
> >>> catalog.
> >>>
> >>>> What's the benefit over having the ACLs on the table/view defined in
> the
> >>> intended way?
> >>>
> >>> It's more from feature parity perspective and giving more control on
> view
> >>> rather than just ACL (which are conjunctions) for ex if we just
> >> complicate
> >>> the view def with more predicated for ex disjunction
> >>>
> >>> select * from ns1.layer1_table where (condition1) OR
> >>> (is_principal_role('ANALYST'))
> >>>
> >>> I would love to get your further feedback, considering the above.
> >>>
> >>>
> >>> Best,
> >>> Prashant Singh
> >>>
> >>>
> >>>
> >>>
> >>> On Mon, May 19, 2025 at 11:04 AM Robert Stupp <sn...@snazy.de> wrote:
> >>>
> >>>> I'm brutally honest here:
> >>>>
> >>>> I think we should really stay away from interpreting SQL or any other
> >>>> kind of (view) definition in Polaris. There are tons of SQL dialects
> out
> >>>> there, each requires its own fully implemented
> lexer/parser/interpreter
> >>>> - plus views-in-views-in-views-in-views... constructs requiring
> >>>> resolution of nested views. It eventually ends in implementing
> >>>> yet-another-query-engine. I doubt that this is doable with a
> >>>> "java.lang.String.replace(from, to)" approach.
> >>>>
> >>>> Exposing authZ information via any kind of publicly accessible API to
> >>>> every user sounds like an interesting source of information -
> especially
> >>>> for the "not so good and nice guys".
> >>>>
> >>>> Regarding the examples: what's the benefit over having the the ACLs on
> >>>> the table/view defined in the intended way?
> >>>>
> >>>> On 19.05.25 19:26, Prashant Singh wrote:
> >>>>> Hi everyone,
> >>>>>
> >>>>> I’d like to propose adding *context-aware functions* to Apache
> Polaris
> >> so
> >>>>> that view definitions can resolve security context on the Polaris
> side
> >>>> (aka
> >>>>> catalog end without depending on engines).
> >>>>>
> >>>>> *Proposed functions*
> >>>>>
> >>>>>       1.
> >>>>>
> >>>>>       *is_principal('<principal_name>')* – returns TRUE if the
> >>>> authenticated
> >>>>>       principal matches <principal_name>, otherwise FALSE.
> >>>>>       2.
> >>>>>
> >>>>>       *is_principal_role('<principal_role_name>')* – returns TRUE
> when
> >>>>>       <principal_role_name> appears in the principal’s role set.
> >>>>>       3.
> >>>>>
> >>>>>       *is_catalog_role('<catalog_role_name>')* – analogous check at
> the
> >>>>>       catalog-role level.
> >>>>>
> >>>>> *Why it matters*
> >>>>>
> >>>>> These predicates make views dynamic. Example:
> >>>>>
> >>>>> CREATE VIEW dynamic_vw ASSELECT *FROM ns1.layer1_tableWHERE
> >>>>> is_principal_role('ANALYST');
> >>>>>
> >>>>> When a user whose one of principal roles include *ANALYST* calls LOAD
> >>>>> VIEW, Polaris rewrites the view to
> >>>>>
> >>>>>
> >>>>>       -
> >>>>>
> >>>>>       SELECT * FROM ns1.layer1_table WHERE TRUE;
> >>>>>
> >>>>>
> >>>>> For everyone else the view becomes
> >>>>>
> >>>>>       -
> >>>>>
> >>>>>       SELECT * FROM ns1.layer1_table WHERE FALSE;
> >>>>>
> >>>>>
> >>>>> The result is better and consistent control of the identity
> resolution
> >>>>> without relying on the engine side changes and giving polaris more
> >>>>> authority in enforcing things like FGAC (WIP by me).
> >>>>> Note the same can be extrapolated to any Polaris stored entity.
> >>>>>
> >>>>> *Proof of concept*
> >>>>>
> >>>>> I’ve put together a quick POC branch:
> >>>>>
> >>
> https://github.com/apache/polaris/compare/main...singhpk234:polaris:dyanmic/view
> >>>>> *Prior art*
> >>>>>
> >>>>> Snowflake context functions :
> >>>>>     https://docs.snowflake.com/en/sql-reference/functions-context
> >>>>> <https://docs.snowflake.com/en/sql-reference/functions-context>
> >>>>> Databricks Unity Catalog offers a similar mechanism called *dynamic
> >>>> views*:
> >>>>> https://docs.databricks.com/aws/en/views/dynamic
> >>>>>
> >>>>> *Next steps*
> >>>>>
> >>>>> If the community is interested, we can discuss API surface, engine
> >>>>> implications, and a roadmap for merging.
> >>>>>
> >>>>> Eager to hear your feedback!
> >>>>>
> >>>>> Best,
> >>>>> Prashant Singh
> >>>>>
> >>>> --
> >>>> Robert Stupp
> >>>> @snazy
> >>>>
> >>>>
> >> --
> >> Robert Stupp
> >> @snazy
> >>
> >>
> --
> Robert Stupp
> @snazy
>
>

Re: Context-Aware Functions for Apache Polaris

Reply via email to