Re: Context-Aware Functions for Apache Polaris

2025-05-19 Thread Prashant Singh
Hey Ajanta, Thank you so much for your feedback ! But my intention is different, what I want to do is to evaluate the identity related context at catalog end and not let it evaluate at the engine end. > It seems to me that what you are trying to do is to store scalar UDF in the polaris catalog.

Re: Context-Aware Functions for Apache Polaris

2025-05-19 Thread Ajantha Bhat
Hi Prashant, It seems to me that what you are trying to do is to store scalar UDF in the polaris catalog. Iceberg is the best place to standardize this and store it as Iceberg UDF (not Polaris). Regarding the SQL syntax interoperability, we still haven't implemented this at Iceberg. Since UDF can

Re: [Discuss] Add `location` to generic table spec

2025-05-19 Thread yun zou
Hi Dmitri, " I do not think those doc comments provide enough visibility to ensure that the key information is received by users, unless they are dealing directly with the API" -- Yeah, I agree those information may not be visible enough for users who don't directly work with APIs. However, I thin

Re: [DISCUSS] Remove realm_id metric tag

2025-05-19 Thread Robert Stupp
Hi, I think it's okay to remove the realm-ID from the metric tags and leave it in traces. So +1 from me on doing this. High cardinality values are not good for metrics (or metrics systems) and can easily cause a lot of "interesting situations" in production systems - things that are hard to

Re: [DISCUSS] On-going CHANGELOG with breaking changes

2025-05-19 Thread Robert Stupp
+1 on having a CHANGELOG. That's been proven to be very useful. Change-log and release-notes (as a list of all commits) are orthogonal IMO. The former, as Dmitri mentioned, accumulates important changes in categories (highlights, breaking changes, new features, fixes). The latter is a "list of

Re: [PROPOSAL] Asynchronous & Reliable Tasks

2025-05-19 Thread Robert Stupp
Yes, each "task behavior" has an ID. I've chosen the term "task behavior" over "type", because it doesn't only define "what's done" but also "when" it's done (delay) and "how it behaves" (retries on failures). On 14.05.25 04:25, Adnan Hemani wrote: Hi Robert, Firstly, thanks for this document

Re: [HEADS UP] Preparing 0.10.0-beta-incubating-rc3

2025-05-19 Thread Jean-Baptiste Onofré
Hi folks, During my check, I've seen that the spark plugin builds a runtime jar (shading iceberg and other jar). Even if not distributed (for now), those jar files should include DISCLAIMER, LICENSE, and NOTICE. I will work on that. It's not a blocker for the release (as, again, not distributed t

Re: Merge module polaris-quarkus-admin and polaris-quarkus-server

2025-05-19 Thread Robert Stupp
Polaris-server and the admin-tool are separate things. You deploy the /server/ and let it run in k8s (or whatever). Bootstrapping via the admin-tools happens rarely (once) and is rather performed from an administrator's machine, whereas the server(s) run elsewhere. So server and admin-tool are

Re: GitHub pull-requests

2025-05-19 Thread Dmitri Bourlatchkov
Great suggestions, Robert! Thanks for writing them down. Cheers, Dmitri. On Mon, May 19, 2025 at 8:34 AM Robert Stupp wrote: > Hi all, > > Looking a bit ahead with respect to releases and (semi) automatic releases: > > We have a script that automatically collects the code changes from the > Git

Re: GitHub pull-requests

2025-05-19 Thread Prashant Singh
+1 to the suggestion ! On Mon, May 19, 2025 at 7:39 AM Dmitri Bourlatchkov wrote: > Great suggestions, Robert! Thanks for writing them down. > > Cheers, > Dmitri. > > On Mon, May 19, 2025 at 8:34 AM Robert Stupp wrote: > > > Hi all, > > > > Looking a bit ahead with respect to releases and (semi

[DISCUSS] Create apache/polaris-admin-tool DockerHub repository ?

2025-05-19 Thread Jean-Baptiste Onofré
Hi folks, Right now, as part of the release and nightly build, we plan to push the Polaris server docker image (on https://hub.docker.com/r/apache/polaris). Concretely, it means we push Polaris server As part of RC3 release prep, I pushed apache/polaris:0.10.0-beta-incubating-rc3 image (correspon

Re: [DISCUSS] Create apache/polaris-admin-tool DockerHub repository ?

2025-05-19 Thread Dmitri Bourlatchkov
Option 2 looks very confusing to me. While it can technically work, I think most people expect the repository name to reflect the nature of the binary, so apache/polaris would mean "server" by default. I prefer option 3. I also think we should have an image for the admin tool because it is requir

Re: [DISCUSS] Create apache/polaris-admin-tool DockerHub repository ?

2025-05-19 Thread Robert Stupp
Also +1 on option 3. Would also propose to push releases and snapshots to separate image repositories. On 19.05.25 17:46, Dmitri Bourlatchkov wrote: Option 2 looks very confusing to me. While it can technically work, I think most people expect the repository name to reflect the nature of the

Re: GitHub pull-requests

2025-05-19 Thread William Hyun
+1 on this suggestion, it would be good practice for everyone. Bests, William On Mon, May 19, 2025 at 7:56 AM Prashant Singh wrote: > +1 to the suggestion ! > > On Mon, May 19, 2025 at 7:39 AM Dmitri Bourlatchkov > wrote: > > > Great suggestions, Robert! Thanks for writing them down. > > > > C

GitHub pull-requests

2025-05-19 Thread Robert Stupp
Hi all, Looking a bit ahead with respect to releases and (semi) automatic releases: We have a script that automatically collects the code changes from the Git log and adds it to the release-notes. While PRs usually have some meaningful information in the description, that information is often

Re: [DISCUSS] Create apache/polaris-admin-tool DockerHub repository ?

2025-05-19 Thread Alex Dutra
Hi all, To be honest my preference would be Option 4: a different image name for everything that is "nightly" or "unstable". For example, "apache/polaris-unstable" or "apache/polaris-admin-tool-nightly". My reasoning is simple: make it almost impossible for a user to accidentally deploy an unstab

Context-Aware Functions for Apache Polaris

2025-05-19 Thread Prashant Singh
Hi everyone, I’d like to propose adding *context-aware functions* to Apache Polaris so that view definitions can resolve security context on the Polaris side (aka catalog end without depending on engines). *Proposed functions* 1. *is_principal('')* – returns TRUE if the authenticated p

Re: Context-Aware Functions for Apache Polaris

2025-05-19 Thread Laurent Goujon
Maybe I'm missing something, but aren't the engines the ones actually interpreting/executing the views, not Polaris? On Mon, May 19, 2025 at 10:27 AM Prashant Singh wrote: > Hi everyone, > > I’d like to propose adding *context-aware functions* to Apache Polaris so > that view definitions can res

Re: [DISCUSS] Prepare for 1.0 Release

2025-05-19 Thread Yufei Gu
Thanks everyone for the productive discussion! We've made great progress on cleaning up the Polaris 1.0 blockers. Here’s a quick summary: - Add CI for Python code ( #1058), Need a volunteer to pick https://github.com/apache/polaris/pull/1096 up

Re: Context-Aware Functions for Apache Polaris

2025-05-19 Thread Prashant Singh
Yes that's true, Engines will be the one executing the view, the idea is not to break that part but the intention is that we don't want engines to resolve the identity it is something the catalog should do. For ex engine might be running with its own user / groups etc which would not mean anything

Re: Context-Aware Functions for Apache Polaris

2025-05-19 Thread Robert Stupp
I'm brutally honest here: I think we should really stay away from interpreting SQL or any other kind of (view) definition in Polaris. There are tons of SQL dialects out there, each requires its own fully implemented lexer/parser/interpreter - plus views-in-views-in-views-in-views... constructs

Re: [Discuss] Add `location` to generic table spec

2025-05-19 Thread Russell Spitzer
The only multiple locations table formats I'm currently aware of are Hive (partitions can live wherever) and Iceberg. I think for Delta, Hudi, LanceDB, Paimon and File based tables they all have to live in the root location. I'm not sure of any other "file" based tables where this would be an iss

Re: GitHub pull-requests

2025-05-19 Thread Alex Dutra
Hi all, I'm obviously +1 on Robert's proposals. Should we modify our CONTRIBUTING.md guidelines? Also, in the spirit of standardizing how commit messages should be formatted, I suggest that we take a look at the Conventional Commits spec [1]. This small spec is becoming more and more popular (I

Re: Context-Aware Functions for Apache Polaris

2025-05-19 Thread Prashant Singh
Hey Robert, Thank you for your honest feedbacks, please let me try answering your concerns : > There are tons of SQL dialects out there, each requires its own fully implemented lexer/parser/interpreter That's true and we are not interpreting it either, we are just replacing the sql text wherever

Re: [DISCUSS] Create apache/polaris-admin-tool DockerHub repository ?

2025-05-19 Thread Dmitri Bourlatchkov
Using a different repo name for nightlies / unstable sounds good to me, Cheers, Dmitri. On Mon, May 19, 2025 at 12:19 PM Alex Dutra wrote: > Hi all, > > To be honest my preference would be Option 4: a different image name > for everything that is "nightly" or "unstable". For example, > "apache/

Re: Context-Aware Functions for Apache Polaris

2025-05-19 Thread Jean-Baptiste Onofré
Hi Prashant Thanks for the proposal. I understand the purpose (about FGAC which is something we plan to work on), but I'm not sure if it's a good approach with this kind of SQL functions. Polaris, as a catalog, should: 1. not do query engine work, but more interact with any query engines (same di

Re: [DISCUSS] Remove realm_id metric tag

2025-05-19 Thread Dmitri Bourlatchkov
Removing realm_id from metrics tags makes sense to me (to avoid high cardinality). If we need to have insight into load differences from realm to realm, it might be preferable to introduce metrics dedicated to that rather than increasing the cardinality of every endpoint metric. Cheers, Dmitri.

Re: [DISCUSS] Create apache/polaris-admin-tool DockerHub repository ?

2025-05-19 Thread Jean-Baptiste Onofré
I'm not super convinced by a repository dedicated to nightly or snapshot. A nightly or a SNAPSHOT is a version/tag of an image. So, I would expect to have it in the same repository as the released images. For instance, you would have apache/polaris:x.y.z and apache/polaris:x.y.z-SNAPSHOT Some pr

Re: GitHub pull-requests

2025-05-19 Thread Dmitri Bourlatchkov
TBH, I'm not up-to-date on Conventional Commits features, but if it is possible to flag and auto-extract breaking changes / use notes / etc. from commi messages, it might be worth considering it as an alternative to my CHANGELOG [1] proposal. [1] https://lists.apache.org/thread/qznf8toht1r7ml35lt4

Re: [DISCUSS] Create apache/polaris-admin-tool DockerHub repository ?

2025-05-19 Thread Dmitri Bourlatchkov
If we put nightlies in the same repo, we should be careful with the "latest" tag. I suppose users will expect "latest" to track only officially released images in that case. It might even be worth _not_ using the "latest" tag at all. Cheers, Dmitri. On Mon, May 19, 2025 at 3:24 PM Jean-Baptist

Re: [Discuss] Add `location` to generic table spec

2025-05-19 Thread Dmitri Bourlatchkov
For context: my locations concerns are rooted in Nessie's experience where we often get problem reports related to files being outside the declared Iceberg metadata location. Example: https://github.com/projectnessie/nessie/issues/10817#issuecomment-2887329227 I'm ok going with a single location

[VOTE] Release Apache Polaris 0.10.0-beta-incubating (rc3)

2025-05-19 Thread Jean-Baptiste Onofré
Hi everyone I propose that we release the following RC (RC3) as the official Apache Polaris 0.10.0-beta-incubating release. * This corresponds to the tag: apache-polaris-0.10.0-beta-incubating-rc3 * https://github.com/apache/polaris/commits/apache-polaris-0.10.0-beta-incubating-rc3 * https://gi

Re: [VOTE] Release Apache Polaris 0.10.0-beta-incubating (rc3)

2025-05-19 Thread Adnan Hemani
(Non-binding) +1 On Mon, May 19, 2025 at 1:26 PM Jean-Baptiste Onofré wrote: > Hi everyone > > I propose that we release the following RC (RC3) as the official > Apache Polaris 0.10.0-beta-incubating release. > > * This corresponds to the tag: apache-polaris-0.10.0-beta-incubating-rc3 > * > http

Re: [Discuss] Add `location` to generic table spec

2025-05-19 Thread Russell Spitzer
Yeah I think Iceberg and Hive are the only ones trying to make life difficult, that I think we should also cover but in changes to the Iceberg Spec. Hive can just stay how it is ... On Mon, May 19, 2025 at 2:59 PM Dmitri Bourlatchkov wrote: > For context: my locations concerns are rooted in Ness

Re: Context-Aware Functions for Apache Polaris

2025-05-19 Thread Prashant Singh
Hey JB, Thank you so much for the feedback, I would like to convince you, as to what my thought process is, when i propose this : > not do query engine work, but more interact with any query engines for ex: TMS I agree with this in principle, and we should specially not involve any compute (for

Re: [Discuss] Add `location` to generic table spec

2025-05-19 Thread Yufei Gu
> > Open API yaml comments are not sufficient, IMHO. I'd prefer to have a > dedicated doc page to define expectations and compliance. I'm not against a dedicated doc page for that, but I think open API spec including comments should be the source of truth, instead of anywhere else. Yufei On M

Re: [Discuss] Add `location` to generic table spec

2025-05-19 Thread yun zou
Hi Dmitri, I think for Iceberg, we all agreed that there can be multiple locations, and I definitely agree with Russel that the extension should be done with the IRC endpoints. The Generic Table APIs are designed for non-Iceberg table usage today, and We still want Iceberg table usage to go throug

Re: [VOTE] Release Apache Polaris 0.10.0-beta-incubating (rc3)

2025-05-19 Thread Yufei Gu
+1(binding) 1. Verified asc, sha512. 2. Build and test passed. 3. Verified both binary distributions, admin and server. They can run successfully. Yufei On Mon, May 19, 2025 at 1:39 PM Adnan Hemani wrote: > (Non-binding) +1 > > On Mon, May 19, 2025 at 1:26 PM Jean-Baptiste Onofré > wrote: >

Re: [Discuss] Add `location` to generic table spec

2025-05-19 Thread Dmitri Bourlatchkov
Open API spec defines the API for obtaining the "location" property for Generic Tables. My concern is with the meaning of that property, which is at the level of Generic Table files. It is essentially about making a table format spec for Generic Tables, even though this spec may be very simple (co

Re: [Discuss] Add `location` to generic table spec

2025-05-19 Thread Dmitri Bourlatchkov
As I commented in my other recent email, I think by introducing a "location" property Polaris enters the realm of table format specs. This is fine, from my POV, however, since Polaris is the defining project behind that property, I believe Polaris should provide a more definitive description of th

Re: [Discuss] Add `location` to generic table spec

2025-05-19 Thread Yufei Gu
> > * Clients (engines) are responsible for writing files only under the > specified location. It's nice to have a doc like that. But the open API spec is *the* place to define the behavior of client and server, and how they interact with each other. Just as we said before, spec change is recommen

Re: [Discuss] Add `location` to generic table spec

2025-05-19 Thread yun zou
Hi Dmitri, Thanks for the detailed explanation, I definitely agree we need to call out those restrictions and compliance in our Spec. As for the documentation, Polaris today already publishes the API spec, if you go to page https://polaris.apache.org/in-dev/unreleased/, and click on the Catalog A

Re: [Discuss] Add `location` to generic table spec

2025-05-19 Thread Dmitri Bourlatchkov
I believe the Open API spec and the definition of "location" are slightly different concerns. The former is about the API used to obtain information about Generic Tables. The latter is about the interpretation of that information. One can think of the location value being handled / transferred be

Re: [Discuss] Add Policy Privileges and PolicyGrant to Management Spec

2025-05-19 Thread yun zou
+1 (non-binding) Thanks Jonas! On Wed, May 14, 2025 at 12:13 PM Yufei Gu wrote: > +1. Thanks Jonas! > Yufei > > > On Tue, May 13, 2025 at 9:56 PM Honah J. wrote: > > > Hi folks, > > > > I would like to propose extending the management API specification to add > > policy-related privileges and

Re: [DISCUSS] Create apache/polaris-admin-tool DockerHub repository ?

2025-05-19 Thread Jean-Baptiste Onofré
Yes agree. Latest would be a valid tag only for release images. Snapshot images will be just … snapshot tag ;) (not latest). Regards JB Le lun. 19 mai 2025 à 21:40, Dmitri Bourlatchkov a écrit : > If we put nightlies in the same repo, we should be careful with the > "latest" tag. > > I suppose

Re: GitHub pull-requests

2025-05-19 Thread Jean-Baptiste Onofré
+1 And I think Alex is right: it could be helpful to update contributing file in the repo and on the website. Regards JB Le lun. 19 mai 2025 à 20:26, Alex Dutra a écrit : > Hi all, > > I'm obviously +1 on Robert's proposals. Should we modify our > CONTRIBUTING.md guidelines? > > Also, in the s