Re: funky S3 URIs: https://github.com/apache/polaris/issues/1545
On Wed, May 7, 2025 at 7:48 PM Dmitri Bourlatchkov <di...@apache.org> wrote: > What options do we have other than URI? I think it's more an engine side > concern. > > > If (as mentioned in previous emails) we use this location as an input into > generating vended credentials, then Polaris must be able to interpret it. > > Therefore, it is not only an engine side concern. > > What's the concern here? > > > Interpreting locations means dealing with S3 syntax peculiarities. > Effectively, > not all S3 locations comply with the URI RFC [1]. > > Polaris may be able to avoid parsing locations for credential vending, but > if > it is to do some "prefix" matching, I suspect it will have to deal with S3 > location syntax issues. > > This basically goes back to my first reply to this thread. I believe we > need > to clarify the meaning and interpretation of the location property before > giving into more specific concerns. > > [1] https://github.com/projectnessie/nessie/issues/8328 > > Cheers, > Dmitri. > > On Wed, May 7, 2025 at 7:32 PM Yufei Gu <flyrain...@gmail.com> wrote: > >> > >> > Another point: I'm pretty sure sooner or later users will want to move >> > their data to some other location. As an option users may want to write >> new >> > files into another location but keep old files in place. >> >> What's the concern here? This field is pretty much like the Iceberg table >> location, which points to all files under a generic table. It isn't >> related >> to how users relocate a table. >> >> Also: if the location is a URI, how do we deal with s3 vs. s3a for >> example? >> >> What options do we have other than URI? I think it's more an engine side >> concern. I'm OK if Polaris opinionated a certain schema like "s3". We >> could >> even make the conversion at Polaris client side even if the engines >> require other schemas. >> >> Yufei >> >> >> On Wed, May 7, 2025 at 3:54 PM Dmitri Bourlatchkov <di...@apache.org> >> wrote: >> >> > >> > >> > Also: if the location is a URI, how do we deal with s3 vs. s3a for >> example? >> > >> > In Iceberg it is quite common for different engines to use different >> access >> > tools, which often leads to different URI schemes. >> > >> > Cheers, >> > Dmitri. >> > >> > On Wed, May 7, 2025 at 6:46 PM Eric Maynard <eric.w.mayn...@gmail.com> >> > wrote: >> > >> > > All good questions Dmitri — I’m especially interested in the first >> one as >> > > from what I understand Iceberg tables can have metadata and data at >> two >> > > different paths that we need to vend credentials for. >> > > >> > > For iceberg tables, we just use special properties to track these >> > > locations. I wonder if we couldn’t do the same for generic tables. >> > > >> > > On Wed, May 7, 2025 at 3:42 PM Dmitri Bourlatchkov <di...@apache.org> >> > > wrote: >> > > >> > > > Hi Yun, >> > > > >> > > > Please clarify the meaning of the value of the new location >> attribute. >> > > > >> > > > - Is is one value or many? >> > > > - Is it a URI? >> > > > - Does it point to any particular file? >> > > > - Is it a common prefix of all files within a table? >> > > > - What happens when a value does not match these expectation? >> > > > >> > > > Thanks, >> > > > Dmitri. >> > > > >> > > > On 2025/05/07 21:50:19 yun zou wrote: >> > > > > Hi folks, >> > > > > >> > > > > I would like to propose to add an optional `location` field to >> > > > > CreateGenricTable Request and LoadGenericTable response. >> > > > > >> > > > > The `location` is the location for the table, which is common to >> most >> > > > table >> > > > > formats including Iceberg, Delta, Hudi, csv, parquet etc. The >> > location >> > > > > information is critical for loading the table at engine side, >> having >> > a >> > > > > dedicated keyword could help improve the robustness for cross >> engine >> > > > > sharing, instead of relying on the properties passed by the client >> > > side. >> > > > > >> > > > > Furthermore, this information is also required to provide >> credential >> > > > > vending capabilities later. >> > > > > >> > > > > Here is the PR for adding the spec: >> > > > > https://github.com/apache/polaris/pull/1543 >> > > > > >> > > > > Looking forward to your reply and feedback! >> > > > > >> > > > > Best Regards, >> > > > > Yun >> > > > > >> > > > >> > > >> > >> >