Re: funky S3 URIs: https://github.com/apache/polaris/issues/1545

On Wed, May 7, 2025 at 7:48 PM Dmitri Bourlatchkov <di...@apache.org> wrote:

> What options do we have other than URI? I think it's more an engine side
> concern.
>
>
> If (as mentioned in previous emails) we use this location as an input into
> generating vended credentials, then Polaris must be able to interpret it.
>
> Therefore, it is not only an engine side concern.
>
> What's the concern here?
>
>
> Interpreting locations means dealing with S3 syntax peculiarities.
> Effectively,
> not all S3 locations comply with the URI RFC [1].
>
> Polaris may be able to avoid parsing locations for credential vending, but
> if
> it is to do some "prefix" matching, I suspect it will have to deal with S3
> location syntax issues.
>
> This basically goes back to my first reply to this thread. I believe we
> need
> to clarify the meaning and interpretation of the location property before
> giving into more specific concerns.
>
> [1] https://github.com/projectnessie/nessie/issues/8328
>
> Cheers,
> Dmitri.
>
> On Wed, May 7, 2025 at 7:32 PM Yufei Gu <flyrain...@gmail.com> wrote:
>
>> >
>> > Another point: I'm pretty sure sooner or later users will want to move
>> > their data to some other location. As an option users may want to write
>> new
>> > files into another location but keep old files in place.
>>
>> What's the concern here? This field is pretty much like the Iceberg table
>> location, which points to all files under a generic table. It isn't
>> related
>> to how users relocate a table.
>>
>> Also: if the location is a URI, how do we deal with s3 vs. s3a for
>> example?
>>
>>  What options do we have other than URI? I think it's more an engine side
>> concern. I'm OK if Polaris opinionated a certain schema like "s3". We
>> could
>> even make the conversion at Polaris client side even if the engines
>> require other schemas.
>>
>> Yufei
>>
>>
>> On Wed, May 7, 2025 at 3:54 PM Dmitri Bourlatchkov <di...@apache.org>
>> wrote:
>>
>> >
>> >
>> > Also: if the location is a URI, how do we deal with s3 vs. s3a for
>> example?
>> >
>> > In Iceberg it is quite common for different engines to use different
>> access
>> > tools, which often leads to different URI schemes.
>> >
>> > Cheers,
>> > Dmitri.
>> >
>> > On Wed, May 7, 2025 at 6:46 PM Eric Maynard <eric.w.mayn...@gmail.com>
>> > wrote:
>> >
>> > > All good questions Dmitri — I’m especially interested in the first
>> one as
>> > > from what I understand Iceberg tables can have metadata and data at
>> two
>> > > different paths that we need to vend credentials for.
>> > >
>> > > For iceberg tables, we just use special properties to track these
>> > > locations. I wonder if we couldn’t do the same for generic tables.
>> > >
>> > > On Wed, May 7, 2025 at 3:42 PM Dmitri Bourlatchkov <di...@apache.org>
>> > > wrote:
>> > >
>> > > > Hi Yun,
>> > > >
>> > > > Please clarify the meaning of the value of the new location
>> attribute.
>> > > >
>> > > > - Is is one value or many?
>> > > > - Is it a URI?
>> > > > - Does it point to any particular file?
>> > > > - Is it a common prefix of all files within a table?
>> > > > - What happens when a value does not match these expectation?
>> > > >
>> > > > Thanks,
>> > > > Dmitri.
>> > > >
>> > > > On 2025/05/07 21:50:19 yun zou wrote:
>> > > > > Hi folks,
>> > > > >
>> > > > > I would like to propose to add an optional `location` field to
>> > > > > CreateGenricTable Request and LoadGenericTable response.
>> > > > >
>> > > > > The `location` is the location for the table, which is common to
>> most
>> > > > table
>> > > > > formats including Iceberg, Delta, Hudi, csv, parquet etc. The
>> > location
>> > > > > information is critical for loading the table at engine side,
>> having
>> > a
>> > > > > dedicated keyword could help improve the robustness for cross
>> engine
>> > > > > sharing, instead of relying on the properties passed by the client
>> > > side.
>> > > > >
>> > > > > Furthermore, this information is also required to provide
>> credential
>> > > > > vending capabilities later.
>> > > > >
>> > > > > Here is the PR for adding the spec:
>> > > > > https://github.com/apache/polaris/pull/1543
>> > > > >
>> > > > > Looking forward to your reply and feedback!
>> > > > >
>> > > > > Best Regards,
>> > > > > Yun
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to