Hi All,

Want to summarize the thread here:

For generic tables, we will add a `location` key to help cross engine
sharing and future support for credential vending.

Here is a description about this `location` key and corresponding
restrictions and responsibilities:
- `location`(OPTIONAL): table root location in URI format. For example:
s3://<my-bucket>/path/to/table.
  - The table root location is a location that includes all files for the
table.
  - Clients (engines) are responsible to make sure all files are written
under the configured location.
  - A table with multiple root locations (i.e. containing files that are
outside the configured root location) is not compliant with the current
generic table support in Polaris.
  - No two tables can have the same or overlapped location, otherwise, a
ForbiddenException will be thrown on creation.
  - If no location is provided, clients or users are responsible to manage
the location and location related concerns such as path conflict check etc.
  - The location configuration can not be updated once the table is
created.

This description will be added into the spec. In order to help non-API
users to discover the information easily, we will also get a site page to
describe the support
for Generic Table and key fields.

Best Regards,
Yun

On Mon, May 19, 2025 at 11:16 PM yun zou <yunzou.colost...@gmail.com> wrote:

> Hi Dmitri,
>
> " I do not think those doc comments provide enough visibility to ensure
> that the key information
> is received by users, unless they are dealing directly with the API"
> -- Yeah, I agree those information may not be visible enough for users who
> don't directly work with APIs.
> However, I think just having one page for "location" might be a little bit
> overkill. Given that generic table API support is
> a new catalog capabilities that Polaris added which is not IRC, I think it
> might worth having a more general page to
> describe the Polaris Generic Table support and describe some of the
> critical fields like *location*.
> I think we should have the description in the spec also, so that things
> could be clear for API users.
>
> Please let me know what you think.
>
> Best Regards,
> Yun
>
> On Mon, May 19, 2025 at 4:22 PM Dmitri Bourlatchkov <di...@apache.org>
> wrote:
>
>> I believe the Open API spec and the definition of "location" are slightly
>> different concerns.
>>
>> The former is about the API used to obtain information about Generic
>> Tables.
>>
>> The latter is about the interpretation of that information. One can think
>> of the location
>> value being handled / transferred beyond the immediate Polaris client, in
>> which case
>> is loses its connection to the API, but does not lose its meaning as a
>> location of a
>> Generic Table.
>>
>> Also, I think that Open API doc comments are too low-level and too obscure
>> for
>> people who will work with processing actual Generic Table files. I do not
>> think
>> those doc comment provide enough visibility to ensure that the key
>> information
>> is received by users, unless they are dealing directly with the API.
>>
>> That said, if you prefer to keep the finer points about Generic Table
>> locations in the
>> Open API spec, I'd be fine with that.
>>
>> Cheers,
>> Dmitri.
>>
>> On Mon, May 19, 2025 at 6:46 PM yun zou <yunzou.colost...@gmail.com>
>> wrote:
>>
>> > Hi Dmitri,
>> >
>> > Thanks for the detailed explanation, I definitely agree we need to call
>> out
>> > those restrictions and compliance in our Spec.
>> >
>> > As for the documentation, Polaris today already publishes the API spec,
>> if
>> > you go to page https://polaris.apache.org/in-dev/unreleased/,
>> > and click on the Catalog API Spec, it will lead you to the published
>> Spec,
>> > which contains all description in the Spec.
>> > That basically means we have both published doc and spec code, and the
>> > single source of truth is the description in the doc.
>> > or do you think we should have an extra page for the Generic Table API
>> > spec?
>> >
>> > Best Regards,
>> > Yun
>> >
>> > On Mon, May 19, 2025 at 3:20 PM Yufei Gu <flyrain...@gmail.com> wrote:
>> >
>> > > >
>> > > > * Clients (engines) are responsible for writing files only under the
>> > > > specified location.
>> > >
>> > > It's nice to have a doc like that. But the open API spec is *the*
>> place
>> > to
>> > > define the behavior of client and server, and how they interact with
>> each
>> > > other. Just as we said before, spec change is recommended to have a ML
>> > > discussion.
>> > >
>> > > * A table, whose files exist outside the declared location, is not
>> > > > compliant with the Polaris' definition for a Generic Table.
>> > >
>> > > I'm not sure we should go that far. "location" is an optional field.
>> It's
>> > > just some features like credential vending that don't work if
>> "location"
>> > is
>> > > missing.
>> > >
>> > > Yufei
>> > >
>> > >
>> > > On Mon, May 19, 2025 at 2:59 PM Dmitri Bourlatchkov <di...@apache.org
>> >
>> > > wrote:
>> > >
>> > > > As I commented in my other recent email, I think by introducing a
>> > > > "location" property Polaris enters the realm of table format specs.
>> > > >
>> > > > This is fine, from my POV, however, since Polaris is the defining
>> > project
>> > > > behind that property, I believe Polaris should provide a more
>> > definitive
>> > > > description of the meaning and intended processing of that property.
>> > > >
>> > > > To repeat myself, I think the Open API spec defines only the API for
>> > > > obtaining the location. We need a place to define what this location
>> > > means.
>> > > > I do not insist on calling this a "spec" for Generic Tables, but I
>> > think
>> > > it
>> > > > deserves a separate page in Polaris docs, where it would be defined
>> > with
>> > > > more rigor.
>> > > >
>> > > > Specifically, I think we need to call out that:
>> > > > * The location is a base URI (essentially prefix) for all files in a
>> > > > generic table.
>> > > > * Clients (engines) are responsible for writing files only under the
>> > > > specified location.
>> > > > * A table, whose files exist outside the declared location, is not
>> > > > compliant with the Polaris' definition for a Generic Table.
>> > > >
>> > > > By extension, I think we ought to describe other existing properties
>> > too.
>> > > >
>> > > > WDYT?
>> > > >
>> > > > Thanks,
>> > > > Dmitri.
>> > > >
>> > > > On Mon, May 19, 2025 at 5:39 PM yun zou <yunzou.colost...@gmail.com
>> >
>> > > > wrote:
>> > > >
>> > > > > Hi Dmitri,
>> > > > >
>> > > > > I think for Iceberg, we all agreed that there can be multiple
>> > > locations,
>> > > > > and I definitely agree with Russel that the extension
>> > > > > should be done with the IRC endpoints. The Generic Table APIs are
>> > > > designed
>> > > > > for non-Iceberg table usage today, and
>> > > > > We still want Iceberg table usage to go through the IRC endpoint
>> to
>> > > have
>> > > > > full IRC support.
>> > > > >
>> > > > > As for the following point
>> > > > > "a more strict spec for that (define where file should and should
>> not
>> > > > go)"
>> > > > > Are you referring that Polaris need to generate a location for the
>> > > table
>> > > > to
>> > > > > use, if that is the case, I don't think engines
>> > > > > respects that today. The table locations are either generated by
>> the
>> > > > engine
>> > > > > or specified by the user.
>> > > > > Or are you referring that we should have something like Iceberg
>> that
>> > we
>> > > > > should have an allowed location and do a
>> > > > > validation to make sure the location is under the allowed
>> location?
>> > > Would
>> > > > > you mind elaborate more on this point?
>> > > > >
>> > > > > Best Regards,
>> > > > > Yun
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, May 19, 2025 at 1:45 PM Russell Spitzer <
>> > > > russell.spit...@gmail.com
>> > > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Yeah I think Iceberg and Hive are the only ones trying to make
>> life
>> > > > > > difficult, that I think
>> > > > > > we should also cover but in changes to the Iceberg Spec. Hive
>> can
>> > > just
>> > > > > stay
>> > > > > > how it is ...
>> > > > > >
>> > > > > > On Mon, May 19, 2025 at 2:59 PM Dmitri Bourlatchkov <
>> > > di...@apache.org>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > For context: my locations concerns are rooted in Nessie's
>> > > experience
>> > > > > > where
>> > > > > > > we often get problem reports related to files being outside
>> the
>> > > > > declared
>> > > > > > > Iceberg metadata location.
>> > > > > > >
>> > > > > > > Example:
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/projectnessie/nessie/issues/10817#issuecomment-2887329227
>> > > > > > >
>> > > > > > > I'm ok going with a single location for generic tables, but I
>> > think
>> > > > > > Polaris
>> > > > > > > needs to have a more strict spec for that (define where file
>> > should
>> > > > and
>> > > > > > > should not go) because polaris owns this spec. Polaris ought
>> to
>> > > > define
>> > > > > > what
>> > > > > > > complies with the spec and what does not. Having a proper
>> spec is
>> > > > > > essential
>> > > > > > > to ensure a mutual understanding of all parties dealing with
>> > > Generic
>> > > > > > > Tables.
>> > > > > > >
>> > > > > > > Open API yaml comments are not sufficient, IMHO. I'd prefer to
>> > > have a
>> > > > > > > dedicated doc page to define expectations and compliance.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Dmitri.
>> > > > > > >
>> > > > > > >
>> > > > > > > On Mon, May 19, 2025 at 2:17 PM Russell Spitzer <
>> > > > > > russell.spit...@gmail.com
>> > > > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > The only multiple locations table formats I'm currently
>> aware
>> > of
>> > > > are
>> > > > > > Hive
>> > > > > > > > (partitions can live wherever) and Iceberg.
>> > > > > > > >
>> > > > > > > >  I think for Delta, Hudi, LanceDB, Paimon and File based
>> tables
>> > > > they
>> > > > > > all
>> > > > > > > > have to live in the root location. I'm not sure of any other
>> > > "file"
>> > > > > > based
>> > > > > > > > tables where this would be an issue but I'd love to know if
>> > > someone
>> > > > > > else
>> > > > > > > > has ideas. I think with the rise in credential vending,
>> > splitting
>> > > > > > things
>> > > > > > > > amongst multiple prefixes is becoming less common. I don't
>> > oppose
>> > > > > doing
>> > > > > > > an
>> > > > > > > > array of locations but it may be enough to just leave this
>> as
>> > an
>> > > > > > > extension
>> > > > > > > > later. (Support location or locations)
>> > > > > > > >
>> > > > > > > > On Wed, May 7, 2025 at 8:52 PM yun zou <
>> > > yunzou.colost...@gmail.com
>> > > > >
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > > Hi Dmitri,
>> > > > > > > > >
>> > > > > > > > > If it's not "all" is it not strong enough for a spec,
>> IMHO.
>> > If
>> > > > some
>> > > > > > > > tables
>> > > > > > > > > have multiple base locations how is Polaris going to deal
>> > with
>> > > > > them?
>> > > > > > > > >
>> > > > > > > > > Sorry, when I say most of them, it was because I haven't
>> > tested
>> > > > all
>> > > > > > of
>> > > > > > > > them
>> > > > > > > > > (I only tested Delta and CSV before).
>> > > > > > > > > However, if Unity Catalog is only taking one location, I
>> > think
>> > > > that
>> > > > > > is
>> > > > > > > a
>> > > > > > > > > strong enough proof that
>> > > > > > > > > one location is enough today.
>> > > > > > > > >
>> > > > > > > > > It is also more natural to start with one location, and if
>> > > there
>> > > > > are
>> > > > > > > use
>> > > > > > > > > cases that
>> > > > > > > > > require support for multiple locations later, we can move
>> on
>> > to
>> > > > V2
>> > > > > > spec
>> > > > > > > > to
>> > > > > > > > > support multiple
>> > > > > > > > > tables locations.
>> > > > > > > > >
>> > > > > > > > > We're making a specification for Polaris. I do not think
>> it
>> > is
>> > > > > > > sufficient
>> > > > > > > > > to say we'll do the same as other (unspecified ATM)
>> catalogs.
>> > > > > > > > > If we want to migrate users from other Catalog services to
>> > > > Polaris
>> > > > > > > > (through
>> > > > > > > > > federation), then Polaris will need to
>> > > > > > > > > provide corresponding capabilities.  For example, Unity
>> > Catalog
>> > > > > > storage
>> > > > > > > > > location is a URI representation, when entity
>> > > > > > > > > are federated from Unity Catalog, we will need to be able
>> to
>> > > > handle
>> > > > > > the
>> > > > > > > > URI
>> > > > > > > > > location.
>> > > > > > > > > If URI representation is a common standard that has been
>> > > accepted
>> > > > > by
>> > > > > > > > other
>> > > > > > > > > Catalog services like Unity Catalog, Gravitino,
>> > > > > > > > > Polaris should be compatible with that, otherwise it might
>> > > cause
>> > > > > > > problem
>> > > > > > > > > for users when they are migrating from one to
>> > > > > > > > > another.
>> > > > > > > > >
>> > > > > > > > > What will Polaris Server do with this location?
>> > > > > > > > > For generic tables, Polaris will provide credential
>> vending
>> > for
>> > > > > this
>> > > > > > > > > location in near future, I don't see we will provide
>> > > > > > > > > anything else in short or mid term, since we still want to
>> > > > promote
>> > > > > > > > > native support for Iceberg.
>> > > > > > > > > Or if you have anything special in your mind that you
>> think
>> > we
>> > > > > should
>> > > > > > > > > support?
>> > > > > > > > >
>> > > > > > > > > If Polaris has to define it in a spec, it will be hard to
>> > > change
>> > > > in
>> > > > > > the
>> > > > > > > > > future.
>> > > > > > > > > Regardless of whether it is explicitly in the spec
>> definition
>> > > or
>> > > > > as a
>> > > > > > > > > reserved property key, as long as they are explicitly
>> > > > > > > > > documented, they will be hard to change in the future.
>> From
>> > > that
>> > > > > > > > > perspective, those two approaches seem the same to me.
>> > > > > > > > >
>> > > > > > > > > Table location is critical information that is required by
>> > the
>> > > > > engine
>> > > > > > > > side
>> > > > > > > > > to read and write the tables, which should
>> > > > > > > > > be explicitly defined to provide better sharing across
>> > engines.
>> > > > For
>> > > > > > > > > example, the delta table location is passed in the
>> > > > > > > > > table properties with a property key either "location" or
>> > > "path"
>> > > > > > > depends
>> > > > > > > > on
>> > > > > > > > > how the table is created. Now, if another
>> > > > > > > > > engine wants to read the delta table, it will need to
>> > > understand
>> > > > > > those
>> > > > > > > > > keys, which are controlled by Spark today. If Spark
>> > > > > > > > > changes them one day, all sharing will stop working.
>> > > > > > > > >
>> > > > > > > > > As to whether we want to put it as an explicit field or a
>> > > > reserved
>> > > > > > > key, I
>> > > > > > > > > think for a common field among various
>> > > > > > > > > table formats, it makes more sense to have it as an
>> explicit
>> > > > field.
>> > > > > > For
>> > > > > > > > > properties that are specific to a particular table format,
>> > > > > > > > > it is more proper to just have a reserved key.
>> > > > > > > > >
>> > > > > > > > > If Polaris takes control of the location, I think we have
>> to
>> > be
>> > > > > more
>> > > > > > > > > careful
>> > > > > > > > > and at least try to make it future-proof.
>> > > > > > > > >
>> > > > > > > > > I don't think Polaris is taking control of the location,
>> the
>> > > > > location
>> > > > > > > is
>> > > > > > > > > still controlled by the engine and users today like table
>> > > names.
>> > > > > > > > > Polaris is a Catalog service, it records the generic table
>> > > > entity,
>> > > > > > and
>> > > > > > > > > returns the information back to the user on query.
>> > > > > > > > > It might be able to do some validation on the location
>> (like
>> > > > check
>> > > > > > > > special
>> > > > > > > > > character), but it doesn't decide which location
>> > > > > > > > > the table will be used. I personally don't think it is a
>> bad
>> > > idea
>> > > > > to
>> > > > > > > let
>> > > > > > > > > the Catalog service also take control of generating
>> > > > > > > > > the table location, but I think that will require a lot of
>> > > work.
>> > > > > > > > >
>> > > > > > > > > Best Regards,
>> > > > > > > > > Yun
>> > > > > > > > >
>> > > > > > > > > On Wed, May 7, 2025 at 5:22 PM Dmitri Bourlatchkov <
>> > > > > di...@apache.org
>> > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > No worries about the name. It is a possible alternative
>> > > > spelling
>> > > > > :)
>> > > > > > > > > >
>> > > > > > > > > > On Wed, May 7, 2025 at 8:04 PM yun zou <
>> > > > > yunzou.colost...@gmail.com
>> > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > > > Hi Dmitri,
>> > > > > > > > > > >
>> > > > > > > > > > > Sorry, I accidentally typed your name wrong in the
>> > previous
>> > > > > > reply!
>> > > > > > > > > > > Apologize for this!
>> > > > > > > > > > >
>> > > > > > > > > > > For the S3 issue, I think we will need to deal with
>> those
>> > > > > > > regardless,
>> > > > > > > > > > > especially with the federation work going on, we will
>> > need
>> > > to
>> > > > > > > handle
>> > > > > > > > > all
>> > > > > > > > > > > those entities eventually coming from different
>> Catalogs,
>> > > and
>> > > > > the
>> > > > > > > URI
>> > > > > > > > > > > format seems the standard format used by various
>> Catalog
>> > > > > > services.
>> > > > > > > > > > >
>> > > > > > > > > > > Best Regards,
>> > > > > > > > > > > Yun
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, May 7, 2025 at 4:55 PM yun zou <
>> > > > > > yunzou.colost...@gmail.com
>> > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > > Hi Dimitri and Eric,
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks a lot for the feedback!
>> > > > > > > > > > > >
>> > > > > > > > > > > > For the questions:
>> > > > > > > > > > > > - Is one value or many?
>> > > > > > > > > > > > It will be one value, similar to the location in
>> > Iceberg
>> > > > and
>> > > > > > the
>> > > > > > > > > > > > storage_location in unity catalog.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Regarding to the point about having new data in new
>> > > > locations
>> > > > > > and
>> > > > > > > > > > keeping
>> > > > > > > > > > > > old data in old locations, do we support that for
>> > Iceberg
>> > > > > > > > > > > > today?
>> > > > > > > > > > > > For most of the Spark tables, it seems to only have
>> one
>> > > > > > location.
>> > > > > > > > > > Also, I
>> > > > > > > > > > > > think it is better to start restricted first, and
>> then
>> > > > extend
>> > > > > > it
>> > > > > > > to
>> > > > > > > > > > > > allow multiple locations when the use case raises.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Ref:
>> > > > > > > > > > > > Iceberg location:
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L3451
>> > > > > > > > > > > > Storage location in Unity Catalog:
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L3451
>> > > > > > > > > > > >
>> > > > > > > > > > > > - Is it a URI?
>> > > > > > > > > > > > Yes, it will be a URI, which seems the standard
>> catalog
>> > > > > > > > > implementation.
>> > > > > > > > > > > > Regarding to the point about s3 v2 s3a, i assume
>> that
>> > is
>> > > a
>> > > > > > common
>> > > > > > > > > > > > problem that every catalog implementation needs to
>> > > address,
>> > > > > and
>> > > > > > > we
>> > > > > > > > > will
>> > > > > > > > > > > > stay the same on this part. At least from the load
>> > table
>> > > > > point
>> > > > > > of
>> > > > > > > > > view,
>> > > > > > > > > > > > Spark engine knows how to deal with such cases.
>> > > > > > > > > > > >
>> > > > > > > > > > > > - Does it point to any particular file?
>> > > > > > > > > > > > No, it doesn't point to a particular file. It is the
>> > base
>> > > > > table
>> > > > > > > > > > location.
>> > > > > > > > > > > >
>> > > > > > > > > > > > - Is it a common prefix of all files within a table?
>> > > > > > > > > > > > It is supposed to be the base table location, which
>> > > > > > theoretically
>> > > > > > > > > > should
>> > > > > > > > > > > > be the common prefix of all files within a table I
>> > > believe.
>> > > > > > > > > > > >
>> > > > > > > > > > > > - What happens when a value does not match these
>> > > > > expectations?
>> > > > > > > > > > > > Whether it is one value or many is restricted by the
>> > spec
>> > > > > > > already.
>> > > > > > > > > > > > For URI format, I think we can do a format check,
>> and
>> > > fail
>> > > > > it.
>> > > > > > > > > > > > Other than that, we will not do any other special
>> > check,
>> > > > and
>> > > > > we
>> > > > > > > > rely
>> > > > > > > > > on
>> > > > > > > > > > > > the client to put the correct value, otherwise, the
>> > other
>> > > > > > engine
>> > > > > > > > will
>> > > > > > > > > > > > not be able to successfully read the table.
>> > > > > > > > > > > >
>> > > > > > > > > > > > For the location keyword, as Eric has pointed out,
>> we
>> > can
>> > > > > > > > potentially
>> > > > > > > > > > > have
>> > > > > > > > > > > > a reserved key for the properties. However, location
>> > is a
>> > > > > > common
>> > > > > > > > > > > > enough key among various table formats, which
>> worths a
>> > > > > > dedicated
>> > > > > > > > key
>> > > > > > > > > to
>> > > > > > > > > > > > help store and load the information in a more
>> > > > straightforward
>> > > > > > > > > > > > way.  For things that are specific to one or two
>> > > formats, I
>> > > > > > think
>> > > > > > > > it
>> > > > > > > > > > > makes
>> > > > > > > > > > > > more sense to use a reserved property key.
>> > > > > > > > > > > >
>> > > > > > > > > > > > As a reference, in Iceberg, the CreateTable request
>> and
>> > > > > > > > TableMetadata
>> > > > > > > > > > > does
>> > > > > > > > > > > > have an explicit location key in the spec. For
>> > > > > write.data.path
>> > > > > > > > > > > > and write.metadata.path, they are passed as
>> properties
>> > > > today.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Best Regards,
>> > > > > > > > > > > > Yun
>> > > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Wed, May 7, 2025 at 3:54 PM Dmitri Bourlatchkov <
>> > > > > > > > di...@apache.org
>> > > > > > > > > >
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > >> Another point: I'm pretty sure sooner or later
>> users
>> > > will
>> > > > > want
>> > > > > > > to
>> > > > > > > > > move
>> > > > > > > > > > > >> their data to some other location. As an option
>> users
>> > > may
>> > > > > want
>> > > > > > > to
>> > > > > > > > > > write
>> > > > > > > > > > > >> new
>> > > > > > > > > > > >> files into another location but keep old files in
>> > place.
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> Also: if the location is a URI, how do we deal
>> with s3
>> > > vs.
>> > > > > s3a
>> > > > > > > for
>> > > > > > > > > > > >> example?
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> In Iceberg it is quite common for different
>> engines to
>> > > use
>> > > > > > > > different
>> > > > > > > > > > > >> access
>> > > > > > > > > > > >> tools, which often leads to different URI schemes.
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> Cheers,
>> > > > > > > > > > > >> Dmitri.
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> On Wed, May 7, 2025 at 6:46 PM Eric Maynard <
>> > > > > > > > > eric.w.mayn...@gmail.com
>> > > > > > > > > > >
>> > > > > > > > > > > >> wrote:
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> > All good questions Dmitri — I’m especially
>> > interested
>> > > in
>> > > > > the
>> > > > > > > > first
>> > > > > > > > > > one
>> > > > > > > > > > > >> as
>> > > > > > > > > > > >> > from what I understand Iceberg tables can have
>> > > metadata
>> > > > > and
>> > > > > > > data
>> > > > > > > > > at
>> > > > > > > > > > > two
>> > > > > > > > > > > >> > different paths that we need to vend credentials
>> > for.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > For iceberg tables, we just use special
>> properties
>> > to
>> > > > > track
>> > > > > > > > these
>> > > > > > > > > > > >> > locations. I wonder if we couldn’t do the same
>> for
>> > > > generic
>> > > > > > > > tables.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > On Wed, May 7, 2025 at 3:42 PM Dmitri
>> Bourlatchkov <
>> > > > > > > > > > di...@apache.org>
>> > > > > > > > > > > >> > wrote:
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > > Hi Yun,
>> > > > > > > > > > > >> > >
>> > > > > > > > > > > >> > > Please clarify the meaning of the value of the
>> new
>> > > > > > location
>> > > > > > > > > > > attribute.
>> > > > > > > > > > > >> > >
>> > > > > > > > > > > >> > > - Is is one value or many?
>> > > > > > > > > > > >> > > - Is it a URI?
>> > > > > > > > > > > >> > > - Does it point to any particular file?
>> > > > > > > > > > > >> > > - Is it a common prefix of all files within a
>> > table?
>> > > > > > > > > > > >> > > - What happens when a value does not match
>> these
>> > > > > > > expectation?
>> > > > > > > > > > > >> > >
>> > > > > > > > > > > >> > > Thanks,
>> > > > > > > > > > > >> > > Dmitri.
>> > > > > > > > > > > >> > >
>> > > > > > > > > > > >> > > On 2025/05/07 21:50:19 yun zou wrote:
>> > > > > > > > > > > >> > > > Hi folks,
>> > > > > > > > > > > >> > > >
>> > > > > > > > > > > >> > > > I would like to propose to add an optional
>> > > > `location`
>> > > > > > > field
>> > > > > > > > to
>> > > > > > > > > > > >> > > > CreateGenricTable Request and
>> LoadGenericTable
>> > > > > response.
>> > > > > > > > > > > >> > > >
>> > > > > > > > > > > >> > > > The `location` is the location for the table,
>> > > which
>> > > > is
>> > > > > > > > common
>> > > > > > > > > to
>> > > > > > > > > > > >> most
>> > > > > > > > > > > >> > > table
>> > > > > > > > > > > >> > > > formats including Iceberg, Delta, Hudi, csv,
>> > > parquet
>> > > > > > etc.
>> > > > > > > > The
>> > > > > > > > > > > >> location
>> > > > > > > > > > > >> > > > information is critical for loading the
>> table at
>> > > > > engine
>> > > > > > > > side,
>> > > > > > > > > > > >> having a
>> > > > > > > > > > > >> > > > dedicated keyword could help improve the
>> > > robustness
>> > > > > for
>> > > > > > > > cross
>> > > > > > > > > > > engine
>> > > > > > > > > > > >> > > > sharing, instead of relying on the properties
>> > > passed
>> > > > > by
>> > > > > > > the
>> > > > > > > > > > client
>> > > > > > > > > > > >> > side.
>> > > > > > > > > > > >> > > >
>> > > > > > > > > > > >> > > > Furthermore, this information is also
>> required
>> > to
>> > > > > > provide
>> > > > > > > > > > > credential
>> > > > > > > > > > > >> > > > vending capabilities later.
>> > > > > > > > > > > >> > > >
>> > > > > > > > > > > >> > > > Here is the PR for adding the spec:
>> > > > > > > > > > > >> > > > https://github.com/apache/polaris/pull/1543
>> > > > > > > > > > > >> > > >
>> > > > > > > > > > > >> > > > Looking forward to your reply and feedback!
>> > > > > > > > > > > >> > > >
>> > > > > > > > > > > >> > > > Best Regards,
>> > > > > > > > > > > >> > > > Yun
>> > > > > > > > > > > >> > > >
>> > > > > > > > > > > >> > >
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >>
>> > > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to