Hi Eric,

"No two tables globally can have a location overlap?"
-- let me correct that, i meant no two tables under the same catalog can
have the same location, which I think should be the right thing to do.
Otherwise, we might have a vended credential for a table that can be used
to access multiple tables under the same catalog.

"restriction that you can’t change location is stricter than what we have
for Iceberg".
This is a current limitation of our Generic table support since we do not
have any update support. This restriction can be removed with proper
support later.

As for multiple locations, since generic tables are designed for
non-Iceberg tables today, supporting multiple locations seems
unnecessary at
this moment. As Yufei mentioned, I think we should start simple and evolve
with use cases.
The multi-location support in Polaris seems not very well also, the overlap
check and credential vending seems all done with one location. Once
we have good support for Iceberg, I think it would be very easy for us to
generalize it to other table formats if necessary.

Best Regards,
Yun


On Thu, May 22, 2025 at 9:30 AM Yufei Gu <flyrain...@gmail.com> wrote:

> Inlined.
>
> On Thu, May 22, 2025 at 7:48 AM Dmitri Bourlatchkov <di...@apache.org>
> wrote:
>
> > > Can we keep it simple for v1 [...]
> >
> > What is v1 in this context?
> >
> I meant as the first iteration, sorry for the confusion.
>
> >
> > Thanks,
> > Dmitri.
> >
> > On Wed, May 21, 2025 at 8:42 PM Yufei Gu <flyrain...@gmail.com> wrote:
> >
> > > Can we keep it simple for v1, as one location field is enough for
> today’s
> > > use cases? And we can revisit multi-location support when there’s real
> > > demand.
> > >
> > > The current API spec already implies that a table’s location is
> > immutable,
> > > there’s no “alter location” call. I’m fine leaving it implicit, but we
> > > could add an explicit note to make that clear if it helps avoid
> > confusion.
> > >
> > > Yufei
> > >
> > >
> > > On Wed, May 21, 2025 at 4:36 PM Eric Maynard <eric.w.mayn...@gmail.com
> >
> > > wrote:
> > >
> > > > No two tables globally can have a location overlap? That’s a stricter
> > > > requirement than we have for even Iceberg tables and doesn’t sound
> > > correct.
> > > >
> > > > Similarly, the restriction that you can’t change location is stricter
> > > than
> > > > what we have for Iceberg.
> > > >
> > > > Finally, I’m still not sure what the problem is with having multiple
> > > > locations. Again, we already track multiple locations for Iceberg.
> > > >
> > > > On Thu, May 22, 2025 at 12:32 AM yun zou <yunzou.colost...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > Want to summarize the thread here:
> > > > >
> > > > > For generic tables, we will add a `location` key to help cross
> engine
> > > > > sharing and future support for credential vending.
> > > > >
> > > > > Here is a description about this `location` key and corresponding
> > > > > restrictions and responsibilities:
> > > > > - `location`(OPTIONAL): table root location in URI format. For
> > example:
> > > > > s3://<my-bucket>/path/to/table.
> > > > >   - The table root location is a location that includes all files
> for
> > > the
> > > > > table.
> > > > >   - Clients (engines) are responsible to make sure all files are
> > > written
> > > > > under the configured location.
> > > > >   - A table with multiple root locations (i.e. containing files
> that
> > > are
> > > > > outside the configured root location) is not compliant with the
> > current
> > > > > generic table support in Polaris.
> > > > >   - No two tables can have the same or overlapped location,
> > otherwise,
> > > a
> > > > > ForbiddenException will be thrown on creation.
> > > > >   - If no location is provided, clients or users are responsible to
> > > > manage
> > > > > the location and location related concerns such as path conflict
> > check
> > > > etc.
> > > > >   - The location configuration can not be updated once the table is
> > > > > created.
> > > > >
> > > > > This description will be added into the spec. In order to help
> > non-API
> > > > > users to discover the information easily, we will also get a site
> > page
> > > to
> > > > > describe the support
> > > > > for Generic Table and key fields.
> > > > >
> > > > > Best Regards,
> > > > > Yun
> > > > >
> > > > > On Mon, May 19, 2025 at 11:16 PM yun zou <
> yunzou.colost...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi Dmitri,
> > > > > >
> > > > > > " I do not think those doc comments provide enough visibility to
> > > ensure
> > > > > > that the key information
> > > > > > is received by users, unless they are dealing directly with the
> > API"
> > > > > > -- Yeah, I agree those information may not be visible enough for
> > > users
> > > > > who
> > > > > > don't directly work with APIs.
> > > > > > However, I think just having one page for "location" might be a
> > > little
> > > > > bit
> > > > > > overkill. Given that generic table API support is
> > > > > > a new catalog capabilities that Polaris added which is not IRC, I
> > > think
> > > > > it
> > > > > > might worth having a more general page to
> > > > > > describe the Polaris Generic Table support and describe some of
> the
> > > > > > critical fields like *location*.
> > > > > > I think we should have the description in the spec also, so that
> > > things
> > > > > > could be clear for API users.
> > > > > >
> > > > > > Please let me know what you think.
> > > > > >
> > > > > > Best Regards,
> > > > > > Yun
> > > > > >
> > > > > > On Mon, May 19, 2025 at 4:22 PM Dmitri Bourlatchkov <
> > > di...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> I believe the Open API spec and the definition of "location" are
> > > > > slightly
> > > > > >> different concerns.
> > > > > >>
> > > > > >> The former is about the API used to obtain information about
> > Generic
> > > > > >> Tables.
> > > > > >>
> > > > > >> The latter is about the interpretation of that information. One
> > can
> > > > > think
> > > > > >> of the location
> > > > > >> value being handled / transferred beyond the immediate Polaris
> > > client,
> > > > > in
> > > > > >> which case
> > > > > >> is loses its connection to the API, but does not lose its
> meaning
> > > as a
> > > > > >> location of a
> > > > > >> Generic Table.
> > > > > >>
> > > > > >> Also, I think that Open API doc comments are too low-level and
> too
> > > > > obscure
> > > > > >> for
> > > > > >> people who will work with processing actual Generic Table
> files. I
> > > do
> > > > > not
> > > > > >> think
> > > > > >> those doc comment provide enough visibility to ensure that the
> key
> > > > > >> information
> > > > > >> is received by users, unless they are dealing directly with the
> > API.
> > > > > >>
> > > > > >> That said, if you prefer to keep the finer points about Generic
> > > Table
> > > > > >> locations in the
> > > > > >> Open API spec, I'd be fine with that.
> > > > > >>
> > > > > >> Cheers,
> > > > > >> Dmitri.
> > > > > >>
> > > > > >> On Mon, May 19, 2025 at 6:46 PM yun zou <
> > yunzou.colost...@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi Dmitri,
> > > > > >> >
> > > > > >> > Thanks for the detailed explanation, I definitely agree we
> need
> > to
> > > > > call
> > > > > >> out
> > > > > >> > those restrictions and compliance in our Spec.
> > > > > >> >
> > > > > >> > As for the documentation, Polaris today already publishes the
> > API
> > > > > spec,
> > > > > >> if
> > > > > >> > you go to page https://polaris.apache.org/in-dev/unreleased/,
> > > > > >> > and click on the Catalog API Spec, it will lead you to the
> > > published
> > > > > >> Spec,
> > > > > >> > which contains all description in the Spec.
> > > > > >> > That basically means we have both published doc and spec code,
> > and
> > > > the
> > > > > >> > single source of truth is the description in the doc.
> > > > > >> > or do you think we should have an extra page for the Generic
> > Table
> > > > API
> > > > > >> > spec?
> > > > > >> >
> > > > > >> > Best Regards,
> > > > > >> > Yun
> > > > > >> >
> > > > > >> > On Mon, May 19, 2025 at 3:20 PM Yufei Gu <
> flyrain...@gmail.com>
> > > > > wrote:
> > > > > >> >
> > > > > >> > > >
> > > > > >> > > > * Clients (engines) are responsible for writing files only
> > > under
> > > > > the
> > > > > >> > > > specified location.
> > > > > >> > >
> > > > > >> > > It's nice to have a doc like that. But the open API spec is
> > > *the*
> > > > > >> place
> > > > > >> > to
> > > > > >> > > define the behavior of client and server, and how they
> > interact
> > > > with
> > > > > >> each
> > > > > >> > > other. Just as we said before, spec change is recommended to
> > > have
> > > > a
> > > > > ML
> > > > > >> > > discussion.
> > > > > >> > >
> > > > > >> > > * A table, whose files exist outside the declared location,
> is
> > > not
> > > > > >> > > > compliant with the Polaris' definition for a Generic
> Table.
> > > > > >> > >
> > > > > >> > > I'm not sure we should go that far. "location" is an
> optional
> > > > field.
> > > > > >> It's
> > > > > >> > > just some features like credential vending that don't work
> if
> > > > > >> "location"
> > > > > >> > is
> > > > > >> > > missing.
> > > > > >> > >
> > > > > >> > > Yufei
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Mon, May 19, 2025 at 2:59 PM Dmitri Bourlatchkov <
> > > > > di...@apache.org
> > > > > >> >
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > As I commented in my other recent email, I think by
> > > introducing
> > > > a
> > > > > >> > > > "location" property Polaris enters the realm of table
> format
> > > > > specs.
> > > > > >> > > >
> > > > > >> > > > This is fine, from my POV, however, since Polaris is the
> > > > defining
> > > > > >> > project
> > > > > >> > > > behind that property, I believe Polaris should provide a
> > more
> > > > > >> > definitive
> > > > > >> > > > description of the meaning and intended processing of that
> > > > > property.
> > > > > >> > > >
> > > > > >> > > > To repeat myself, I think the Open API spec defines only
> the
> > > API
> > > > > for
> > > > > >> > > > obtaining the location. We need a place to define what
> this
> > > > > location
> > > > > >> > > means.
> > > > > >> > > > I do not insist on calling this a "spec" for Generic
> Tables,
> > > > but I
> > > > > >> > think
> > > > > >> > > it
> > > > > >> > > > deserves a separate page in Polaris docs, where it would
> be
> > > > > defined
> > > > > >> > with
> > > > > >> > > > more rigor.
> > > > > >> > > >
> > > > > >> > > > Specifically, I think we need to call out that:
> > > > > >> > > > * The location is a base URI (essentially prefix) for all
> > > files
> > > > > in a
> > > > > >> > > > generic table.
> > > > > >> > > > * Clients (engines) are responsible for writing files only
> > > under
> > > > > the
> > > > > >> > > > specified location.
> > > > > >> > > > * A table, whose files exist outside the declared
> location,
> > is
> > > > not
> > > > > >> > > > compliant with the Polaris' definition for a Generic
> Table.
> > > > > >> > > >
> > > > > >> > > > By extension, I think we ought to describe other existing
> > > > > properties
> > > > > >> > too.
> > > > > >> > > >
> > > > > >> > > > WDYT?
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > > Dmitri.
> > > > > >> > > >
> > > > > >> > > > On Mon, May 19, 2025 at 5:39 PM yun zou <
> > > > > yunzou.colost...@gmail.com
> > > > > >> >
> > > > > >> > > > wrote:
> > > > > >> > > >
> > > > > >> > > > > Hi Dmitri,
> > > > > >> > > > >
> > > > > >> > > > > I think for Iceberg, we all agreed that there can be
> > > multiple
> > > > > >> > > locations,
> > > > > >> > > > > and I definitely agree with Russel that the extension
> > > > > >> > > > > should be done with the IRC endpoints. The Generic Table
> > > APIs
> > > > > are
> > > > > >> > > > designed
> > > > > >> > > > > for non-Iceberg table usage today, and
> > > > > >> > > > > We still want Iceberg table usage to go through the IRC
> > > > endpoint
> > > > > >> to
> > > > > >> > > have
> > > > > >> > > > > full IRC support.
> > > > > >> > > > >
> > > > > >> > > > > As for the following point
> > > > > >> > > > > "a more strict spec for that (define where file should
> and
> > > > > should
> > > > > >> not
> > > > > >> > > > go)"
> > > > > >> > > > > Are you referring that Polaris need to generate a
> location
> > > for
> > > > > the
> > > > > >> > > table
> > > > > >> > > > to
> > > > > >> > > > > use, if that is the case, I don't think engines
> > > > > >> > > > > respects that today. The table locations are either
> > > generated
> > > > by
> > > > > >> the
> > > > > >> > > > engine
> > > > > >> > > > > or specified by the user.
> > > > > >> > > > > Or are you referring that we should have something like
> > > > Iceberg
> > > > > >> that
> > > > > >> > we
> > > > > >> > > > > should have an allowed location and do a
> > > > > >> > > > > validation to make sure the location is under the
> allowed
> > > > > >> location?
> > > > > >> > > Would
> > > > > >> > > > > you mind elaborate more on this point?
> > > > > >> > > > >
> > > > > >> > > > > Best Regards,
> > > > > >> > > > > Yun
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > >
> > > > > >> > > > > On Mon, May 19, 2025 at 1:45 PM Russell Spitzer <
> > > > > >> > > > russell.spit...@gmail.com
> > > > > >> > > > > >
> > > > > >> > > > > wrote:
> > > > > >> > > > >
> > > > > >> > > > > > Yeah I think Iceberg and Hive are the only ones trying
> > to
> > > > make
> > > > > >> life
> > > > > >> > > > > > difficult, that I think
> > > > > >> > > > > > we should also cover but in changes to the Iceberg
> Spec.
> > > > Hive
> > > > > >> can
> > > > > >> > > just
> > > > > >> > > > > stay
> > > > > >> > > > > > how it is ...
> > > > > >> > > > > >
> > > > > >> > > > > > On Mon, May 19, 2025 at 2:59 PM Dmitri Bourlatchkov <
> > > > > >> > > di...@apache.org>
> > > > > >> > > > > > wrote:
> > > > > >> > > > > >
> > > > > >> > > > > > > For context: my locations concerns are rooted in
> > > Nessie's
> > > > > >> > > experience
> > > > > >> > > > > > where
> > > > > >> > > > > > > we often get problem reports related to files being
> > > > outside
> > > > > >> the
> > > > > >> > > > > declared
> > > > > >> > > > > > > Iceberg metadata location.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Example:
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/projectnessie/nessie/issues/10817#issuecomment-2887329227
> > > > > >> > > > > > >
> > > > > >> > > > > > > I'm ok going with a single location for generic
> > tables,
> > > > but
> > > > > I
> > > > > >> > think
> > > > > >> > > > > > Polaris
> > > > > >> > > > > > > needs to have a more strict spec for that (define
> > where
> > > > file
> > > > > >> > should
> > > > > >> > > > and
> > > > > >> > > > > > > should not go) because polaris owns this spec.
> Polaris
> > > > ought
> > > > > >> to
> > > > > >> > > > define
> > > > > >> > > > > > what
> > > > > >> > > > > > > complies with the spec and what does not. Having a
> > > proper
> > > > > >> spec is
> > > > > >> > > > > > essential
> > > > > >> > > > > > > to ensure a mutual understanding of all parties
> > dealing
> > > > with
> > > > > >> > > Generic
> > > > > >> > > > > > > Tables.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Open API yaml comments are not sufficient, IMHO. I'd
> > > > prefer
> > > > > to
> > > > > >> > > have a
> > > > > >> > > > > > > dedicated doc page to define expectations and
> > > compliance.
> > > > > >> > > > > > >
> > > > > >> > > > > > > Thanks,
> > > > > >> > > > > > > Dmitri.
> > > > > >> > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > > > On Mon, May 19, 2025 at 2:17 PM Russell Spitzer <
> > > > > >> > > > > > russell.spit...@gmail.com
> > > > > >> > > > > > > >
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > >
> > > > > >> > > > > > > > The only multiple locations table formats I'm
> > > currently
> > > > > >> aware
> > > > > >> > of
> > > > > >> > > > are
> > > > > >> > > > > > Hive
> > > > > >> > > > > > > > (partitions can live wherever) and Iceberg.
> > > > > >> > > > > > > >
> > > > > >> > > > > > > >  I think for Delta, Hudi, LanceDB, Paimon and File
> > > based
> > > > > >> tables
> > > > > >> > > > they
> > > > > >> > > > > > all
> > > > > >> > > > > > > > have to live in the root location. I'm not sure of
> > any
> > > > > other
> > > > > >> > > "file"
> > > > > >> > > > > > based
> > > > > >> > > > > > > > tables where this would be an issue but I'd love
> to
> > > know
> > > > > if
> > > > > >> > > someone
> > > > > >> > > > > > else
> > > > > >> > > > > > > > has ideas. I think with the rise in credential
> > > vending,
> > > > > >> > splitting
> > > > > >> > > > > > things
> > > > > >> > > > > > > > amongst multiple prefixes is becoming less
> common. I
> > > > don't
> > > > > >> > oppose
> > > > > >> > > > > doing
> > > > > >> > > > > > > an
> > > > > >> > > > > > > > array of locations but it may be enough to just
> > leave
> > > > this
> > > > > >> as
> > > > > >> > an
> > > > > >> > > > > > > extension
> > > > > >> > > > > > > > later. (Support location or locations)
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > On Wed, May 7, 2025 at 8:52 PM yun zou <
> > > > > >> > > yunzou.colost...@gmail.com
> > > > > >> > > > >
> > > > > >> > > > > > > wrote:
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > Hi Dmitri,
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > If it's not "all" is it not strong enough for a
> > > spec,
> > > > > >> IMHO.
> > > > > >> > If
> > > > > >> > > > some
> > > > > >> > > > > > > > tables
> > > > > >> > > > > > > > > have multiple base locations how is Polaris
> going
> > to
> > > > > deal
> > > > > >> > with
> > > > > >> > > > > them?
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Sorry, when I say most of them, it was because I
> > > > haven't
> > > > > >> > tested
> > > > > >> > > > all
> > > > > >> > > > > > of
> > > > > >> > > > > > > > them
> > > > > >> > > > > > > > > (I only tested Delta and CSV before).
> > > > > >> > > > > > > > > However, if Unity Catalog is only taking one
> > > > location, I
> > > > > >> > think
> > > > > >> > > > that
> > > > > >> > > > > > is
> > > > > >> > > > > > > a
> > > > > >> > > > > > > > > strong enough proof that
> > > > > >> > > > > > > > > one location is enough today.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > It is also more natural to start with one
> > location,
> > > > and
> > > > > if
> > > > > >> > > there
> > > > > >> > > > > are
> > > > > >> > > > > > > use
> > > > > >> > > > > > > > > cases that
> > > > > >> > > > > > > > > require support for multiple locations later, we
> > can
> > > > > move
> > > > > >> on
> > > > > >> > to
> > > > > >> > > > V2
> > > > > >> > > > > > spec
> > > > > >> > > > > > > > to
> > > > > >> > > > > > > > > support multiple
> > > > > >> > > > > > > > > tables locations.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > We're making a specification for Polaris. I do
> not
> > > > think
> > > > > >> it
> > > > > >> > is
> > > > > >> > > > > > > sufficient
> > > > > >> > > > > > > > > to say we'll do the same as other (unspecified
> > ATM)
> > > > > >> catalogs.
> > > > > >> > > > > > > > > If we want to migrate users from other Catalog
> > > > services
> > > > > to
> > > > > >> > > > Polaris
> > > > > >> > > > > > > > (through
> > > > > >> > > > > > > > > federation), then Polaris will need to
> > > > > >> > > > > > > > > provide corresponding capabilities.  For
> example,
> > > > Unity
> > > > > >> > Catalog
> > > > > >> > > > > > storage
> > > > > >> > > > > > > > > location is a URI representation, when entity
> > > > > >> > > > > > > > > are federated from Unity Catalog, we will need
> to
> > be
> > > > > able
> > > > > >> to
> > > > > >> > > > handle
> > > > > >> > > > > > the
> > > > > >> > > > > > > > URI
> > > > > >> > > > > > > > > location.
> > > > > >> > > > > > > > > If URI representation is a common standard that
> > has
> > > > been
> > > > > >> > > accepted
> > > > > >> > > > > by
> > > > > >> > > > > > > > other
> > > > > >> > > > > > > > > Catalog services like Unity Catalog, Gravitino,
> > > > > >> > > > > > > > > Polaris should be compatible with that,
> otherwise
> > it
> > > > > might
> > > > > >> > > cause
> > > > > >> > > > > > > problem
> > > > > >> > > > > > > > > for users when they are migrating from one to
> > > > > >> > > > > > > > > another.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > What will Polaris Server do with this location?
> > > > > >> > > > > > > > > For generic tables, Polaris will provide
> > credential
> > > > > >> vending
> > > > > >> > for
> > > > > >> > > > > this
> > > > > >> > > > > > > > > location in near future, I don't see we will
> > provide
> > > > > >> > > > > > > > > anything else in short or mid term, since we
> still
> > > > want
> > > > > to
> > > > > >> > > > promote
> > > > > >> > > > > > > > > native support for Iceberg.
> > > > > >> > > > > > > > > Or if you have anything special in your mind
> that
> > > you
> > > > > >> think
> > > > > >> > we
> > > > > >> > > > > should
> > > > > >> > > > > > > > > support?
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > If Polaris has to define it in a spec, it will
> be
> > > hard
> > > > > to
> > > > > >> > > change
> > > > > >> > > > in
> > > > > >> > > > > > the
> > > > > >> > > > > > > > > future.
> > > > > >> > > > > > > > > Regardless of whether it is explicitly in the
> spec
> > > > > >> definition
> > > > > >> > > or
> > > > > >> > > > > as a
> > > > > >> > > > > > > > > reserved property key, as long as they are
> > > explicitly
> > > > > >> > > > > > > > > documented, they will be hard to change in the
> > > future.
> > > > > >> From
> > > > > >> > > that
> > > > > >> > > > > > > > > perspective, those two approaches seem the same
> to
> > > me.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Table location is critical information that is
> > > > required
> > > > > by
> > > > > >> > the
> > > > > >> > > > > engine
> > > > > >> > > > > > > > side
> > > > > >> > > > > > > > > to read and write the tables, which should
> > > > > >> > > > > > > > > be explicitly defined to provide better sharing
> > > across
> > > > > >> > engines.
> > > > > >> > > > For
> > > > > >> > > > > > > > > example, the delta table location is passed in
> the
> > > > > >> > > > > > > > > table properties with a property key either
> > > "location"
> > > > > or
> > > > > >> > > "path"
> > > > > >> > > > > > > depends
> > > > > >> > > > > > > > on
> > > > > >> > > > > > > > > how the table is created. Now, if another
> > > > > >> > > > > > > > > engine wants to read the delta table, it will
> need
> > > to
> > > > > >> > > understand
> > > > > >> > > > > > those
> > > > > >> > > > > > > > > keys, which are controlled by Spark today. If
> > Spark
> > > > > >> > > > > > > > > changes them one day, all sharing will stop
> > working.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > As to whether we want to put it as an explicit
> > field
> > > > or
> > > > > a
> > > > > >> > > > reserved
> > > > > >> > > > > > > key, I
> > > > > >> > > > > > > > > think for a common field among various
> > > > > >> > > > > > > > > table formats, it makes more sense to have it as
> > an
> > > > > >> explicit
> > > > > >> > > > field.
> > > > > >> > > > > > For
> > > > > >> > > > > > > > > properties that are specific to a particular
> table
> > > > > format,
> > > > > >> > > > > > > > > it is more proper to just have a reserved key.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > If Polaris takes control of the location, I
> think
> > we
> > > > > have
> > > > > >> to
> > > > > >> > be
> > > > > >> > > > > more
> > > > > >> > > > > > > > > careful
> > > > > >> > > > > > > > > and at least try to make it future-proof.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > I don't think Polaris is taking control of the
> > > > location,
> > > > > >> the
> > > > > >> > > > > location
> > > > > >> > > > > > > is
> > > > > >> > > > > > > > > still controlled by the engine and users today
> > like
> > > > > table
> > > > > >> > > names.
> > > > > >> > > > > > > > > Polaris is a Catalog service, it records the
> > generic
> > > > > table
> > > > > >> > > > entity,
> > > > > >> > > > > > and
> > > > > >> > > > > > > > > returns the information back to the user on
> query.
> > > > > >> > > > > > > > > It might be able to do some validation on the
> > > location
> > > > > >> (like
> > > > > >> > > > check
> > > > > >> > > > > > > > special
> > > > > >> > > > > > > > > character), but it doesn't decide which location
> > > > > >> > > > > > > > > the table will be used. I personally don't think
> > it
> > > > is a
> > > > > >> bad
> > > > > >> > > idea
> > > > > >> > > > > to
> > > > > >> > > > > > > let
> > > > > >> > > > > > > > > the Catalog service also take control of
> > generating
> > > > > >> > > > > > > > > the table location, but I think that will
> require
> > a
> > > > lot
> > > > > of
> > > > > >> > > work.
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > Best Regards,
> > > > > >> > > > > > > > > Yun
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > On Wed, May 7, 2025 at 5:22 PM Dmitri
> > Bourlatchkov <
> > > > > >> > > > > di...@apache.org
> > > > > >> > > > > > >
> > > > > >> > > > > > > > > wrote:
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > > > > No worries about the name. It is a possible
> > > > > alternative
> > > > > >> > > > spelling
> > > > > >> > > > > :)
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > On Wed, May 7, 2025 at 8:04 PM yun zou <
> > > > > >> > > > > yunzou.colost...@gmail.com
> > > > > >> > > > > > >
> > > > > >> > > > > > > > > wrote:
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > > Hi Dmitri,
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Sorry, I accidentally typed your name wrong
> in
> > > the
> > > > > >> > previous
> > > > > >> > > > > > reply!
> > > > > >> > > > > > > > > > > Apologize for this!
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > For the S3 issue, I think we will need to
> deal
> > > > with
> > > > > >> those
> > > > > >> > > > > > > regardless,
> > > > > >> > > > > > > > > > > especially with the federation work going
> on,
> > we
> > > > > will
> > > > > >> > need
> > > > > >> > > to
> > > > > >> > > > > > > handle
> > > > > >> > > > > > > > > all
> > > > > >> > > > > > > > > > > those entities eventually coming from
> > different
> > > > > >> Catalogs,
> > > > > >> > > and
> > > > > >> > > > > the
> > > > > >> > > > > > > URI
> > > > > >> > > > > > > > > > > format seems the standard format used by
> > various
> > > > > >> Catalog
> > > > > >> > > > > > services.
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > Best Regards,
> > > > > >> > > > > > > > > > > Yun
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > On Wed, May 7, 2025 at 4:55 PM yun zou <
> > > > > >> > > > > > yunzou.colost...@gmail.com
> > > > > >> > > > > > > >
> > > > > >> > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Hi Dimitri and Eric,
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Thanks a lot for the feedback!
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > For the questions:
> > > > > >> > > > > > > > > > > > - Is one value or many?
> > > > > >> > > > > > > > > > > > It will be one value, similar to the
> > location
> > > in
> > > > > >> > Iceberg
> > > > > >> > > > and
> > > > > >> > > > > > the
> > > > > >> > > > > > > > > > > > storage_location in unity catalog.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Regarding to the point about having new
> data
> > > in
> > > > > new
> > > > > >> > > > locations
> > > > > >> > > > > > and
> > > > > >> > > > > > > > > > keeping
> > > > > >> > > > > > > > > > > > old data in old locations, do we support
> > that
> > > > for
> > > > > >> > Iceberg
> > > > > >> > > > > > > > > > > > today?
> > > > > >> > > > > > > > > > > > For most of the Spark tables, it seems to
> > only
> > > > > have
> > > > > >> one
> > > > > >> > > > > > location.
> > > > > >> > > > > > > > > > Also, I
> > > > > >> > > > > > > > > > > > think it is better to start restricted
> > first,
> > > > and
> > > > > >> then
> > > > > >> > > > extend
> > > > > >> > > > > > it
> > > > > >> > > > > > > to
> > > > > >> > > > > > > > > > > > allow multiple locations when the use case
> > > > raises.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Ref:
> > > > > >> > > > > > > > > > > > Iceberg location:
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L3451
> > > > > >> > > > > > > > > > > > Storage location in Unity Catalog:
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L3451
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > - Is it a URI?
> > > > > >> > > > > > > > > > > > Yes, it will be a URI, which seems the
> > > standard
> > > > > >> catalog
> > > > > >> > > > > > > > > implementation.
> > > > > >> > > > > > > > > > > > Regarding to the point about s3 v2 s3a, i
> > > assume
> > > > > >> that
> > > > > >> > is
> > > > > >> > > a
> > > > > >> > > > > > common
> > > > > >> > > > > > > > > > > > problem that every catalog implementation
> > > needs
> > > > to
> > > > > >> > > address,
> > > > > >> > > > > and
> > > > > >> > > > > > > we
> > > > > >> > > > > > > > > will
> > > > > >> > > > > > > > > > > > stay the same on this part. At least from
> > the
> > > > load
> > > > > >> > table
> > > > > >> > > > > point
> > > > > >> > > > > > of
> > > > > >> > > > > > > > > view,
> > > > > >> > > > > > > > > > > > Spark engine knows how to deal with such
> > > cases.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > - Does it point to any particular file?
> > > > > >> > > > > > > > > > > > No, it doesn't point to a particular file.
> > It
> > > is
> > > > > the
> > > > > >> > base
> > > > > >> > > > > table
> > > > > >> > > > > > > > > > location.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > - Is it a common prefix of all files
> within
> > a
> > > > > table?
> > > > > >> > > > > > > > > > > > It is supposed to be the base table
> > location,
> > > > > which
> > > > > >> > > > > > theoretically
> > > > > >> > > > > > > > > > should
> > > > > >> > > > > > > > > > > > be the common prefix of all files within a
> > > > table I
> > > > > >> > > believe.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > - What happens when a value does not match
> > > these
> > > > > >> > > > > expectations?
> > > > > >> > > > > > > > > > > > Whether it is one value or many is
> > restricted
> > > by
> > > > > the
> > > > > >> > spec
> > > > > >> > > > > > > already.
> > > > > >> > > > > > > > > > > > For URI format, I think we can do a format
> > > > check,
> > > > > >> and
> > > > > >> > > fail
> > > > > >> > > > > it.
> > > > > >> > > > > > > > > > > > Other than that, we will not do any other
> > > > special
> > > > > >> > check,
> > > > > >> > > > and
> > > > > >> > > > > we
> > > > > >> > > > > > > > rely
> > > > > >> > > > > > > > > on
> > > > > >> > > > > > > > > > > > the client to put the correct value,
> > > otherwise,
> > > > > the
> > > > > >> > other
> > > > > >> > > > > > engine
> > > > > >> > > > > > > > will
> > > > > >> > > > > > > > > > > > not be able to successfully read the
> table.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > For the location keyword, as Eric has
> > pointed
> > > > out,
> > > > > >> we
> > > > > >> > can
> > > > > >> > > > > > > > potentially
> > > > > >> > > > > > > > > > > have
> > > > > >> > > > > > > > > > > > a reserved key for the properties.
> However,
> > > > > location
> > > > > >> > is a
> > > > > >> > > > > > common
> > > > > >> > > > > > > > > > > > enough key among various table formats,
> > which
> > > > > >> worths a
> > > > > >> > > > > > dedicated
> > > > > >> > > > > > > > key
> > > > > >> > > > > > > > > to
> > > > > >> > > > > > > > > > > > help store and load the information in a
> > more
> > > > > >> > > > straightforward
> > > > > >> > > > > > > > > > > > way.  For things that are specific to one
> or
> > > two
> > > > > >> > > formats, I
> > > > > >> > > > > > think
> > > > > >> > > > > > > > it
> > > > > >> > > > > > > > > > > makes
> > > > > >> > > > > > > > > > > > more sense to use a reserved property key.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > As a reference, in Iceberg, the
> CreateTable
> > > > > request
> > > > > >> and
> > > > > >> > > > > > > > TableMetadata
> > > > > >> > > > > > > > > > > does
> > > > > >> > > > > > > > > > > > have an explicit location key in the spec.
> > For
> > > > > >> > > > > write.data.path
> > > > > >> > > > > > > > > > > > and write.metadata.path, they are passed
> as
> > > > > >> properties
> > > > > >> > > > today.
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > Best Regards,
> > > > > >> > > > > > > > > > > > Yun
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > > On Wed, May 7, 2025 at 3:54 PM Dmitri
> > > > > Bourlatchkov <
> > > > > >> > > > > > > > di...@apache.org
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > > > > > wrote:
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > > >> Another point: I'm pretty sure sooner or
> > > later
> > > > > >> users
> > > > > >> > > will
> > > > > >> > > > > want
> > > > > >> > > > > > > to
> > > > > >> > > > > > > > > move
> > > > > >> > > > > > > > > > > >> their data to some other location. As an
> > > option
> > > > > >> users
> > > > > >> > > may
> > > > > >> > > > > want
> > > > > >> > > > > > > to
> > > > > >> > > > > > > > > > write
> > > > > >> > > > > > > > > > > >> new
> > > > > >> > > > > > > > > > > >> files into another location but keep old
> > > files
> > > > in
> > > > > >> > place.
> > > > > >> > > > > > > > > > > >>
> > > > > >> > > > > > > > > > > >> Also: if the location is a URI, how do we
> > > deal
> > > > > >> with s3
> > > > > >> > > vs.
> > > > > >> > > > > s3a
> > > > > >> > > > > > > for
> > > > > >> > > > > > > > > > > >> example?
> > > > > >> > > > > > > > > > > >>
> > > > > >> > > > > > > > > > > >> In Iceberg it is quite common for
> different
> > > > > >> engines to
> > > > > >> > > use
> > > > > >> > > > > > > > different
> > > > > >> > > > > > > > > > > >> access
> > > > > >> > > > > > > > > > > >> tools, which often leads to different URI
> > > > > schemes.
> > > > > >> > > > > > > > > > > >>
> > > > > >> > > > > > > > > > > >> Cheers,
> > > > > >> > > > > > > > > > > >> Dmitri.
> > > > > >> > > > > > > > > > > >>
> > > > > >> > > > > > > > > > > >> On Wed, May 7, 2025 at 6:46 PM Eric
> > Maynard <
> > > > > >> > > > > > > > > eric.w.mayn...@gmail.com
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > > > >> wrote:
> > > > > >> > > > > > > > > > > >>
> > > > > >> > > > > > > > > > > >> > All good questions Dmitri — I’m
> > especially
> > > > > >> > interested
> > > > > >> > > in
> > > > > >> > > > > the
> > > > > >> > > > > > > > first
> > > > > >> > > > > > > > > > one
> > > > > >> > > > > > > > > > > >> as
> > > > > >> > > > > > > > > > > >> > from what I understand Iceberg tables
> can
> > > > have
> > > > > >> > > metadata
> > > > > >> > > > > and
> > > > > >> > > > > > > data
> > > > > >> > > > > > > > > at
> > > > > >> > > > > > > > > > > two
> > > > > >> > > > > > > > > > > >> > different paths that we need to vend
> > > > > credentials
> > > > > >> > for.
> > > > > >> > > > > > > > > > > >> >
> > > > > >> > > > > > > > > > > >> > For iceberg tables, we just use special
> > > > > >> properties
> > > > > >> > to
> > > > > >> > > > > track
> > > > > >> > > > > > > > these
> > > > > >> > > > > > > > > > > >> > locations. I wonder if we couldn’t do
> the
> > > > same
> > > > > >> for
> > > > > >> > > > generic
> > > > > >> > > > > > > > tables.
> > > > > >> > > > > > > > > > > >> >
> > > > > >> > > > > > > > > > > >> > On Wed, May 7, 2025 at 3:42 PM Dmitri
> > > > > >> Bourlatchkov <
> > > > > >> > > > > > > > > > di...@apache.org>
> > > > > >> > > > > > > > > > > >> > wrote:
> > > > > >> > > > > > > > > > > >> >
> > > > > >> > > > > > > > > > > >> > > Hi Yun,
> > > > > >> > > > > > > > > > > >> > >
> > > > > >> > > > > > > > > > > >> > > Please clarify the meaning of the
> value
> > > of
> > > > > the
> > > > > >> new
> > > > > >> > > > > > location
> > > > > >> > > > > > > > > > > attribute.
> > > > > >> > > > > > > > > > > >> > >
> > > > > >> > > > > > > > > > > >> > > - Is is one value or many?
> > > > > >> > > > > > > > > > > >> > > - Is it a URI?
> > > > > >> > > > > > > > > > > >> > > - Does it point to any particular
> file?
> > > > > >> > > > > > > > > > > >> > > - Is it a common prefix of all files
> > > > within a
> > > > > >> > table?
> > > > > >> > > > > > > > > > > >> > > - What happens when a value does not
> > > match
> > > > > >> these
> > > > > >> > > > > > > expectation?
> > > > > >> > > > > > > > > > > >> > >
> > > > > >> > > > > > > > > > > >> > > Thanks,
> > > > > >> > > > > > > > > > > >> > > Dmitri.
> > > > > >> > > > > > > > > > > >> > >
> > > > > >> > > > > > > > > > > >> > > On 2025/05/07 21:50:19 yun zou wrote:
> > > > > >> > > > > > > > > > > >> > > > Hi folks,
> > > > > >> > > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > > >> > > > I would like to propose to add an
> > > > optional
> > > > > >> > > > `location`
> > > > > >> > > > > > > field
> > > > > >> > > > > > > > to
> > > > > >> > > > > > > > > > > >> > > > CreateGenricTable Request and
> > > > > >> LoadGenericTable
> > > > > >> > > > > response.
> > > > > >> > > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > > >> > > > The `location` is the location for
> > the
> > > > > table,
> > > > > >> > > which
> > > > > >> > > > is
> > > > > >> > > > > > > > common
> > > > > >> > > > > > > > > to
> > > > > >> > > > > > > > > > > >> most
> > > > > >> > > > > > > > > > > >> > > table
> > > > > >> > > > > > > > > > > >> > > > formats including Iceberg, Delta,
> > Hudi,
> > > > > csv,
> > > > > >> > > parquet
> > > > > >> > > > > > etc.
> > > > > >> > > > > > > > The
> > > > > >> > > > > > > > > > > >> location
> > > > > >> > > > > > > > > > > >> > > > information is critical for loading
> > the
> > > > > >> table at
> > > > > >> > > > > engine
> > > > > >> > > > > > > > side,
> > > > > >> > > > > > > > > > > >> having a
> > > > > >> > > > > > > > > > > >> > > > dedicated keyword could help
> improve
> > > the
> > > > > >> > > robustness
> > > > > >> > > > > for
> > > > > >> > > > > > > > cross
> > > > > >> > > > > > > > > > > engine
> > > > > >> > > > > > > > > > > >> > > > sharing, instead of relying on the
> > > > > properties
> > > > > >> > > passed
> > > > > >> > > > > by
> > > > > >> > > > > > > the
> > > > > >> > > > > > > > > > client
> > > > > >> > > > > > > > > > > >> > side.
> > > > > >> > > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > > >> > > > Furthermore, this information is
> also
> > > > > >> required
> > > > > >> > to
> > > > > >> > > > > > provide
> > > > > >> > > > > > > > > > > credential
> > > > > >> > > > > > > > > > > >> > > > vending capabilities later.
> > > > > >> > > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > > >> > > > Here is the PR for adding the spec:
> > > > > >> > > > > > > > > > > >> > > >
> > > > > https://github.com/apache/polaris/pull/1543
> > > > > >> > > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > > >> > > > Looking forward to your reply and
> > > > feedback!
> > > > > >> > > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > > >> > > > Best Regards,
> > > > > >> > > > > > > > > > > >> > > > Yun
> > > > > >> > > > > > > > > > > >> > > >
> > > > > >> > > > > > > > > > > >> > >
> > > > > >> > > > > > > > > > > >> >
> > > > > >> > > > > > > > > > > >>
> > > > > >> > > > > > > > > > > >
> > > > > >> > > > > > > > > > >
> > > > > >> > > > > > > > > >
> > > > > >> > > > > > > > >
> > > > > >> > > > > > > >
> > > > > >> > > > > > >
> > > > > >> > > > > >
> > > > > >> > > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to