Inlined.

On Thu, May 22, 2025 at 7:48 AM Dmitri Bourlatchkov <di...@apache.org>
wrote:

> > Can we keep it simple for v1 [...]
>
> What is v1 in this context?
>
I meant as the first iteration, sorry for the confusion.

>
> Thanks,
> Dmitri.
>
> On Wed, May 21, 2025 at 8:42 PM Yufei Gu <flyrain...@gmail.com> wrote:
>
> > Can we keep it simple for v1, as one location field is enough for today’s
> > use cases? And we can revisit multi-location support when there’s real
> > demand.
> >
> > The current API spec already implies that a table’s location is
> immutable,
> > there’s no “alter location” call. I’m fine leaving it implicit, but we
> > could add an explicit note to make that clear if it helps avoid
> confusion.
> >
> > Yufei
> >
> >
> > On Wed, May 21, 2025 at 4:36 PM Eric Maynard <eric.w.mayn...@gmail.com>
> > wrote:
> >
> > > No two tables globally can have a location overlap? That’s a stricter
> > > requirement than we have for even Iceberg tables and doesn’t sound
> > correct.
> > >
> > > Similarly, the restriction that you can’t change location is stricter
> > than
> > > what we have for Iceberg.
> > >
> > > Finally, I’m still not sure what the problem is with having multiple
> > > locations. Again, we already track multiple locations for Iceberg.
> > >
> > > On Thu, May 22, 2025 at 12:32 AM yun zou <yunzou.colost...@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > Want to summarize the thread here:
> > > >
> > > > For generic tables, we will add a `location` key to help cross engine
> > > > sharing and future support for credential vending.
> > > >
> > > > Here is a description about this `location` key and corresponding
> > > > restrictions and responsibilities:
> > > > - `location`(OPTIONAL): table root location in URI format. For
> example:
> > > > s3://<my-bucket>/path/to/table.
> > > >   - The table root location is a location that includes all files for
> > the
> > > > table.
> > > >   - Clients (engines) are responsible to make sure all files are
> > written
> > > > under the configured location.
> > > >   - A table with multiple root locations (i.e. containing files that
> > are
> > > > outside the configured root location) is not compliant with the
> current
> > > > generic table support in Polaris.
> > > >   - No two tables can have the same or overlapped location,
> otherwise,
> > a
> > > > ForbiddenException will be thrown on creation.
> > > >   - If no location is provided, clients or users are responsible to
> > > manage
> > > > the location and location related concerns such as path conflict
> check
> > > etc.
> > > >   - The location configuration can not be updated once the table is
> > > > created.
> > > >
> > > > This description will be added into the spec. In order to help
> non-API
> > > > users to discover the information easily, we will also get a site
> page
> > to
> > > > describe the support
> > > > for Generic Table and key fields.
> > > >
> > > > Best Regards,
> > > > Yun
> > > >
> > > > On Mon, May 19, 2025 at 11:16 PM yun zou <yunzou.colost...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hi Dmitri,
> > > > >
> > > > > " I do not think those doc comments provide enough visibility to
> > ensure
> > > > > that the key information
> > > > > is received by users, unless they are dealing directly with the
> API"
> > > > > -- Yeah, I agree those information may not be visible enough for
> > users
> > > > who
> > > > > don't directly work with APIs.
> > > > > However, I think just having one page for "location" might be a
> > little
> > > > bit
> > > > > overkill. Given that generic table API support is
> > > > > a new catalog capabilities that Polaris added which is not IRC, I
> > think
> > > > it
> > > > > might worth having a more general page to
> > > > > describe the Polaris Generic Table support and describe some of the
> > > > > critical fields like *location*.
> > > > > I think we should have the description in the spec also, so that
> > things
> > > > > could be clear for API users.
> > > > >
> > > > > Please let me know what you think.
> > > > >
> > > > > Best Regards,
> > > > > Yun
> > > > >
> > > > > On Mon, May 19, 2025 at 4:22 PM Dmitri Bourlatchkov <
> > di...@apache.org>
> > > > > wrote:
> > > > >
> > > > >> I believe the Open API spec and the definition of "location" are
> > > > slightly
> > > > >> different concerns.
> > > > >>
> > > > >> The former is about the API used to obtain information about
> Generic
> > > > >> Tables.
> > > > >>
> > > > >> The latter is about the interpretation of that information. One
> can
> > > > think
> > > > >> of the location
> > > > >> value being handled / transferred beyond the immediate Polaris
> > client,
> > > > in
> > > > >> which case
> > > > >> is loses its connection to the API, but does not lose its meaning
> > as a
> > > > >> location of a
> > > > >> Generic Table.
> > > > >>
> > > > >> Also, I think that Open API doc comments are too low-level and too
> > > > obscure
> > > > >> for
> > > > >> people who will work with processing actual Generic Table files. I
> > do
> > > > not
> > > > >> think
> > > > >> those doc comment provide enough visibility to ensure that the key
> > > > >> information
> > > > >> is received by users, unless they are dealing directly with the
> API.
> > > > >>
> > > > >> That said, if you prefer to keep the finer points about Generic
> > Table
> > > > >> locations in the
> > > > >> Open API spec, I'd be fine with that.
> > > > >>
> > > > >> Cheers,
> > > > >> Dmitri.
> > > > >>
> > > > >> On Mon, May 19, 2025 at 6:46 PM yun zou <
> yunzou.colost...@gmail.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Dmitri,
> > > > >> >
> > > > >> > Thanks for the detailed explanation, I definitely agree we need
> to
> > > > call
> > > > >> out
> > > > >> > those restrictions and compliance in our Spec.
> > > > >> >
> > > > >> > As for the documentation, Polaris today already publishes the
> API
> > > > spec,
> > > > >> if
> > > > >> > you go to page https://polaris.apache.org/in-dev/unreleased/,
> > > > >> > and click on the Catalog API Spec, it will lead you to the
> > published
> > > > >> Spec,
> > > > >> > which contains all description in the Spec.
> > > > >> > That basically means we have both published doc and spec code,
> and
> > > the
> > > > >> > single source of truth is the description in the doc.
> > > > >> > or do you think we should have an extra page for the Generic
> Table
> > > API
> > > > >> > spec?
> > > > >> >
> > > > >> > Best Regards,
> > > > >> > Yun
> > > > >> >
> > > > >> > On Mon, May 19, 2025 at 3:20 PM Yufei Gu <flyrain...@gmail.com>
> > > > wrote:
> > > > >> >
> > > > >> > > >
> > > > >> > > > * Clients (engines) are responsible for writing files only
> > under
> > > > the
> > > > >> > > > specified location.
> > > > >> > >
> > > > >> > > It's nice to have a doc like that. But the open API spec is
> > *the*
> > > > >> place
> > > > >> > to
> > > > >> > > define the behavior of client and server, and how they
> interact
> > > with
> > > > >> each
> > > > >> > > other. Just as we said before, spec change is recommended to
> > have
> > > a
> > > > ML
> > > > >> > > discussion.
> > > > >> > >
> > > > >> > > * A table, whose files exist outside the declared location, is
> > not
> > > > >> > > > compliant with the Polaris' definition for a Generic Table.
> > > > >> > >
> > > > >> > > I'm not sure we should go that far. "location" is an optional
> > > field.
> > > > >> It's
> > > > >> > > just some features like credential vending that don't work if
> > > > >> "location"
> > > > >> > is
> > > > >> > > missing.
> > > > >> > >
> > > > >> > > Yufei
> > > > >> > >
> > > > >> > >
> > > > >> > > On Mon, May 19, 2025 at 2:59 PM Dmitri Bourlatchkov <
> > > > di...@apache.org
> > > > >> >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > As I commented in my other recent email, I think by
> > introducing
> > > a
> > > > >> > > > "location" property Polaris enters the realm of table format
> > > > specs.
> > > > >> > > >
> > > > >> > > > This is fine, from my POV, however, since Polaris is the
> > > defining
> > > > >> > project
> > > > >> > > > behind that property, I believe Polaris should provide a
> more
> > > > >> > definitive
> > > > >> > > > description of the meaning and intended processing of that
> > > > property.
> > > > >> > > >
> > > > >> > > > To repeat myself, I think the Open API spec defines only the
> > API
> > > > for
> > > > >> > > > obtaining the location. We need a place to define what this
> > > > location
> > > > >> > > means.
> > > > >> > > > I do not insist on calling this a "spec" for Generic Tables,
> > > but I
> > > > >> > think
> > > > >> > > it
> > > > >> > > > deserves a separate page in Polaris docs, where it would be
> > > > defined
> > > > >> > with
> > > > >> > > > more rigor.
> > > > >> > > >
> > > > >> > > > Specifically, I think we need to call out that:
> > > > >> > > > * The location is a base URI (essentially prefix) for all
> > files
> > > > in a
> > > > >> > > > generic table.
> > > > >> > > > * Clients (engines) are responsible for writing files only
> > under
> > > > the
> > > > >> > > > specified location.
> > > > >> > > > * A table, whose files exist outside the declared location,
> is
> > > not
> > > > >> > > > compliant with the Polaris' definition for a Generic Table.
> > > > >> > > >
> > > > >> > > > By extension, I think we ought to describe other existing
> > > > properties
> > > > >> > too.
> > > > >> > > >
> > > > >> > > > WDYT?
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Dmitri.
> > > > >> > > >
> > > > >> > > > On Mon, May 19, 2025 at 5:39 PM yun zou <
> > > > yunzou.colost...@gmail.com
> > > > >> >
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > Hi Dmitri,
> > > > >> > > > >
> > > > >> > > > > I think for Iceberg, we all agreed that there can be
> > multiple
> > > > >> > > locations,
> > > > >> > > > > and I definitely agree with Russel that the extension
> > > > >> > > > > should be done with the IRC endpoints. The Generic Table
> > APIs
> > > > are
> > > > >> > > > designed
> > > > >> > > > > for non-Iceberg table usage today, and
> > > > >> > > > > We still want Iceberg table usage to go through the IRC
> > > endpoint
> > > > >> to
> > > > >> > > have
> > > > >> > > > > full IRC support.
> > > > >> > > > >
> > > > >> > > > > As for the following point
> > > > >> > > > > "a more strict spec for that (define where file should and
> > > > should
> > > > >> not
> > > > >> > > > go)"
> > > > >> > > > > Are you referring that Polaris need to generate a location
> > for
> > > > the
> > > > >> > > table
> > > > >> > > > to
> > > > >> > > > > use, if that is the case, I don't think engines
> > > > >> > > > > respects that today. The table locations are either
> > generated
> > > by
> > > > >> the
> > > > >> > > > engine
> > > > >> > > > > or specified by the user.
> > > > >> > > > > Or are you referring that we should have something like
> > > Iceberg
> > > > >> that
> > > > >> > we
> > > > >> > > > > should have an allowed location and do a
> > > > >> > > > > validation to make sure the location is under the allowed
> > > > >> location?
> > > > >> > > Would
> > > > >> > > > > you mind elaborate more on this point?
> > > > >> > > > >
> > > > >> > > > > Best Regards,
> > > > >> > > > > Yun
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > On Mon, May 19, 2025 at 1:45 PM Russell Spitzer <
> > > > >> > > > russell.spit...@gmail.com
> > > > >> > > > > >
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Yeah I think Iceberg and Hive are the only ones trying
> to
> > > make
> > > > >> life
> > > > >> > > > > > difficult, that I think
> > > > >> > > > > > we should also cover but in changes to the Iceberg Spec.
> > > Hive
> > > > >> can
> > > > >> > > just
> > > > >> > > > > stay
> > > > >> > > > > > how it is ...
> > > > >> > > > > >
> > > > >> > > > > > On Mon, May 19, 2025 at 2:59 PM Dmitri Bourlatchkov <
> > > > >> > > di...@apache.org>
> > > > >> > > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > For context: my locations concerns are rooted in
> > Nessie's
> > > > >> > > experience
> > > > >> > > > > > where
> > > > >> > > > > > > we often get problem reports related to files being
> > > outside
> > > > >> the
> > > > >> > > > > declared
> > > > >> > > > > > > Iceberg metadata location.
> > > > >> > > > > > >
> > > > >> > > > > > > Example:
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/projectnessie/nessie/issues/10817#issuecomment-2887329227
> > > > >> > > > > > >
> > > > >> > > > > > > I'm ok going with a single location for generic
> tables,
> > > but
> > > > I
> > > > >> > think
> > > > >> > > > > > Polaris
> > > > >> > > > > > > needs to have a more strict spec for that (define
> where
> > > file
> > > > >> > should
> > > > >> > > > and
> > > > >> > > > > > > should not go) because polaris owns this spec. Polaris
> > > ought
> > > > >> to
> > > > >> > > > define
> > > > >> > > > > > what
> > > > >> > > > > > > complies with the spec and what does not. Having a
> > proper
> > > > >> spec is
> > > > >> > > > > > essential
> > > > >> > > > > > > to ensure a mutual understanding of all parties
> dealing
> > > with
> > > > >> > > Generic
> > > > >> > > > > > > Tables.
> > > > >> > > > > > >
> > > > >> > > > > > > Open API yaml comments are not sufficient, IMHO. I'd
> > > prefer
> > > > to
> > > > >> > > have a
> > > > >> > > > > > > dedicated doc page to define expectations and
> > compliance.
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > > Dmitri.
> > > > >> > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > > > On Mon, May 19, 2025 at 2:17 PM Russell Spitzer <
> > > > >> > > > > > russell.spit...@gmail.com
> > > > >> > > > > > > >
> > > > >> > > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > The only multiple locations table formats I'm
> > currently
> > > > >> aware
> > > > >> > of
> > > > >> > > > are
> > > > >> > > > > > Hive
> > > > >> > > > > > > > (partitions can live wherever) and Iceberg.
> > > > >> > > > > > > >
> > > > >> > > > > > > >  I think for Delta, Hudi, LanceDB, Paimon and File
> > based
> > > > >> tables
> > > > >> > > > they
> > > > >> > > > > > all
> > > > >> > > > > > > > have to live in the root location. I'm not sure of
> any
> > > > other
> > > > >> > > "file"
> > > > >> > > > > > based
> > > > >> > > > > > > > tables where this would be an issue but I'd love to
> > know
> > > > if
> > > > >> > > someone
> > > > >> > > > > > else
> > > > >> > > > > > > > has ideas. I think with the rise in credential
> > vending,
> > > > >> > splitting
> > > > >> > > > > > things
> > > > >> > > > > > > > amongst multiple prefixes is becoming less common. I
> > > don't
> > > > >> > oppose
> > > > >> > > > > doing
> > > > >> > > > > > > an
> > > > >> > > > > > > > array of locations but it may be enough to just
> leave
> > > this
> > > > >> as
> > > > >> > an
> > > > >> > > > > > > extension
> > > > >> > > > > > > > later. (Support location or locations)
> > > > >> > > > > > > >
> > > > >> > > > > > > > On Wed, May 7, 2025 at 8:52 PM yun zou <
> > > > >> > > yunzou.colost...@gmail.com
> > > > >> > > > >
> > > > >> > > > > > > wrote:
> > > > >> > > > > > > >
> > > > >> > > > > > > > > Hi Dmitri,
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > If it's not "all" is it not strong enough for a
> > spec,
> > > > >> IMHO.
> > > > >> > If
> > > > >> > > > some
> > > > >> > > > > > > > tables
> > > > >> > > > > > > > > have multiple base locations how is Polaris going
> to
> > > > deal
> > > > >> > with
> > > > >> > > > > them?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Sorry, when I say most of them, it was because I
> > > haven't
> > > > >> > tested
> > > > >> > > > all
> > > > >> > > > > > of
> > > > >> > > > > > > > them
> > > > >> > > > > > > > > (I only tested Delta and CSV before).
> > > > >> > > > > > > > > However, if Unity Catalog is only taking one
> > > location, I
> > > > >> > think
> > > > >> > > > that
> > > > >> > > > > > is
> > > > >> > > > > > > a
> > > > >> > > > > > > > > strong enough proof that
> > > > >> > > > > > > > > one location is enough today.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > It is also more natural to start with one
> location,
> > > and
> > > > if
> > > > >> > > there
> > > > >> > > > > are
> > > > >> > > > > > > use
> > > > >> > > > > > > > > cases that
> > > > >> > > > > > > > > require support for multiple locations later, we
> can
> > > > move
> > > > >> on
> > > > >> > to
> > > > >> > > > V2
> > > > >> > > > > > spec
> > > > >> > > > > > > > to
> > > > >> > > > > > > > > support multiple
> > > > >> > > > > > > > > tables locations.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > We're making a specification for Polaris. I do not
> > > think
> > > > >> it
> > > > >> > is
> > > > >> > > > > > > sufficient
> > > > >> > > > > > > > > to say we'll do the same as other (unspecified
> ATM)
> > > > >> catalogs.
> > > > >> > > > > > > > > If we want to migrate users from other Catalog
> > > services
> > > > to
> > > > >> > > > Polaris
> > > > >> > > > > > > > (through
> > > > >> > > > > > > > > federation), then Polaris will need to
> > > > >> > > > > > > > > provide corresponding capabilities.  For example,
> > > Unity
> > > > >> > Catalog
> > > > >> > > > > > storage
> > > > >> > > > > > > > > location is a URI representation, when entity
> > > > >> > > > > > > > > are federated from Unity Catalog, we will need to
> be
> > > > able
> > > > >> to
> > > > >> > > > handle
> > > > >> > > > > > the
> > > > >> > > > > > > > URI
> > > > >> > > > > > > > > location.
> > > > >> > > > > > > > > If URI representation is a common standard that
> has
> > > been
> > > > >> > > accepted
> > > > >> > > > > by
> > > > >> > > > > > > > other
> > > > >> > > > > > > > > Catalog services like Unity Catalog, Gravitino,
> > > > >> > > > > > > > > Polaris should be compatible with that, otherwise
> it
> > > > might
> > > > >> > > cause
> > > > >> > > > > > > problem
> > > > >> > > > > > > > > for users when they are migrating from one to
> > > > >> > > > > > > > > another.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > What will Polaris Server do with this location?
> > > > >> > > > > > > > > For generic tables, Polaris will provide
> credential
> > > > >> vending
> > > > >> > for
> > > > >> > > > > this
> > > > >> > > > > > > > > location in near future, I don't see we will
> provide
> > > > >> > > > > > > > > anything else in short or mid term, since we still
> > > want
> > > > to
> > > > >> > > > promote
> > > > >> > > > > > > > > native support for Iceberg.
> > > > >> > > > > > > > > Or if you have anything special in your mind that
> > you
> > > > >> think
> > > > >> > we
> > > > >> > > > > should
> > > > >> > > > > > > > > support?
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > If Polaris has to define it in a spec, it will be
> > hard
> > > > to
> > > > >> > > change
> > > > >> > > > in
> > > > >> > > > > > the
> > > > >> > > > > > > > > future.
> > > > >> > > > > > > > > Regardless of whether it is explicitly in the spec
> > > > >> definition
> > > > >> > > or
> > > > >> > > > > as a
> > > > >> > > > > > > > > reserved property key, as long as they are
> > explicitly
> > > > >> > > > > > > > > documented, they will be hard to change in the
> > future.
> > > > >> From
> > > > >> > > that
> > > > >> > > > > > > > > perspective, those two approaches seem the same to
> > me.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Table location is critical information that is
> > > required
> > > > by
> > > > >> > the
> > > > >> > > > > engine
> > > > >> > > > > > > > side
> > > > >> > > > > > > > > to read and write the tables, which should
> > > > >> > > > > > > > > be explicitly defined to provide better sharing
> > across
> > > > >> > engines.
> > > > >> > > > For
> > > > >> > > > > > > > > example, the delta table location is passed in the
> > > > >> > > > > > > > > table properties with a property key either
> > "location"
> > > > or
> > > > >> > > "path"
> > > > >> > > > > > > depends
> > > > >> > > > > > > > on
> > > > >> > > > > > > > > how the table is created. Now, if another
> > > > >> > > > > > > > > engine wants to read the delta table, it will need
> > to
> > > > >> > > understand
> > > > >> > > > > > those
> > > > >> > > > > > > > > keys, which are controlled by Spark today. If
> Spark
> > > > >> > > > > > > > > changes them one day, all sharing will stop
> working.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > As to whether we want to put it as an explicit
> field
> > > or
> > > > a
> > > > >> > > > reserved
> > > > >> > > > > > > key, I
> > > > >> > > > > > > > > think for a common field among various
> > > > >> > > > > > > > > table formats, it makes more sense to have it as
> an
> > > > >> explicit
> > > > >> > > > field.
> > > > >> > > > > > For
> > > > >> > > > > > > > > properties that are specific to a particular table
> > > > format,
> > > > >> > > > > > > > > it is more proper to just have a reserved key.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > If Polaris takes control of the location, I think
> we
> > > > have
> > > > >> to
> > > > >> > be
> > > > >> > > > > more
> > > > >> > > > > > > > > careful
> > > > >> > > > > > > > > and at least try to make it future-proof.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > I don't think Polaris is taking control of the
> > > location,
> > > > >> the
> > > > >> > > > > location
> > > > >> > > > > > > is
> > > > >> > > > > > > > > still controlled by the engine and users today
> like
> > > > table
> > > > >> > > names.
> > > > >> > > > > > > > > Polaris is a Catalog service, it records the
> generic
> > > > table
> > > > >> > > > entity,
> > > > >> > > > > > and
> > > > >> > > > > > > > > returns the information back to the user on query.
> > > > >> > > > > > > > > It might be able to do some validation on the
> > location
> > > > >> (like
> > > > >> > > > check
> > > > >> > > > > > > > special
> > > > >> > > > > > > > > character), but it doesn't decide which location
> > > > >> > > > > > > > > the table will be used. I personally don't think
> it
> > > is a
> > > > >> bad
> > > > >> > > idea
> > > > >> > > > > to
> > > > >> > > > > > > let
> > > > >> > > > > > > > > the Catalog service also take control of
> generating
> > > > >> > > > > > > > > the table location, but I think that will require
> a
> > > lot
> > > > of
> > > > >> > > work.
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > Best Regards,
> > > > >> > > > > > > > > Yun
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > On Wed, May 7, 2025 at 5:22 PM Dmitri
> Bourlatchkov <
> > > > >> > > > > di...@apache.org
> > > > >> > > > > > >
> > > > >> > > > > > > > > wrote:
> > > > >> > > > > > > > >
> > > > >> > > > > > > > > > No worries about the name. It is a possible
> > > > alternative
> > > > >> > > > spelling
> > > > >> > > > > :)
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > On Wed, May 7, 2025 at 8:04 PM yun zou <
> > > > >> > > > > yunzou.colost...@gmail.com
> > > > >> > > > > > >
> > > > >> > > > > > > > > wrote:
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > > Hi Dmitri,
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Sorry, I accidentally typed your name wrong in
> > the
> > > > >> > previous
> > > > >> > > > > > reply!
> > > > >> > > > > > > > > > > Apologize for this!
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > For the S3 issue, I think we will need to deal
> > > with
> > > > >> those
> > > > >> > > > > > > regardless,
> > > > >> > > > > > > > > > > especially with the federation work going on,
> we
> > > > will
> > > > >> > need
> > > > >> > > to
> > > > >> > > > > > > handle
> > > > >> > > > > > > > > all
> > > > >> > > > > > > > > > > those entities eventually coming from
> different
> > > > >> Catalogs,
> > > > >> > > and
> > > > >> > > > > the
> > > > >> > > > > > > URI
> > > > >> > > > > > > > > > > format seems the standard format used by
> various
> > > > >> Catalog
> > > > >> > > > > > services.
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > Best Regards,
> > > > >> > > > > > > > > > > Yun
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > On Wed, May 7, 2025 at 4:55 PM yun zou <
> > > > >> > > > > > yunzou.colost...@gmail.com
> > > > >> > > > > > > >
> > > > >> > > > > > > > > > wrote:
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > > Hi Dimitri and Eric,
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > Thanks a lot for the feedback!
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > For the questions:
> > > > >> > > > > > > > > > > > - Is one value or many?
> > > > >> > > > > > > > > > > > It will be one value, similar to the
> location
> > in
> > > > >> > Iceberg
> > > > >> > > > and
> > > > >> > > > > > the
> > > > >> > > > > > > > > > > > storage_location in unity catalog.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > Regarding to the point about having new data
> > in
> > > > new
> > > > >> > > > locations
> > > > >> > > > > > and
> > > > >> > > > > > > > > > keeping
> > > > >> > > > > > > > > > > > old data in old locations, do we support
> that
> > > for
> > > > >> > Iceberg
> > > > >> > > > > > > > > > > > today?
> > > > >> > > > > > > > > > > > For most of the Spark tables, it seems to
> only
> > > > have
> > > > >> one
> > > > >> > > > > > location.
> > > > >> > > > > > > > > > Also, I
> > > > >> > > > > > > > > > > > think it is better to start restricted
> first,
> > > and
> > > > >> then
> > > > >> > > > extend
> > > > >> > > > > > it
> > > > >> > > > > > > to
> > > > >> > > > > > > > > > > > allow multiple locations when the use case
> > > raises.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > Ref:
> > > > >> > > > > > > > > > > > Iceberg location:
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L3451
> > > > >> > > > > > > > > > > > Storage location in Unity Catalog:
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L3451
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > - Is it a URI?
> > > > >> > > > > > > > > > > > Yes, it will be a URI, which seems the
> > standard
> > > > >> catalog
> > > > >> > > > > > > > > implementation.
> > > > >> > > > > > > > > > > > Regarding to the point about s3 v2 s3a, i
> > assume
> > > > >> that
> > > > >> > is
> > > > >> > > a
> > > > >> > > > > > common
> > > > >> > > > > > > > > > > > problem that every catalog implementation
> > needs
> > > to
> > > > >> > > address,
> > > > >> > > > > and
> > > > >> > > > > > > we
> > > > >> > > > > > > > > will
> > > > >> > > > > > > > > > > > stay the same on this part. At least from
> the
> > > load
> > > > >> > table
> > > > >> > > > > point
> > > > >> > > > > > of
> > > > >> > > > > > > > > view,
> > > > >> > > > > > > > > > > > Spark engine knows how to deal with such
> > cases.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > - Does it point to any particular file?
> > > > >> > > > > > > > > > > > No, it doesn't point to a particular file.
> It
> > is
> > > > the
> > > > >> > base
> > > > >> > > > > table
> > > > >> > > > > > > > > > location.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > - Is it a common prefix of all files within
> a
> > > > table?
> > > > >> > > > > > > > > > > > It is supposed to be the base table
> location,
> > > > which
> > > > >> > > > > > theoretically
> > > > >> > > > > > > > > > should
> > > > >> > > > > > > > > > > > be the common prefix of all files within a
> > > table I
> > > > >> > > believe.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > - What happens when a value does not match
> > these
> > > > >> > > > > expectations?
> > > > >> > > > > > > > > > > > Whether it is one value or many is
> restricted
> > by
> > > > the
> > > > >> > spec
> > > > >> > > > > > > already.
> > > > >> > > > > > > > > > > > For URI format, I think we can do a format
> > > check,
> > > > >> and
> > > > >> > > fail
> > > > >> > > > > it.
> > > > >> > > > > > > > > > > > Other than that, we will not do any other
> > > special
> > > > >> > check,
> > > > >> > > > and
> > > > >> > > > > we
> > > > >> > > > > > > > rely
> > > > >> > > > > > > > > on
> > > > >> > > > > > > > > > > > the client to put the correct value,
> > otherwise,
> > > > the
> > > > >> > other
> > > > >> > > > > > engine
> > > > >> > > > > > > > will
> > > > >> > > > > > > > > > > > not be able to successfully read the table.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > For the location keyword, as Eric has
> pointed
> > > out,
> > > > >> we
> > > > >> > can
> > > > >> > > > > > > > potentially
> > > > >> > > > > > > > > > > have
> > > > >> > > > > > > > > > > > a reserved key for the properties. However,
> > > > location
> > > > >> > is a
> > > > >> > > > > > common
> > > > >> > > > > > > > > > > > enough key among various table formats,
> which
> > > > >> worths a
> > > > >> > > > > > dedicated
> > > > >> > > > > > > > key
> > > > >> > > > > > > > > to
> > > > >> > > > > > > > > > > > help store and load the information in a
> more
> > > > >> > > > straightforward
> > > > >> > > > > > > > > > > > way.  For things that are specific to one or
> > two
> > > > >> > > formats, I
> > > > >> > > > > > think
> > > > >> > > > > > > > it
> > > > >> > > > > > > > > > > makes
> > > > >> > > > > > > > > > > > more sense to use a reserved property key.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > As a reference, in Iceberg, the CreateTable
> > > > request
> > > > >> and
> > > > >> > > > > > > > TableMetadata
> > > > >> > > > > > > > > > > does
> > > > >> > > > > > > > > > > > have an explicit location key in the spec.
> For
> > > > >> > > > > write.data.path
> > > > >> > > > > > > > > > > > and write.metadata.path, they are passed as
> > > > >> properties
> > > > >> > > > today.
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > Best Regards,
> > > > >> > > > > > > > > > > > Yun
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > > On Wed, May 7, 2025 at 3:54 PM Dmitri
> > > > Bourlatchkov <
> > > > >> > > > > > > > di...@apache.org
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > > > > > wrote:
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > > >> Another point: I'm pretty sure sooner or
> > later
> > > > >> users
> > > > >> > > will
> > > > >> > > > > want
> > > > >> > > > > > > to
> > > > >> > > > > > > > > move
> > > > >> > > > > > > > > > > >> their data to some other location. As an
> > option
> > > > >> users
> > > > >> > > may
> > > > >> > > > > want
> > > > >> > > > > > > to
> > > > >> > > > > > > > > > write
> > > > >> > > > > > > > > > > >> new
> > > > >> > > > > > > > > > > >> files into another location but keep old
> > files
> > > in
> > > > >> > place.
> > > > >> > > > > > > > > > > >>
> > > > >> > > > > > > > > > > >> Also: if the location is a URI, how do we
> > deal
> > > > >> with s3
> > > > >> > > vs.
> > > > >> > > > > s3a
> > > > >> > > > > > > for
> > > > >> > > > > > > > > > > >> example?
> > > > >> > > > > > > > > > > >>
> > > > >> > > > > > > > > > > >> In Iceberg it is quite common for different
> > > > >> engines to
> > > > >> > > use
> > > > >> > > > > > > > different
> > > > >> > > > > > > > > > > >> access
> > > > >> > > > > > > > > > > >> tools, which often leads to different URI
> > > > schemes.
> > > > >> > > > > > > > > > > >>
> > > > >> > > > > > > > > > > >> Cheers,
> > > > >> > > > > > > > > > > >> Dmitri.
> > > > >> > > > > > > > > > > >>
> > > > >> > > > > > > > > > > >> On Wed, May 7, 2025 at 6:46 PM Eric
> Maynard <
> > > > >> > > > > > > > > eric.w.mayn...@gmail.com
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > > > >> wrote:
> > > > >> > > > > > > > > > > >>
> > > > >> > > > > > > > > > > >> > All good questions Dmitri — I’m
> especially
> > > > >> > interested
> > > > >> > > in
> > > > >> > > > > the
> > > > >> > > > > > > > first
> > > > >> > > > > > > > > > one
> > > > >> > > > > > > > > > > >> as
> > > > >> > > > > > > > > > > >> > from what I understand Iceberg tables can
> > > have
> > > > >> > > metadata
> > > > >> > > > > and
> > > > >> > > > > > > data
> > > > >> > > > > > > > > at
> > > > >> > > > > > > > > > > two
> > > > >> > > > > > > > > > > >> > different paths that we need to vend
> > > > credentials
> > > > >> > for.
> > > > >> > > > > > > > > > > >> >
> > > > >> > > > > > > > > > > >> > For iceberg tables, we just use special
> > > > >> properties
> > > > >> > to
> > > > >> > > > > track
> > > > >> > > > > > > > these
> > > > >> > > > > > > > > > > >> > locations. I wonder if we couldn’t do the
> > > same
> > > > >> for
> > > > >> > > > generic
> > > > >> > > > > > > > tables.
> > > > >> > > > > > > > > > > >> >
> > > > >> > > > > > > > > > > >> > On Wed, May 7, 2025 at 3:42 PM Dmitri
> > > > >> Bourlatchkov <
> > > > >> > > > > > > > > > di...@apache.org>
> > > > >> > > > > > > > > > > >> > wrote:
> > > > >> > > > > > > > > > > >> >
> > > > >> > > > > > > > > > > >> > > Hi Yun,
> > > > >> > > > > > > > > > > >> > >
> > > > >> > > > > > > > > > > >> > > Please clarify the meaning of the value
> > of
> > > > the
> > > > >> new
> > > > >> > > > > > location
> > > > >> > > > > > > > > > > attribute.
> > > > >> > > > > > > > > > > >> > >
> > > > >> > > > > > > > > > > >> > > - Is is one value or many?
> > > > >> > > > > > > > > > > >> > > - Is it a URI?
> > > > >> > > > > > > > > > > >> > > - Does it point to any particular file?
> > > > >> > > > > > > > > > > >> > > - Is it a common prefix of all files
> > > within a
> > > > >> > table?
> > > > >> > > > > > > > > > > >> > > - What happens when a value does not
> > match
> > > > >> these
> > > > >> > > > > > > expectation?
> > > > >> > > > > > > > > > > >> > >
> > > > >> > > > > > > > > > > >> > > Thanks,
> > > > >> > > > > > > > > > > >> > > Dmitri.
> > > > >> > > > > > > > > > > >> > >
> > > > >> > > > > > > > > > > >> > > On 2025/05/07 21:50:19 yun zou wrote:
> > > > >> > > > > > > > > > > >> > > > Hi folks,
> > > > >> > > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > > >> > > > I would like to propose to add an
> > > optional
> > > > >> > > > `location`
> > > > >> > > > > > > field
> > > > >> > > > > > > > to
> > > > >> > > > > > > > > > > >> > > > CreateGenricTable Request and
> > > > >> LoadGenericTable
> > > > >> > > > > response.
> > > > >> > > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > > >> > > > The `location` is the location for
> the
> > > > table,
> > > > >> > > which
> > > > >> > > > is
> > > > >> > > > > > > > common
> > > > >> > > > > > > > > to
> > > > >> > > > > > > > > > > >> most
> > > > >> > > > > > > > > > > >> > > table
> > > > >> > > > > > > > > > > >> > > > formats including Iceberg, Delta,
> Hudi,
> > > > csv,
> > > > >> > > parquet
> > > > >> > > > > > etc.
> > > > >> > > > > > > > The
> > > > >> > > > > > > > > > > >> location
> > > > >> > > > > > > > > > > >> > > > information is critical for loading
> the
> > > > >> table at
> > > > >> > > > > engine
> > > > >> > > > > > > > side,
> > > > >> > > > > > > > > > > >> having a
> > > > >> > > > > > > > > > > >> > > > dedicated keyword could help improve
> > the
> > > > >> > > robustness
> > > > >> > > > > for
> > > > >> > > > > > > > cross
> > > > >> > > > > > > > > > > engine
> > > > >> > > > > > > > > > > >> > > > sharing, instead of relying on the
> > > > properties
> > > > >> > > passed
> > > > >> > > > > by
> > > > >> > > > > > > the
> > > > >> > > > > > > > > > client
> > > > >> > > > > > > > > > > >> > side.
> > > > >> > > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > > >> > > > Furthermore, this information is also
> > > > >> required
> > > > >> > to
> > > > >> > > > > > provide
> > > > >> > > > > > > > > > > credential
> > > > >> > > > > > > > > > > >> > > > vending capabilities later.
> > > > >> > > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > > >> > > > Here is the PR for adding the spec:
> > > > >> > > > > > > > > > > >> > > >
> > > > https://github.com/apache/polaris/pull/1543
> > > > >> > > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > > >> > > > Looking forward to your reply and
> > > feedback!
> > > > >> > > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > > >> > > > Best Regards,
> > > > >> > > > > > > > > > > >> > > > Yun
> > > > >> > > > > > > > > > > >> > > >
> > > > >> > > > > > > > > > > >> > >
> > > > >> > > > > > > > > > > >> >
> > > > >> > > > > > > > > > > >>
> > > > >> > > > > > > > > > > >
> > > > >> > > > > > > > > > >
> > > > >> > > > > > > > > >
> > > > >> > > > > > > > >
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to