Even for Iceberg, as I noted, we track multiple locations and vend
credentials scoped to those multiple locations.

On Wed, May 7, 2025 at 5:22 PM Dmitri Bourlatchkov <di...@apache.org> wrote:

> No worries about the name. It is a possible alternative spelling :)
>
> On Wed, May 7, 2025 at 8:04 PM yun zou <yunzou.colost...@gmail.com> wrote:
>
> > Hi Dmitri,
> >
> > Sorry, I accidentally typed your name wrong in the previous reply!
> > Apologize for this!
> >
> > For the S3 issue, I think we will need to deal with those regardless,
> > especially with the federation work going on, we will need to handle all
> > those entities eventually coming from different Catalogs, and the URI
> > format seems the standard format used by various Catalog services.
> >
> > Best Regards,
> > Yun
> >
> > On Wed, May 7, 2025 at 4:55 PM yun zou <yunzou.colost...@gmail.com>
> wrote:
> >
> > > Hi Dimitri and Eric,
> > >
> > > Thanks a lot for the feedback!
> > >
> > > For the questions:
> > > - Is one value or many?
> > > It will be one value, similar to the location in Iceberg and the
> > > storage_location in unity catalog.
> > >
> > > Regarding to the point about having new data in new locations and
> keeping
> > > old data in old locations, do we support that for Iceberg
> > > today?
> > > For most of the Spark tables, it seems to only have one location.
> Also, I
> > > think it is better to start restricted first, and then extend it to
> > > allow multiple locations when the use case raises.
> > >
> > > Ref:
> > > Iceberg location:
> > >
> >
> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L3451
> > > Storage location in Unity Catalog:
> > >
> >
> https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L3451
> > >
> > > - Is it a URI?
> > > Yes, it will be a URI, which seems the standard catalog implementation.
> > > Regarding to the point about s3 v2 s3a, i assume that is a common
> > > problem that every catalog implementation needs to address, and we will
> > > stay the same on this part. At least from the load table point of view,
> > > Spark engine knows how to deal with such cases.
> > >
> > > - Does it point to any particular file?
> > > No, it doesn't point to a particular file. It is the base table
> location.
> > >
> > > - Is it a common prefix of all files within a table?
> > > It is supposed to be the base table location, which theoretically
> should
> > > be the common prefix of all files within a table I believe.
> > >
> > > - What happens when a value does not match these expectations?
> > > Whether it is one value or many is restricted by the spec already.
> > > For URI format, I think we can do a format check, and fail it.
> > > Other than that, we will not do any other special check, and we rely on
> > > the client to put the correct value, otherwise, the other engine will
> > > not be able to successfully read the table.
> > >
> > > For the location keyword, as Eric has pointed out, we can potentially
> > have
> > > a reserved key for the properties. However, location is a common
> > > enough key among various table formats, which worths a dedicated key to
> > > help store and load the information in a more straightforward
> > > way.  For things that are specific to one or two formats, I think it
> > makes
> > > more sense to use a reserved property key.
> > >
> > > As a reference, in Iceberg, the CreateTable request and TableMetadata
> > does
> > > have an explicit location key in the spec. For write.data.path
> > > and write.metadata.path, they are passed as properties today.
> > >
> > > Best Regards,
> > > Yun
> > >
> > >
> > > On Wed, May 7, 2025 at 3:54 PM Dmitri Bourlatchkov <di...@apache.org>
> > > wrote:
> > >
> > >> Another point: I'm pretty sure sooner or later users will want to move
> > >> their data to some other location. As an option users may want to
> write
> > >> new
> > >> files into another location but keep old files in place.
> > >>
> > >> Also: if the location is a URI, how do we deal with s3 vs. s3a for
> > >> example?
> > >>
> > >> In Iceberg it is quite common for different engines to use different
> > >> access
> > >> tools, which often leads to different URI schemes.
> > >>
> > >> Cheers,
> > >> Dmitri.
> > >>
> > >> On Wed, May 7, 2025 at 6:46 PM Eric Maynard <eric.w.mayn...@gmail.com
> >
> > >> wrote:
> > >>
> > >> > All good questions Dmitri — I’m especially interested in the first
> one
> > >> as
> > >> > from what I understand Iceberg tables can have metadata and data at
> > two
> > >> > different paths that we need to vend credentials for.
> > >> >
> > >> > For iceberg tables, we just use special properties to track these
> > >> > locations. I wonder if we couldn’t do the same for generic tables.
> > >> >
> > >> > On Wed, May 7, 2025 at 3:42 PM Dmitri Bourlatchkov <
> di...@apache.org>
> > >> > wrote:
> > >> >
> > >> > > Hi Yun,
> > >> > >
> > >> > > Please clarify the meaning of the value of the new location
> > attribute.
> > >> > >
> > >> > > - Is is one value or many?
> > >> > > - Is it a URI?
> > >> > > - Does it point to any particular file?
> > >> > > - Is it a common prefix of all files within a table?
> > >> > > - What happens when a value does not match these expectation?
> > >> > >
> > >> > > Thanks,
> > >> > > Dmitri.
> > >> > >
> > >> > > On 2025/05/07 21:50:19 yun zou wrote:
> > >> > > > Hi folks,
> > >> > > >
> > >> > > > I would like to propose to add an optional `location` field to
> > >> > > > CreateGenricTable Request and LoadGenericTable response.
> > >> > > >
> > >> > > > The `location` is the location for the table, which is common to
> > >> most
> > >> > > table
> > >> > > > formats including Iceberg, Delta, Hudi, csv, parquet etc. The
> > >> location
> > >> > > > information is critical for loading the table at engine side,
> > >> having a
> > >> > > > dedicated keyword could help improve the robustness for cross
> > engine
> > >> > > > sharing, instead of relying on the properties passed by the
> client
> > >> > side.
> > >> > > >
> > >> > > > Furthermore, this information is also required to provide
> > credential
> > >> > > > vending capabilities later.
> > >> > > >
> > >> > > > Here is the PR for adding the spec:
> > >> > > > https://github.com/apache/polaris/pull/1543
> > >> > > >
> > >> > > > Looking forward to your reply and feedback!
> > >> > > >
> > >> > > > Best Regards,
> > >> > > > Yun
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Reply via email to