Indeed, Manu, you're right. However, integrating support for v2 format based on this should be quite simple.
Yufei On Wed, Feb 28, 2024 at 1:18 AM Manu Zhang <owenzhang1...@gmail.com> wrote: > Hi Yufei, > > If I'm not mistaken, https://github.com/apache/iceberg/pull/4705 doesn't > support delete files or format v2, does it? > > Manu > > On Fri, Feb 23, 2024 at 12:41 AM Yufei Gu <flyrain...@gmail.com> wrote: > >> We took a different approach by modifying the metadata. It is a bit heavy >> compared to the relative path and s3 access point, but it can be used for >> any types of storage and any locations. I shared it here, >> https://github.com/apache/iceberg/pull/4705. >> >> Yufei >> >> >> On Tue, Feb 20, 2024 at 6:25 PM Manu Zhang <owenzhang1...@gmail.com> >> wrote: >> >>> Hi Jack, >>> >>> Thanks for sharing this idea. >>> >>> Our typical usage of "relative path" is distcp between two HDFS clusters >>> for disaster recovery. It looks to me that by extending this feature, we >>> should always take the authority and scheme from HDFS configurations in >>> that cluster for any path. >>> The downside is there could be confusion when we read files directly. >>> I'm not sure about other side effects and how much effort it will take to >>> implement. It would be best verified with a PoC. >>> >>> Regards, >>> Manu >>> >>> On Wed, Feb 21, 2024 at 4:26 AM Jack Ye <yezhao...@gmail.com> wrote: >>> >>>> Just to put another alternative solution on the table. In S3FileIO, we >>>> implemented the support for S3 access point and bucket alias, which >>>> actually accidentally enabled "relative path" if you are just switching >>>> bucket name. >>>> >>>> At read time, you can supply a catalog property >>>> "s3.access-points.<bucket-name>=<bucket-alias-name>" indicating data in >>>> <bucket-name> should be read using <bucket-alias-name> which comes from an >>>> access point. However, bucket alias name is basically the same as bucket >>>> name, so there is nothing preventing me to say something like >>>> "s3.access-points.my-bucket-us-east-1=my-bucket-us-west-2". >>>> >>>> If I configure that, then any file path like >>>> "s3://my-bucket-us-east-1/some/path" will be converted to >>>> "s3://my-bucket-us-west-2/some/path" during read, achieving technically the >>>> same effect without the need to change the Iceberg spec. >>>> >>>> Is it possible to extend this feature, so instead of supporting >>>> relative path, we can support some form of replacing absolute path, so the >>>> Iceberg metadata tree is still self-complete without the need to reference >>>> external information like a prefix in a catalog? >>>> >>>> For example, user can provide a map saying that any path with prefix >>>> "my-bucket-us-east-1/table1" should now be read through >>>> "my-bucket-us-west-2/table1-backup". And we already have built-in >>>> integration for catalog to set customized catalog properties per table. For >>>> example, this is achieved in REST through the config field in >>>> LoadTableResponse, which is used to vend S3 access credentials today. There >>>> were also thoughts about allowing similar features in Glue to provide these >>>> configs through Glue table parameters, as an implementation for non-REST >>>> catalogs. We just did not add that feature because Glue already supports S3 >>>> access credentials vending through LakeFormation. >>>> >>>> Has this option been considered? I quickly scanned through the linked >>>> doc, it seems to be not discussed, but I might have missed it. >>>> >>>> Best, >>>> Jack Ye >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Feb 20, 2024 at 9:21 AM Jean-Baptiste Onofré <j...@nanthrax.net> >>>> wrote: >>>> >>>>> Hi Ryan >>>>> >>>>> Ah ok, I thought that an Iceberg release is "based"/implement a spec >>>>> (I assumed the opposite is wrong). >>>>> >>>>> Thanks for the explanation! >>>>> >>>>> Regards >>>>> JB >>>>> >>>>> On Tue, Feb 20, 2024 at 6:04 PM Ryan Blue <b...@tabular.io> wrote: >>>>> > >>>>> > JB, >>>>> > >>>>> > The spec and the reference implementation are released separately so >>>>> v3 and 2.0 are independent. There's no requirement that v3 is completed >>>>> for >>>>> Iceberg Java 2.0 and the goal of a 2.0 is to have an opportunity to >>>>> deprecate and remove things so that we don't continue to carry forward and >>>>> maintain older interfaces. >>>>> > >>>>> > Ryan >>>>> > >>>>> > On Tue, Feb 20, 2024 at 1:58 AM Jean-Baptiste Onofré < >>>>> j...@nanthrax.net> wrote: >>>>> >> >>>>> >> Hi Manu >>>>> >> >>>>> >> Thanks for the reminder. It sounds like a good feature and worth >>>>> >> discussing it :). >>>>> >> >>>>> >> It was my intention to define what we plan to include (or not) in >>>>> Spec >>>>> >> v3 / Iceberg 2.0.0 (I sent a message about that last week). >>>>> >> >>>>> >> Regards >>>>> >> JB >>>>> >> >>>>> >> On Tue, Feb 20, 2024 at 10:36 AM Manu Zhang < >>>>> owenzhang1...@gmail.com> wrote: >>>>> >> > >>>>> >> > Do we still want to move forward with this feature? It's on the >>>>> roadmap for Spec V3 but it hasn't appeared in our discussion for a while. >>>>> >> > >>>>> >> > Manu >>>>> >> > >>>>> >> > On Sat, Aug 26, 2023 at 2:43 AM Mohit Garg <mohitga...@gmail.com> >>>>> wrote: >>>>> >> >> >>>>> >> >> hi >>>>> >> >> >>>>> >> >> Please review the approach captured here Iceberg Table >>>>> Portability This is a continuation from the previous effort here - Support >>>>> relative paths and multiple root locations. >>>>> >> >> >>>>> >> >> -- >>>>> >> >> >>>>> >> >> kind regards >>>>> >> >> Mohit >>>>> > >>>>> > >>>>> > >>>>> > -- >>>>> > Ryan Blue >>>>> > Tabular >>>>> >>>>