Hi Yufei, If I'm not mistaken, https://github.com/apache/iceberg/pull/4705 doesn't support delete files or format v2, does it?
Manu On Fri, Feb 23, 2024 at 12:41 AM Yufei Gu <flyrain...@gmail.com> wrote: > We took a different approach by modifying the metadata. It is a bit heavy > compared to the relative path and s3 access point, but it can be used for > any types of storage and any locations. I shared it here, > https://github.com/apache/iceberg/pull/4705. > > Yufei > > > On Tue, Feb 20, 2024 at 6:25 PM Manu Zhang <owenzhang1...@gmail.com> > wrote: > >> Hi Jack, >> >> Thanks for sharing this idea. >> >> Our typical usage of "relative path" is distcp between two HDFS clusters >> for disaster recovery. It looks to me that by extending this feature, we >> should always take the authority and scheme from HDFS configurations in >> that cluster for any path. >> The downside is there could be confusion when we read files directly. I'm >> not sure about other side effects and how much effort it will take to >> implement. It would be best verified with a PoC. >> >> Regards, >> Manu >> >> On Wed, Feb 21, 2024 at 4:26 AM Jack Ye <yezhao...@gmail.com> wrote: >> >>> Just to put another alternative solution on the table. In S3FileIO, we >>> implemented the support for S3 access point and bucket alias, which >>> actually accidentally enabled "relative path" if you are just switching >>> bucket name. >>> >>> At read time, you can supply a catalog property >>> "s3.access-points.<bucket-name>=<bucket-alias-name>" indicating data in >>> <bucket-name> should be read using <bucket-alias-name> which comes from an >>> access point. However, bucket alias name is basically the same as bucket >>> name, so there is nothing preventing me to say something like >>> "s3.access-points.my-bucket-us-east-1=my-bucket-us-west-2". >>> >>> If I configure that, then any file path like >>> "s3://my-bucket-us-east-1/some/path" will be converted to >>> "s3://my-bucket-us-west-2/some/path" during read, achieving technically the >>> same effect without the need to change the Iceberg spec. >>> >>> Is it possible to extend this feature, so instead of supporting relative >>> path, we can support some form of replacing absolute path, so the Iceberg >>> metadata tree is still self-complete without the need to reference external >>> information like a prefix in a catalog? >>> >>> For example, user can provide a map saying that any path with prefix >>> "my-bucket-us-east-1/table1" should now be read through >>> "my-bucket-us-west-2/table1-backup". And we already have built-in >>> integration for catalog to set customized catalog properties per table. For >>> example, this is achieved in REST through the config field in >>> LoadTableResponse, which is used to vend S3 access credentials today. There >>> were also thoughts about allowing similar features in Glue to provide these >>> configs through Glue table parameters, as an implementation for non-REST >>> catalogs. We just did not add that feature because Glue already supports S3 >>> access credentials vending through LakeFormation. >>> >>> Has this option been considered? I quickly scanned through the linked >>> doc, it seems to be not discussed, but I might have missed it. >>> >>> Best, >>> Jack Ye >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Feb 20, 2024 at 9:21 AM Jean-Baptiste Onofré <j...@nanthrax.net> >>> wrote: >>> >>>> Hi Ryan >>>> >>>> Ah ok, I thought that an Iceberg release is "based"/implement a spec >>>> (I assumed the opposite is wrong). >>>> >>>> Thanks for the explanation! >>>> >>>> Regards >>>> JB >>>> >>>> On Tue, Feb 20, 2024 at 6:04 PM Ryan Blue <b...@tabular.io> wrote: >>>> > >>>> > JB, >>>> > >>>> > The spec and the reference implementation are released separately so >>>> v3 and 2.0 are independent. There's no requirement that v3 is completed for >>>> Iceberg Java 2.0 and the goal of a 2.0 is to have an opportunity to >>>> deprecate and remove things so that we don't continue to carry forward and >>>> maintain older interfaces. >>>> > >>>> > Ryan >>>> > >>>> > On Tue, Feb 20, 2024 at 1:58 AM Jean-Baptiste Onofré <j...@nanthrax.net> >>>> wrote: >>>> >> >>>> >> Hi Manu >>>> >> >>>> >> Thanks for the reminder. It sounds like a good feature and worth >>>> >> discussing it :). >>>> >> >>>> >> It was my intention to define what we plan to include (or not) in >>>> Spec >>>> >> v3 / Iceberg 2.0.0 (I sent a message about that last week). >>>> >> >>>> >> Regards >>>> >> JB >>>> >> >>>> >> On Tue, Feb 20, 2024 at 10:36 AM Manu Zhang <owenzhang1...@gmail.com> >>>> wrote: >>>> >> > >>>> >> > Do we still want to move forward with this feature? It's on the >>>> roadmap for Spec V3 but it hasn't appeared in our discussion for a while. >>>> >> > >>>> >> > Manu >>>> >> > >>>> >> > On Sat, Aug 26, 2023 at 2:43 AM Mohit Garg <mohitga...@gmail.com> >>>> wrote: >>>> >> >> >>>> >> >> hi >>>> >> >> >>>> >> >> Please review the approach captured here Iceberg Table >>>> Portability This is a continuation from the previous effort here - Support >>>> relative paths and multiple root locations. >>>> >> >> >>>> >> >> -- >>>> >> >> >>>> >> >> kind regards >>>> >> >> Mohit >>>> > >>>> > >>>> > >>>> > -- >>>> > Ryan Blue >>>> > Tabular >>>> >>>