Indeed, Manu, you're right. However, integrating support for v2 format
based on this should be quite simple.

Yufei


On Wed, Feb 28, 2024 at 1:18 AM Manu Zhang <owenzhang1...@gmail.com> wrote:

> Hi Yufei,
>
> If I'm not mistaken, https://github.com/apache/iceberg/pull/4705 doesn't
> support delete files or format v2, does it?
>
> Manu
>
> On Fri, Feb 23, 2024 at 12:41 AM Yufei Gu <flyrain...@gmail.com> wrote:
>
>> We took a different approach by modifying the metadata. It is a bit heavy
>> compared to the relative path and s3 access point, but it can be used for
>> any types of storage and any locations. I shared it here,
>> https://github.com/apache/iceberg/pull/4705.
>>
>> Yufei
>>
>>
>> On Tue, Feb 20, 2024 at 6:25 PM Manu Zhang <owenzhang1...@gmail.com>
>> wrote:
>>
>>> Hi Jack,
>>>
>>> Thanks for sharing this idea.
>>>
>>> Our typical usage of "relative path" is distcp between two HDFS clusters
>>> for disaster recovery. It looks to me that by extending this feature, we
>>> should always take the authority and scheme from HDFS configurations in
>>> that cluster for any path.
>>> The downside is there could be confusion when we read files directly.
>>> I'm not sure about other side effects and how much effort it will take to
>>> implement. It would be best verified with a PoC.
>>>
>>> Regards,
>>> Manu
>>>
>>> On Wed, Feb 21, 2024 at 4:26 AM Jack Ye <yezhao...@gmail.com> wrote:
>>>
>>>> Just to put another alternative solution on the table. In S3FileIO, we
>>>> implemented the support for S3 access point and bucket alias, which
>>>> actually accidentally enabled "relative path" if you are just switching
>>>> bucket name.
>>>>
>>>> At read time, you can supply a catalog property
>>>> "s3.access-points.<bucket-name>=<bucket-alias-name>" indicating data in
>>>> <bucket-name> should be read using <bucket-alias-name> which comes from an
>>>> access point. However, bucket alias name is basically the same as bucket
>>>> name, so there is nothing preventing me to say something like
>>>> "s3.access-points.my-bucket-us-east-1=my-bucket-us-west-2".
>>>>
>>>> If I configure that, then any file path like
>>>> "s3://my-bucket-us-east-1/some/path" will be converted to
>>>> "s3://my-bucket-us-west-2/some/path" during read, achieving technically the
>>>> same effect without the need to change the Iceberg spec.
>>>>
>>>> Is it possible to extend this feature, so instead of supporting
>>>> relative path, we can support some form of replacing absolute path, so the
>>>> Iceberg metadata tree is still self-complete without the need to reference
>>>> external information like a prefix in a catalog?
>>>>
>>>> For example, user can provide a map saying that any path with prefix
>>>> "my-bucket-us-east-1/table1" should now be read through
>>>> "my-bucket-us-west-2/table1-backup". And we already have built-in
>>>> integration for catalog to set customized catalog properties per table. For
>>>> example, this is achieved in REST through the config field in
>>>> LoadTableResponse, which is used to vend S3 access credentials today. There
>>>> were also thoughts about allowing similar features in Glue to provide these
>>>> configs through Glue table parameters, as an implementation for non-REST
>>>> catalogs. We just did not add that feature because Glue already supports S3
>>>> access credentials vending through LakeFormation.
>>>>
>>>> Has this option been considered? I quickly scanned through the linked
>>>> doc, it seems to be not discussed, but I might have missed it.
>>>>
>>>> Best,
>>>> Jack Ye
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Feb 20, 2024 at 9:21 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>> wrote:
>>>>
>>>>> Hi Ryan
>>>>>
>>>>> Ah ok, I thought that an Iceberg release is "based"/implement a spec
>>>>> (I assumed the opposite is wrong).
>>>>>
>>>>> Thanks for the explanation!
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>> On Tue, Feb 20, 2024 at 6:04 PM Ryan Blue <b...@tabular.io> wrote:
>>>>> >
>>>>> > JB,
>>>>> >
>>>>> > The spec and the reference implementation are released separately so
>>>>> v3 and 2.0 are independent. There's no requirement that v3 is completed 
>>>>> for
>>>>> Iceberg Java 2.0 and the goal of a 2.0 is to have an opportunity to
>>>>> deprecate and remove things so that we don't continue to carry forward and
>>>>> maintain older interfaces.
>>>>> >
>>>>> > Ryan
>>>>> >
>>>>> > On Tue, Feb 20, 2024 at 1:58 AM Jean-Baptiste Onofré <
>>>>> j...@nanthrax.net> wrote:
>>>>> >>
>>>>> >> Hi Manu
>>>>> >>
>>>>> >> Thanks for the reminder. It sounds like a good feature and worth
>>>>> >> discussing it :).
>>>>> >>
>>>>> >> It was my intention to define what we plan to include (or not) in
>>>>> Spec
>>>>> >> v3 / Iceberg 2.0.0 (I sent a message about that last week).
>>>>> >>
>>>>> >> Regards
>>>>> >> JB
>>>>> >>
>>>>> >> On Tue, Feb 20, 2024 at 10:36 AM Manu Zhang <
>>>>> owenzhang1...@gmail.com> wrote:
>>>>> >> >
>>>>> >> > Do we still want to move forward with this feature? It's on the
>>>>> roadmap for Spec V3 but it hasn't appeared in our discussion for a while.
>>>>> >> >
>>>>> >> > Manu
>>>>> >> >
>>>>> >> > On Sat, Aug 26, 2023 at 2:43 AM Mohit Garg <mohitga...@gmail.com>
>>>>> wrote:
>>>>> >> >>
>>>>> >> >> hi
>>>>> >> >>
>>>>> >> >> Please review the approach captured here Iceberg Table
>>>>> Portability This is a continuation from the previous effort here - Support
>>>>> relative paths and multiple root locations.
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >>
>>>>> >> >> kind regards
>>>>> >> >> Mohit
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Ryan Blue
>>>>> > Tabular
>>>>>
>>>>

Reply via email to