Hi Yufei,

If I'm not mistaken, https://github.com/apache/iceberg/pull/4705 doesn't
support delete files or format v2, does it?

Manu

On Fri, Feb 23, 2024 at 12:41 AM Yufei Gu <flyrain...@gmail.com> wrote:

> We took a different approach by modifying the metadata. It is a bit heavy
> compared to the relative path and s3 access point, but it can be used for
> any types of storage and any locations. I shared it here,
> https://github.com/apache/iceberg/pull/4705.
>
> Yufei
>
>
> On Tue, Feb 20, 2024 at 6:25 PM Manu Zhang <owenzhang1...@gmail.com>
> wrote:
>
>> Hi Jack,
>>
>> Thanks for sharing this idea.
>>
>> Our typical usage of "relative path" is distcp between two HDFS clusters
>> for disaster recovery. It looks to me that by extending this feature, we
>> should always take the authority and scheme from HDFS configurations in
>> that cluster for any path.
>> The downside is there could be confusion when we read files directly. I'm
>> not sure about other side effects and how much effort it will take to
>> implement. It would be best verified with a PoC.
>>
>> Regards,
>> Manu
>>
>> On Wed, Feb 21, 2024 at 4:26 AM Jack Ye <yezhao...@gmail.com> wrote:
>>
>>> Just to put another alternative solution on the table. In S3FileIO, we
>>> implemented the support for S3 access point and bucket alias, which
>>> actually accidentally enabled "relative path" if you are just switching
>>> bucket name.
>>>
>>> At read time, you can supply a catalog property
>>> "s3.access-points.<bucket-name>=<bucket-alias-name>" indicating data in
>>> <bucket-name> should be read using <bucket-alias-name> which comes from an
>>> access point. However, bucket alias name is basically the same as bucket
>>> name, so there is nothing preventing me to say something like
>>> "s3.access-points.my-bucket-us-east-1=my-bucket-us-west-2".
>>>
>>> If I configure that, then any file path like
>>> "s3://my-bucket-us-east-1/some/path" will be converted to
>>> "s3://my-bucket-us-west-2/some/path" during read, achieving technically the
>>> same effect without the need to change the Iceberg spec.
>>>
>>> Is it possible to extend this feature, so instead of supporting relative
>>> path, we can support some form of replacing absolute path, so the Iceberg
>>> metadata tree is still self-complete without the need to reference external
>>> information like a prefix in a catalog?
>>>
>>> For example, user can provide a map saying that any path with prefix
>>> "my-bucket-us-east-1/table1" should now be read through
>>> "my-bucket-us-west-2/table1-backup". And we already have built-in
>>> integration for catalog to set customized catalog properties per table. For
>>> example, this is achieved in REST through the config field in
>>> LoadTableResponse, which is used to vend S3 access credentials today. There
>>> were also thoughts about allowing similar features in Glue to provide these
>>> configs through Glue table parameters, as an implementation for non-REST
>>> catalogs. We just did not add that feature because Glue already supports S3
>>> access credentials vending through LakeFormation.
>>>
>>> Has this option been considered? I quickly scanned through the linked
>>> doc, it seems to be not discussed, but I might have missed it.
>>>
>>> Best,
>>> Jack Ye
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Feb 20, 2024 at 9:21 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>> wrote:
>>>
>>>> Hi Ryan
>>>>
>>>> Ah ok, I thought that an Iceberg release is "based"/implement a spec
>>>> (I assumed the opposite is wrong).
>>>>
>>>> Thanks for the explanation!
>>>>
>>>> Regards
>>>> JB
>>>>
>>>> On Tue, Feb 20, 2024 at 6:04 PM Ryan Blue <b...@tabular.io> wrote:
>>>> >
>>>> > JB,
>>>> >
>>>> > The spec and the reference implementation are released separately so
>>>> v3 and 2.0 are independent. There's no requirement that v3 is completed for
>>>> Iceberg Java 2.0 and the goal of a 2.0 is to have an opportunity to
>>>> deprecate and remove things so that we don't continue to carry forward and
>>>> maintain older interfaces.
>>>> >
>>>> > Ryan
>>>> >
>>>> > On Tue, Feb 20, 2024 at 1:58 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>>>> wrote:
>>>> >>
>>>> >> Hi Manu
>>>> >>
>>>> >> Thanks for the reminder. It sounds like a good feature and worth
>>>> >> discussing it :).
>>>> >>
>>>> >> It was my intention to define what we plan to include (or not) in
>>>> Spec
>>>> >> v3 / Iceberg 2.0.0 (I sent a message about that last week).
>>>> >>
>>>> >> Regards
>>>> >> JB
>>>> >>
>>>> >> On Tue, Feb 20, 2024 at 10:36 AM Manu Zhang <owenzhang1...@gmail.com>
>>>> wrote:
>>>> >> >
>>>> >> > Do we still want to move forward with this feature? It's on the
>>>> roadmap for Spec V3 but it hasn't appeared in our discussion for a while.
>>>> >> >
>>>> >> > Manu
>>>> >> >
>>>> >> > On Sat, Aug 26, 2023 at 2:43 AM Mohit Garg <mohitga...@gmail.com>
>>>> wrote:
>>>> >> >>
>>>> >> >> hi
>>>> >> >>
>>>> >> >> Please review the approach captured here Iceberg Table
>>>> Portability This is a continuation from the previous effort here - Support
>>>> relative paths and multiple root locations.
>>>> >> >>
>>>> >> >> --
>>>> >> >>
>>>> >> >> kind regards
>>>> >> >> Mohit
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Ryan Blue
>>>> > Tabular
>>>>
>>>

Reply via email to