Hi Renjie,

Based on your feedback, I have created a PR which separates out the
different logical parts to different commits:
https://github.com/apache/iceberg/pull/12298
The following parts are separated:

   -
   
https://github.com/apache/iceberg/pull/12298/commits/1ad230f67df014b424c3547603831f5e637b96d0
   - The API Interface classes
   -
   
https://github.com/apache/iceberg/pull/12298/commits/6fa135927676fd080d8322d7d09cf2b86f54de36
   - Moving the Parquet/Avro/ORC readers and writers to implement these
   interfaces
   -
   
https://github.com/apache/iceberg/pull/12298/commits/b6ab3d059732b7c898dd2a385f0cfa8a7956e999
   - Moving the implementation of the generic readers/writers with the new
   interfaces
   -
   
https://github.com/apache/iceberg/pull/12298/commits/aba830a86f535b2d1363b350d5f8b8622b608f1a
   - Arrow reader implementation with the new interfaces
   -
   
https://github.com/apache/iceberg/pull/12298/commits/21179b8d0f7d1f8db3d9ea532d8cc776533b3fdf
   - Spark reader/writer implementation with the new interfaces
   -
   
https://github.com/apache/iceberg/pull/12298/commits/907089c15fb497879ac879ff1d9227fc684d356d
   - Flink reader/writer implementation with the new interfaces

Thanks,
Peter



Péter Váry <peter.vary.apa...@gmail.com> ezt írta (időpont: 2025. febr.
14., P, 11:30):

> Hi Renjie,
> Here is the WIP PR for the readers:
> https://github.com/apache/iceberg/pull/12069
> Here is the WIP PR for the writers:
> https://github.com/apache/iceberg/pull/12164
>
> If you want to concentrate on the proposed new API, maybe this is the best
> place to start:
> https://github.com/apache/iceberg/compare/main...pvary:iceberg:file_format_api_minimal_few_class
> Thanks,
> Peter
>
> Renjie Liu <liurenjie2...@gmail.com> ezt írta (időpont: 2025. febr. 14.,
> P, 11:15):
>
>> Hi, Peter:
>>
>> Thanks for raising this, and this proposal sounds quite interesting to me.
>>
>> I've reviewed the doc but it still seems too abstract to understand, do
>> you mind to submit a pr so that it would be more clear what's changed?
>>
>> On Wed, Feb 12, 2025 at 12:46 AM Péter Váry <peter.vary.apa...@gmail.com>
>> wrote:
>>
>>> Hi Team,
>>>
>>> As mentioned earlier on our Community Sync I am exploring the
>>> possibility to define a FileFormat API for accessing different file
>>> formats. I have put together a proposal based on my findings.
>>>
>>> -------------------
>>> Iceberg currently supports 3 different file formats: Avro, Parquet, ORC.
>>> With the introduction of Iceberg V3 specification many new features are
>>> added to Iceberg. Some of these features like new column types, default
>>> values require changes at the file format level. The changes are added by
>>> individual developers with different focus on the different file formats.
>>> As a result not all of the features are available for every supported file
>>> format.
>>> Also there are emerging file formats like Vortex [1] or Lance [2] which
>>> either by specialization, or by applying newer research results could
>>> provide better alternatives for certain use-cases like random access for
>>> data, or storing ML models.
>>> -------------------
>>>
>>> Please check the detailed proposal [3] and the google document [4], and
>>> comment there or reply on the dev list if you have any suggestions.
>>>
>>> Thanks,
>>> Peter
>>>
>>> [1] - https://github.com/spiraldb/vortex
>>> [2] - https://lancedb.github.io/lance/
>>> [3] - https://github.com/apache/iceberg/issues/12225
>>> [4] -
>>> https://docs.google.com/document/d/1sF_d4tFxJsZWsZFCyCL9ZE7YuI7-P3VrzMLIrrTIxds
>>>
>>>

Reply via email to