Re: [DISCUSS] PyIceberg 0.6.0 release

Ryan Blue Fri, 26 Jan 2024 10:45:39 -0800

Like the Java implementation, we've been building toward a library that can
be used in distributed applications as well as directly on a single node.
For example, job planning can produce a set of file scan tasks or a scan
can be pushed to duckdb (to_duckdb) or pandas (to_pandas). The write side
is similar where we have methods that accept Arrow dataframes and write
files and an API for committing those files to a table. The write side
isn't as well developed yet (no support for partitions, for example), but
the basics are there and we would love to work with Ray and other
communities to add native Iceberg support!


On Fri, Jan 26, 2024 at 10:40 AM Pucheng Yang <[email protected]>
wrote:

> I have similar questions as Yufei's. My organization has interest in Ray
> Iceberg integration and during the conversation with the Ray team, we know
> they would also like the have Iceberg integration as well. I think this is
> a good opportunity for both projects to collaborate.
>
> On Fri, Jan 26, 2024 at 10:32 AM Sung Yun <[email protected]> wrote:
>
>> It’s so exciting to see the project take another step forward, Fokko!
>>
>> Really great job to everyone involved.
>>
>> Best,
>> Sung
>>
>> On Jan 26, 2024, at 11:48 AM, Ryan Blue <[email protected]> wrote:
>>
>> 
>> It's great to see all the progress in PyIceberg. Thanks to everyone
>> that's been contributing!
>>
>> I'm all for getting a release out as soon as possible and following up
>> with more features in the write path in 0.7.0.
>>
>> On Fri, Jan 26, 2024 at 5:22 AM Fokko Driesprong <[email protected]>
>> wrote:
>>
>>> Hey everyone,
>>>
>>> I want to discuss the 0.6.0 release that will bring a lot of
>>> functionality to the public:
>>>
>>>    - Write support for writing to unpartitioned tables
>>>       - Includes snapshot generation
>>>       - Constructing Avro writer trees
>>>    - Support writing metadata which allows to commit support for the
>>>    Hive, Sql, and Glue catalog.
>>>    - Support for name-mapping
>>>    - Easy evolution of schema using the union_by_name method
>>>    - And a lot of bug fixes and improvements
>>>
>>> The write support is still limited, for example, partitioned writes or
>>> tables with sort-orders are not supported. Also, as Ryan mentioned during
>>> the last community sync, we're doing fast appends by default, and we're
>>> unable to compact yet. I've created issues on Github
>>> <https://github.com/apache/iceberg-python/issues> to track all these
>>> limitations. However, I think it is good to get the current work out to the
>>> public so they can try it and we can uncover any impediments as soon as
>>> possible. And we can follow up with 0.7.0.
>>>
>>> Kind regards,
>>> Fokko Driesprong
>>>
>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>>

-- 
Ryan Blue
Tabular

Re: [DISCUSS] PyIceberg 0.6.0 release

Reply via email to