Like the Java implementation, we've been building toward a library that can
be used in distributed applications as well as directly on a single node.
For example, job planning can produce a set of file scan tasks or a scan
can be pushed to duckdb (to_duckdb) or pandas (to_pandas). The write side
is similar where we have methods that accept Arrow dataframes and write
files and an API for committing those files to a table. The write side
isn't as well developed yet (no support for partitions, for example), but
the basics are there and we would love to work with Ray and other
communities to add native Iceberg support!

On Fri, Jan 26, 2024 at 10:40 AM Pucheng Yang <py...@pinterest.com.invalid>
wrote:

> I have similar questions as Yufei's. My organization has interest in Ray
> Iceberg integration and during the conversation with the Ray team, we know
> they would also like the have Iceberg integration as well. I think this is
> a good opportunity for both projects to collaborate.
>
> On Fri, Jan 26, 2024 at 10:32 AM Sung Yun <sy...@cornell.edu> wrote:
>
>> It’s so exciting to see the project take another step forward, Fokko!
>>
>> Really great job to everyone involved.
>>
>> Best,
>> Sung
>>
>> On Jan 26, 2024, at 11:48 AM, Ryan Blue <b...@tabular.io> wrote:
>>
>> 
>> It's great to see all the progress in PyIceberg. Thanks to everyone
>> that's been contributing!
>>
>> I'm all for getting a release out as soon as possible and following up
>> with more features in the write path in 0.7.0.
>>
>> On Fri, Jan 26, 2024 at 5:22 AM Fokko Driesprong <fo...@apache.org>
>> wrote:
>>
>>> Hey everyone,
>>>
>>> I want to discuss the 0.6.0 release that will bring a lot of
>>> functionality to the public:
>>>
>>>    - Write support for writing to unpartitioned tables
>>>       - Includes snapshot generation
>>>       - Constructing Avro writer trees
>>>    - Support writing metadata which allows to commit support for the
>>>    Hive, Sql, and Glue catalog.
>>>    - Support for name-mapping
>>>    - Easy evolution of schema using the union_by_name method
>>>    - And a lot of bug fixes and improvements
>>>
>>> The write support is still limited, for example, partitioned writes or
>>> tables with sort-orders are not supported. Also, as Ryan mentioned during
>>> the last community sync, we're doing fast appends by default, and we're
>>> unable to compact yet. I've created issues on Github
>>> <https://github.com/apache/iceberg-python/issues> to track all these
>>> limitations. However, I think it is good to get the current work out to the
>>> public so they can try it and we can uncover any impediments as soon as
>>> possible. And we can follow up with 0.7.0.
>>>
>>> Kind regards,
>>> Fokko Driesprong
>>>
>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>>

-- 
Ryan Blue
Tabular

Reply via email to