Re: [DISCUSS] PyIceberg 0.6.0 release

Fokko Driesprong Mon, 29 Jan 2024 23:45:22 -0800

Hey everyone,

Since #305 <https://github.com/apache/iceberg-python/pull/305> has been
merged, I think we're good for the release. Thank you Sung for the PR and
Honah for the great review! I think it would be nice to get #311
<https://github.com/apache/iceberg-python/pull/311> to get people started
with the write API. Let me know if anything is missing.


I'm happy to run the release, but always open to anyone else to run the
release <https://py.iceberg.apache.org/how-to-release/>.

Today at 1700 UTC we have the monthly PyIceberg sync. Feel free to join if
you're interested in contributing or if you have any questions. You can
attend by joining the Google group
<https://groups.google.com/search?q=iceberg-python-sync>, or by following
the link to the Google Calendar directly
<https://calendar.google.com/calendar/event?action=TEMPLATE&tmeid=MG5oZnYxa2NhZjdvaHE5a2ZlMHJ0aG91OTZfMjAyNDAxMzBUMTcwMDAwWiBmb2trb0Bkcmllc3Byb25nZW4ubmw&tmsrc=fokko%40driesprongen.nl&scp=ALL>
.

Kind regards,
Fokko

Op zo 28 jan 2024 om 23:23 schreef Honah J. <hon...@apache.org>:

> Really excited for the upcoming 0.6.0 release and its new features! Big
> thanks to everyone for their hard work.
>
> I'm looking forward to the community feedback and future enhancements.
>
> Best regards,
> Honah
>
> On Fri, Jan 26, 2024 at 1:56 PM Daniel Weeks <dwe...@apache.org> wrote:
>
>> I'm also strongly in favor of getting this release out even with the
>> limitations as it's still a huge step forward and we can build
>> incrementally on the write support.
>>
>> Incredible work everyone, I'm really excited about the progress here.
>>
>> -Dan
>>
>> On Fri, Jan 26, 2024 at 11:16 AM Fokko Driesprong <fo...@apache.org>
>> wrote:
>>
>>> Thanks everyone for the responses and great to see everyone is as
>>> excited as I am :D
>>>
>>> I have some good news. The guys from Eventual have been working on
>>> integrating PyIceberg into their Daft dataframe
>>> <https://www.getdaft.io/projects/docs/en/latest/user_guide/integrations/data_catalogs.html#apache-iceberg>.
>>> They are integrating on the scan-tasks level where they leverage their own
>>> Parquet reader to read in a distributed fashion. Feel free to join the
>>> #daft channel on the Iceberg Slack
>>> <https://iceberg.apache.org/community/#slack> if you're interested in
>>> this. We're in the process of making sure that all the Iceberg features
>>> work well (schema and partition evolution, projection, etc). The query
>>> planning is done in PyIceberg in a single process (we do use
>>> multi-threading), we're doing some profiling on the PyIceberg code to
>>> identify bottlenecks to scale to at least 1M+ partitions.
>>>
>>> Similar to the read-path, for writing, we're designing the API in such a
>>> way that this also can be distributed.
>>>
>>> As I mentioned, I created issues
>>> <https://github.com/apache/iceberg-python/issues> around the gaps.
>>> There is a good discussion going on around the partitioned writes
>>> <https://github.com/apache/iceberg-python/issues/208>, and writing
>>> using a sort order <https://github.com/apache/iceberg-python/issues/271>
>>> is still up for grabs.
>>>
>>> Kind regards,
>>> Fokko
>>>
>>> Op vr 26 jan 2024 om 19:45 schreef Ryan Blue <b...@tabular.io>:
>>>
>>>> Like the Java implementation, we've been building toward a library that
>>>> can be used in distributed applications as well as directly on a single
>>>> node. For example, job planning can produce a set of file scan tasks or a
>>>> scan can be pushed to duckdb (to_duckdb) or pandas (to_pandas). The write
>>>> side is similar where we have methods that accept Arrow dataframes and
>>>> write files and an API for committing those files to a table. The write
>>>> side isn't as well developed yet (no support for partitions, for example),
>>>> but the basics are there and we would love to work with Ray and other
>>>> communities to add native Iceberg support!
>>>>
>>>> On Fri, Jan 26, 2024 at 10:40 AM Pucheng Yang
>>>> <py...@pinterest.com.invalid> wrote:
>>>>
>>>>> I have similar questions as Yufei's. My organization has interest in
>>>>> Ray Iceberg integration and during the conversation with the Ray team, we
>>>>> know they would also like the have Iceberg integration as well. I think
>>>>> this is a good opportunity for both projects to collaborate.
>>>>>
>>>>> On Fri, Jan 26, 2024 at 10:32 AM Sung Yun <sy...@cornell.edu> wrote:
>>>>>
>>>>>> It’s so exciting to see the project take another step forward, Fokko!
>>>>>>
>>>>>> Really great job to everyone involved.
>>>>>>
>>>>>> Best,
>>>>>> Sung
>>>>>>
>>>>>> On Jan 26, 2024, at 11:48 AM, Ryan Blue <b...@tabular.io> wrote:
>>>>>>
>>>>>> 
>>>>>> It's great to see all the progress in PyIceberg. Thanks to everyone
>>>>>> that's been contributing!
>>>>>>
>>>>>> I'm all for getting a release out as soon as possible and following
>>>>>> up with more features in the write path in 0.7.0.
>>>>>>
>>>>>> On Fri, Jan 26, 2024 at 5:22 AM Fokko Driesprong <fo...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey everyone,
>>>>>>>
>>>>>>> I want to discuss the 0.6.0 release that will bring a lot of
>>>>>>> functionality to the public:
>>>>>>>
>>>>>>>    - Write support for writing to unpartitioned tables
>>>>>>>       - Includes snapshot generation
>>>>>>>       - Constructing Avro writer trees
>>>>>>>    - Support writing metadata which allows to commit support for
>>>>>>>    the Hive, Sql, and Glue catalog.
>>>>>>>    - Support for name-mapping
>>>>>>>    - Easy evolution of schema using the union_by_name method
>>>>>>>    - And a lot of bug fixes and improvements
>>>>>>>
>>>>>>> The write support is still limited, for example, partitioned writes
>>>>>>> or tables with sort-orders are not supported. Also, as Ryan mentioned
>>>>>>> during the last community sync, we're doing fast appends by default, and
>>>>>>> we're unable to compact yet. I've created issues on Github
>>>>>>> <https://github.com/apache/iceberg-python/issues> to track all
>>>>>>> these limitations. However, I think it is good to get the current work 
>>>>>>> out
>>>>>>> to the public so they can try it and we can uncover any impediments as 
>>>>>>> soon
>>>>>>> as possible. And we can follow up with 0.7.0.
>>>>>>>
>>>>>>> Kind regards,
>>>>>>> Fokko Driesprong
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Tabular
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Tabular
>>>>
>>>

Re: [DISCUSS] PyIceberg 0.6.0 release

Reply via email to