Thanks for bringing this up Weston.

Joris has already created a 12.0.1 milestone that contains several fixes
that are candidates for backport [1], including this one. I think this is
the most severe issue though.

As a maintainer of the Python deltalake package, which uses the PyArrow
Parquet writer and is often passed pandas data, I would appreciate a patch
release.

Best,

Will Jones

[1]
https://github.com/apache/arrow/issues?q=is%3Aopen+is%3Aissue+milestone%3A12.0.1

On Thu, May 18, 2023 at 10:18 AM Ian Cook <i...@ursacomputing.com> wrote:

> There is also a major issue with the 12.0.0 R package that has now
> been fixed in the repo [2] and needs to be resubmitted to CRAN soon.
> The R package developers are supportive of a 12.0.1 patch release
> happening soon so that the resubmission of the R package to CRAN can
> also include the fix for the performance regression you mention.
>
> Ian
>
> [2] https://github.com/apache/arrow/pull/35612
>
> On Thu, May 18, 2023 at 1:04 PM Weston Pace <weston.p...@gmail.com> wrote:
> >
> > Regrettabl, 12.0.0 had a significant performance regression (I'll take
> the
> > blame for not thinking through all the use cases), most easily exposed
> when
> > writing datasets from pandas / numpy data, which is being addressed in
> > [1].  I believe this to be a fairly common use case and it may warrant a
> > 12.0.1 patch.  Are there other issues that would need a patch?  Do we
> feel
> > this issue is significant enough to justify the work?
> >
> > [1] https://github.com/apache/arrow/pull/35565
>

Reply via email to