Thanks for bringing this up Weston. Joris has already created a 12.0.1 milestone that contains several fixes that are candidates for backport [1], including this one. I think this is the most severe issue though.
As a maintainer of the Python deltalake package, which uses the PyArrow Parquet writer and is often passed pandas data, I would appreciate a patch release. Best, Will Jones [1] https://github.com/apache/arrow/issues?q=is%3Aopen+is%3Aissue+milestone%3A12.0.1 On Thu, May 18, 2023 at 10:18 AM Ian Cook <i...@ursacomputing.com> wrote: > There is also a major issue with the 12.0.0 R package that has now > been fixed in the repo [2] and needs to be resubmitted to CRAN soon. > The R package developers are supportive of a 12.0.1 patch release > happening soon so that the resubmission of the R package to CRAN can > also include the fix for the performance regression you mention. > > Ian > > [2] https://github.com/apache/arrow/pull/35612 > > On Thu, May 18, 2023 at 1:04 PM Weston Pace <weston.p...@gmail.com> wrote: > > > > Regrettabl, 12.0.0 had a significant performance regression (I'll take > the > > blame for not thinking through all the use cases), most easily exposed > when > > writing datasets from pandas / numpy data, which is being addressed in > > [1]. I believe this to be a fairly common use case and it may warrant a > > 12.0.1 patch. Are there other issues that would need a patch? Do we > feel > > this issue is significant enough to justify the work? > > > > [1] https://github.com/apache/arrow/pull/35565 >