To get the ball rolling, here is a quick and dirty PR adding a test that
writes an Arrow batch to a Parquet file.

https://github.com/apache/arrow/pull/6785

I'll keep iterating on this but will gladly accept help or hand this off to
someone better qualified.



On Tue, Mar 31, 2020 at 8:15 AM Wes McKinney <wesmck...@gmail.com> wrote:

> Here was the last discussion about this 6 months ago
>
> https://github.com/apache/parquet-testing/pull/9
>
> I saw another PR come through like this so that's why I'm bringing it up
> again
>
> https://github.com/apache/parquet-testing/pull/11
>
> On Tue, Mar 31, 2020 at 9:08 AM Andy Grove <andygrov...@gmail.com> wrote:
> >
> > Hi Wes,
> >
> > I agree that this is important. I have been looking at the Parquet
> > implementation this morning and I do see code for writing files., along
> > with roundtrip tests As you said, It isn't writing from Arrow types yet
> but
> > I would hope that this would be relatively simple to add. I don't know
> how
> > complete the Parquet writer code is. It would be useful to get some
> > guidance from the main authors of this crate.
> >
> > I'd be happy to create some JIRAs and try and help organize an effort
> here
> > for the next release.
> >
> > Andy.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Mar 30, 2020 at 6:07 PM Wes McKinney <wesmck...@gmail.com>
> wrote:
> >
> > > hi folks,
> > >
> > > More than a year has passed since the Parquet Rust project joined
> > > forces with Apache Arrow.
> > >
> > > I raised this issue in the past, but the project still cannot write
> > > files originating from Arrow records. In my opinion, this creates
> > > sustainability / development scalability problems for the ongoing
> > > development of the project. In particular, testing has to rely on
> > > binary files either pre-generated or generated by another library.
> > > This makes everything harder (testing, feature development,
> > > benchmarking, and so forth) and increases the chance of failing to
> > > cover edge cases.
> > >
> > > Looking back on over 4 years of C++ Parquet development, I doubt we
> > > could have gotten the project to where it is now without a writer
> > > implementation moving together with the reader. For example, we've had
> > > to deal with issues arising in very large files (e.g. BinaryArray
> > > overflows), and in many cases it would not be practical to store a
> > > pre-generated file exhibiting some of these problems.
> > >
> > > Of course, as a volunteer driven effort no one can be forced to
> > > implement a writer, but since a good amount of time has passed I feel
> > > I need to raise awareness of the issue again to see if an effort might
> > > be mobilized, since this also impacts people who might come to rely on
> > > this code in production. Given the importance of Parquet in current
> > > times, having a rock solid Parquet library will likely become
> > > essential to sustained adoption of the Arrow Rust project (it has
> > > certainly been very important for C++/Python/R adoption).
> > >
> > > best,
> > > Wes
> > >
>

Reply via email to