Here was the last discussion about this 6 months ago

https://github.com/apache/parquet-testing/pull/9

I saw another PR come through like this so that's why I'm bringing it up again

https://github.com/apache/parquet-testing/pull/11

On Tue, Mar 31, 2020 at 9:08 AM Andy Grove <andygrov...@gmail.com> wrote:
>
> Hi Wes,
>
> I agree that this is important. I have been looking at the Parquet
> implementation this morning and I do see code for writing files., along
> with roundtrip tests As you said, It isn't writing from Arrow types yet but
> I would hope that this would be relatively simple to add. I don't know how
> complete the Parquet writer code is. It would be useful to get some
> guidance from the main authors of this crate.
>
> I'd be happy to create some JIRAs and try and help organize an effort here
> for the next release.
>
> Andy.
>
>
>
>
>
>
>
>
>
> On Mon, Mar 30, 2020 at 6:07 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > hi folks,
> >
> > More than a year has passed since the Parquet Rust project joined
> > forces with Apache Arrow.
> >
> > I raised this issue in the past, but the project still cannot write
> > files originating from Arrow records. In my opinion, this creates
> > sustainability / development scalability problems for the ongoing
> > development of the project. In particular, testing has to rely on
> > binary files either pre-generated or generated by another library.
> > This makes everything harder (testing, feature development,
> > benchmarking, and so forth) and increases the chance of failing to
> > cover edge cases.
> >
> > Looking back on over 4 years of C++ Parquet development, I doubt we
> > could have gotten the project to where it is now without a writer
> > implementation moving together with the reader. For example, we've had
> > to deal with issues arising in very large files (e.g. BinaryArray
> > overflows), and in many cases it would not be practical to store a
> > pre-generated file exhibiting some of these problems.
> >
> > Of course, as a volunteer driven effort no one can be forced to
> > implement a writer, but since a good amount of time has passed I feel
> > I need to raise awareness of the issue again to see if an effort might
> > be mobilized, since this also impacts people who might come to rely on
> > this code in production. Given the importance of Parquet in current
> > times, having a rock solid Parquet library will likely become
> > essential to sustained adoption of the Arrow Rust project (it has
> > certainly been very important for C++/Python/R adoption).
> >
> > best,
> > Wes
> >

Reply via email to