Hi Wes,

I agree that this is important. I have been looking at the Parquet
implementation this morning and I do see code for writing files., along
with roundtrip tests As you said, It isn't writing from Arrow types yet but
I would hope that this would be relatively simple to add. I don't know how
complete the Parquet writer code is. It would be useful to get some
guidance from the main authors of this crate.

I'd be happy to create some JIRAs and try and help organize an effort here
for the next release.

Andy.









On Mon, Mar 30, 2020 at 6:07 PM Wes McKinney <wesmck...@gmail.com> wrote:

> hi folks,
>
> More than a year has passed since the Parquet Rust project joined
> forces with Apache Arrow.
>
> I raised this issue in the past, but the project still cannot write
> files originating from Arrow records. In my opinion, this creates
> sustainability / development scalability problems for the ongoing
> development of the project. In particular, testing has to rely on
> binary files either pre-generated or generated by another library.
> This makes everything harder (testing, feature development,
> benchmarking, and so forth) and increases the chance of failing to
> cover edge cases.
>
> Looking back on over 4 years of C++ Parquet development, I doubt we
> could have gotten the project to where it is now without a writer
> implementation moving together with the reader. For example, we've had
> to deal with issues arising in very large files (e.g. BinaryArray
> overflows), and in many cases it would not be practical to store a
> pre-generated file exhibiting some of these problems.
>
> Of course, as a volunteer driven effort no one can be forced to
> implement a writer, but since a good amount of time has passed I feel
> I need to raise awareness of the issue again to see if an effort might
> be mobilized, since this also impacts people who might come to rely on
> this code in production. Given the importance of Parquet in current
> times, having a rock solid Parquet library will likely become
> essential to sustained adoption of the Arrow Rust project (it has
> certainly been very important for C++/Python/R adoption).
>
> best,
> Wes
>

Reply via email to