Here was the last discussion about this 6 months ago https://github.com/apache/parquet-testing/pull/9
I saw another PR come through like this so that's why I'm bringing it up again https://github.com/apache/parquet-testing/pull/11 On Tue, Mar 31, 2020 at 9:08 AM Andy Grove <andygrov...@gmail.com> wrote: > > Hi Wes, > > I agree that this is important. I have been looking at the Parquet > implementation this morning and I do see code for writing files., along > with roundtrip tests As you said, It isn't writing from Arrow types yet but > I would hope that this would be relatively simple to add. I don't know how > complete the Parquet writer code is. It would be useful to get some > guidance from the main authors of this crate. > > I'd be happy to create some JIRAs and try and help organize an effort here > for the next release. > > Andy. > > > > > > > > > > On Mon, Mar 30, 2020 at 6:07 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > hi folks, > > > > More than a year has passed since the Parquet Rust project joined > > forces with Apache Arrow. > > > > I raised this issue in the past, but the project still cannot write > > files originating from Arrow records. In my opinion, this creates > > sustainability / development scalability problems for the ongoing > > development of the project. In particular, testing has to rely on > > binary files either pre-generated or generated by another library. > > This makes everything harder (testing, feature development, > > benchmarking, and so forth) and increases the chance of failing to > > cover edge cases. > > > > Looking back on over 4 years of C++ Parquet development, I doubt we > > could have gotten the project to where it is now without a writer > > implementation moving together with the reader. For example, we've had > > to deal with issues arising in very large files (e.g. BinaryArray > > overflows), and in many cases it would not be practical to store a > > pre-generated file exhibiting some of these problems. > > > > Of course, as a volunteer driven effort no one can be forced to > > implement a writer, but since a good amount of time has passed I feel > > I need to raise awareness of the issue again to see if an effort might > > be mobilized, since this also impacts people who might come to rely on > > this code in production. Given the importance of Parquet in current > > times, having a rock solid Parquet library will likely become > > essential to sustained adoption of the Arrow Rust project (it has > > certainly been very important for C++/Python/R adoption). > > > > best, > > Wes > >