hi folks, As some of you may have noticed, we are accumulating a mountain of Parquet-related JIRA issues, many of them resulting from people using Apache Arrow to do data engineering in Python and running into problems.
To help with having better visibility into all the relevant Parquet issues, and with the monorepo merge behind us, I created a couple wiki pages linked to from the main https://cwiki.apache.org/confluence/display/ARROW page: * C++ issue dashboard: https://cwiki.apache.org/confluence/x/fpWzBQ * Python issue dashboard: https://cwiki.apache.org/confluence/display/ARROW/Python+Parquet+Development Many Parquet issues in the ARROW project are not found in these dashboards because they lack the "parquet" label. Please help with project organization by remembering to apply the "parquet" label to any issue. Since Ruby also supports Parquet now via GLib, and R support for Parquet is coming soon, we need to do what we can to grow the community of people working on the core Parquet libraries and the things they depend on, like the IO and memory management subsystems of the Arrow C++ libraries. In general, I think it is very important for us to have fast and reliable C++ support (and language bindings) for the 5 major file formats in use in data warehousing: * CSV * JSON * Parquet * Avro * ORC Antoine has been leading efforts on reading CSV files, and we will need to make a push into JSON and Avro at some point. Thanks Wes