cc parquet-dev list (it would be nice to always do so for these general
questions.)
Cheng
On 12/6/15 3:10 PM, Shushant Arora wrote:
Hi
I have few doubts on parquet file format.
1.Does parquet keeps min max statistics like in ORC. how can I see
parquet version(whether its1.1,1.2or1.3) for parquet file generated
using hive or custom MR or AvroParquetoutputFormat.
Yes, Parquet also keeps row group statistics. You may check the Parquet
file using the parquet-meta CLI tool in parquet-tools (see
https://github.com/Parquet/parquet-mr/issues/321 for details), then look
for the "creator" field of the file. For programmatic access, check for
o.a.p.hadoop.metadata.FileMetaData.createdBy.
2.how to sort parquet records while generating parquet file using
avroparquetoutput format?
AvroParquetOutputFormat is not a format. It's just responsible for
converting Avro records to Parquet records. How are you using
AvroParquetOutputFormat? Any example snippets?
Thanks
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org