I don't think the exact behavior you want is supported, see [1], so I think your assessment of the code is correct.
To store field-level metadata, I think you'd have to approach this with the higher-level Arrow APIs and ship the field-level metadata inside the Arrow Schema, see [2]. [1] https://github.com/apache/arrow/issues/31018 [2] https://github.com/apache/arrow/blob/4ede48c89b8ec80bbd1895357f272c5fb61bc9b6/cpp/examples/arrow/parquet_read_write.cc#L115-L116 On Wed, Jan 8, 2025 at 8:46 AM Andrew Bell <andrew.bell...@gmail.com> wrote: > > Thanks for your response > > On Mon, Jan 6, 2025 at 4:39 PM Bryce Mecum <bryceme...@gmail.com> wrote: > > > > Are you able to share your code, particularly how you build your > > ArrowWriterProperties? > > > > The Arrow Schema and therefore the field-level metadata is actually > > stored in the Parquet file as an opaque blob. Opaque in the sense that > > it's opaque to the standard Parquet tools. You'll have to read it in > > with a tool that's Arrow-aware such as Arrow C++ or PyArrow. > > Sorry. I guess I wasn't clear. I'm doing something like this: > > using SchemaPtr = std::shared_ptr<arrow::Schema>; > using ParquetSchemaPtr = std::shared_ptr<parquet::SchemaDescriptor>; > using FieldPtr = std::shared_ptr<arrow::Field>; > std::vector<FieldPtr> fields; > > // Note metadata on field. > fields.push_back(arrow::field("field1", some_type, kvMetadata)); > fields.push_back(arrow::field("field2", some_other_type, kvMetadata2)); > ... > SchemaPtr schema(new arrow::Schema(fields)); > > ParquetSchemaPtr parquetSchema; > parquet::arrow::ToParquetSchema(schema.get(), > *propertiesBuilder.build(), *writerProperties, &parquetSchema); > ... > // Open file and write data. > > What I was wanting was that the metadata information that I placed in > each of the fields that were part of the arrow schema to be written to > the parquet file. I don't see this happening. When I look at > FieldToNode() in parquet/arrow/schema.cc, it doesn't seem like the > metadata is dealt with -- I don't see anyplace on the parquet Node to > contain the metadata (I could be missing something). > > > However, I believe the default behavior of the Arrow C++ Parquet > > implementation is to not store the Arrow Schema so you'll have to opt > > into that behavior to get what you want by enabling store_schema [1] > > > > [1] https://arrow.apache.org/docs/cpp/parquet.html#writetable > > > > On Mon, Jan 6, 2025 at 12:31 PM Andrew Bell <andrew.bell...@gmail.com> > > wrote: > > > > > > Hi, > > > > > > I'm creating a Parquet file with a writer (a FileWriter based on a > > > ParquetFileWriter). The writer is created using a Schema and the > > > Schema itself was created from a list of Fields. Each of the fields > > > contains metadata and the schema itself also contains metadata. When I > > > examine the output of the file with `parquet-tools inspect --detail` > > > it shows the Schema metadata, but no field metadata. > > > > > > I'm trying to figure out if the field metadata is being written or if > > > this is just an issue with seeing the data using the `paquet-tools` > > > program. Do I have to do something special to get metadata associated > > > with schema fields written to a parquet file? Or do I need to use some > > > other command to see field-level metadata? > > > > > > Thanks, > > > > > > -- > > > Andrew Bell > > > andrew.bell...@gmail.com > > > > -- > Andrew Bell > andrew.bell...@gmail.com