I don't think the exact behavior you want is supported, see [1], so I
think your assessment of the code is correct.

To store field-level metadata, I think you'd have to approach this
with the higher-level Arrow APIs and ship the field-level metadata
inside the Arrow Schema, see [2].

[1] https://github.com/apache/arrow/issues/31018
[2] 
https://github.com/apache/arrow/blob/4ede48c89b8ec80bbd1895357f272c5fb61bc9b6/cpp/examples/arrow/parquet_read_write.cc#L115-L116

On Wed, Jan 8, 2025 at 8:46 AM Andrew Bell <andrew.bell...@gmail.com> wrote:
>
> Thanks for your response
>
> On Mon, Jan 6, 2025 at 4:39 PM Bryce Mecum <bryceme...@gmail.com> wrote:
> >
> > Are you able to share your code, particularly how you build your
> > ArrowWriterProperties?
> >
> > The Arrow Schema and therefore the field-level metadata is actually
> > stored in the Parquet file as an opaque blob. Opaque in the sense that
> > it's opaque to the standard Parquet tools. You'll have to read it in
> > with a tool that's Arrow-aware such as Arrow C++ or PyArrow.
>
> Sorry. I guess I wasn't clear. I'm doing something like this:
>
> using SchemaPtr = std::shared_ptr<arrow::Schema>;
> using ParquetSchemaPtr = std::shared_ptr<parquet::SchemaDescriptor>;
> using FieldPtr = std::shared_ptr<arrow::Field>;
> std::vector<FieldPtr> fields;
>
> // Note metadata on field.
> fields.push_back(arrow::field("field1", some_type, kvMetadata));
> fields.push_back(arrow::field("field2", some_other_type, kvMetadata2));
> ...
> SchemaPtr schema(new arrow::Schema(fields));
>
> ParquetSchemaPtr parquetSchema;
> parquet::arrow::ToParquetSchema(schema.get(),
> *propertiesBuilder.build(), *writerProperties, &parquetSchema);
> ...
> // Open file and write data.
>
> What I was wanting was that the metadata information that I placed in
> each of the fields that were part of the arrow schema to be written to
> the parquet file. I don't see this happening. When I look at
> FieldToNode() in parquet/arrow/schema.cc, it doesn't seem like the
> metadata is dealt with -- I don't see anyplace on the parquet Node to
> contain the metadata (I could be missing something).
>
> > However, I believe the default behavior of the Arrow C++ Parquet
> > implementation is to not store the Arrow Schema so you'll have to opt
> > into that behavior to get what you want by enabling store_schema [1]
> >
> > [1] https://arrow.apache.org/docs/cpp/parquet.html#writetable
> >
> > On Mon, Jan 6, 2025 at 12:31 PM Andrew Bell <andrew.bell...@gmail.com> 
> > wrote:
> > >
> > > Hi,
> > >
> > > I'm creating a Parquet file with a writer (a FileWriter based on a
> > > ParquetFileWriter). The writer is created using a Schema and the
> > > Schema itself was created from a list of Fields. Each of the fields
> > > contains metadata and the schema itself also contains metadata. When I
> > > examine the output of the file with `parquet-tools inspect --detail`
> > > it shows the Schema metadata, but no field metadata.
> > >
> > > I'm trying to figure out if the field metadata is being written or if
> > > this is just an issue with seeing the data using the `paquet-tools`
> > > program. Do I have to do something special to get metadata associated
> > > with schema fields written to a parquet file? Or do I need to use some
> > > other command to see field-level metadata?
> > >
> > > Thanks,
> > >
> > > --
> > > Andrew Bell
> > > andrew.bell...@gmail.com
>
>
>
> --
> Andrew Bell
> andrew.bell...@gmail.com

Reply via email to