ravs11 commented on issue #4609: URL: https://github.com/apache/hudi/issues/4609#issuecomment-1013806955
@xiarixiaoyao Thanks for your reply. 1. There is only one directory. `Found 1 items drwxrwxr-x - ravs11 ravs11 0 2022-01-16 00:16 hdfs://R2/project_path/hudi_z_order/.hoodie/.zindex/20220115235509743` 2. Under the above directory there is only 1 parquet file. spark.read.load(xxx.parquet).schema results into `org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.` But I'm able to check the schema with parquet-tools-1.11.1.jar. `hadoop jar parquet-tools-1.11.1.jar schema hdfs://R2/project_path/hudi_z_order/.hoodie/.zindex/20220115235509743/part-00000-afa7376c-b7b5-481b-8912-2129634a38d0-c000.snappy.parquet` `message spark_schema { optional binary file (STRING); optional binary page_type_minValue (STRING); optional binary page_type_maxValue (STRING); optional int64 page_type_num_nulls; optional binary page_section_0_minValue (STRING); optional binary page_section_0_maxValue (STRING); optional int64 page_section_0_num_nulls; optional binary target_type_minValue (STRING); optional binary target_type_maxValue (STRING); optional int64 target_type_num_nulls; }` 3. Actually I'm dealing with sensitive data. Let me see how I can prepare some dummy data for you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org