[GitHub] [hudi] ravs11 commented on issue #4609: [SUPPORT] Got exception while using clustering with z-order

GitBox Sat, 15 Jan 2022 20:07:50 -0800


ravs11 commented on issue #4609:
URL: https://github.com/apache/hudi/issues/4609#issuecomment-1013806955



   @xiarixiaoyao Thanks for your reply.
   
   1. There is only one directory.
   `Found 1 items
   drwxrwxr-x   - ravs11 ravs11          0 2022-01-16 00:16 
hdfs://R2/project_path/hudi_z_order/.hoodie/.zindex/20220115235509743`
   2. Under the above directory there is only 1 parquet file. 
spark.read.load(xxx.parquet).schema results into 
`org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It 
must be specified manually.` But I'm able to check the schema with 
parquet-tools-1.11.1.jar.
   `hadoop jar parquet-tools-1.11.1.jar schema 
hdfs://R2/project_path/hudi_z_order/.hoodie/.zindex/20220115235509743/part-00000-afa7376c-b7b5-481b-8912-2129634a38d0-c000.snappy.parquet`
   `message spark_schema {
     optional binary file (STRING);
     optional binary page_type_minValue (STRING);
     optional binary page_type_maxValue (STRING);
     optional int64 page_type_num_nulls;
     optional binary page_section_0_minValue (STRING);
     optional binary page_section_0_maxValue (STRING);
     optional int64 page_section_0_num_nulls;
     optional binary target_type_minValue (STRING);
     optional binary target_type_maxValue (STRING);
     optional int64 target_type_num_nulls;
   }`
   3. Actually I'm dealing with sensitive data. Let me see how I can prepare 
some dummy data for you.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] ravs11 commented on issue #4609: [SUPPORT] Got exception while using clustering with z-order

Reply via email to