[ https://issues.apache.org/jira/browse/HIVE-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578619#comment-17578619 ]
Ruyi Zheng commented on HIVE-26435: ----------------------------------- h1. *What worked?* Summarized HMS metadata. The summary includes all the indicators that the PM team required, including catalog name, database name, table name, column count, partition count, partition column count, size of bytes, number of rows, number of files, table type, file format, compression type. Here are some details of some indicators: * catalog name: can be null, cast it to "null" * compression type: When HMS tables show that the compression type is "0" or "f", we cast it to "None" * Size of bytes, number of rows, number of files: Those indicators of partitioned tables are stored in Impala. Has already get those data and added them into HMS Summary * file format: currently can recognize "iceberg", "parquet", "orc", "avro", "json", "base", "jdbc", "kudu", "text", "sequence", "passthrough" Tested with PostgreSQL DB successfully. Generated output in JSON and CONSOLE h1. *What still need to be done?* * Haven't tested with large-scale MySQL DB * For getting data of the partitioned table from Impala, I created 3 additional tables and did a left join by using SQL queries. This part should be improved * Don't have time to test the Iceberg related method, so just deleted that part of the code. This is the work about Iceberg I've done so far in case someone would like to refer: [https://docs.google.com/document/d/1zWyif0exSOqiukREkkIy4CiqnpvChVYqay_7C1fgeRo/edit?usp=sharing] > HMS Summary > ----------- > > Key: HIVE-26435 > URL: https://issues.apache.org/jira/browse/HIVE-26435 > Project: Hive > Issue Type: New Feature > Reporter: Ruyi Zheng > Assignee: Naveen Gangam > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Hive Metastore currently lacks visibility into its metadata. This work > includes enhancing the Hive Metatool to include an option(JSON, CONSOLE) to > print a summary (catalog name, database name, table name, partition column > count, number of rows. table type, file type, compression type, total data > size, etc). -- This message was sent by Atlassian Jira (v8.20.10#820010)