[ https://issues.apache.org/jira/browse/HIVE-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fang-Yu Rao updated HIVE-22947: ------------------------------- Description: The RPC of {{getTableObjectsByName()}} in {{HiveMetaStoreClient.java}} ([https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L2111-L2114]) is very slow. Specifically, according to an empirical evaluation, to load the complete metadata of all the tables under a database consisting of 40,000 tables, it takes at least 170 seconds for {{getTableObjectsByName()}} to complete, whereas it only takes less than 0.5 second for {{getAllTables()}} ([https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L2281-L2288]) on the same machine. In some use cases, not all the fields under the class of {{org.apache.hadoop.hive.metastore.api.Table}} are required. For instance, if a client would only like to determine the type of a table, e.g., an HDFS table or a Kudu table, then it should suffice to only load the field of {{sd}}, which is of class {{org.apache.hadoop.hive.metastore.api.StorageDescriptor}}. It would be great if {{getTableObjectsByName()}} could be made more fine-grained so that only those required fields specified by the client are retrieved, which could also possibly reduce the time spent on this RPC. A spreadsheet is also attached ([^Benchmark_related_to_IMPALA-9363.pdf]), where the detailed experimental results are provided. In the experiment, as a client of Hive metastore, the {{catalogd}} of Impala calls {{getTableObjectsByName()}} to retrieve the complete metadata of tables under a database having 40,000 tables. was: The RPC of {{getTableObjectsByName()}} in {{HiveMetaStoreClient.java}} ([https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L2111-L2114]) is very slow. Specifically, according to an empirical evaluation, to load the complete metadata of all the tables under a database consisting of 40,000 tables, it takes at least 170 seconds for {{getTableObjectsByName()}} to complete, whereas it only takes less than 0.5 second for {{getAllTables()}} ([https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L2281-L2288]). In some use cases, not all the fields under the class of {{org.apache.hadoop.hive.metastore.api.Table}} are required. For instance, if a client would only like to determine the type of a table, e.g., an HDFS table or a Kudu table, then it should suffice to only load the field of {{sd}}, which is of class {{org.apache.hadoop.hive.metastore.api.StorageDescriptor}}. It would be great if {{getTableObjectsByName()}} could be made more fine-grained so that only those required fields specified by the client are retrieved, which could also possibly reduce the time spent on this RPC. A spreadsheet is also attached ([^Benchmark_related_to_IMPALA-9363.pdf]), where the detailed experimental results are provided. In the experiment, as a client of Hive metastore, the {{catalogd}} of Impala calls {{getTableObjectsByName()}} to retrieve the complete metadata of tables under a database having 40,000 tables. > The method getTableObjectsByName() in HiveMetaStoreClient.java is slow > ---------------------------------------------------------------------- > > Key: HIVE-22947 > URL: https://issues.apache.org/jira/browse/HIVE-22947 > Project: Hive > Issue Type: Improvement > Components: Standalone Metastore > Reporter: Fang-Yu Rao > Priority: Major > Attachments: Benchmark_related_to_IMPALA-9363.pdf > > > The RPC of {{getTableObjectsByName()}} in {{HiveMetaStoreClient.java}} > ([https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L2111-L2114]) > is very slow. Specifically, according to an empirical evaluation, to load > the complete metadata of all the tables under a database consisting of 40,000 > tables, it takes at least 170 seconds for {{getTableObjectsByName()}} to > complete, whereas it only takes less than 0.5 second for {{getAllTables()}} > ([https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L2281-L2288]) > on the same machine. > In some use cases, not all the fields under the class of > {{org.apache.hadoop.hive.metastore.api.Table}} are required. For instance, if > a client would only like to determine the type of a table, e.g., an HDFS > table or a Kudu table, then it should suffice to only load the field of > {{sd}}, which is of class > {{org.apache.hadoop.hive.metastore.api.StorageDescriptor}}. It would be great > if {{getTableObjectsByName()}} could be made more fine-grained so that only > those required fields specified by the client are retrieved, which could also > possibly reduce the time spent on this RPC. > A spreadsheet is also attached ([^Benchmark_related_to_IMPALA-9363.pdf]), > where the detailed experimental results are provided. In the experiment, as a > client of Hive metastore, the {{catalogd}} of Impala calls > {{getTableObjectsByName()}} to retrieve the complete metadata of tables under > a database having 40,000 tables. > -- This message was sent by Atlassian Jira (v8.3.4#803005)