[ 
https://issues.apache.org/jira/browse/HIVE-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fang-Yu Rao updated HIVE-22947:
-------------------------------
    Description: 
The RPC of {{getTableObjectsByName()}} in {{HiveMetaStoreClient.java}} 
([https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L2111-L2114])
 is very slow. Specifically, according to an empirical evaluation, to load the 
complete metadata of all the tables under a database consisting of 40,000 
tables, it takes at least 170 seconds for {{getTableObjectsByName()}} to 
complete, whereas it only takes less than 0.5 second for {{getAllTables()}} 
([https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L2281-L2288])
 on the same machine.

In some use cases, not all the fields under the class of 
{{org.apache.hadoop.hive.metastore.api.Table}} are required. For instance, if a 
client would only like to determine the type of a table, e.g., an HDFS table or 
a Kudu table, then it should suffice to only load the field of {{sd}}, which is 
of class {{org.apache.hadoop.hive.metastore.api.StorageDescriptor}}. It would 
be great if {{getTableObjectsByName()}} could be made more fine-grained so that 
only those required fields specified by the client are retrieved, which could 
also possibly reduce the time spent on this RPC.

A spreadsheet is also attached ([^Benchmark_related_to_IMPALA-9363.pdf]), where 
the detailed experimental results are provided. In the experiment, as a client 
of Hive metastore, the {{catalogd}} of Impala calls {{getTableObjectsByName()}} 
to retrieve the complete metadata of tables under a database having 40,000 
tables.

 

  was:
The RPC of {{getTableObjectsByName()}} in {{HiveMetaStoreClient.java}} 
([https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L2111-L2114])
 is very slow. Specifically, according to an empirical evaluation, to load the 
complete metadata of all the tables under a database consisting of 40,000 
tables, it takes at least 170 seconds for {{getTableObjectsByName()}} to 
complete, whereas it only takes less than 0.5 second for {{getAllTables()}} 
([https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L2281-L2288]).

In some use cases, not all the fields under the class of 
{{org.apache.hadoop.hive.metastore.api.Table}} are required. For instance, if a 
client would only like to determine the type of a table, e.g., an HDFS table or 
a Kudu table, then it should suffice to only load the field of {{sd}}, which is 
of class {{org.apache.hadoop.hive.metastore.api.StorageDescriptor}}. It would 
be great if {{getTableObjectsByName()}} could be made more fine-grained so that 
only those required fields specified by the client are retrieved, which could 
also possibly reduce the time spent on this RPC.

A spreadsheet is also attached ([^Benchmark_related_to_IMPALA-9363.pdf]), where 
the detailed experimental results are provided. In the experiment, as a client 
of Hive metastore, the {{catalogd}} of Impala calls {{getTableObjectsByName()}} 
to retrieve the complete metadata of tables under a database having 40,000 
tables.

 


> The method getTableObjectsByName() in HiveMetaStoreClient.java is slow
> ----------------------------------------------------------------------
>
>                 Key: HIVE-22947
>                 URL: https://issues.apache.org/jira/browse/HIVE-22947
>             Project: Hive
>          Issue Type: Improvement
>          Components: Standalone Metastore
>            Reporter: Fang-Yu Rao
>            Priority: Major
>         Attachments: Benchmark_related_to_IMPALA-9363.pdf
>
>
> The RPC of {{getTableObjectsByName()}} in {{HiveMetaStoreClient.java}} 
> ([https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L2111-L2114])
>  is very slow. Specifically, according to an empirical evaluation, to load 
> the complete metadata of all the tables under a database consisting of 40,000 
> tables, it takes at least 170 seconds for {{getTableObjectsByName()}} to 
> complete, whereas it only takes less than 0.5 second for {{getAllTables()}} 
> ([https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L2281-L2288])
>  on the same machine.
> In some use cases, not all the fields under the class of 
> {{org.apache.hadoop.hive.metastore.api.Table}} are required. For instance, if 
> a client would only like to determine the type of a table, e.g., an HDFS 
> table or a Kudu table, then it should suffice to only load the field of 
> {{sd}}, which is of class 
> {{org.apache.hadoop.hive.metastore.api.StorageDescriptor}}. It would be great 
> if {{getTableObjectsByName()}} could be made more fine-grained so that only 
> those required fields specified by the client are retrieved, which could also 
> possibly reduce the time spent on this RPC.
> A spreadsheet is also attached ([^Benchmark_related_to_IMPALA-9363.pdf]), 
> where the detailed experimental results are provided. In the experiment, as a 
> client of Hive metastore, the {{catalogd}} of Impala calls 
> {{getTableObjectsByName()}} to retrieve the complete metadata of tables under 
> a database having 40,000 tables.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to