Kevin Minder created HIVE-19996:
-----------------------------------

             Summary: Beeline performance poor with drivers having slow 
DatabaseMetaData.getPrimaryKeys impl
                 Key: HIVE-19996
                 URL: https://issues.apache.org/jira/browse/HIVE-19996
             Project: Hive
          Issue Type: Improvement
          Components: Beeline
    Affects Versions: 1.2.1
         Environment: Issue detected using Beeline with HBase Phoenix thin 
driver and a result set with many columns.
            Reporter: Kevin Minder


Beeline performance is rather poor for table output format when two conditions 
occur for the same result set.
 # The result set has a large number of columns.
 # The driver being used has a slow implementation of 
DatabaseMetaData.getPrimaryKeys.

For example testing has shown that for a query with ~100 columns using the 
HBase Phoenix thin driver the execution time can be cut from ~30 seconds to ~2 
seconds by using CSV output format vs table output format. For example: 
{{select * from system.catalog;}}

This is due to how primary keys are detected. Currently the Rows implementation 
will make a metadata call for every column to determine it is a primary key for 
display purposes. I propose optimizing this such that a metadata call is only 
made for each unique table in the result set's columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to