Rajesh Balamohan created HIVE-15339:
---------------------------------------
Summary: Prefetch column stats for fields needed in
FilterSelectivityEstimator
Key: HIVE-15339
URL: https://issues.apache.org/jira/browse/HIVE-15339
Project: Hive
Issue Type: Improvement
Reporter: Rajesh Balamohan
Priority: Minor
Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics
from metastore in multiple calls. For instance, in the following query, it ends
up getting individual column statistics for for flights multiple number of
times.
When the table has large number of partitions, getting statistics for columns
via multiple calls can be very expensive. This would adversely impact the
overall compilation time. The following query took 14 seconds to compile.
{noformat}
SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`,
YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok`
FROM `flights` as `flights`
JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`)
JOIN `airports` as `source_airport` ON (`flights`.`origin` =
`source_airport`.`iata`)
JOIN `airports` as `dest_airport` ON (`flights`.`dest` = `dest_airport`.`iata`)
GROUP BY YEAR(`flights`.`dateofflight`);
{noformat}
It may be helpful to club all columns that need statistics and fetch these
details in single remote call.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)