[ https://issues.apache.org/jira/browse/HIVE-15422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749825#comment-15749825 ]
Sergey Shelukhin commented on HIVE-15422: ----------------------------------------- +1 on updated patch > HiveInputFormat::pushProjectionsAndFilters paths comparison generates huge > number of objects for partitioned dataset > -------------------------------------------------------------------------------------------------------------------- > > Key: HIVE-15422 > URL: https://issues.apache.org/jira/browse/HIVE-15422 > Project: Hive > Issue Type: Improvement > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Priority: Minor > Attachments: HIVE-15422.1.patch, HIVE-15422.2.patch, > HIVE-15422.3.patch, Profiler_Snapshot_HIVE-15422.png > > > When executing the following query in LLAP (single instance) in a 5 node > cluster, lots of GC pressure was observed. > {noformat} > select a.type, a.city , a.frequency, b.city, b.country, b.lat, b.lon > from (select 'depart' as type, origin as city, count(origin) as frequency > from flights > group by origin > order by frequency desc, type) as a > left join airports as b on a.city = b.iata > order by frequency desc; > {noformat} > Flights table has got around 7000+ partitions in S3. Profiling revealed large > amount of objects created just in path comparisons in HiveInputFormat. > HIVE-15405 reduces number of path comparisons at FileUtils, but it still ends > up doing lots of comparisons in HiveInputFormat::pushProjectionsAndFilters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)