[ https://issues.apache.org/jira/browse/HIVE-23218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ramesh Kumar Thangarajan reassigned HIVE-23218: ----------------------------------------------- Assignee: Ramesh Kumar Thangarajan > LlapRecordReader queue limit computation is not optimal > ------------------------------------------------------- > > Key: HIVE-23218 > URL: https://issues.apache.org/jira/browse/HIVE-23218 > Project: Hive > Issue Type: Improvement > Components: llap > Reporter: Rajesh Balamohan > Assignee: Ramesh Kumar Thangarajan > Priority: Major > > After decoding {{OrcEncodedDataConsumer::decodeBatch}}, data is enqueued into > a queue in LlapRecordReader. Queue limit for this queue is determined in > LlapRecordReader. If it is minimal, it ends up waiting for 100ms until it > gets capacity. > https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L168 > https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L590 > https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapRecordReader.java#L260 > {{determineQueueLimit}} takes into consideration all columns though only few > columns are needed for projection. Here is an example. > {noformat} > create table test_acid(a1 string, a2 string, a3 string, a4 string, a5 string, > a6 string, a7 string, a8 string, a9 string, a10 string, > a11 string, a22 string, a33 string, a44 string, a55 string, a66 string, a77 > string, a88 string, a99 string, a100 string, > a111 decimal(25,2), a222 decimal(25,2), a333 decimal(25,2), a444 > decimal(25,2), a555 decimal(25,2), a666 decimal(25,2), a777 decimal(25,2), > a888 decimal(25,2), a999 decimal(25,2), a1000 decimal(25,2)) stored as orc; > insert into table test_acid values > ("a1","a2","a3","a4","a5","a6","a7","a8","a9","a10", > "a11","a22","a33","a44","a55","a66","a77","a88","a99","a100", > 10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23,10.23 > ); > select a44, count(*) from test_acid where a44 like "a4%" group by a44 order > by a44; > {noformat} > For this query, queue size predicted would be "138" as it takes into account > all fields instead of just 2. This would causes unwanted delays in adding > data to the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)