Gopal V created HIVE-5144: ----------------------------- Summary: HashTableSink allocates empty new Object[] arrays & OOMs - use a static emptyRow instead Key: HIVE-5144 URL: https://issues.apache.org/jira/browse/HIVE-5144 Project: Hive Issue Type: Bug Components: Query Processor Environment: Ubuntu LXC + -Xmx4096m client opts Reporter: Gopal V Assignee: Gopal V Priority: Minor
The map-join hashtable sink in the local-task creates an in-memory hashtable with the following code. {code} Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], ... MapJoinRowContainer rowContainer = tableContainer.get(key); if (rowContainer == null) { rowContainer = new MapJoinRowContainer(); rowContainer.add(value); {code} But for a query where the joinValues[alias].size() == 0, this results in a large number of unnecessary allocations which would be better served with a copy-on-write default value container & a pre-allocated zero object array which is immutable (the only immutable array there is in java). The query tested is roughly the following to scan all of customer_demographics in the hash-sink {code} select c_salutation, count(1) from customer JOIN customer_demographics ON customer.c_current_cdemo_sk = customer_demographics.cd_demo_sk group by c_salutation limit 10 ; {code} When running with current trunk, the code results in an OOM with 512Mb ram. {code} 2013-08-23 05:11:26 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 292418944 percentage: 0.579 Execution failed with exit status: 3 Obtaining error information {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira