> When is OOM error actually thrown? With >hive.mapjoin.hybridgrace.hashtable set to true, spilling should be >possible, so OOM error should not come. ... > Is it the case when the hash table of not even one of the 16 partitions >fits in memory?
It will OOM if any one of them overflows. The grace hash-join works really well when the memory side is a primary key and doesn't have these sort of skews. The real problem with the shuffle hash-join is that it amplifies the skews, since it is doing distributions on the hashcode of the join keys. > But increasing the partitions to 100 also did not solve the problem >(This is in the case of 3G container size and 5G small table size. > I have given a high value for >hive.auto.convert.join.noconditionaltask.size so that the broadcast hash >join path is picked. Unfortunately, it will only start spilling after it overflows "hive.auto.convert.join.noconditionaltask.size" or 55% of the heap. set hive.mapjoin.followby.gby.localtask.max.memory.usage=0.1; // Get the total available memory from memory manager long totalMapJoinMemory = desc.getMemoryNeeded(); LOG.info("Memory manager allocates " + totalMapJoinMemory + " bytes for the loading hashtable."); if (totalMapJoinMemory <= 0) { totalMapJoinMemory = HiveConf.getLongVar( hconf, HiveConf.ConfVars.HIVECONVERTJOINNOCONDITIONALTASKTHRESHOLD); } long processMaxMemory = ManagementFactory.getMemoryMXBean().getHeapMemoryUsage().getMax(); if (totalMapJoinMemory > processMaxMemory) { float hashtableMemoryUsage = HiveConf.getFloatVar( hconf, HiveConf.ConfVars.HIVEHASHTABLEFOLLOWBYGBYMAXMEMORYUSAGE); LOG.warn("totalMapJoinMemory value of " + totalMapJoinMemory + " is greater than the max memory size of " + processMaxMemory); // Don't want to attempt to grab more memory than we have available .. percentage is a bit arbitrary totalMapJoinMemory = (long) (processMaxMemory * hashtableMemoryUsage); } If you can devise an example using TPC-DS schema on hive-testbench, I can run it locally and tell you what's happening. Cheers, Gopal