Zheng Shao created HIVE-11414:
---------------------------------

             Summary: Fix OOM in MapTask with many input partitions by making 
ColumnarSerDeBase's cachedLazyStruct weakly referenced
                 Key: HIVE-11414
                 URL: https://issues.apache.org/jira/browse/HIVE-11414
             Project: Hive
          Issue Type: Improvement
          Components: Serializers/Deserializers
    Affects Versions: 1.2.0, 0.13.1, 0.14.0, 0.12.0, 0.11.0
            Reporter: Zheng Shao
            Priority: Minor


MapTask hit OOM in the following situation in our production environment:
* src: 2048 partitions, each with 1 file of about 2MB using RCFile format
* query: INSERT OVERWRITE TABLE tgt SELECT * FROM src
* Hadoop version: Both on CDH 4.7 using MR1 and CDH 5.4.1 using YARN.
* MapTask memory Xmx: 1.5GB

By analyzing the heap dump using jhat, we realized that the problem is:
* One single mapper is processing many partitions (because of 
CombineHiveInputFormat)
* Each input path (equivalent to partition here) will construct its own SerDe
* Each SerDe will do its own caching of deserialized object (and try to reuse 
it), but will never release it (in this case, the 
serde2.columnar.ColumnarSerDeBase has a field cachedLazyStruct which can take a 
lot of space - pretty much the last N rows of a file where N is the number of 
rows in a columnar block).
* This problem may exist in other SerDe as well, but columnar file format are 
affected the most because they need bigger cache for the last N rows instead of 
1 row.

Proposed solution:
* Make cachedLazyStruct a weakly referenced object.  Do similar changes to 
other columnar serde if any (e.g. maybe ORCFile's serde as well).

Alternative solutions:
* We can also free up the whole SerDe after processing a block/file.  The 
problem with that is that the input splits may contain multiple blocks/files 
that maps to the same SerDe, and recreating a SerDe is just more work.
* We can also move the SerDe creation/free-up to the place when input file 
changes.  But that requires a much bigger change to the code.
* We can also add a "cleanup()" method to SerDe interface that release the 
cached object, but that change is not backward compatible with many SerDes that 
people have wrote.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to