[ https://issues.apache.org/jira/browse/HIVE-22221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shubham Chaurasia updated HIVE-22221: ------------------------------------- Attachment: HIVE-22221.2.patch > Llap external client - Need to reduce LlapBaseInputFormat#getSplits() > footprint > --------------------------------------------------------------------------------- > > Key: HIVE-22221 > URL: https://issues.apache.org/jira/browse/HIVE-22221 > Project: Hive > Issue Type: Bug > Components: llap, UDF > Reporter: Shubham Chaurasia > Assignee: Shubham Chaurasia > Priority: Major > Labels: pull-request-available > Attachments: HIVE-22221.1.patch, HIVE-22221.2.patch > > Time Spent: 10m > Remaining Estimate: 0h > > While querying through llap external client, LlapBaseInputFormat#getSplits() > invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods. > GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies > around 90% of the split size. > Depending on data size/partitions and plan, LlapInputSplit can grow upto 1mb > with planBytes[] being common to all the splits and occupying more than 850 > kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size. > This can be resolved by separating out common parts from actual splits and > reassembling them at client side. > We can also provide an option where client can say it does not want to > reassemble them and can take the control of reassembling in it's hands. > Splits can be broken like: > 1) schema split > 2) plan split > 3) actual split 1 > 4) actual split 2....and so on. > This greatly reduces the memory(in my case from 5GB(~5000 splits) to around > 15MB) on server side and hence the data transfer. And this eliminates OOM on > HS2 side. > cc [~jdere] [~sankarh] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)