[jira] [Work logged] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint

ASF GitHub Bot (Jira) Tue, 24 Sep 2019 16:15:05 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-22221?focusedWorklogId=317897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-317897
 ]


ASF GitHub Bot logged work on HIVE-22221:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Sep/19 23:14
            Start Date: 24/Sep/19 23:14
    Worklog Time Spent: 10m 
      Work Description: jdere commented on pull request #778: HIVE-22221: Llap 
external client - Need to reduce LlapBaseInputFormat#getSplits() footprint
URL: https://github.com/apache/hive/pull/778#discussion_r327840568
 
 

 ##########
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java
 ##########
 @@ -192,9 +199,31 @@ public PlanFragment(TezWork work, Schema schema, JobConf 
jc) {
 
   @Override
   public void process(Object[] arguments) throws HiveException {
+    initArgs(arguments);
+    try {
+      SplitResult splitResult = getSplitResult(false);
+      InputSplit[] splits = schemaSplitOnly ? new 
InputSplit[]{splitResult.schemaSplit} : splitResult.actualSplits;
+      for (InputSplit s : splits) {
+        Object[] os = new Object[1];
+        bos.reset();
+        s.write(dos);
+        byte[] frozen = bos.toByteArray();
+        os[0] = frozen;
+        forward(os);
+      }
+    } catch (Exception e) {
+      throw new HiveException(e);
+    }
+  }
 
-    String query = stringOI.getPrimitiveJavaObject(arguments[0]);
-    int num = intOI.get(arguments[1]);
+  protected void initArgs(Object[] arguments) {
+    inputArgQuery = stringOI.getPrimitiveJavaObject(arguments[0]);
+    inputArgNumSplits = intOI.get(arguments[1]);
+    schemaSplitOnly = inputArgNumSplits == 0;
 
 Review comment:
   Under what circumstances does numSplits actually get set to 0, during actual 
usage?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 317897)
    Time Spent: 20m  (was: 10m)

> Llap external client - Need to reduce LlapBaseInputFormat#getSplits() 
> footprint  
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-22221
>                 URL: https://issues.apache.org/jira/browse/HIVE-22221
>             Project: Hive
>          Issue Type: Bug
>          Components: llap, UDF
>            Reporter: Shubham Chaurasia
>            Assignee: Shubham Chaurasia
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-22221.1.patch, HIVE-22221.2.patch, 
> HIVE-22221.3.patch, HIVE-22221.4.patch, HIVE-22221.5.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> While querying through llap external client, LlapBaseInputFormat#getSplits() 
> invokes get_splits() (GenericUDTFGetSplits) udtf under the hoods.
> GenericUDTFGetSplits returns LlapInputSplit in which planBytes[] occupies 
> around 90% of the split size.
> Depending on data size/partitions and plan,  LlapInputSplit can grow upto 1mb 
> with planBytes[] being common to all the splits and occupying more than 850 
> kb. Also, it sometimes causes OOM on HS2 depending on HS2 heap size.
> This can be resolved by separating out common parts from actual splits and 
> reassembling them at client side. 
> We can also provide an option where client can say it does not want to 
> reassemble them and can take the control of reassembling in it's hands.
> Splits can be broken like:
> 1) schema split
> 2) plan split
> 3) actual split 1
> 4) actual split 2....and so on.
> This greatly reduces the memory(in my case from 5GB(~5000 splits) to around 
> 15MB) on server side  and hence the data transfer. And this eliminates OOM on 
> HS2 side.
> cc [~jdere] [~sankarh] [~thejas]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-22221) Llap external client - Need to reduce LlapBaseInputFormat#getSplits() footprint

Reply via email to