[jira] [Work logged] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly

ASF GitHub Bot (Jira) Thu, 10 Dec 2020 01:24:34 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-24207?focusedWorklogId=522631&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-522631
 ]


ASF GitHub Bot logged work on HIVE-24207:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 10/Dec/20 09:23
            Start Date: 10/Dec/20 09:23
    Worklog Time Spent: 10m 
      Work Description: abstractdog commented on a change in pull request #1556:
URL: https://github.com/apache/hive/pull/1556#discussion_r540004959



##########
File path: ql/src/test/queries/clientpositive/authorization_view_1.q
##########
@@ -1,5 +1,6 @@
 --! qt:dataset:src
 set 
hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider;
+set hive.exec.reducers.max=1;

Review comment:
       yes, as this patch changes the behavior of all queries having limit, 
basically, all queries with >1 reducers are subject to intermittent result 
changes, because of making reducers exit early
   that's what I experienced in early precommit runs, and I didn't want to 
introduce ORDER BY clauses

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java
##########
@@ -19,11 +19,18 @@
 package org.apache.hadoop.hive.ql.exec;
 
 import java.io.Serializable;
+import java.util.concurrent.Callable;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
 
 import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.ql.CompilationOpContext;
+import org.apache.hadoop.hive.ql.exec.tez.DagUtils;
+import org.apache.hadoop.hive.ql.exec.tez.LlapObjectCache;
 import org.apache.hadoop.hive.ql.metadata.HiveException;
 import org.apache.hadoop.hive.ql.plan.LimitDesc;
+import org.apache.hadoop.hive.ql.plan.OperatorDesc;

Review comment:
       I'll do




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 522631)
    Time Spent: 1h  (was: 50m)

> LimitOperator can leverage ObjectCache to bail out quickly
> ----------------------------------------------------------
>
>                 Key: HIVE-24207
>                 URL: https://issues.apache.org/jira/browse/HIVE-24207
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> {noformat}
> select  ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in 
> (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk 
> limit 100;
>  select distinct ss_sold_date_sk from store_sales, date_dim where 
> date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = 
> date_dim.d_date_sk limit 100;
>  {noformat}
> Queries like the above generate a large number of map tasks. Currently they 
> don't bail out after generating enough amount of data. 
> It would be good to make use of ObjectCache & retain the number of records 
> generated. LimitOperator/VectorLimitOperator can bail out for the later tasks 
> in the operator's init phase itself. 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24207) LimitOperator can leverage ObjectCache to bail out quickly

Reply via email to