[ 
https://issues.apache.org/jira/browse/HIVE-4318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13629957#comment-13629957
 ] 

Gunther Hagleitner commented on HIVE-4318:
------------------------------------------

[~kevinwilfong]: Here are the additional numbers. 

Summary: You were right about counters having a significant effect despite the 
flag, but OperatorHooks are definitely expensive too.

All tests were run on EC2, single node setup. I used ~3m rows, single table, 
stored in rc file. Query was count\(*\) with a simple not very selective where 
clause. I've ran each different build 5 times and averaged the last 3 runs. 
There was little difference between the runs. Hive.task.progress was off in all 
runs, no actual operator hooks were installed.

I've also tested both removing counters and a fixed version of counters. The 
fixed version places the check for the flag at the right place to avoid 
unnecessary calls to System.currentTimeMillis(), as well as unnecessary 
counting of the rows, etc.

Numbers:

{noformat}
Current trunk: 44.5 seconds
Fix for counters, unchanged operator hooks: 33.5 seconds (Kevin, that's the run 
you asked for)
Fix for counters, removal of operator hooks: 29.3 seconds
Removal of both operator hooks and counters completely: 27.9 seconds
{noformat}

Proposal:

- Remove operator hooks and backport to 0.11 branch. That's a regression that 
was introduced between 0.10 and 0.11, I believe.
- Remove profiler for now and backport to 0.11 branch. Profiler doesn't work 
without operator hooks right now. I'll open a jira to re-introduce profiler in 
a way that doesn't add any code to the inner loop (maybe hidden behind static 
final var that is false, so compiler removes it).
- Counters: Change this patch to include my fix for counters and backport to 
0.11. This gives us a significant boost, but isn't a regression from the last 
version. I'll open a jira to dig deeper and see if we can get even closer to 
the result with the counters completely removed.

How does that sound?
                
> OperatorHooks hit performance even when not used
> ------------------------------------------------
>
>                 Key: HIVE-4318
>                 URL: https://issues.apache.org/jira/browse/HIVE-4318
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>         Environment: Ubuntu LXC (64 bit)
>            Reporter: Gopal V
>            Assignee: Gunther Hagleitner
>         Attachments: HIVE-4318.1.patch
>
>
> Operator Hooks inserted into Operator.java cause a performance hit even when 
> it is not being used.
> For a count(1) query tested with & without the operator hook calls.
> {code:title=with}
> 2013-04-09 07:33:58,920 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 84.07 sec
> Total MapReduce CPU Time Spent: 1 minutes 24 seconds 70 msec
> OK
> 28800991
> Time taken: 40.407 seconds, Fetched: 1 row(s)
> {code}
> {code:title=without}
> 2013-04-09 07:33:02,355 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 68.48 sec
> ...
> Total MapReduce CPU Time Spent: 1 minutes 8 seconds 480 msec
> OK
> 28800991
> Time taken: 35.907 seconds, Fetched: 1 row(s)
> {code}
> The effect is multiplied by the number of operators in the pipeline that has 
> to forward the row - the more operators there are the, the slower the query.
> The modification made to test this was 
> {code:title=Operator.java}
> --- ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
> +++ ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java
> @@ -526,16 +526,16 @@ public void process(Object row, int tag) throws 
> HiveException {
>        return;
>      }
>      OperatorHookContext opHookContext = new OperatorHookContext(this, row, 
> tag);
> -    preProcessCounter();
> -    enterOperatorHooks(opHookContext);
> +    //preProcessCounter();
> +    //enterOperatorHooks(opHookContext);
>      processOp(row, tag);
> -    exitOperatorHooks(opHookContext);
> -    postProcessCounter();
> +    //exitOperatorHooks(opHookContext);
> +    //postProcessCounter();
>    }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to