[ 
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715528#comment-14715528
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-11652:
----------------------------------------------------------

[~jcamachorodriguez] You may want to look at the latest patch of HIVE-11341 
where a similar issue is being looked into by me.

{code}
// While all its descendants have not been dispatched,
// we do not move forward
while(!childrenDispatched) {
  for (Node childNode : nd.getChildren()) {
     walk(childNode);
   }
   childrenDispatched = getDispatchedList().containsAll(nd.getChildren());
 }
{code}

In the above change, I see that you are using the system stack here instead of 
toWalk list to traverse the children nodes. 
However a couple of points :
1. Why do you need the 'while(!childrenDispatched)' condition at all. Wont the 
below code work? 
{code}
 if (!getDispatchedList().contains(childNode)) {
  walk(childNode). 
 }
{code}

2. we should make sure that walk() is not called again for a node already 
dispatched, we should make sure that this logic is present in startWalking as 
well
{code}
  public void startWalking(Collection<Node> startNodes,
      HashMap<Node, Object> nodeOutput) throws SemanticException {
    for Node (nd : startNodes) {
      // If already dispatched list, continue.
      if (getDispatchedList().contains(nd)) {
        continue;
       }
      walk(nd);
      if (nodeOutput != null && getDispatchedList().contains(nd)) {
        nodeOutput.put(nd, retMap.get(nd));
      }
    }
  }
{code}

In short, if you are using the system stack via recursion you can eliminate the 
use of toWalk data structure all together for DefaultGraphWalker.
Otherwise, you can replace toWalk to be a stack instead of an arraylist.

Thanks
Hari

> Avoid expensive call to removeAll in DefaultGraphWalker
> -------------------------------------------------------
>
>                 Key: HIVE-11652
>                 URL: https://issues.apache.org/jira/browse/HIVE-11652
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer, Physical Optimizer
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-11652.patch
>
>
> When the plan is too large, the removeAll call in DefaultGraphWalker (line 
> 140) will take very long as it will have to go through the list looking for 
> each of the nodes. We try to get rid of this call by rewriting the logic in 
> the walker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to