[ https://issues.apache.org/jira/browse/HIVE-25061?focusedWorklogId=632973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-632973 ]
ASF GitHub Bot logged work on HIVE-25061: ----------------------------------------- Author: ASF GitHub Bot Created on: 03/Aug/21 14:44 Start Date: 03/Aug/21 14:44 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #2225: URL: https://github.com/apache/hive/pull/2225#discussion_r681829001 ########## File path: ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java ########## @@ -664,6 +629,84 @@ public Object computeValue(Object row) throws HiveException { */ public abstract boolean isEqual(Object v1, Object v2); + protected int binarySearchBack(int rowId, PTFPartition p, Object sortKey, int amt, + Order order) throws HiveException { + boolean isOrderDesc = order.equals(Order.DESC); + Object rowVal = sortKey; + + int rMin = 0; // tracks lowest possible number fulfilling the range requirement + int rMax = rowId; // tracks highest possible number fulfilling the range requirement + + boolean isDistanceGreater = true; + while (rMin < rMax) { + isDistanceGreater = isDistanceGreater(isOrderDesc ? rowVal : sortKey, isOrderDesc ? sortKey : rowVal, amt); + if (isDistanceGreater) { + rMin = rowId + 1; + } else { + rMax = rowId; + } + rowId = rMin + (rMax - rMin) / 2; + rowVal = computeValueUseCache(rowId, p); + } + return rMin; + } + + private int linearSearchBack(int r, PTFPartition p, Object sortKey, int amt, + Order order) throws HiveException { + boolean isOrderDesc = order.equals(Order.DESC); + Object rowVal = sortKey; + while (r >= 0 && !isDistanceGreater(isOrderDesc ? rowVal : sortKey, + isOrderDesc ? sortKey : rowVal, amt)) { + Pair<Integer, Object> stepResult = skipOrStepBack(r, p); + r = stepResult.getLeft(); + rowVal = stepResult.getRight(); + } + return r + 1; + } + + protected int binarySearchForward(int rowId, PTFPartition p, Object sortKey, int amt, + Order order) throws HiveException { + boolean isOrderDesc = order.equals(Order.DESC); + Object rowVal = sortKey; + int rMin = rowId; // tracks lowest possible number fulfilling the range requirement + int rMax = p.size(); // tracks highest possible number fulfilling the range requirement + + boolean isDistanceGreater = true; + while (rMin < rMax) { + isDistanceGreater = + isDistanceGreater(isOrderDesc ? sortKey : rowVal, isOrderDesc ? rowVal : sortKey, amt); + /* + * if the currently inspected row is the last (p.size() - 1) but we're not far enough to fulfill + * the range requirement (!isDistanceGreater), we cannot move forward, let's simply return p.size(), + * range end is exclusive + */ + if (!isDistanceGreater && rowId == p.size() - 1) { Review comment: The second approach sounds reasonable to me -- just to clarify: is IndexOutOfBoundsException thrown from computeValueUseCache call ? rowVal should be calculated at the beginning of each loop -- for example before ```if (isDistanceGreater)``` -- this could avoid the exception -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 632973) Time Spent: 4h 20m (was: 4h 10m) > PTF: Improve ValueBoundaryScanner > --------------------------------- > > Key: HIVE-25061 > URL: https://issues.apache.org/jira/browse/HIVE-25061 > Project: Hive > Issue Type: Improvement > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Screen Shot 2021-04-27 at 1.02.37 PM.png > > Time Spent: 4h 20m > Remaining Estimate: 0h > > -First, I need to check whether TreeMap is really needed for our case.- > It turned out a binary-ish search approach could help in range calculation, > as we're searching in an ordered set of values. -- This message was sent by Atlassian Jira (v8.3.4#803005)