[jira] [Work logged] (HIVE-24962) Enable partition pruning for Iceberg tables

ASF GitHub Bot (Jira) Mon, 12 Apr 2021 04:35:12 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-24962?focusedWorklogId=580903&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580903
 ]


ASF GitHub Bot logged work on HIVE-24962:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Apr/21 11:34
            Start Date: 12/Apr/21 11:34
    Worklog Time Spent: 10m 
      Work Description: pvary commented on a change in pull request #2137:
URL: https://github.com/apache/hive/pull/2137#discussion_r611550383



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DynamicPartitionPruner.java
##########
@@ -514,4 +569,29 @@ private boolean checkForSourceCompletion(String name) {
     }
     return false;
   }
+
+  /**
+   * Recursively replaces the ExprNodeDynamicListDesc to the list of the 
actual values. As a result of this call the
+   * original expression is modified so it can be used for pushing down to the 
TableScan for filtering the data at the
+   * source.
+   * <p>
+   * Please make sure to clone the predicate if needed since the original node 
will be modified.
+   * @param node The node we are traversing
+   * @param dynArgs The constant values we are substituting
+   */
+  private void replaceDynamicLists(ExprNodeDesc node, 
Collection<ExprNodeConstantDesc> dynArgs) {
+    List<ExprNodeDesc> children = node.getChildren();
+    if (children != null && !children.isEmpty()) {
+      ListIterator<ExprNodeDesc> iterator = node.getChildren().listIterator();
+      while (iterator.hasNext()) {
+        ExprNodeDesc child = iterator.next();
+        if (child instanceof ExprNodeDynamicListDesc) {
+          iterator.remove();
+          dynArgs.forEach(iterator::add);

Review comment:
       This is where my knowledge might be insufficient, but we have these 
`SourceInfo` objects we try to evaluate.
   
   The `SourceInfo` has the following attributes:
   ```
       public final ExprNodeDesc partKey;
       public final ExprNodeDesc predicate;  // <- don't mind this as this is 
created by me
       public final Deserializer deserializer;
       public final StructObjectInspector soi;
       public final StructField field;
       public final ObjectInspector fieldInspector;
       /* List of partitions that are required - populated from processing each 
event */
       public Set<Object> values = new HashSet<Object>();
       /* Whether to skipPruning - depends on the payload from an event which 
may signal skip - if the event payload is too large */
       public AtomicBoolean skipPruning = new AtomicBoolean();
       public final String columnName;
       public final String columnType;
       private boolean mustKeepOnePartition;
   ```
   
   This suggests to me that every SourceInfo is only related to a single 
column. I added the `predicate` at the same place where the other fields are 
added, so I think the `predicate` should only contain a single column too.
   
   Also I have run tests with `a=L1 and b=L2` conditions and found that the 
condition is split to multiple expressions and `SourceInfo` objects - but that 
was only a single test, so....
   
   Maybe we should return false in `addDynamicSplitPruningEdge` if we have 
multiple columns so we can bail out early?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 580903)
    Time Spent: 3h 20m  (was: 3h 10m)

> Enable partition pruning for Iceberg tables
> -------------------------------------------
>
>                 Key: HIVE-24962
>                 URL: https://issues.apache.org/jira/browse/HIVE-24962
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> We should enable partition pruning above iceberg tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-24962) Enable partition pruning for Iceberg tables

Reply via email to