[ 
https://issues.apache.org/jira/browse/HIVE-25501?focusedWorklogId=647261&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-647261
 ]

ASF GitHub Bot logged work on HIVE-25501:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 07/Sep/21 09:19
            Start Date: 07/Sep/21 09:19
    Worklog Time Spent: 10m 
      Work Description: abstractdog commented on a change in pull request #2620:
URL: https://github.com/apache/hive/pull/2620#discussion_r703336272



##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java
##########
@@ -541,6 +570,69 @@ public Map read(Kryo kryo, Input input, Class<? extends 
Map> type) {
     }
   }
 
+  /**
+   * We use a custom {@link com.esotericsoftware.kryo.Serializer} for {@link 
MapWork} objects e.g. in
+   * order to remove useless properties in execution time.
+   */
+  private static class MapWorkSerializer extends FieldSerializer<MapWork> {
+
+    public MapWorkSerializer(Kryo kryo, Class type) {
+      super(kryo, type);
+    }
+
+    @Override
+    public void write(Kryo kryo, Output output, MapWork mapWork) {
+      filterMapworkProperties(kryo, mapWork);
+      super.write(kryo, output, mapWork);
+    }
+
+    private void filterMapworkProperties(Kryo kryo, MapWork mapWork) {
+      Configuration configuration = ((KryoWithHooks) kryo).getConf();
+      if (configuration == null || HiveConf
+          .getVar(configuration, 
HiveConf.ConfVars.HIVE_PLAN_MAPWORK_SERIALIZATION_SKIP_PROPERTIES).isEmpty()) {
+        return;
+      }
+      String[] filterProps =
+          HiveConf.getVar(configuration, 
HiveConf.ConfVars.HIVE_PLAN_MAPWORK_SERIALIZATION_SKIP_PROPERTIES).split(",");
+      for (String prop : filterProps) {
+        boolean isRegex = isRegex(prop);
+        LOG.debug("Trying to filter mapwork properties (regex: " + isRegex + 
"): " + prop);
+
+        for (Entry<Path, PartitionDesc> partDescEntry : 
mapWork.getPathToPartitionInfo().entrySet()) {
+          /*
+           * remove by regex, could be a bit more expensive because of 
iterating and matching regexes
+           * e.g.: in case of impala_intermediate_stats_chunk1, 
impala_intermediate_stats_chunk2, user only needs to
+           * configure impala_intermediate_stats_chunk.*

Review comment:
       sure




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 647261)
    Time Spent: 50m  (was: 40m)

> Provide a filter for removing useless properties from PartitionDesc objects 
> before MapWork serialization
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-25501
>                 URL: https://issues.apache.org/jira/browse/HIVE-25501
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> This is due to performance considerations. When a large amount of partitions 
> is present in MapWork, serializing useless properties (coming from metastore 
> as a partititon metadata) could become a bottleneck, which can even lead to 
> OOM in Tez AM if the dag plan becomes large.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to