[ https://issues.apache.org/jira/browse/HIVE-25501?focusedWorklogId=647261&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-647261 ]
ASF GitHub Bot logged work on HIVE-25501: ----------------------------------------- Author: ASF GitHub Bot Created on: 07/Sep/21 09:19 Start Date: 07/Sep/21 09:19 Worklog Time Spent: 10m Work Description: abstractdog commented on a change in pull request #2620: URL: https://github.com/apache/hive/pull/2620#discussion_r703336272 ########## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java ########## @@ -541,6 +570,69 @@ public Map read(Kryo kryo, Input input, Class<? extends Map> type) { } } + /** + * We use a custom {@link com.esotericsoftware.kryo.Serializer} for {@link MapWork} objects e.g. in + * order to remove useless properties in execution time. + */ + private static class MapWorkSerializer extends FieldSerializer<MapWork> { + + public MapWorkSerializer(Kryo kryo, Class type) { + super(kryo, type); + } + + @Override + public void write(Kryo kryo, Output output, MapWork mapWork) { + filterMapworkProperties(kryo, mapWork); + super.write(kryo, output, mapWork); + } + + private void filterMapworkProperties(Kryo kryo, MapWork mapWork) { + Configuration configuration = ((KryoWithHooks) kryo).getConf(); + if (configuration == null || HiveConf + .getVar(configuration, HiveConf.ConfVars.HIVE_PLAN_MAPWORK_SERIALIZATION_SKIP_PROPERTIES).isEmpty()) { + return; + } + String[] filterProps = + HiveConf.getVar(configuration, HiveConf.ConfVars.HIVE_PLAN_MAPWORK_SERIALIZATION_SKIP_PROPERTIES).split(","); + for (String prop : filterProps) { + boolean isRegex = isRegex(prop); + LOG.debug("Trying to filter mapwork properties (regex: " + isRegex + "): " + prop); + + for (Entry<Path, PartitionDesc> partDescEntry : mapWork.getPathToPartitionInfo().entrySet()) { + /* + * remove by regex, could be a bit more expensive because of iterating and matching regexes + * e.g.: in case of impala_intermediate_stats_chunk1, impala_intermediate_stats_chunk2, user only needs to + * configure impala_intermediate_stats_chunk.* Review comment: sure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 647261) Time Spent: 50m (was: 40m) > Provide a filter for removing useless properties from PartitionDesc objects > before MapWork serialization > -------------------------------------------------------------------------------------------------------- > > Key: HIVE-25501 > URL: https://issues.apache.org/jira/browse/HIVE-25501 > Project: Hive > Issue Type: Improvement > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > This is due to performance considerations. When a large amount of partitions > is present in MapWork, serializing useless properties (coming from metastore > as a partititon metadata) could become a bottleneck, which can even lead to > OOM in Tez AM if the dag plan becomes large. -- This message was sent by Atlassian Jira (v8.3.4#803005)