[jira] [Work logged] (HIVE-25794) CombineHiveRecordReader: log statements in a loop leads to memory pressure

ASF GitHub Bot (Jira) Sun, 12 Dec 2021 17:50:06 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-25794?focusedWorklogId=694811&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-694811
 ]


ASF GitHub Bot logged work on HIVE-25794:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Dec/21 01:49
            Start Date: 13/Dec/21 01:49
    Worklog Time Spent: 10m 
      Work Description: belugabehr commented on a change in pull request #2861:
URL: https://github.com/apache/hive/pull/2861#discussion_r767367650



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveRecordReader.java
##########
@@ -113,7 +113,9 @@ private PartitionDesc 
extractSinglePartSpec(CombineHiveInputSplit hsplit) throws
     for (Path path : hsplit.getPaths()) {
       PartitionDesc otherPart = HiveFileFormatUtils.getFromPathRecursively(
           pathToPartInfo, path, cache);
-      LOG.debug("Found spec for " + path + " " + otherPart + " from " + 
pathToPartInfo);
+      if (LOG.isDebugEnabled()) {

Review comment:
       I spent a lot of time trying to scrub this behavior from the code.  
Please do not include the logging guards and just use the anchors `{}`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 694811)
    Time Spent: 1h  (was: 50m)

> CombineHiveRecordReader: log statements in a loop leads to memory pressure
> --------------------------------------------------------------------------
>
>                 Key: HIVE-25794
>                 URL: https://issues.apache.org/jira/browse/HIVE-25794
>             Project: Hive
>          Issue Type: Bug
>          Components: Logging
>            Reporter: iBenny
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Similar to HIVE-16150, a huge string will be built in a loop, even the log 
> level is INFO. That leads to memory pressure when processing a big number of 
> split files. 
> From 
> [CombineHiveRecordReader.java|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveRecordReader.java#L116],
>  the following needs to be fixed.
> LOG.debug("Found spec for " + path + " " + otherPart + " from " + 
> pathToPartInfo);
> {code}
> "TezChild" #26 daemon prio=5 os_prio=0 tid=0x00007f5fd1716000 nid=0x2118a 
> runnable [0x00007f5f8c411000]
>    java.lang.Thread.State: RUNNABLE
>       at java.lang.String.valueOf(String.java:2994)
>       at java.lang.StringBuilder.append(StringBuilder.java:131)
>       at java.util.AbstractMap.toString(AbstractMap.java:557)
>       at java.lang.String.valueOf(String.java:2994)
>       at java.lang.StringBuilder.append(StringBuilder.java:131)
>       at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.extractSinglePartSpec(CombineHiveRecordReader.java:119)
>       at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.<init>(CombineHiveRecordReader.java:88)
>       at sun.reflect.GeneratedConstructorAccessor22.newInstance(Unknown 
> Source)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:257)
>       at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:144)
>       at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:153)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>       at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>       at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>       at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
>       at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25794) CombineHiveRecordReader: log statements in a loop leads to memory pressure

Reply via email to