[
https://issues.apache.org/jira/browse/KAFKA-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801034#comment-17801034
]
Divij Vaidya commented on KAFKA-16052:
--------------------------------------
The fix in the PR for these improved but did not completely fix the OOM.
Here's the status now. The heap dump shows Mockito invocations of different
types such a place where we are mocking FileRrcords with each invocation
consuming 5MB of heap. We will end up fixing many tests to fix this. But I am
curious as to why Mockito is not cleaning up it's invocations? Why is it a
"leak" after the test has finished executing? Should we try to upgrade mockito
version and see if that fixes things?
Another second source of leak is ApplicationShutdownHooks which starts when
running EndToEndAuthorization tests. It has something to do with KDC server
since we also have DefaultDirectoryService retained objects on the heap. I will
start a child Jira to look into this.
The other part is leaked threads. You will notice on the picture below that
leaked suddenly spike (not correlated to heap memory increase) by hundreds. A
thread dump suggests large number of ExpirationReaper-AlterACL threads. I am
tracking that here: https://issues.apache.org/jira/browse/KAFKA-16059
!Screenshot 2023-12-28 at 18.44.19.png!
> OOM in Kafka test suite
> -----------------------
>
> Key: KAFKA-16052
> URL: https://issues.apache.org/jira/browse/KAFKA-16052
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 3.7.0
> Reporter: Divij Vaidya
> Priority: Major
> Attachments: Screenshot 2023-12-27 at 14.04.52.png, Screenshot
> 2023-12-27 at 14.22.21.png, Screenshot 2023-12-27 at 14.45.20.png, Screenshot
> 2023-12-27 at 15.31.09.png, Screenshot 2023-12-27 at 17.44.09.png, Screenshot
> 2023-12-28 at 00.13.06.png, Screenshot 2023-12-28 at 00.18.56.png, Screenshot
> 2023-12-28 at 11.26.03.png, Screenshot 2023-12-28 at 11.26.09.png, Screenshot
> 2023-12-28 at 18.44.19.png, newRM.patch
>
>
> *Problem*
> Our test suite is failing with frequent OOM. Discussion in the mailing list
> is here: [https://lists.apache.org/thread/d5js0xpsrsvhgjb10mbzo9cwsy8087x4]
> *Setup*
> To find the source of leaks, I ran the :core:test build target with a single
> thread (see below on how to do it) and attached a profiler to it. This Jira
> tracks the list of action items identified from the analysis.
> How to run tests using a single thread:
> {code:java}
> diff --git a/build.gradle b/build.gradle
> index f7abbf4f0b..81df03f1ee 100644
> --- a/build.gradle
> +++ b/build.gradle
> @@ -74,9 +74,8 @@ ext {
> "--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED"
> )- maxTestForks = project.hasProperty('maxParallelForks') ?
> maxParallelForks.toInteger() : Runtime.runtime.availableProcessors()
> - maxScalacThreads = project.hasProperty('maxScalacThreads') ?
> maxScalacThreads.toInteger() :
> - Math.min(Runtime.runtime.availableProcessors(), 8)
> + maxTestForks = 1
> + maxScalacThreads = 1
> userIgnoreFailures = project.hasProperty('ignoreFailures') ?
> ignoreFailures : false userMaxTestRetries =
> project.hasProperty('maxTestRetries') ? maxTestRetries.toInteger() : 0
> diff --git a/gradle.properties b/gradle.properties
> index 4880248cac..ee4b6e3bc1 100644
> --- a/gradle.properties
> +++ b/gradle.properties
> @@ -30,4 +30,4 @@ scalaVersion=2.13.12
> swaggerVersion=2.2.8
> task=build
> org.gradle.jvmargs=-Xmx2g -Xss4m -XX:+UseParallelGC
> -org.gradle.parallel=true
> +org.gradle.parallel=false {code}
> *Result of experiment*
> This is how the heap memory utilized looks like, starting from tens of MB to
> ending with 1.5GB (with spikes of 2GB) of heap being used as the test
> executes. Note that the total number of threads also increases but it does
> not correlate with sharp increase in heap memory usage. The heap dump is
> available at
> [https://www.dropbox.com/scl/fi/nwtgc6ir6830xlfy9z9cu/GradleWorkerMain_10311_27_12_2023_13_37_08.hprof.zip?rlkey=ozbdgh5vih4rcynnxbatzk7ln&dl=0]
>
> !Screenshot 2023-12-27 at 14.22.21.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)