Thanks for sharing this, I think the activity of OOM-Killer means high memory pressure (it just kills a process with the highest score of memory consumption). High CPU usage can only be a consequence of it, being constant GC.
Currently, tests do not run in parallel, but high memory usage can be caused by the nature test (e.g. running Flink with high parallelism). So I think the best way to deal with this is to use VM with more memory. Regards, Roman On Tue, Oct 20, 2020 at 8:56 AM Juha Mynttinen <juha.myntti...@gmail.com> wrote: > Hey, > > Good hint that /var/log/kern.log. This time I can see this: > > Oct 20 09:44:48 ubuntu kernel: [ 1925.651551] > oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service > ,task=java,pid=270024,uid=1000 > Oct 20 09:44:48 ubuntu kernel: [ 1925.651632] Out of memory: Killed > process 270024 (java) total-vm:9841596kB, anon-rss:4820380kB, file-rss:0kB, > shmem-rss:0kB, UID:1000 pgtables:11780kB oom_score_adj:0 > Oct 20 09:44:48 ubuntu kernel: [ 1925.844155] oom_reaper: reaped process > 270024 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > > The next question is why does this happen.... I'll try to dig deeper. > > About the CPU load. I have five CPUs. Theoretically it makes sense to run > five tests at time to max out the CPUs. However, when I look at what the > five Java processes (that MVN forks) are doing, it can be seen that each of > those processes have a large number of threads wanting to use CPU. Here's > an example from 'top -H' > > top - 09:42:03 up 29 min, 1 user, load average: 17,00, 12,86, 8,81 > Threads: 1099 total, 21 running, 1078 sleeping, 0 stopped, 0 zombie > %Cpu(s): 90,5 us, 9,4 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,1 si, > 0,0 st > MiB Mem : 7961,6 total, 1614,3 free, 4023,8 used, 2323,5 buff/cache > MiB Swap: 2048,0 total, 2047,0 free, 1,0 used. 3638,9 avail Mem > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > > 254825 juha 20 0 4250424 195768 27596 R 20,9 2,4 0:01.41 C2 > CompilerThre > > 255116 juha 20 0 2820448 99240 27488 R 20,3 1,2 0:00.78 > java > > 254968 juha 20 0 5312696 125212 27716 R 19,9 1,5 0:01.16 > java > > 255027 juha 20 0 5310648 108716 27496 R 19,9 1,3 0:00.90 > java > > 255123 juha 20 0 2820448 99120 27420 R 19,3 1,2 0:00.78 > java > > 254829 juha 20 0 4240356 184376 27792 R 17,9 2,3 0:01.26 C2 > CompilerThre > > 253993 juha 20 0 6436132 276808 28000 R 17,6 3,4 0:02.47 C2 > CompilerThre > > 254793 juha 20 0 4250424 195768 27596 R 17,3 2,4 0:01.76 > java > > 254801 juha 20 0 4240356 184376 27792 R 16,3 2,3 0:01.67 > java > > 254298 juha 20 0 6510340 435360 28212 R 15,6 5,3 0:02.82 C2 > CompilerThre > > 255145 juha 20 0 2820448 99240 27488 S 15,6 1,2 0:00.51 C2 > CompilerThre > > 255045 juha 20 0 5310648 108716 27496 R 15,3 1,3 0:00.62 C2 > CompilerThre > > 255151 juha 20 0 2820448 99120 27420 S 14,0 1,2 0:00.47 C2 > CompilerThre > > 254986 juha 20 0 5312696 125212 27716 R 12,6 1,5 0:00.76 C2 > CompilerThre > > 253980 juha 20 0 6436132 276808 28000 S 11,6 3,4 0:02.63 > java > > 255148 juha 20 0 2820448 99240 27488 S 10,6 1,2 0:00.39 C1 > CompilerThre > > 255154 juha 20 0 2820448 99120 27420 S 9,6 1,2 0:00.37 C1 > CompilerThre > > 254457 juha 20 0 4269900 218036 28236 R 9,3 2,7 0:02.22 C2 > CompilerThre > > 254299 juha 20 0 6510340 435360 28212 S 8,6 5,3 0:01.30 C1 > CompilerThre > > 255047 juha 20 0 5310648 108716 27496 S 8,6 1,3 0:00.42 C1 > CompilerThre > > 253994 juha 20 0 6436132 276808 28000 R 7,3 3,4 0:01.10 C1 > CompilerThre > > 255312 juha 20 0 4250424 195768 27596 R 7,0 2,4 0:00.21 C2 > CompilerThre > > 254831 juha 20 0 4240356 184376 27792 S 6,3 2,3 0:00.62 C1 > CompilerThre > > 254988 juha 20 0 5312696 125212 27716 S 6,3 1,5 0:00.45 C1 > CompilerThre > > 254828 juha 20 0 4250424 195768 27596 S 6,0 2,4 0:00.64 C1 > CompilerThre > > 254720 juha 20 0 6510340 435360 28212 S 5,0 5,3 0:00.15 > flink-akka.acto > > > It can be seen that the JIT related threads consume quite a lot of CPU, > essentially leaving less CPU available to the actual test code. By using > htop I can also see the garbage collection related threads eating CPU. This > doesn't seem right. I think it'd make sense to run the tests with less > parallelism to better utilize the CPUs. Having greatly more threads wanting > CPU slows things down (not speed up). > > However, AFAIK high CPU load shouldn't trigger OOM-killer? > > Regards, > Juha > > > > > El lun., 19 oct. 2020 a las 20:48, Khachatryan Roman (< > khachatryan.ro...@gmail.com>) escribió: > >> Hey, >> >> One reason could be that a resource-intensive test was killed by oom >> killer. You can inspect /var/log/kern.log for the related messages in your >> VM. >> >> Regards, >> Roman >> >> >> On Mon, Oct 19, 2020 at 5:57 PM Juha Mynttinen <juha.myntti...@gmail.com> >> wrote: >> >>> >>> Hey, >>> >>> I'm trying to build Flink and failing. I'm running Ubuntu 20.04.1 in >>> a virtual machine on Windows 10. I'm using OpenJDK 11.0.8. I'm on the >>> master branch, commit 9eae578ae592254d54bc51c679644e8e84c65152. >>> >>> The command I'm using: >>> >>> apache-maven-3.2.5/bin/mvn clean verify >>> >>> The output: >>> >>> [INFO] Flink : Tests ...................................... FAILURE >>> [14:38 min] >>> [INFO] Flink : Streaming Scala ............................ SKIPPED >>> [INFO] Flink : Connectors : HCatalog ...................... SKIPPED >>> [INFO] Flink : Connectors : Base .......................... SKIPPED >>> [INFO] Flink : Connectors : Files ......................... SKIPPED >>> [INFO] Flink : Table : .................................... SKIPPED >>> [INFO] Flink : Table : Common ............................. SKIPPED >>> [INFO] Flink : Table : API Java ........................... SKIPPED >>> [INFO] Flink : Table : API Java bridge .................... SKIPPED >>> [INFO] Flink : Table : API Scala .......................... SKIPPED >>> [INFO] Flink : Table : API Scala bridge ................... SKIPPED >>> [INFO] Flink : Table : SQL Parser ......................... SKIPPED >>> [INFO] Flink : Libraries : ................................ SKIPPED >>> [INFO] Flink : Libraries : CEP ............................ SKIPPED >>> [INFO] Flink : Table : Planner ............................ SKIPPED >>> [INFO] Flink : Table : SQL Parser Hive .................... SKIPPED >>> [INFO] Flink : Table : Runtime Blink ...................... SKIPPED >>> [INFO] Flink : Table : Planner Blink ...................... SKIPPED >>> [INFO] Flink : Metrics : JMX .............................. SKIPPED >>> [INFO] Flink : Formats : .................................. SKIPPED >>> [INFO] Flink : Formats : Json ............................. SKIPPED >>> [INFO] Flink : Connectors : Kafka base .................... SKIPPED >>> [INFO] Flink : Connectors : Elasticsearch base ............ SKIPPED >>> [INFO] Flink : Connectors : Elasticsearch 5 ............... SKIPPED >>> [INFO] Flink : Connectors : Elasticsearch 6 ............... SKIPPED >>> [INFO] Flink : Connectors : Elasticsearch 7 ............... SKIPPED >>> [INFO] Flink : Connectors : HBase base .................... SKIPPED >>> [INFO] Flink : Connectors : HBase 1.4 ..................... SKIPPED >>> [INFO] Flink : Connectors : HBase 2.2 ..................... SKIPPED >>> [INFO] Flink : Formats : Hadoop bulk ...................... SKIPPED >>> [INFO] Flink : Formats : Orc .............................. SKIPPED >>> [INFO] Flink : Formats : Orc nohive ....................... SKIPPED >>> [INFO] Flink : Formats : Avro ............................. SKIPPED >>> [INFO] Flink : Formats : Parquet .......................... SKIPPED >>> [INFO] Flink : Formats : Csv .............................. SKIPPED >>> [INFO] Flink : Connectors : Hive .......................... SKIPPED >>> [INFO] Flink : Connectors : JDBC .......................... SKIPPED >>> [INFO] Flink : Connectors : RabbitMQ ...................... SKIPPED >>> [INFO] Flink : Connectors : Twitter ....................... SKIPPED >>> [INFO] Flink : Connectors : Nifi .......................... SKIPPED >>> [INFO] Flink : Connectors : Cassandra ..................... SKIPPED >>> [INFO] Flink : Connectors : Filesystem .................... SKIPPED >>> [INFO] Flink : Connectors : Kafka ......................... SKIPPED >>> [INFO] Flink : Connectors : Google PubSub ................. SKIPPED >>> [INFO] Flink : Connectors : Kinesis ....................... SKIPPED >>> [INFO] Flink : Connectors : SQL : Elasticsearch 6 ......... SKIPPED >>> [INFO] Flink : Connectors : SQL : Elasticsearch 7 ......... SKIPPED >>> [INFO] Flink : Connectors : SQL : HBase 1.4 ............... SKIPPED >>> [INFO] Flink : Connectors : SQL : HBase 2.2 ............... SKIPPED >>> [INFO] Flink : Connectors : SQL : Hive 1.2.2 .............. SKIPPED >>> [INFO] Flink : Connectors : SQL : Hive 2.2.0 .............. SKIPPED >>> [INFO] Flink : Connectors : SQL : Hive 2.3.6 .............. SKIPPED >>> [INFO] Flink : Connectors : SQL : Hive 3.1.2 .............. SKIPPED >>> [INFO] Flink : Connectors : SQL : Kafka ................... SKIPPED >>> [INFO] Flink : Formats : Avro confluent registry .......... SKIPPED >>> [INFO] Flink : Formats : Sequence file .................... SKIPPED >>> [INFO] Flink : Formats : Compress ......................... SKIPPED >>> [INFO] Flink : Formats : SQL Orc .......................... SKIPPED >>> [INFO] Flink : Formats : SQL Parquet ...................... SKIPPED >>> [INFO] Flink : Formats : SQL Avro ......................... SKIPPED >>> [INFO] Flink : Examples : Streaming ....................... SKIPPED >>> [INFO] Flink : Examples : Table ........................... SKIPPED >>> [INFO] Flink : Examples : Build Helper : .................. SKIPPED >>> [INFO] Flink : Examples : Build Helper : Streaming Twitter SKIPPED >>> [INFO] Flink : Examples : Build Helper : Streaming State machine SKIPPED >>> [INFO] Flink : Examples : Build Helper : Streaming Google PubSub SKIPPED >>> [INFO] Flink : Container .................................. SKIPPED >>> [INFO] Flink : Queryable state : Runtime .................. SKIPPED >>> [INFO] Flink : Mesos ...................................... SKIPPED >>> [INFO] Flink : Kubernetes ................................. SKIPPED >>> [INFO] Flink : Yarn ....................................... SKIPPED >>> [INFO] Flink : Libraries : Gelly .......................... SKIPPED >>> [INFO] Flink : Libraries : Gelly scala .................... SKIPPED >>> [INFO] Flink : Libraries : Gelly Examples ................. SKIPPED >>> [INFO] Flink : External resources : ....................... SKIPPED >>> [INFO] Flink : External resources : GPU ................... SKIPPED >>> [INFO] Flink : Metrics : Dropwizard ....................... SKIPPED >>> [INFO] Flink : Metrics : Graphite ......................... SKIPPED >>> [INFO] Flink : Metrics : InfluxDB ......................... SKIPPED >>> [INFO] Flink : Metrics : Prometheus ....................... SKIPPED >>> [INFO] Flink : Metrics : StatsD ........................... SKIPPED >>> [INFO] Flink : Metrics : Datadog .......................... SKIPPED >>> [INFO] Flink : Metrics : Slf4j ............................ SKIPPED >>> [INFO] Flink : Libraries : CEP Scala ...................... SKIPPED >>> [INFO] Flink : Table : Uber ............................... SKIPPED >>> [INFO] Flink : Table : Uber Blink ......................... SKIPPED >>> [INFO] Flink : Python ..................................... SKIPPED >>> [INFO] Flink : Table : SQL Client ......................... SKIPPED >>> [INFO] Flink : Libraries : State processor API ............ SKIPPED >>> [INFO] Flink : ML : ....................................... SKIPPED >>> [INFO] Flink : ML : API ................................... SKIPPED >>> [INFO] Flink : ML : Lib ................................... SKIPPED >>> [INFO] Flink : ML : Uber .................................. SKIPPED >>> [INFO] Flink : Scala shell ................................ SKIPPED >>> [INFO] Flink : Dist ....................................... SKIPPED >>> [INFO] Flink : Yarn Tests ................................. SKIPPED >>> [INFO] Flink : E2E Tests : ................................ SKIPPED >>> [INFO] Flink : E2E Tests : CLI ............................ SKIPPED >>> [INFO] Flink : E2E Tests : Parent Child classloading program SKIPPED >>> [INFO] Flink : E2E Tests : Parent Child classloading lib-package SKIPPED >>> [INFO] Flink : E2E Tests : Dataset allround ............... SKIPPED >>> [INFO] Flink : E2E Tests : Dataset Fine-grained recovery .. SKIPPED >>> [INFO] Flink : E2E Tests : Datastream allround ............ SKIPPED >>> [INFO] Flink : E2E Tests : Batch SQL ...................... SKIPPED >>> [INFO] Flink : E2E Tests : Stream SQL ..................... SKIPPED >>> [INFO] Flink : E2E Tests : Bucketing sink ................. SKIPPED >>> [INFO] Flink : E2E Tests : Distributed cache via blob ..... SKIPPED >>> [INFO] Flink : E2E Tests : High parallelism iterations .... SKIPPED >>> [INFO] Flink : E2E Tests : Stream stateful job upgrade .... SKIPPED >>> [INFO] Flink : E2E Tests : Queryable state ................ SKIPPED >>> [INFO] Flink : E2E Tests : Local recovery and allocation .. SKIPPED >>> [INFO] Flink : E2E Tests : Elasticsearch 5 ................ SKIPPED >>> [INFO] Flink : E2E Tests : Elasticsearch 6 ................ SKIPPED >>> [INFO] Flink : Quickstart : ............................... SKIPPED >>> [INFO] Flink : Quickstart : Java .......................... SKIPPED >>> [INFO] Flink : Quickstart : Scala ......................... SKIPPED >>> [INFO] Flink : E2E Tests : Quickstart ..................... SKIPPED >>> [INFO] Flink : E2E Tests : Confluent schema registry ...... SKIPPED >>> [INFO] Flink : E2E Tests : Stream state TTL ............... SKIPPED >>> [INFO] Flink : E2E Tests : SQL client ..................... SKIPPED >>> [INFO] Flink : E2E Tests : Streaming file sink ............ SKIPPED >>> [INFO] Flink : E2E Tests : State evolution ................ SKIPPED >>> [INFO] Flink : E2E Tests : RocksDB state memory control ... SKIPPED >>> [INFO] Flink : E2E Tests : Common ......................... SKIPPED >>> [INFO] Flink : E2E Tests : Metrics availability ........... SKIPPED >>> [INFO] Flink : E2E Tests : Metrics reporter prometheus .... SKIPPED >>> [INFO] Flink : E2E Tests : Heavy deployment ............... SKIPPED >>> [INFO] Flink : E2E Tests : Connectors : Google PubSub ..... SKIPPED >>> [INFO] Flink : E2E Tests : Streaming Kafka base ........... SKIPPED >>> [INFO] Flink : E2E Tests : Streaming Kafka ................ SKIPPED >>> [INFO] Flink : E2E Tests : Plugins : ...................... SKIPPED >>> [INFO] Flink : E2E Tests : Plugins : Dummy fs ............. SKIPPED >>> [INFO] Flink : E2E Tests : Plugins : Another dummy fs ..... SKIPPED >>> [INFO] Flink : E2E Tests : TPCH ........................... SKIPPED >>> [INFO] Flink : E2E Tests : Streaming Kinesis .............. SKIPPED >>> [INFO] Flink : E2E Tests : Elasticsearch 7 ................ SKIPPED >>> [INFO] Flink : E2E Tests : Common Kafka ................... SKIPPED >>> [INFO] Flink : E2E Tests : TPCDS .......................... SKIPPED >>> [INFO] Flink : E2E Tests : Netty shuffle memory control ... SKIPPED >>> [INFO] Flink : E2E Tests : Python ......................... SKIPPED >>> [INFO] Flink : E2E Tests : HBase .......................... SKIPPED >>> [INFO] Flink : State backends : Heap spillable ............ SKIPPED >>> [INFO] Flink : Contrib : .................................. SKIPPED >>> [INFO] Flink : Contrib : Connectors : Wikiedits ........... SKIPPED >>> [INFO] Flink : FileSystems : Tests ........................ SKIPPED >>> [INFO] Flink : Docs ....................................... SKIPPED >>> [INFO] Flink : Walkthrough : .............................. SKIPPED >>> [INFO] Flink : Walkthrough : Common ....................... SKIPPED >>> [INFO] Flink : Walkthrough : Datastream Java .............. SKIPPED >>> [INFO] Flink : Walkthrough : Datastream Scala ............. SKIPPED >>> [INFO] >>> ------------------------------------------------------------------------ >>> [INFO] BUILD FAILURE >>> [INFO] >>> ------------------------------------------------------------------------ >>> [INFO] Total time: 36:49 min >>> [INFO] Finished at: 2020-10-19T18:24:46+03:00 >>> [INFO] Final Memory: 179M/614M >>> [INFO] >>> ------------------------------------------------------------------------ >>> [ERROR] Failed to execute goal >>> org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test >>> (integration-tests) on project flink-tests: There are test failures. >>> [ERROR] >>> [ERROR] Please refer to >>> /home/juha/git/apache-flink/flink-tests/target/surefire-reports for the >>> individual test results. >>> [ERROR] Please refer to dump files (if any exist) [date].dump, >>> [date]-jvmRun[N].dump and [date].dumpstream. >>> [ERROR] ExecutionException The forked VM terminated without properly >>> saying goodbye. VM crash or System.exit called? >>> [ERROR] Command was /bin/sh -c cd >>> /home/juha/git/apache-flink/flink-tests/target && >>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m >>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar >>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar >>> /home/juha/git/apache-flink/flink-tests/target/surefire >>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp >>> surefire_122313349068739873924160tmp >>> [ERROR] Error occurred in starting fork, check output in log >>> [ERROR] Process Exit Code: 137 >>> [ERROR] Crashed tests: >>> [ERROR] >>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase >>> [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: >>> ExecutionException The forked VM terminated without properly saying >>> goodbye. VM crash or System.exit called? >>> [ERROR] Command was /bin/sh -c cd >>> /home/juha/git/apache-flink/flink-tests/target && >>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m >>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar >>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar >>> /home/juha/git/apache-flink/flink-tests/target/surefire >>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp >>> surefire_122313349068739873924160tmp >>> [ERROR] Error occurred in starting fork, check output in log >>> [ERROR] Process Exit Code: 137 >>> [ERROR] Crashed tests: >>> [ERROR] >>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase >>> [ERROR] at >>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:510) >>> [ERROR] at >>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:457) >>> [ERROR] at >>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:298) >>> [ERROR] at >>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246) >>> [ERROR] at >>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183) >>> [ERROR] at >>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011) >>> [ERROR] at >>> org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857) >>> [ERROR] at >>> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) >>> [ERROR] at >>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) >>> [ERROR] at >>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) >>> [ERROR] at >>> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) >>> [ERROR] at >>> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) >>> [ERROR] at >>> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) >>> [ERROR] at >>> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) >>> [ERROR] at >>> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120) >>> [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355) >>> [ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155) >>> [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584) >>> [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216) >>> [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:160) >>> [ERROR] at >>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native >>> Method) >>> [ERROR] at >>> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> [ERROR] at >>> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> [ERROR] at java.base/java.lang.reflect.Method.invoke(Method.java:566) >>> [ERROR] at >>> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) >>> [ERROR] at >>> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) >>> [ERROR] at >>> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) >>> [ERROR] at >>> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) >>> [ERROR] Caused by: >>> org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM >>> terminated without properly saying goodbye. VM crash or System.exit called? >>> [ERROR] Command was /bin/sh -c cd >>> /home/juha/git/apache-flink/flink-tests/target && >>> /usr/lib/jvm/java-11-openjdk-amd64/bin/java -Xms256m -Xmx2048m >>> -Dmvn.forkNumber=3 -XX:+UseG1GC -jar >>> /home/juha/git/apache-flink/flink-tests/target/surefire/surefirebooter11703198505285401478.jar >>> /home/juha/git/apache-flink/flink-tests/target/surefire >>> 2020-10-19T17-48-02_394-jvmRun3 surefire14859194279791928992tmp >>> surefire_122313349068739873924160tmp >>> [ERROR] Error occurred in starting fork, check output in log >>> [ERROR] Process Exit Code: 137 >>> [ERROR] Crashed tests: >>> [ERROR] >>> org.apache.flink.test.checkpointing.EventTimeWindowCheckpointingITCase >>> [ERROR] at >>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669) >>> [ERROR] at >>> org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$600(ForkStarter.java:115) >>> [ERROR] at >>> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:444) >>> [ERROR] at >>> org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:420) >>> [ERROR] at >>> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) >>> [ERROR] at >>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >>> [ERROR] at >>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >>> [ERROR] at java.base/java.lang.Thread.run(Thread.java:834) >>> [ERROR] -> [Help 1] >>> [ERROR] >>> [ERROR] To see the full stack trace of the errors, re-run Maven with the >>> -e switch. >>> [ERROR] Re-run Maven using the -X switch to enable full debug logging. >>> [ERROR] >>> [ERROR] For more information about the errors and possible solutions, >>> please read the following articles: >>> [ERROR] [Help 1] >>> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException >>> [ERROR] >>> [ERROR] After correcting the problems, you can resume the build with the >>> command >>> [ERROR] mvn <goals> -rf :flink-tests >>> >>> The jvmdump-files look like this: >>> >>> # Created at 2020-10-19T18:14:22.869 >>> java.io.IOException: Stream closed >>> at >>> java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176) >>> at >>> java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289) >>> at >>> java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351) >>> at >>> java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) >>> at >>> java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) >>> at >>> java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) >>> at >>> java.base/java.io.InputStreamReader.read(InputStreamReader.java:185) >>> at java.base/java.io.Reader.read(Reader.java:189) >>> at java.base/java.util.Scanner.readInput(Scanner.java:882) >>> at >>> java.base/java.util.Scanner.findWithinHorizon(Scanner.java:1796) >>> at java.base/java.util.Scanner.hasNextLine(Scanner.java:1610) >>> at >>> org.apache.maven.surefire.booter.PpidChecker$ProcessInfoConsumer.execute(PpidChecker.java:354) >>> at >>> org.apache.maven.surefire.booter.PpidChecker.unix(PpidChecker.java:190) >>> at >>> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:123) >>> at >>> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214) >>> at >>> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) >>> at >>> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) >>> at >>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) >>> at >>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >>> at >>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >>> at java.base/java.lang.Thread.run(Thread.java:834) >>> >>> >>> # Created at 2020-10-19T18:14:22.870 >>> System.exit() or native command error interrupted process checker. >>> java.lang.IllegalStateException: error [STOPPED] to read process 898133 >>> at >>> org.apache.maven.surefire.booter.PpidChecker.checkProcessInfo(PpidChecker.java:145) >>> at >>> org.apache.maven.surefire.booter.PpidChecker.isProcessAlive(PpidChecker.java:124) >>> at >>> org.apache.maven.surefire.booter.ForkedBooter$2.run(ForkedBooter.java:214) >>> at >>> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) >>> at >>> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) >>> at >>> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) >>> at >>> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) >>> at >>> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) >>> at java.base/java.lang.Thread.run(Thread.java:834) >>> >>> >>> I found some JIRA tickets with " The forked VM terminated without >>> properly saying goodbye": >>> >>> https://issues.apache.org/jira/browse/FLINK-18375 >>> https://issues.apache.org/jira/browse/FLINK-2466 >>> >>> I don't see how these could explain the issue I'm witnessing.... >>> >>> I wonder if the issue is related to the VM running "too hot". 'top' >>> shows very high load averages. >>> >>> The crash can be reproduced. >>> >>> Regards, >>> Juha >>> >>>