The heap dump did not show anything too suspicious. The only thing I noticed is that there are 13 ChildFirstClassLoaders whereas there are only 6 Task instances in the heap dump. Are you running all 13 tasks on the same TaskExecutor?
Cheers, Till On Mon, Aug 24, 2020 at 2:01 PM Till Rohrmann <trohrm...@apache.org> wrote: > What could also cause the problem is that the metaspace memory budget is > configured too tightly. Here is a pointer to increasing the metaspace size > [1]. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_trouble.html#outofmemoryerror-metaspace > > Cheers, > Till > > On Mon, Aug 24, 2020 at 1:49 PM Till Rohrmann <trohrm...@apache.org> > wrote: > >> Hi, >> >> could you share with us the Flink cluster logs? This would help answering >> a lot of questions around your setup and the Flink version you are using. >> Thanks a lot! >> >> Cheers, >> Till >> >> On Mon, Aug 24, 2020 at 10:48 AM 耿延杰 <gyj199...@qq.com> wrote: >> >>> Still failed after every 12 tasks. >>> And the exception stack of failed tasks is different. >>> >>> >>> such as the recent failed tasks's exception info: >>> Caused by: java.lang.OutOfMemoryError: Metaspace >>> at java.lang.ClassLoader.defineClass1(Native >>> Method) >>> at >>> java.lang.ClassLoader.defineClass(ClassLoader.java:757) >>> at >>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) >>> at >>> java.net.URLClassLoader.defineClass(URLClassLoader.java:468) >>> at >>> java.net.URLClassLoader.access$100(URLClassLoader.java:74) >>> at >>> java.net.URLClassLoader$1.run(URLClassLoader.java:369) >>> at >>> java.net.URLClassLoader$1.run(URLClassLoader.java:363) >>> at >>> java.security.AccessController.doPrivileged(Native Method) >>> at >>> java.net.URLClassLoader.findClass(URLClassLoader.java:362) >>> at >>> org.apache.flink.util.ChildFirstClassLoader.loadClass(ChildFirstClassLoader.java:66) >>> at >>> java.lang.ClassLoader.loadClass(ClassLoader.java:352) >>> at >>> org.apache.http.impl.client.CloseableHttpClient.determineTarget(CloseableHttpClient.java:93) >>> at >>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) >>> at >>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) >>> at >>> ru.yandex.clickhouse.ClickHouseStatementImpl.getInputStream(ClickHouseStatementImpl.java:614) >>> at >>> ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:117) >>> at >>> ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:100) >>> at >>> ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:95) >>> at >>> ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:90) >>> at >>> ru.yandex.clickhouse.ClickHouseConnectionImpl.initTimeZone(ClickHouseConnectionImpl.java:94) >>> at >>> ru.yandex.clickhouse.ClickHouseConnectionImpl.<init>(ClickHouseConnectionImpl.java:80) >>> at >>> ru.yandex.clickhouse.ClickHouseDriver.connect(ClickHouseDriver.java:55) >>> at >>> ru.yandex.clickhouse.ClickHouseDriver.connect(ClickHouseDriver.java:47) >>> at >>> ru.yandex.clickhouse.ClickHouseDriver.connect(ClickHouseDriver.java:29) >>> at >>> java.sql.DriverManager.getConnection(DriverManager.java:664) >>> at >>> java.sql.DriverManager.getConnection(DriverManager.java:270) >>> at org.apache.flink.api.java.io >>> .jdbc.AbstractJDBCOutputFormat.establishConnection(AbstractJDBCOutputFormat.java:68) >>> at >>> com.xxx.clickhouse.ClickHouseJDBCOutputFormat.open(ClickHouseJDBCOutputFormat.java:53) >>> at >>> org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:205) >>> at >>> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707) >>> at >>> org.apache.flink.runtime.taskmanager.Task.run(Task.java:532) >>> at java.lang.Thread.run(Thread.java:748) >>> >>> >>> >>> >>> is different with the exception info in last email. >>> >>> >>> So analyse the dump file is the key. >>> >>> >>> >>> >>> >>> >>> ------------------ 原始邮件 ------------------ >>> 发件人: >>> "耿延杰" >>> < >>> gyj199...@qq.com>; >>> 发送时间: 2020年8月24日(星期一) 下午4:33 >>> 收件人: "dev"<dev@flink.apache.org>; >>> >>> 主题: 回复:OutOfMemoryError: Metaspace on Batch Task When Write into >>> Clickhouse >>> >>> >>> >>> Additional info: >>> >>> >>> The exception info in Flink Manager Page: >>> >>> >>> Caused by: java.lang.OutOfMemoryError: Metaspace >>> at java.lang.ClassLoader.defineClass1(Native Method) >>> at >>> java.lang.ClassLoader.defineClass(ClassLoader.java:757) >>> at >>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) >>> at >>> java.net.URLClassLoader.defineClass(URLClassLoader.java:468) >>> at >>> java.net.URLClassLoader.access$100(URLClassLoader.java:74) >>> at >>> java.net.URLClassLoader$1.run(URLClassLoader.java:369) >>> at >>> java.net.URLClassLoader$1.run(URLClassLoader.java:363) >>> at java.security.AccessController.doPrivileged(Native >>> Method) >>> at >>> java.net.URLClassLoader.findClass(URLClassLoader.java:362) >>> at >>> org.apache.flink.util.ChildFirstClassLoader.loadClass(ChildFirstClassLoader.java:66) >>> at >>> java.lang.ClassLoader.loadClass(ClassLoader.java:352) >>> at >>> org.apache.http.impl.client.CloseableHttpClient.determineTarget(CloseableHttpClient.java:93) >>> at >>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83) >>> at >>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108) >>> at >>> ru.yandex.clickhouse.ClickHouseStatementImpl.getInputStream(ClickHouseStatementImpl.java:614) >>> at >>> ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:117) >>> at >>> ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:100) >>> at >>> ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:95) >>> at >>> ru.yandex.clickhouse.ClickHouseStatementImpl.executeQuery(ClickHouseStatementImpl.java:90) >>> at >>> ru.yandex.clickhouse.ClickHouseConnectionImpl.initTimeZone(ClickHouseConnectionImpl.java:94) >>> at >>> ru.yandex.clickhouse.ClickHouseConnectionImpl.<init>(ClickHouseConnectionImpl.java:80) >>> at >>> ru.yandex.clickhouse.ClickHouseDriver.connect(ClickHouseDriver.java:55) >>> at >>> ru.yandex.clickhouse.ClickHouseDriver.connect(ClickHouseDriver.java:47) >>> at >>> ru.yandex.clickhouse.ClickHouseDriver.connect(ClickHouseDriver.java:29) >>> at >>> java.sql.DriverManager.getConnection(DriverManager.java:664) >>> at >>> java.sql.DriverManager.getConnection(DriverManager.java:270) >>> at org.apache.flink.api.java.io >>> .jdbc.AbstractJDBCOutputFormat.establishConnection(AbstractJDBCOutputFormat.java:68) >>> at >>> com.xx.xx.xx.ClickHouseJDBCOutputFormat.open(ClickHouseJDBCOutputFormat.java:53) >>> at >>> org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:205) >>> at >>> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707) >>> at >>> org.apache.flink.runtime.taskmanager.Task.run(Task.java:532) >>> at java.lang.Thread.run(Thread.java:748) >>> >>> >>> >>> >>> >>> >>> ------------------ 原始邮件 ------------------ >>> 发件人: >>> "耿延杰" >>> < >>> gyj199...@qq.com>; >>> 发送时间: 2020年8月24日(星期一) 下午4:20 >>> 收件人: "dev"<dev@flink.apache.org>; >>> >>> 主题: OutOfMemoryError: Metaspace on Batch Task When Write into >>> Clickhouse >>> >>> >>> >>> Hi, >>> >>> >>> I catch "OutOfMemoryError: Metaspace" on Batch Task When Write >>> into Clickhouse. >>> Attached *.java file is my task code. >>> >>> And I find that, after running 12 tasks, the 13th task will be failed. >>> And the exception always is "OutOfMemoryError: Metaspace". see >>> "task-failed.png" >>> >>> >>> I conf -XX:+HeapDumpOnOutOfMemoryError >>> -XX:HeapDumpPath=/path/to/hprofFile >>> and dump the hprof file. >>> I analyse this hprof file. And find this error occurs may not caused by >>> my user-code. >>> So I came here ask for your help. To confirm whether the memory leak >>> should be caused by Flink. >>> >>> >>> Attached file "java_pid29294.hprof" is the dump file. >>> >>> >>> Thanks. >> >>