Hi tom

Flink will create individual classloader for each job in task manager. When
jobs are frequently started and stopped, the usage of memory in Metaspace
will increase. I found out that your OOM was caused by metaspace. I think
you can check the size of metaspace and try to increase the size by option
`taskmanager.memory.jvm-metaspace.size`

Best,
Shammon FY

On Wed, Mar 29, 2023 at 3:04 AM tom yang <ensc...@gmail.com> wrote:

> Hi,
>
> I am running a standalone cluster setup and submit flinksql job with
> python udf following the examples here
>
>
> <https://github.com/ververica/flink-sql-cookbook/blob/main/udfs/01_python_udfs/01_python_udfs.md>
> github.com/ververica/flink-sql-cookbook/blob/main/udfs/01_python_udfs/01_python_udfs.md
>
> I notice that each time I submit the job, cancel and resubmit, eventually
> my task manager will throw an out of memory exception. I am sure it is due
> to a leaky class loader somewhere but I am not sure how to track it down.
> Has anyone experienced this issue before?
>
>
> 2023-03-24 04:55:46,380 ERROR
> org.apache.flink.runtime.taskexecutor.TaskManagerRunner [] - Fatal error
> occurred while executing the TaskManager. Shutting it down...
> java.lang.OutOfMemoryError: Metaspace. The metaspace out-of-memory error
> has occurred. This can mean two things: either the job requires a larger
> size of JVM metaspace to load classes or there is a class loading leak. In
> the first case 'taskmanager.memory.jvm-metaspace.size' configuration option
> should be increased. If the error persists (usually in cluster after
> several job (re-)submissions) then there is probably a class loading leak
> in user code or some of its dependencies which has to be investigated and
> fixed. The task executor has to be shutdown... at
> java.lang.ClassLoader.defineClass1(Native Method) ~[?:?] at
> java.lang.ClassLoader.defineClass(ClassLoader.java:1017) ~[?:?] at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
> ~[?:?] at java.net.URLClassLoader.defineClass(URLClassLoader.java:555)
> ~[?:?] at java.net.URLClassLoader$1.run(URLClassLoader.java:458) ~[?:?] at
> java.net.URLClassLoader$1.run(URLClassLoader.java:452) ~[?:?] at
> java.security.AccessController.doPrivileged(Native Method) ~[?:?] at
> java.net.URLClassLoader.findClass(URLClassLoader.java:451) ~[?:?] at
> java.lang.ClassLoader.loadClass(ClassLoader.java:589) ~[?:?] at
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClassWithoutExceptionHandling(FlinkUserCodeClassLoader.java:67)
> ~[dpi-flink-sql-base-app-0.9.35.jar:?] at
> org.apache.flink.util.FlinkUserCodeClassLoader.loadClass(FlinkUserCodeClassLoader.java:51)
> [dpi-flink-sql-base-app-0.9.35.jar:?] at
> java.lang.ClassLoader.loadClass(ClassLoader.java:522) [?:?] at
> org.apache.beam.sdk.options.PipelineOptionsFactory.<clinit>(PipelineOptionsFactory.java:500)
> [blob_p-bbc3c49fcdd79f0b3f7f6c99a18bd72516414de1-4563cd43f6f153fe0ec32993bf935209:1.16.1]
> at
> org.apache.flink.streaming.api.runners.python.beam.BeamPythonFunctionRunner.open(BeamPythonFunctionRunner.java:238)
> [blob_p-bbc3c49fcdd79f0b3f7f6c99a18bd72516414de1-4563cd43f6f153fe0ec32993bf935209:1.16.1]
> at
> org.apache.flink.streaming.api.operators.python.process.AbstractExternalPythonFunctionOperator.open(AbstractExternalPythonFunctionOperator.java:57)
> [blob_p-bbc3c49fcdd79f0b3f7f6c99a18bd72516414de1-4563cd43f6f153fe0ec32993bf935209:1.16.1]
> at
> org.apache.flink.table.runtime.operators.python.AbstractStatelessFunctionOperator.open(AbstractStatelessFunctionOperator.java:92)
> [blob_p-bbc3c49fcdd79f0b3f7f6c99a18bd72516414de1-4563cd43f6f153fe0ec32993bf935209:1.16.1]
> at
> org.apache.flink.table.runtime.operators.python.table.PythonTableFunctionOperator.open(PythonTableFunctionOperator.java:114)
> [blob_p-bbc3c49fcdd79f0b3f7f6c99a18bd72516414de1-4563cd43f6f153fe0ec32993bf935209:1.16.1]
> at
> org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107)
> [flink-dist-1.16.1.jar:1.16.1] at
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:726)
> [flink-dist-1.16.1.jar:1.16.1] at
> org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$927/0x0000000800a4ac40.call(Unknown
> Source) [flink-dist-1.16.1.jar:1.16.1] at
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.call(StreamTaskActionExecutor.java:100)
> [flink-dist-1.16.1.jar:1.16.1] at
> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:702)
> [flink-dist-1.16.1.jar:1.16.1] at
> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:669)
> [flink-dist-1.16.1.jar:1.16.1] at
> org.apache.flink.runtime.taskmanager.Task$$Lambda$815/0x0000000800904840.run(Unknown
> Source) [flink-dist-1.16.1.jar:1.16.1] at
> org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:935)
> [flink-dist-1.16.1.jar:1.16.1] at
> org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:904)
> [flink-dist-1.16.1.jar:1.16.1] at
> org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:728)
> [flink-dist-1.16.1.jar:1.16.1] at
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)
> [flink-dist-1.16.1.jar:1.16.1] at java.lang.Thread.run(Thread.java:829)
> [?:?]
>

Reply via email to