Yu Chen created FLINK-33613:
-------------------------------

             Summary: Python UDF Runner process leak in Process Mode
                 Key: FLINK-33613
                 URL: https://issues.apache.org/jira/browse/FLINK-33613
             Project: Flink
          Issue Type: Bug
          Components: API / Python
    Affects Versions: 1.17.0
            Reporter: Yu Chen
         Attachments: ps-ef.txt, streaming_word_count-1.py

While working with PyFlink, we found that in Process Mode, the Python UDF 
process may leak after a failover of the job. It leads to a rising number of 
processes with their threads in the host machine, which eventually results in 
failure to create new threads.

 

You can try to reproduce it with the attached test task 
`streamin_word_count.py`.

(Note that the job will continue failover, and you can watch the process leaks 
by `ps -ef` on Taskmanager.

 

Our test environment:
 * K8S Application Mode
 * 4 Taskmanagers with 12 slots/TM
 * Job's parallelism was set to 48 

The udf process `pyflink.fn_execution.beam.beam_boot` should be consistence 
with parallelism (48), but we found that there are 180 processes after several 
failovers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to