LI Mingkun created FLINK-29617: ---------------------------------- Summary: Cost too much time to start SourceCoordinator of hdfsFileSource when start JobMaster Key: FLINK-29617 URL: https://issues.apache.org/jira/browse/FLINK-29617 Project: Flink Issue Type: Improvement Components: Connectors / FileSystem, Runtime / Coordination Affects Versions: 1.15.2 Reporter: LI Mingkun
h1. Scenario: Our user use flink batch to compact small files in one day. Flink version : 1.15 He split pipeline into 24 for each hour. So there are 24 source I find it costs too much time to start SourceCoordinator of hdfsFileSource when start JobMaster as follow: !https://mail.google.com/mail/u/0?ui=2&ik=488d9ac3dd&attid=0.1&permmsgid=msg-a:r-3013789195315215531&th=183cb292e567fd9f&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ9SVAoAslMUGQdVQJ_ccmEf4LxhaONYKJvS_V8nvijvT3JXw_VlyRBAEE9EQhTtWdYPa4TLCO5rxjXGrTDK2_PGHX4RZDPTQTJ0LwKXAUr4BYlMhYZsjcrY9eo&disp=emb&realattid=ii_l95bh7qy0|width=542,height=260! h1. Root Cause: I got the root cause after check: # AbstractFileSource will enumerateSplits when createEnumerator # NotSplittingRecursiveEnumerator need to get fileblockLocation of every fileblock which is a heavy IO operation !https://mail.google.com/mail/u/0?ui=2&ik=488d9ac3dd&attid=0.3&permmsgid=msg-a:r-3013789195315215531&th=183cb292e567fd9f&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ8AoT071eCNMb_q3uJtcbrUmZnYbg3ucnDelMlRRPn7WLlXOBGj650srQk9vhqKyJEANvpOWoxHuH6jNHt7g6go8JkeRUZKc81yqT0yzzz7tbBciTe-YnRVQ7w&disp=emb&realattid=ii_l95bp1832|width=542,height=456! !https://mail.google.com/mail/u/0?ui=2&ik=488d9ac3dd&attid=0.2&permmsgid=msg-a:r-3013789195315215531&th=183cb292e567fd9f&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ9phsX1nauTsx3xWje_YJM4uUaOLXKHcXKsm7WJquPQQGC7bQTni3OhQB5HtGYVOvrD-3Kbp9LURfUj6OiIUgsZU1AImSL0vj27cnDcf7HpVpLpaqdADtpoABU&disp=emb&realattid=ii_l95bjh1g1|width=526,height=542! h1. Suggestion # FileSource add option to disable location fetcher # Move location fetcher into IOExecutor -- This message was sent by Atlassian Jira (v8.20.10#820010)