I am using flink to download and process a big file from a remote ftp server in AWS EMR. As flink supports ftp protocol with hadoop ftp file system, so I use the CSVInputFormat with a ftp address(ftp://user:pass@server/path/file).
It works correct in my local machine, but when I run the job in EMR it failed to establish the connection between the EMR node and ftp server. After some investigation I found that we have to use ftp passive mode to access the ftp server in AWS, but by checking the source code of FtpFileSystem, it looks like it always use active mode, I can't find a way to inject and modify the behavior to use passive mode, although FtpClient has method to do that. Do you have any suggestions, thank you. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/