Hi, readFile() requests a FileInputFormat, i.e., your custom InputFormat would need to extend FileInputFormat. In general, any InputFormat decides about what to read when generating InputSplits. In your case the, createInputSplits() method should return one InputSplit for each file it wants to read. By default, FileInputFormat creates one or more input splits for each file in a directory. If you only want to read a subset of files (or have a list of files to read), you should override the method and return exactly one input split for each file to read (because your files cannot be read in parallel).
If your InputFormat does not extend FileInputFormat, you can use createInput() instead of readFile(). Best, Fabian 2017-08-31 21:24 GMT+02:00 ShB <shon.balakris...@gmail.com>: > Hi Fabian, > > Thanks for your response. > > If I implemented my own InputFormat, how would I read a specific list of > files from S3? > > Assuming I need to use readFile(), below would read all of the files from > the specified S3 bucket or path: > env.readFile(MyInputFormat, "s3://my-bucket/") > > Is there a way for me to read only a specific list/subset of files(say > fileList) from a S3 bucket, in parallel using readFile? > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/ >