Hi,

readFile() requests a FileInputFormat, i.e., your custom InputFormat would
need to extend FileInputFormat.
In general, any InputFormat decides about what to read when generating
InputSplits. In your case the, createInputSplits() method should return one
InputSplit for each file it wants to read.
By default, FileInputFormat creates one or more input splits for each file
in a directory. If you only want to read a subset of files (or have a list
of files to read), you should override the method and return exactly one
input split for each file to read (because your files cannot be read in
parallel).

If your InputFormat does not extend FileInputFormat, you can use
createInput() instead of readFile().

Best, Fabian

2017-08-31 21:24 GMT+02:00 ShB <shon.balakris...@gmail.com>:

> Hi Fabian,
>
> Thanks for your response.
>
> If I implemented my own InputFormat, how would I read a specific list of
> files from S3?
>
> Assuming I need to use readFile(), below would read all of the files from
> the specified S3 bucket or path:
> env.readFile(MyInputFormat, "s3://my-bucket/")
>
> Is there a way for me to read only a specific list/subset of files(say
> fileList) from a S3 bucket, in parallel using readFile?
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>

Reply via email to