But still this workaround would only work when you have access to the underlying /FileInputFormat/. For//SQL and Table APIs, you don't so you'll be unable to apply this workaround. So what we could do is make a PR to support glob at the FileInputFormat level to profit for all APIs.

I'm gonna do it if everyone agrees.

Best

Etienne Chauchot

On 25/03/2021 13:12, Etienne Chauchot wrote:

Hi all,

In case it is useful to some of you:

I have a big batch that needs to use globs (*.parquet for example) to read input files. It seems that globs do not work out of the box (see https://issues.apache.org/jira/browse/FLINK-6417)

But there is a workaround:


final  FileInputFormat inputFormat =new  FileInputFormat(new  
Path(extractDir(filePath)));/* or any subclass of FileInputFormat*/  /*extact 
parent dir*/
inputFormat.setFilesFilter(new GlobFilePathFilter(Collections.singletonList(filePath), Collections.emptyList()));/*filePath contains glob, the whole path needs to be provided to GlobFilePathFilter*/
inputFormat.setNestedFileEnumeration(true);

Hope, it helps some people

Etienne Chauchot


Reply via email to