[ https://issues.apache.org/jira/browse/FLINK-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17334109#comment-17334109 ]
Flink Jira Bot commented on FLINK-10168: ---------------------------------------- This issue was marked "stale-assigned" and has not received an update in 7 days. It is now automatically unassigned. If you are still working on it, you can assign it to yourself again. Please also give an update about the status of the work. > Add FileFilter interface and FileModTimeFilter which sets a read start > position for files by modification time > -------------------------------------------------------------------------------------------------------------- > > Key: FLINK-10168 > URL: https://issues.apache.org/jira/browse/FLINK-10168 > Project: Flink > Issue Type: Improvement > Components: API / DataStream > Affects Versions: 1.6.0 > Reporter: Bowen Li > Assignee: Bowen Li > Priority: Major > Labels: pull-request-available, stale-assigned > Time Spent: 20m > Remaining Estimate: 0h > > Update: The motivation is 1) enabling users to set a read start position for > files, so they can process files that are modified after a given timestamp 2) > expose more file information to users and providing them with a more flexible > file filter interface to define their own filtering rules > --------------- > support filtering files by modified/created time in > {{StreamExecutionEnvironment.readFile()}} > for example, in a source dir with lots of file, we only want to read files > that is created or modified after a specific time. > This API can expose a generic filter function of files, and let users define > filtering rules. Currently Flink only supports filtering files by path. What > this means is that, currently the API is > {{FileInputFormat.setFilesFilters(PathFiter)}} that takes only one file path > filter. A more generic API that can take more filters can look like this 1) > {{FileInputFormat.setFilesFilters(List (PathFiter, ModifiedTileFilter, ... > ))}} > 2) or {{FileInputFormat.setFilesFilters(FileFiter),}} and {{FileFilter}} > exposes all file attributes that Flink's file system can provide, like path > and modified time > I lean towards the 2nd option, because it gives users more flexibility to > define complex filtering rules based on combinations of file attributes. > -- This message was sent by Atlassian Jira (v8.3.4#803005)