[ https://issues.apache.org/jira/browse/FLINK-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16635923#comment-16635923 ]
Bowen Li edited comment on FLINK-10168 at 10/12/18 6:05 AM: ------------------------------------------------------------ [~kkl0u] what do you think about this task? I believe this is very important, this provides users with similar functions in streaming (kinesis, kafka) where users specify a start position in streams to read data from a certain point of time. was (Author: phoenixjiangnan): [~kkl0u] what do you think about this task? I believe this is very important, this provides users with similar functions in streaming (kinesis, kafka) where users can read data from a certain point of time. > support filtering files by modified/created time in > StreamExecutionEnvironment.readFile() > ----------------------------------------------------------------------------------------- > > Key: FLINK-10168 > URL: https://issues.apache.org/jira/browse/FLINK-10168 > Project: Flink > Issue Type: Improvement > Components: DataStream API > Affects Versions: 1.6.0 > Reporter: Bowen Li > Assignee: Bowen Li > Priority: Major > Fix For: 1.7.0 > > > support filtering files by modified/created time in > {{StreamExecutionEnvironment.readFile()}} > for example, in a source dir with lots of file, we only want to read files > that is created or modified after a specific time. > This API can expose a generic filter function of files, and let users define > filtering rules. Currently Flink only supports filtering files by path. What > this means is that, currently the API is > {{FileInputFormat.setFilesFilters(PathFiter)}} that takes only one file path > filter. A more generic API that can take more filters can look like this 1) > {{FileInputFormat.setFilesFilters(List (PathFiter, ModifiedTileFilter, ... > ))}} > 2) or {{FileInputFormat.setFilesFilters(FileFiter),}} and {{FileFilter}} > exposes all file attributes that Flink's file system can provide, like path > and modified time > I lean towards the 2nd option, because it gives users more flexibility to > define complex filtering rules based on combinations of file attributes. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)