Re: FileInputFormat that processes files in chronological order

2019-05-27 Thread Fabian Hueske
I see, that's unfortunate. Both classes are also tagged with @Public, making them unchangeable until Flink 2.0. Nonetheless, feel free to open a Jira issue to improve the situation for a future release. Best, Fabian Am Mo., 27. Mai 2019 um 16:55 Uhr schrieb spoganshev : > I've tried that, but t

Re: FileInputFormat that processes files in chronological order

2019-05-27 Thread spoganshev
I've tried that, but the problem is: - FileInputFormat#getInputSplitAssigner return type is LocatableInputSplitAssigner - LocatableInputSplitAssigner is final Which makes it impossible to override the split assigner unfortunately -- Sent from: http://apache-flink-user-mailing-list-archive.23360

Re: FileInputFormat that processes files in chronological order

2019-05-27 Thread Fabian Hueske
Configuring the split assigner wasn't a common requirement so far. You can just implement your own format extending from FileInputFormat (or any of its subclasses) and override the getInputSplitAssigner() method. Best, Fabian Am Mo., 27. Mai 2019 um 15:30 Uhr schrieb spoganshev : > Why is FileIn

Re: FileInputFormat that processes files in chronological order

2019-05-27 Thread spoganshev
Why is FileInputFormat#getInputSplitAssigner not configurable though? It makes sense to let those who use FileInputFormat set the desired split assigner (and make LocatableInputSplitAssigner just a default one). -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: FileInputFormat that processes files in chronological order

2019-04-29 Thread Fabian Hueske
Hi Sergei, It depends whether you want to process the file with the DataSet (batch) or DataStream (stream) API. Averell's answer was addressing the DataStream API part. The DataSet API does not have any built-in support to distinguish files (or file splits) by folders and process them in order. F

Re: FileInputFormat that processes files in chronological order

2019-04-27 Thread Averell
Hi, Regarding splitting by shards, I believe that you can simply create two sources, one for each shard. After that, union them together. Regarding processing files in chronological order, Flink currently reads files using the files' last-modified-time order (i.e. oldest files will be processed f

FileInputFormat that processes files in chronological order

2019-04-26 Thread Sergei Poganshev
Given a directory with input files of the following format: /data/shard1/file1.json /data/shard1/file2.json /data/shard1/file3.json /data/shard2/file1.json /data/shard2/file2.json /data/shard2/file3.json Is there a way to make FileInputFormat with parallelism 2 split processing by "shard" (folder