Hi John,
out of the box, Flink does not provide this functionality. However, you
might be able to write your own CsvInputFormat which overrides fillRecord
so that it generates a CSV record where the first field contains the
filename. You can obtain the filename from the field currentSplit. I
haven
Hi, so reading a CSV file using env.readFile() with RowCsvInputFormat.
Is there a way to get the filename as part of the row stream?
The file contains a unique identifier to tag the rows with.
Okay. We filter files starting with underscores because that is the same
behavior as Hadoop.
Hadoop is always creating some underscore files, so when reading results of
a MapReduce job, Flink would read these files.
On Wed, Jul 1, 2015 at 12:15 PM, Ronny Bräunlich
wrote:
> Hi Robert,
>
> just ig
Hi Robert,
just ignore my previous question.
My files started with underscore and I just found out that FileInputFormat does
filter for underscores in acceptFile().
Cheers,
Ronny
Am 01.07.2015 um 11:35 schrieb Robert Metzger :
> Hi Ronny,
>
> check out this answer on SO:
> http://stackoverfl
Hi Robert,
thank you for your quick answer.
Just one additional question:
When I use the ExecutionEnvironment like this: DataSource files =
env.readTextFile("file:///Users/me/path/to/file/dir“);
Shouldn’t it read all the files in dir? I have three .json files there but when
I print the result, n
Hi Ronny,
check out this answer on SO:
http://stackoverflow.com/questions/30599616/create-objects-from-input-files-in-apache-flink
It is a similar use case ... I guess you can get the metadata from the
input split as well.
On Wed, Jul 1, 2015 at 11:30 AM, Ronny Bräunlich
wrote:
> Hello,
>
> I w
Hello,
I want to read a file containing textfiles with Flink.
As I already found out I can simply point the environment to the directory and
it will read all the files.
What I couldn’t find out is if it’s possible to keep the file metadata somehow.
Concrete, I need the timestamp, the filename and