Hello, I have multiple files (file1, file2, file3) each being CSV and having different columns and data. The column headers are finite and we know their format. I would like to take them and parse them based on the column structure. I already have the parsers
e.g.: file1 has columns (id, firstname, lastname) file2 has columns (id, name) file3 has columns (id, name_1, name_2, name_3, name_4) I would like to take all those files, read them, parse them and output objects to a sink as Person { id, fullName } Example files would be: file1: ------ id, firstname, lastname 33, John, Smith 55, Labe, Soni file2: ------ id, name 5, Mitr Kompi 99, Squi Masw file3: ------ id, name_1, name_2, name_3, name_4 1, Peter, Hov, Risti, Pena 2, Rii, Koni, Ques,, Expected output of my program would be: Person { 33, John Smith } Person { 55, Labe Soni } Person { 5, Mitr Kompi } Person { 99, Squi Masw } Person { 1, Peter Hov Risti Pena } Person { 2, Rii Koni Ques } What I do now is: My code (very simplified) is: env.readFile().flatMap(new MyParser()).addSink(new MySink()) The MyParser receives the rows 1 by 1 in string format. Which means that when I run with parallelism > 1 I receive data from any file and I cannot say this line comes from where. What I would like to do is: Be able to figure out which is the file I am reading from. Since I only know the file type based on the first row (columns) I need to either send the 1st row to MyParser() or send a tuple <1st row of file being read, current row of file being read>. Another option that I can think about is to have some keyed function based on the first row, but I am not sure how to achieve that by using readFile. Is there a way I can achieve this? Regards , Nikola