Thanks Till,
I understand making my FileInputFormat "unsplittable" guarantees a file is
always read by a single task. But how can I produce a single record for the
entire file?
As my file is a CSV with some idiosyncrasies, I am extending CsvInputFormat
not to reinvent the wheel of the CSV parsing
Hi Lorenzo,
what you could try to do is to derive your own InputFormat (extending
FileInputFormat) where you set the field `unsplittable` to true. That way,
an InputSplit is the whole file and you can handle the set of new rules as
a single record.
Cheers,
Till
On Mon, Jun 29, 2020 at 3:52 PM Lo