Hi, all I have a large csv file ( larger than 10 GB ), I'd like to use a certain InputFormat to split it into smaller part thus each Mapper can deal with piece of the csv file. However, as far as I know, FileInputFormat only cares about byte size of file, that is, the class can divide the csv file as many part, and maybe some part is not a well-format CVS file. For example, one line of the CSV file is not terminated with CRLF, or maybe some text is trimed.
How to ensure each FileSplit is a smaller valid CSV file using a proper InputFormat? BR/anderson
