Allow setting of end-of-record delimiter for TextInputFormat ------------------------------------------------------------
Key: HADOOP-7096 URL: https://issues.apache.org/jira/browse/HADOOP-7096 Project: Hadoop Common Issue Type: Improvement Reporter: Ahmed Radwan Attachments: 2.patch The patch for https://issues.apache.org/jira/browse/MAPREDUCE-2254 required minor changes to the LineReader class to allow extensions (see attached 2.patch). Description copied below: It will be useful to allow setting the end-of-record delimiter for TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as the only possible record delimiters. This is a problem if users have impeded newlines in their data fields (which is pretty common). This is also a problem for other tools using this TextInputFormat (See for example: https://issues.apache.org/jira/browse/PIG-836 and https://issues.cloudera.org/browse/SQOOP-136). I have wrote a patch to address this issue. This patch allows users to specify any custom end-of-record delimiter using a new added configuration property. For backward compatibility, if this new configuration property is absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or '\r\n'). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.