From that wiki: "This SerDe works for most CSV data, but does not handle embedded newlines."
The Hive SerDe interface is all downstream of the TextInputFormat, which has already split records by newlines. In theory you can give it a different line delimiter, but Hive 1.2.1 does not support it: "FAILED: SemanticException 3:20 LINES TERMINATED BY only supports newline '\n' right now." From: Alexander Pivovarov [mailto:apivova...@gmail.com] Sent: Tuesday, January 12, 2016 9:52 AM To: user@hive.apache.org Subject: Re: Loading data containing newlines Try CSV serde. It should correctly parse quoted field value having newline inside https://cwiki.apache.org/confluence/display/Hive/CSV+Serde Hadoop should automatically read bz2 files On Tue, Jan 12, 2016 at 9:40 AM, Gerber, Bryan W <bryan.ger...@pnnl.gov<mailto:bryan.ger...@pnnl.gov>> wrote: We are attempting to load CSV text files (compressed to bz2) containing newlines in fields using EXTERNAL tables and INSERT/SELECT into ORC format tables. Data volume is ~1TB/day, we are really trying to avoid unpacking them to condition the data. A few days of research has us ready to implement custom input/output formats to handle the ingest. Any other suggestions that may be less effort with low impact to load times? Thanks, Bryan G.