Try CSV serde. It should correctly parse quoted field value having newline inside https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
Hadoop should automatically read bz2 files On Tue, Jan 12, 2016 at 9:40 AM, Gerber, Bryan W <bryan.ger...@pnnl.gov> wrote: > We are attempting to load CSV text files (compressed to bz2) containing > newlines in fields using EXTERNAL tables and INSERT/SELECT into ORC format > tables. Data volume is ~1TB/day, we are really trying to avoid unpacking > them to condition the data. > > > > A few days of research has us ready to implement custom input/output > formats to handle the ingest. Any other suggestions that may be less > effort with low impact to load times? > > > > Thanks, > > Bryan G. >