We are attempting to load CSV text files (compressed to bz2) containing newlines in fields using EXTERNAL tables and INSERT/SELECT into ORC format tables. Data volume is ~1TB/day, we are really trying to avoid unpacking them to condition the data.
A few days of research has us ready to implement custom input/output formats to handle the ingest. Any other suggestions that may be less effort with low impact to load times? Thanks, Bryan G.