BELUGA BEHR created HIVE-21193: ---------------------------------- Summary: Support LZO Compression with CombineHiveInputFormat Key: HIVE-21193 URL: https://issues.apache.org/jira/browse/HIVE-21193 Project: Hive Issue Type: Improvement Components: Compression Affects Versions: 4.0.0, 3.2.0 Reporter: BELUGA BEHR
In regards to LZO compression with Hive... https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO It does not work out of the box if there are {{.lzo.index}} files present. As I understand it, this is because of the default Hive input format {{CombineHiveInputFormat}} does not handle this correctly. It does not like that there are a mix of data files and some index files, it lumps them altogether when making the combined splits and Mappers fail when they try to process the {{.lzo.index}} files as data. When using the original {{HiveInputFormat}}, it correctly identifies the {{.lzo.index}} files because it considers each file individually. Allow {{CombineHiveInputFormat}} to short-circuit LZO files and to not combine them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)