[ https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abin Shahab updated HIVE-6670: ------------------------------ Attachment: HIVE-6670.patch HIVE-6670-branch-0.12.patch > ClassNotFound with Serde > ------------------------ > > Key: HIVE-6670 > URL: https://issues.apache.org/jira/browse/HIVE-6670 > Project: Hive > Issue Type: Bug > Affects Versions: 0.12.0 > Reporter: Abin Shahab > Attachments: HIVE-6670-branch-0.12.patch, HIVE-6670.patch > > > We are finding a ClassNotFound exception when we use > CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table. > This is happening because MapredLocalTask does not pass the local added jars > to ExecDriver when that is launched. > ExecDriver's classpath does not include the added jars. Therefore, when the > plan is deserialized, it throws a ClassNotFoundException in the > deserialization code, and results in a TableDesc object with a Null > DeserializerClass. > This results in an NPE during Fetch. > Steps to reproduce: > wget > https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar > into somewhere local eg. > /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar. > Place some sample SCV files in HDFS as follows: > hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/ > hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/ > hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/ > hdfs dfs -put /home/soam/sampleJoinTarget.csv > /user/soam/HiveSerdeIssue/sampleJoinTarget/ > ==== > create the tables in hive: > ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; > create external table sampleCSV (md5hash string, filepath string) > row format serde 'com.bizo.hive.serde.csv.CSVSerde' > stored as textfile > location '/user/soam/HiveSerdeIssue/sampleCSV/' > ; > create external table sampleJoinTarget (md5hash string, filepath string, > datestamp string, nblines string, nberrors string) > ROW FORMAT DELIMITED > FIELDS TERMINATED BY ',' > LINES TERMINATED BY '\n' > STORED AS TEXTFILE > LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/' > ; > =============== > Now, try the following JOIN: > ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar; > SELECT > sampleCSV.md5hash, > sampleCSV.filepath > FROM sampleCSV > JOIN sampleJoinTarget > ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) > ; > — > This will fail with the error: > Execution log at: /tmp/soam/.log > java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde > Continuing ... > 2014-03-11 10:35:03 Starting to launch local task to process map join; > maximum memory = 238551040 > Execution failed with exit status: 2 > Obtaining error information > Task failed! > Task ID: > Stage-4 > Logs: > /var/log/hive/soam/hive.log > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask > Try the following LEFT JOIN. This will work: > SELECT > sampleCSV.md5hash, > sampleCSV.filepath > FROM sampleCSV > LEFT JOIN sampleJoinTarget > ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) > ; > == -- This message was sent by Atlassian JIRA (v6.2#6252)