[ 
https://issues.apache.org/jira/browse/HIVE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abin Shahab updated HIVE-6670:
------------------------------

    Description: 
We are finding a ClassNotFound exception when we use 
CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table.
This is happening because MapredLocalTask does not pass the local added jars to 
ExecDriver when that is launched.
ExecDriver's classpath does not include the added jars. Therefore, when the 
plan is deserialized, it throws a ClassNotFoundException in the deserialization 
code, and results in a TableDesc object with a Null DeserializerClass.
This results in an NPE during Fetch.
Steps to reproduce:
wget 
https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar
 into somewhere local eg. 
/home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar.
Place some sample SCV files in HDFS as follows:
hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/
hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/
hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/
hdfs dfs -put /home/soam/sampleJoinTarget.csv 
/user/soam/HiveSerdeIssue/sampleJoinTarget/
====
create the tables in hive:
ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
create external table sampleCSV (md5hash string, filepath string)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
stored as textfile
location '/user/soam/HiveSerdeIssue/sampleCSV/'
;
create external table sampleJoinTarget (md5hash string, filepath string, 
datestamp string, nblines string, nberrors string)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/'
;
===============
Now, try the following JOIN:
ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
SELECT 
sampleCSV.md5hash, 
sampleCSV.filepath 
FROM sampleCSV
JOIN sampleJoinTarget
ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
;
—
This will fail with the error:
Execution log at: /tmp/soam/.log
java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde
Continuing ...
2014-03-11 10:35:03 Starting to launch local task to process map join; maximum 
memory = 238551040
Execution failed with exit status: 2
Obtaining error information
Task failed!
Task ID:
Stage-4
Logs:
/var/log/hive/soam/hive.log
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
Try the following LEFT JOIN. This will work:
SELECT 
sampleCSV.md5hash, 
sampleCSV.filepath 
FROM sampleCSV
LEFT JOIN sampleJoinTarget
ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
;
==

  was:
We are finding a ClassNotFound exception when we use 
CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table.
This is happening because MapredLocalTask does not pass the local added jars to 
ExecDriver when that is launched.
ExecDriver's classpath does not include the added jars. Therefore, when the 
plan is deserialized, it throws a ClassNotFoundException in the deserialization 
code, and results in a TableDesc object with a Null DeserializerClass.
This results in an NPE during Fetch.
Steps to reproduce:
wget 
https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar
 into somewhere local eg. 
/home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar.
Place the sample files attached to this ticket in HDFS as follows:
hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/
hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/
hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/
hdfs dfs -put /home/soam/sampleJoinTarget.csv 
/user/soam/HiveSerdeIssue/sampleJoinTarget/
====
create the tables in hive (this might cause a problem in dogfood since i've 
already created tables in those names, so you'll have to change the table names 
or delete mine):
ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
create external table sampleCSV (md5hash string, filepath string)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
stored as textfile
location '/user/soam/HiveSerdeIssue/sampleCSV/'
;
create external table sampleJoinTarget (md5hash string, filepath string, 
datestamp string, nblines string, nberrors string)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/'
;
===============
Now, try the following JOIN:
ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
SELECT 
sampleCSV.md5hash, 
sampleCSV.filepath 
FROM sampleCSV
JOIN sampleJoinTarget
ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
;
—
This will fail with the error:
Execution log at: /tmp/soam/.log
java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde
Continuing ...
2014-03-11 10:35:03 Starting to launch local task to process map join; maximum 
memory = 238551040
Execution failed with exit status: 2
Obtaining error information
Task failed!
Task ID:
Stage-4
Logs:
/var/log/hive/soam/hive.log
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
Try the following LEFT JOIN. This will work:
SELECT 
sampleCSV.md5hash, 
sampleCSV.filepath 
FROM sampleCSV
LEFT JOIN sampleJoinTarget
ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
;
==


> ClassNotFound with Serde
> ------------------------
>
>                 Key: HIVE-6670
>                 URL: https://issues.apache.org/jira/browse/HIVE-6670
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.12.0
>            Reporter: Abin Shahab
>
> We are finding a ClassNotFound exception when we use 
> CSVSerde(https://github.com/ogrodnek/csv-serde) to create a table.
> This is happening because MapredLocalTask does not pass the local added jars 
> to ExecDriver when that is launched.
> ExecDriver's classpath does not include the added jars. Therefore, when the 
> plan is deserialized, it throws a ClassNotFoundException in the 
> deserialization code, and results in a TableDesc object with a Null 
> DeserializerClass.
> This results in an NPE during Fetch.
> Steps to reproduce:
> wget 
> https://drone.io/github.com/ogrodnek/csv-serde/files/target/csv-serde-1.1.2-0.11.0-all.jar
>  into somewhere local eg. 
> /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar.
> Place some sample SCV files in HDFS as follows:
> hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleCSV/
> hdfs dfs -put /home/soam/sampleCSV.csv /user/soam/HiveSerdeIssue/sampleCSV/
> hdfs dfs -mkdir /user/soam/HiveSerdeIssue/sampleJoinTarget/
> hdfs dfs -put /home/soam/sampleJoinTarget.csv 
> /user/soam/HiveSerdeIssue/sampleJoinTarget/
> ====
> create the tables in hive:
> ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
> create external table sampleCSV (md5hash string, filepath string)
> row format serde 'com.bizo.hive.serde.csv.CSVSerde'
> stored as textfile
> location '/user/soam/HiveSerdeIssue/sampleCSV/'
> ;
> create external table sampleJoinTarget (md5hash string, filepath string, 
> datestamp string, nblines string, nberrors string)
> ROW FORMAT DELIMITED 
> FIELDS TERMINATED BY ',' 
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE
> LOCATION '/user/soam/HiveSerdeIssue/sampleJoinTarget/'
> ;
> ===============
> Now, try the following JOIN:
> ADD JAR /home/soam/HiveSerdeIssue/csv-serde-1.1.2-0.11.0-all.jar;
> SELECT 
> sampleCSV.md5hash, 
> sampleCSV.filepath 
> FROM sampleCSV
> JOIN sampleJoinTarget
> ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
> ;
> —
> This will fail with the error:
> Execution log at: /tmp/soam/.log
> java.lang.ClassNotFoundException: com/bizo/hive/serde/csv/CSVSerde
> Continuing ...
> 2014-03-11 10:35:03 Starting to launch local task to process map join; 
> maximum memory = 238551040
> Execution failed with exit status: 2
> Obtaining error information
> Task failed!
> Task ID:
> Stage-4
> Logs:
> /var/log/hive/soam/hive.log
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
> Try the following LEFT JOIN. This will work:
> SELECT 
> sampleCSV.md5hash, 
> sampleCSV.filepath 
> FROM sampleCSV
> LEFT JOIN sampleJoinTarget
> ON (sampleCSV.md5hash = sampleJoinTarget.md5hash) 
> ;
> ==



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to