Hi,
I have created a hive external table (documents) pointing to a S3 location 
which has a file to be read. I have a python UDF which does URL processing. 
When I fire the below SQL it fails as seen in the log below. I thought it fails 
for specific records and when I delete those specific records, the sql fails 
for some other random record. Now when I put all those bad records in the file 
and fire the SQL, it succeeds. Therefore, I conclude that this is not a data 
specific issue. Can someone help me out ? I even tried by changing the 
execution engine to MR from Tez but the problem persists. I even tried by 
disabling the options hive.vectorized.execution.enabled and 
hive.vectorized.execution.reduce.enabled but the SQL still fails. I am using 
hive on AWS EMR cluster. HDP and hive details are "Core Hadoop: Hadoop 2.7.3 
with Ganglia 3.7.2, Hive 2.1.1". Any help would be appreciated.

SQL:
insert overwrite table DOCUMENT_DETAILS
select TRANSFORM(trim(object_link),alt_text)
USING '/usr/bin/python add.py'
AS one,two
FROM documents
;

ERROR Log:

Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row {"alt_text":"             PDF 
","attribution_start_csa":"57311362699630881778873386187","clicked_timestamp":1481152260268,"csa_number":"67608802335473353968528973230","object_id":"","object_link":"/documentation/cdl/en/procstat/63104/PDF/default/procstat.pdf","object_name":"","session_number":"37240710392470218904507475705","view_sequence_no":0,"object_selector_path":"html
 > body > div#container > div#docleftcolumn > div.tocLinks > a"}
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:455)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row {"alt_text":"             PDF 
","attribution_start_csa":"57311362699630881778873386187","clicked_timestamp":1481152260268,"csa_number":"67608802335473353968528973230","object_id":"","object_link":"/documentation/cdl/en/procstat/63104/PDF/default/procstat.pdf","object_name":"","session_number":"37240710392470218904507475705","view_sequence_no":0,"object_selector_path":"html
 > body > div#container > div#docleftcolumn > div.tocLinks > a"}
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:499)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
        ... 8 more


Regards,

Sujeet Singh Pardeshi

Software Specialist

SAS Research and Development (India) Pvt. Ltd.
Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar  Pune, Maharashtra, 411 
013
off: +91-20-49118448
[Description: untitled]
 "When the solution is simple, God is answering..."

Reply via email to