Hi, I have created a hive external table (documents) pointing to a S3 location which has a file to be read. I have a python UDF which does URL processing. When I fire the below SQL it fails as seen in the log below. I thought it fails for specific records and when I delete those specific records, the sql fails for some other random record. Now when I put all those bad records in the file and fire the SQL, it succeeds. Therefore, I conclude that this is not a data specific issue. Can someone help me out ? I even tried by changing the execution engine to MR from Tez but the problem persists. I even tried by disabling the options hive.vectorized.execution.enabled and hive.vectorized.execution.reduce.enabled but the SQL still fails. I am using hive on AWS EMR cluster. HDP and hive details are "Core Hadoop: Hadoop 2.7.3 with Ganglia 3.7.2, Hive 2.1.1". Any help would be appreciated.
SQL: insert overwrite table DOCUMENT_DETAILS select TRANSFORM(trim(object_link),alt_text) USING '/usr/bin/python add.py' AS one,two FROM documents ; ERROR Log: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"alt_text":" PDF ","attribution_start_csa":"57311362699630881778873386187","clicked_timestamp":1481152260268,"csa_number":"67608802335473353968528973230","object_id":"","object_link":"/documentation/cdl/en/procstat/63104/PDF/default/procstat.pdf","object_name":"","session_number":"37240710392470218904507475705","view_sequence_no":0,"object_selector_path":"html > body > div#container > div#docleftcolumn > div.tocLinks > a"} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:455) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"alt_text":" PDF ","attribution_start_csa":"57311362699630881778873386187","clicked_timestamp":1481152260268,"csa_number":"67608802335473353968528973230","object_id":"","object_link":"/documentation/cdl/en/procstat/63104/PDF/default/procstat.pdf","object_name":"","session_number":"37240710392470218904507475705","view_sequence_no":0,"object_selector_path":"html > body > div#container > div#docleftcolumn > div.tocLinks > a"} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:499) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160) ... 8 more Regards, Sujeet Singh Pardeshi Software Specialist SAS Research and Development (India) Pvt. Ltd. Level 2A and Level 3, Cybercity, Magarpatta, Hadapsar Pune, Maharashtra, 411 013 off: +91-20-49118448 [Description: untitled] "When the solution is simple, God is answering..."