[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

Rui Li (JIRA) Thu, 29 Oct 2015 00:02:21 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-12229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979944#comment-14979944
 ]


Rui Li commented on HIVE-12229:
-------------------------------

Thanks for your inputs, Xuefu.

1. Good catch. I'll make it work with local[*]. But we don't have to worry 
about local-cluster mode, because executors run in separate JVMs in 
local-cluster mode so the child process will inherit the proper working 
directory.
2. OK. Whether the new file can be added successfully on executor is up to 
spark, i.e. configured by {{spark.files.overwrite}}. Let's just overwrite on 
our side.

> Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-12229
>                 URL: https://issues.apache.org/jira/browse/HIVE-12229
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.1.0
>            Reporter: Lifeng Wang
>            Assignee: Rui Li
>         Attachments: HIVE-12229.1-spark.patch, HIVE-12229.2-spark.patch, 
> HIVE-12229.2-spark.patch
>
>
> Added one python script in the query and the python script cannot be found 
> during execution in yarn-cluster mode.
> {noformat}
> 15/10/21 21:10:55 INFO exec.ScriptOperator: Executing [/usr/bin/python, 
> q2-sessionize.py, 3600]
> 15/10/21 21:10:55 INFO exec.ScriptOperator: tablename=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: partname=null
> 15/10/21 21:10:55 INFO exec.ScriptOperator: alias=null
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 10 rows: used 
> memory = 324896224
> 15/10/21 21:10:55 INFO exec.ScriptOperator: ErrorStreamProcessor calling 
> reporter.progress()
> /usr/bin/python: can't open file 'q2-sessionize.py': [Errno 2] No such file 
> or directory
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread OutputProcessor done
> 15/10/21 21:10:55 INFO exec.ScriptOperator: StreamThread ErrorProcessor done
> 15/10/21 21:10:55 INFO spark.SparkRecordHandler: processing 100 rows: used 
> memory = 325619920
> 15/10/21 21:10:55 ERROR exec.ScriptOperator: Error in writing to script: 
> Stream closed
> 15/10/21 21:10:55 INFO exec.ScriptOperator: The script did not consume all 
> input data. This is considered as an error.
> 15/10/21 21:10:55 INFO exec.ScriptOperator: set 
> hive.exec.script.allow.partial.consumption=true; to ignore it.
> 15/10/21 21:10:55 ERROR spark.SparkReduceRecordHandler: Fatal error: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row 
> (tag=0) 
> {"key":{"reducesinkkey0":2,"reducesinkkey1":3316240655},"value":{"_col0":5529}}
>         at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:340)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95)
>         at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>         at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
>         at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:88)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: 
> An error occurred while reading or writing to your custom script. It may have 
> crashed with an error.
>         at 
> org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:453)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
>         at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
>         at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:331)
>         ... 14 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12229) Custom script in query cannot be executed in yarn-cluster mode [Spark Branch].

Reply via email to