[ 
https://issues.apache.org/jira/browse/HIVE-15105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Premal Shah updated HIVE-15105:
-------------------------------
    Description: 
Hive 2.0.1
Hadoop 2.7.2
Tex 0.8.4

We have a UDF in hive which take in some values and outputs a score. When 
running a query on a table which calls the score function on every row, looks 
like tez is not running the query on YARN, but trying to run it in local mode. 
It then runs out of memory trying to insert that data into a table.

Here's the query

*ADD JAR score.jar;
CREATE TEMPORARY FUNCTION score AS 'hive.udf.ScoreUDF';

CREATE TABLE abc AS
SELECT
    id,
    score(col1, col2) as score
    , '2016-10-11' AS dt
FROM input_table
;*

Here's the output of the shell

Query ID = hadoop_20161028232841_5a06db96-ffaa-4e75-a657-c7cb46ccb3f5
Total jobs = 1
Launching Job 1 out of 1
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
        at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
        at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:622)
        at java.lang.StringBuilder.append(StringBuilder.java:202)
        at com.google.protobuf.TextFormat.escapeBytes(TextFormat.java:1283)
        at 
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:394)
        at 
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
        at 
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
        at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
        at 
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
        at 
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
        at 
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
        at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
        at 
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
        at 
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
        at 
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
        at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
        at 
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
        at 
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
        at 
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:283)
        at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
        at 
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
        at 
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
        at 
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:283)
        at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
        at 
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
        at 
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
        at 
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
        at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
        at 
com.google.protobuf.TextFormat$Printer.access$400(TextFormat.java:248)
        at com.google.protobuf.TextFormat.shortDebugString(TextFormat.java:88)
FAILED: Execution Error, return code -101 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Java heap space


It looks like the job is not getting submitted to the cluster, but running 
locally. We can't get tez to run the query on the cluster. 
The hive shell starts with an Xmx of 4G. 

If I set hive.execution.engine = mr, then the query works, because it runs on 
the hadoop cluster. 


  was:
Hive 2.0.1
Hadoop 2.7.2
Tex 0.8.4

We have a UDF in hive which take in some values and outputs a score. When 
running a query on a table which calls the score function on every row, looks 
like tez is not running the query on YARN, but trying to run it in local mode. 
It then runs out of memory trying to insert that data into a table.

Here's the query

ADD JAR score.jar;
CREATE TEMPORARY FUNCTION score AS 'hive.udf.ScoreUDF';

CREATE TABLE abc AS
SELECT
    id,
    score(col1, col2) as score
    , '2016-10-11' AS dt
FROM input_table
;

Here's the output of the shell

Query ID = hadoop_20161028232841_5a06db96-ffaa-4e75-a657-c7cb46ccb3f5
Total jobs = 1
Launching Job 1 out of 1
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:3332)
        at 
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
        at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
        at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:622)
        at java.lang.StringBuilder.append(StringBuilder.java:202)
        at com.google.protobuf.TextFormat.escapeBytes(TextFormat.java:1283)
        at 
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:394)
        at 
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
        at 
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
        at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
        at 
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
        at 
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
        at 
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
        at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
        at 
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
        at 
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
        at 
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
        at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
        at 
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
        at 
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
        at 
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:283)
        at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
        at 
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
        at 
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
        at 
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:283)
        at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
        at 
com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
        at 
com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
        at 
com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
        at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
        at 
com.google.protobuf.TextFormat$Printer.access$400(TextFormat.java:248)
        at com.google.protobuf.TextFormat.shortDebugString(TextFormat.java:88)
FAILED: Execution Error, return code -101 from 
org.apache.hadoop.hive.ql.exec.tez.TezTask. Java heap space


It looks like the job is not getting submitted to the cluster, but running 
locally. We can't get tez to run the query on the cluster. 
The hive shell starts with an Xmx of 4G. 

If I set hive.execution.engine = mr, then the query works, because it runs on 
the hadoop cluster. 



> Hive shell runs out of memory on Tez
> ------------------------------------
>
>                 Key: HIVE-15105
>                 URL: https://issues.apache.org/jira/browse/HIVE-15105
>             Project: Hive
>          Issue Type: Bug
>          Components: Tez
>    Affects Versions: 2.0.1
>            Reporter: Premal Shah
>
> Hive 2.0.1
> Hadoop 2.7.2
> Tex 0.8.4
> We have a UDF in hive which take in some values and outputs a score. When 
> running a query on a table which calls the score function on every row, looks 
> like tez is not running the query on YARN, but trying to run it in local 
> mode. It then runs out of memory trying to insert that data into a table.
> Here's the query
> *ADD JAR score.jar;
> CREATE TEMPORARY FUNCTION score AS 'hive.udf.ScoreUDF';
> CREATE TABLE abc AS
> SELECT
>     id,
>     score(col1, col2) as score
>     , '2016-10-11' AS dt
> FROM input_table
> ;*
> Here's the output of the shell
> Query ID = hadoop_20161028232841_5a06db96-ffaa-4e75-a657-c7cb46ccb3f5
> Total jobs = 1
> Launching Job 1 out of 1
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:3332)
>         at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
>         at 
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
>         at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:622)
>         at java.lang.StringBuilder.append(StringBuilder.java:202)
>         at com.google.protobuf.TextFormat.escapeBytes(TextFormat.java:1283)
>         at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:394)
>         at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
>         at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
>         at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
>         at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
>         at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
>         at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
>         at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
>         at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
>         at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
>         at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
>         at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
>         at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
>         at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
>         at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:283)
>         at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
>         at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
>         at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
>         at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:283)
>         at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
>         at 
> com.google.protobuf.TextFormat$Printer.printFieldValue(TextFormat.java:404)
>         at 
> com.google.protobuf.TextFormat$Printer.printSingleField(TextFormat.java:327)
>         at 
> com.google.protobuf.TextFormat$Printer.printField(TextFormat.java:286)
>         at com.google.protobuf.TextFormat$Printer.print(TextFormat.java:273)
>         at 
> com.google.protobuf.TextFormat$Printer.access$400(TextFormat.java:248)
>         at com.google.protobuf.TextFormat.shortDebugString(TextFormat.java:88)
> FAILED: Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Java heap space
> It looks like the job is not getting submitted to the cluster, but running 
> locally. We can't get tez to run the query on the cluster. 
> The hive shell starts with an Xmx of 4G. 
> If I set hive.execution.engine = mr, then the query works, because it runs on 
> the hadoop cluster. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to