[jira] [Commented] (HIVE-7292) Hive on Spark

Martin Wang (JIRA) Wed, 01 Jul 2015 01:26:21 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609729#comment-14609729
 ]


Martin Wang commented on HIVE-7292:
-----------------------------------

Hi Chinna Rao Lalam,
    I'm using the CDH-5.3.0-1.cdh5.3.0.p0.280, which is a Cloudera CDH version 
that includes Hive on Spark.
    The total table number is 61. I tested successfully with 35 tables, it will 
use 126 maps to process the data.
    When there's no error, I can see the job in the Spark Web UI(From YARN web 
UI->Application Master web UI). When there is error, the job is not even 
started in Spark Web UI.
    The total rows in 61 tables is about 80,000,000, not very big. The data in 
61 tables is total 4GB.

I ran the command in hive CLI, when there's error, I see below message:

Query ID = root_20150701155757_61cd36ee-3c38-49a8-9c13-1029acffa0d3
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Status: Failed
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.spark.SparkTask


> Hive on Spark
> -------------
>
>                 Key: HIVE-7292
>                 URL: https://issues.apache.org/jira/browse/HIVE-7292
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>              Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
>         Attachments: Hive-on-Spark.pdf
>
>
> Spark as an open-source data analytics cluster computing framework has gained 
> significant momentum recently. Many Hive users already have Spark installed 
> as their computing backbone. To take advantages of Hive, they still need to 
> have either MapReduce or Tez on their cluster. This initiative will provide 
> user a new alternative so that those user can consolidate their backend. 
> Secondly, providing such an alternative further increases Hive's adoption as 
> it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
> on Hadoop.
> Finally, allowing Hive to run on Spark also has performance benefits. Hive 
> queries, especially those involving multiple reducer stages, will run faster, 
> thus improving user experience as Tez does.
> This is an umbrella JIRA which will cover many coming subtask. Design doc 
> will be attached here shortly, and will be on the wiki as well. Feedback from 
> the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7292) Hive on Spark

Reply via email to