[ https://issues.apache.org/jira/browse/HIVE-15302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710922#comment-15710922 ]
Xuefu Zhang commented on HIVE-15302: ------------------------------------ I think there are two dependency on Spark from Hive 1. Spark runtime classes which used to be in spark-assembly.jar 2. spark-submit.sh script that is used to submit spark application for a hive session. For #1, I think spark.yarn.jars or spark.yarn.archive will do. For #2, I think we still need SPARK_HOME unless we clone a simplified spark installation in Hive directory structure, which is not ideal. Thus, SPARK_HOME seems still required. If so, Hive can automatically figure out spark.yarn.jars or spark.yarn.archive if it's not already set, from SPARK_HOME. To speed file distribution, an admin can point any of this properties to an HDFS location, which requires admin manually upload files to HDFS beforehand. As to spark.yarn.archive, I think one needs to zip all the jars, not the folder that contains the jar. However, I didn't try and verify this. > Relax the requirement that HoS needs Spark built w/o Hive > --------------------------------------------------------- > > Key: HIVE-15302 > URL: https://issues.apache.org/jira/browse/HIVE-15302 > Project: Hive > Issue Type: Improvement > Reporter: Rui Li > Assignee: Rui Li > > This requirement becomes more and more unacceptable as SparkSQL becomes > widely adopted. Let's use this JIRA to find out how we can relax the > limitation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)