How to build spark with Hive 1.x ?

2015-06-10 Thread Neal Yin
I am trying to build spark 1.3 branch with Hive 1.1.0. mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -Phive-0.13.1 -Dhive.version=1.1.0 -Dhive.version.short=1.1.0 -DskipTests clean package I got following error Failed to execute goal on project spark-hive_2.10: Coul

HiveContext creation failed with Kerberos

2015-12-07 Thread Neal Yin
Hi I am using Spark 1.5.1 with CDH 5.4.2. My cluster is kerberos protected. Here is pseudocode for what I am trying to do. ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(“foo", “…") ugi.doAs( new PrivilegedExceptionAction() { val sparkConf: SparkConf = createSparkConf(…)

Re: HiveContext creation failed with Kerberos

2015-12-09 Thread Neal Yin
:09 AM To: "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: HiveContext creation failed with Kerberos On 8 Dec 2015, at 06:52, Neal Yin mailto:neal@workday.com>> wrote: 15/12/08 04:12:28 ERROR transport.TSaslT

Where is yarn-shuffle.jar in maven?

2016-12-12 Thread Neal Yin
Hi, For dynamic allocation feature, I need spark-xxx-yarn-shuffle.jar. In my local spark build, I can see it. But in maven central, I can't find it. My build script pulls all jars from maven central. The only option is to check in this jar into git? Thanks, -Neal

data frame API, change groupBy result column name

2015-03-30 Thread Neal Yin
I ran a line like following: tb2.groupBy("city", "state").avg("price").show I got result: city state AVG(price) Charlestown New South Wales 1200.0 Newton ... MA 1200.0 Coral Gables ... FL 1200.0 CastricumNoord-H

Re: Spark Yarn-client Kerberos on remote cluster

2015-04-14 Thread Neal Yin
If your localhost can¹t talk to a KDC, you can¹t access a kerberized cluster. Only key tab file is not enough. -Neal On 4/14/15, 3:54 AM, "philippe L" wrote: >Dear All, > >I would like to know if its possible to configure the SparkConf() in order >to interact with a remote kerberized cluster i

Re: Running Spark on Gateway - Connecting to Resource Manager Retries

2015-04-14 Thread Neal Yin
Your Yarn access is not configured. 0.0.0.0:8032 this is default yarn address. I guess you don't have yarn-site.xml in your classpath. -Neal From: Vineet Mishra mailto:clearmido...@gmail.com>> Date: Tuesday, April 14, 2015 at 12:05 AM To: "user@spark.apache.org

dataframe call, how to control number of tasks for a stage

2015-04-16 Thread Neal Yin
I have some trouble to control number of spark tasks for a stage. This on latest spark 1.3.x source code build. val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) sc.getConf.get("spark.default.parallelism") -> setup to 10 val t1 = hiveContext.sql("FROM SalesJan2009 select * ") val