RE: [Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-06-20 Thread Andrew Lee
Hi Roberto, I'm not an EMR person, but it looks like option -h is deploying the necessary dataneucleus JARs for you.The req for HiveContext is the hive-site.xml and dataneucleus JARs. As long as these 2 are there, and Spark is compiled with -Phive, it should work. spark-shell runs in yarn-client

RE: The auxService:spark_shuffle does not exist

2015-07-17 Thread Andrew Lee
I have encountered the same problem after following the document. Here's my spark-defaults.confspark.shuffle.service.enabled true spark.dynamicAllocation.enabled true spark.dynamicAllocation.executorIdleTimeout 60 spark.dynamicAllocation.cachedExecutorIdleTimeout 120 spark.dynamicAllocation.in

Re: Spark build failure with com.oracle:ojdbc6:jar:11.2.0.1.0

2016-05-09 Thread Andrew Lee
In fact, it does require ojdbc from Oracle which also requires a username and password. This was added as part of the testing scope for Oracle's docker. I notice this PR and commit in branch-2.0 according to https://issues.apache.org/jira/browse/SPARK-12941. In the comment, I'm not sure what d

Spark 2.0 preview - How to configure warehouse for Catalyst? always pointing to /user/hive/warehouse

2016-06-17 Thread Andrew Lee
>From branch-2.0, Spark 2.0.0 preview, I found it interesting, no matter what you do by configuring spark.sql.warehouse.dir it will always pull up the default path which is /user/hive/warehouse In the code, I notice that at LOC45 ./sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/a

RE: The auxService:spark_shuffle does not exist

2015-07-21 Thread Andrew Lee
@databricks.com To: alee...@hotmail.com CC: zjf...@gmail.com; rp...@njit.edu; user@spark.apache.org Hi all, Did you forget to restart the node managers after editing yarn-site.xml by any chance? -Andrew 2015-07-17 8:32 GMT-07:00 Andrew Lee : I have encountered the same problem after followi

RE: The auxService:spark_shuffle does not exist

2015-07-21 Thread Andrew Lee
Hi Andrew, Thanks for the advice. I didn't see the log in the NodeManager, so apparently, something was wrong with the yarn-site.xml configuration. After digging in more, I realize it was an user error. I'm sharing this with other people so others may know what mistake I have made. When I review

RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2015-02-03 Thread Andrew Lee
Hi All, In Spark 1.2.0-rc1, I have tried to set the hive.metastore.warehouse.dir to share with the Hive warehouse location on HDFS, however, it does NOT work on yarn-cluster mode. On the Namenode audit log, I see that spark is trying to access the default hive warehouse location which is /user/

RE: hadoopConfiguration for StreamingContext

2015-02-10 Thread Andrew Lee
It looks like this is related to the underlying Hadoop configuration. Try to deploy the Hadoop configuration with your job with --files and --driver-class-path, or to the default /etc/hadoop/conf core-site.xml. If that is not an option (depending on how your Hadoop cluster is setup), then hard co

RE: SparkSQL + Tableau Connector

2015-02-11 Thread Andrew Lee
I'm using mysql as the metastore DB with Spark 1.2.I simply copy the hive-site.xml to /etc/spark/ and added the mysql JDBC JAR to spark-env.sh in /etc/spark/, everything works fine now. My setup looks like this. Tableau => Spark ThriftServer2 => HiveServer2 It's talking to Tableau Desktop 8.3. In

RE: Is the Thrift server right for me?

2015-02-11 Thread Andrew Lee
I have ThriftServer2 up and running, however, I notice that it relays the query to HiveServer2 when I pass the hive-site.xml to it. I'm not sure if this is the expected behavior, but based on what I have up and running, the ThriftServer2 invokes HiveServer2 that results in MapReduce or Tez query

RE: Is the Thrift server right for me?

2015-02-11 Thread Andrew Lee
heck your hive-site.xml. Are you directing to the hive server 2 port instead of spark thrift port? Their default ports are both 1. From: Andrew Lee [mailto:alee...@hotmail.com] Sent: Wednesday, February 11, 2015 12:00 PM To: sjbrunst; user@spark.apache.org Subject: RE: Is the Th

RE: SparkSQL + Tableau Connector

2015-02-11 Thread Andrew Lee
Sorry folks, it is executing Spark jobs instead of Hive jobs. I mis-read the logs since there were other activities going on on the cluster. From: alee...@hotmail.com To: ar...@sigmoidanalytics.com; tsind...@gmail.com CC: user@spark.apache.org Subject: RE: SparkSQL + Tableau Connector Date: Wed,

RE: SparkSQL + Tableau Connector

2015-02-17 Thread Andrew Lee
: Running query ' cache table test ' 15/02/11 19:25:38 INFO MemoryStore: ensureFreeSpace(211383) called with curMem=101514, maxMem=278019440 15/02/11 19:25:38 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 206.4 KB, free 264.8 MB) I see no way in

RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2015-02-17 Thread Andrew Lee
HI All, Just want to give everyone an update of what worked for me. Thanks for Cheng's comment and other ppl's help. So what I misunderstood was the --driver-class-path and how that was related to --files. I put both /etc/hive/hive-site.xml in both --files and --driver-class-path when I started

GSSException when submitting Spark job in yarn-cluster mode with HiveContext APIs on Kerberos cluster

2015-04-20 Thread Andrew Lee
Hi All, Affected version: spark 1.2.1 / 1.2.2 / 1.3-rc1 Posting this problem to user group first to see if someone is encountering the same problem. When submitting spark jobs that invokes HiveContext APIs on a Kerberos Hadoop + YARN (2.4.1) cluster, I'm getting this error. javax.security.sasl.

RE: GSSException when submitting Spark job in yarn-cluster mode with HiveContext APIs on Kerberos cluster

2015-04-20 Thread Andrew Lee
.com > CC: user@spark.apache.org > > I think you want to take a look at: > https://issues.apache.org/jira/browse/SPARK-6207 > > On Mon, Apr 20, 2015 at 1:58 PM, Andrew Lee wrote: > > Hi All, > > > > Affected version: spark 1.2.1 / 1.2.2 / 1.3-rc1 > > >

HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode

2014-06-18 Thread Andrew Lee
Hi All, Have anyone ran into the same problem? By looking at the source code in official release (rc11),this property settings is set to false by default, however, I'm seeing the .sparkStaging folder remains on the HDFS and causing it to fill up the disk pretty fast since SparkContext deploys th

RE: HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode

2014-06-18 Thread Andrew Lee
Forgot to mention that I am using spark-submit to submit jobs, and a verbose mode print out looks like this with the SparkPi examples.The .sparkStaging won't be deleted. My thoughts is that this should be part of the staging and should be cleaned up as well when sc gets terminated. [tes

RE: HDFS folder .sparkStaging not deleted and filled up HDFS in yarn mode

2014-06-23 Thread Andrew Lee
I checked the source code, it looks like it was re-added back based on JIRA SPARK-1588, but I don't know if there's any test case associated with this? SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN. Sandy Ryza 2014-04-29 12:54:02 -0700 Commit: 5f48721, git

RE: write event logs with YARN

2014-07-02 Thread Andrew Lee
Hi Christophe, Make sure you have 3 slashes in the hdfs scheme. e.g. hdfs:///:9000/user//spark-events and in the spark-defaults.conf as well.spark.eventLog.dir=hdfs:///:9000/user//spark-events > Date: Thu, 19 Jun 2014 11:18:51 +0200 > From: christophe.pre...@kelkoo.com > To: user@spark.apache.org

Enable Parsing Failed or Incompleted jobs on HistoryServer (YARN mode)

2014-07-02 Thread Andrew Lee
Hi All, I have HistoryServer up and running, and it is great. Is it possible to also enable HsitoryServer to parse failed jobs event by default as well? I get "No Completed Applications Found" if job fails. =Event Log Location: hdfs:///user/test01/spark/logs/No Completed Applications Foun

RE: Enable Parsing Failed or Incompleted jobs on HistoryServer (YARN mode)

2014-07-07 Thread Andrew Lee
in the history server faster. Haven't reliably tested this though. May just be a coincidence of timing. -Suren On Wed, Jul 2, 2014 at 8:01 PM, Andrew Lee wrote: Hi All, I have HistoryServer up and running, and it is great. Is it possible to also enable HsitoryServer to parse failed jo

RE: Spark logging strategy on YARN

2014-07-07 Thread Andrew Lee
Hi Kudryavtsev, Here's what I am doing as a common practice and reference, I don't want to say it is best practice since it requires a lot of customer experience and feedback, but from a development and operating stand point, it will be great to separate the YARN container logs with the Spark lo

spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file option?

2014-07-08 Thread Andrew Lee
Build: Spark 1.0.0 rc11 (git commit tag: 2f1dc868e5714882cf40d2633fb66772baf34789) Hi All, When I enabled the spark-defaults.conf for the eventLog, spark-shell broke while spark-submit works. I'm trying to create a separate directory per user to keep track with their own Spark job event

RE: SPARK_CLASSPATH Warning

2014-07-11 Thread Andrew Lee
As mentioned, deprecated in Spark 1.0+. Try to use the --driver-class-path: ./bin/spark-shell --driver-class-path yourlib.jar:abc.jar:xyz.jar Don't use glob *, specify the JAR one by one with colon. Date: Wed, 9 Jul 2014 13:45:07 -0700 From: kat...@cs.pitt.edu Subject: SPARK_CLASSPATH Warning To

RE: spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file option?

2014-07-11 Thread Andrew Lee
Ok, I found it on JIRA SPARK-2390: https://issues.apache.org/jira/browse/SPARK-2390 So it looks like this is a known issue. From: alee...@hotmail.com To: user@spark.apache.org Subject: spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file option? Date: Tue, 8 Jul 2014 15:17:00 -070

RE: Hive From Spark

2014-07-21 Thread Andrew Lee
Hi All, Currently, if you are running Spark HiveContext API with Hive 0.12, it won't work due to the following 2 libraries which are not consistent with Hive 0.12 and Hadoop as well. (Hive libs aligns with Hadoop libs, and as a common practice, they should be consistent to work inter-operable).

RE: Hive From Spark

2014-07-22 Thread Andrew Lee
> problems in theory, and you show it causes a problem in practice. Not > to mention it causes issues for Hive-on-Spark now. > > On Mon, Jul 21, 2014 at 6:27 PM, Andrew Lee wrote: > > Hive and Hadoop are using an older version of guava libraries (11.0.1) where > >

RE: Spark SQL and Hive tables

2014-07-25 Thread Andrew Lee
Hi Michael, If I understand correctly, the assembly JAR file is deployed onto HDFS /user/$USER/.stagingSpark folders that will be used by all computing (worker) nodes when people run in yarn-cluster mode. Could you elaborate more what does the document mean by this? It is a bit misleading and I

RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-25 Thread Andrew Lee
Hi Jianshi, Could you provide which HBase version you're using? By the way, a quick sanity check on whether the Workers can access HBase? Were you able to manually write one record to HBase with the serialize function? Hardcode and test it ? From: jianshi.hu...@gmail.com Date: Fri, 25 Jul 2014 15

Issues on spark-shell and spark-submit behave differently on spark-defaults.conf parameter spark.eventLog.dir

2014-07-28 Thread Andrew Lee
Hi All, Not sure if anyone has ran into this problem, but this exist in spark 1.0.0 when you specify the location in conf/spark-defaults.conf for spark.eventLog.dir hdfs:///user/$USER/spark/logs to use the $USER env variable. For example, I'm running the command with user 'test'. In spark-submit,

RE: Issues on spark-shell and spark-submit behave differently on spark-defaults.conf parameter spark.eventLog.dir

2014-07-28 Thread Andrew Lee
n the path you provide to spark.eventLog.dir. -Andrew 2014-07-28 12:40 GMT-07:00 Andrew Lee : Hi All, Not sure if anyone has ran into this problem, but this exist in spark 1.0.0 when you specify the location in conf/spark-defaults.conf for spark.eventLog.dir hdfs:///user/$USER/spark/logs to u

RE: Need help, got java.lang.ExceptionInInitializerError in Yarn-Client/Cluster mode

2014-07-28 Thread Andrew Lee
e files so it got that exception. I appended the resource files explicitly to --jars option and it worked fine. The "Caused by..." messages were found in yarn logs actually, I think it might be useful if I can seem them from the console which runs spark-submit. Would that be po

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-31 Thread Andrew Lee
Hi All, It has been awhile, but what I did to make it work is to make sure the followings: 1. Hive is working when you run Hive CLI and JDBC via Hiveserver2 2. Make sure you have the hive-site.xml from above Hive configuration. The problem here is that you want the hive-site.xml from the Hive

Re: Spark Deployment Patterns - Automated Deployment & Performance Testing

2014-07-31 Thread Andrew Lee
You should be able to use either SBT or maven to create your JAR files (not a fat jar), and only deploying the JAR for spark-submit. 1. Sync spark libs and versions with your development env and CLASSPATH in your IDE (unfortunately this needs to be hard copied, and may result in split-brain syn

Re: HiveContext is creating metastore warehouse locally instead of in hdfs

2014-07-31 Thread Andrew Lee
ring Hive tables by using SET command. For example: >> >> hiveContext.hql("SET >> hive.metastore.warehouse.dir=hdfs://localhost:54310/user/hive/warehouse") >> >> >> >> >> On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee < > >> alee526@

RE: Spark SQL, Parquet and Impala

2014-08-02 Thread Andrew Lee
Hi Patrick, In Impala 131, when you update tables and metadata, do you still need to run 'invalidate metadata' in impala-shell? My understanding is that it is a pull architecture to refresh the metastore on the catalogd in Impala, not sure if this still applies to this case since you are updatin

RE: Hive From Spark

2014-08-22 Thread Andrew Lee
(false).setMaster("local").setAppName("test data > > exchange with Hive") > > conf.set("spark.driver.host", "localhost") > > val sc = new SparkContext(conf) > > val rdd = sc.makeRDD(Seq(rec)) > > rdd.map((x: MyRe

RE: Hive From Spark

2014-08-25 Thread Andrew Lee
though - might > >be too risky at this point. > > > >I'm not familiar with spark-sql. > > > >On Fri, Aug 22, 2014 at 11:25 AM, Andrew Lee wrote: > >> Hopefully there could be some progress on SPARK-2420. It looks like > >>shading > >> ma

RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-12-29 Thread Andrew Lee
Hi All, I have tried to pass the properties via the SparkContext.setLocalProperty and HiveContext.setConf, both failed. Based on the results (haven't get a chance to look into the code yet), HiveContext will try to initiate the JDBC connection right away, I couldn't set other properties dynamica

RE: Spark sql failed in yarn-cluster mode when connecting to non-default hive database

2014-12-29 Thread Andrew Lee
A follow up on the hive-site.xml, if you 1. Specify it in spark/conf, then you can NOT apply it via the --driver-class-path option, otherwise, you will get the following exceptions when initializing SparkContext. org.apache.spark.SparkException: Found both spark.driver.extraClassPath and

Spark 0.9.0-incubation + Apache Hadoop 2.2.0 + YARN encounter Compression codec com.hadoop.compression.lzo.LzoCodec not found

2014-03-17 Thread Andrew Lee
Hi All, I have been contemplating at this problem and couldn't figure out what is missing in the configuration. I traced the script and try to look for CLASSPATH and see what is included, however, I couldn't find any place that is honoring/inheriting HADOOP_CLASSPATH (or pulling in any map-reduce

Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

2014-03-25 Thread Andrew Lee
Hi All, I'm getting the following error when I execute start-master.sh which also invokes spark-class at the end. Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/ You need to build Spark with 'sbt/sbt assembly' before running this program. After digging into the cod

RE: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

2014-03-25 Thread Andrew Lee
builtin to the jar it self so need for random class paths. On Tue, Mar 25, 2014 at 1:47 PM, Andrew Lee wrote: Hi All, I'm getting the following error when I execute start-master.sh which also invokes spark-class at the end. Failed to find Spark assembly in /root/spark/assemb

RE: Using an external jar in the driver, in yarn-standalone mode.

2014-03-25 Thread Andrew Lee
Hi Julien, The ADD_JAR doesn't work in the command line. I checked spark-class, and I couldn't find any Bash shell bringing in the variable ADD_JAR to the CLASSPATH. Were you able to print out the properties and environment variables from the Web GUI? localhost:4040 This should give you an idea w

spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-02 Thread Andrew Lee
Hi All, I encountered this problem when the firewall is enabled between the spark-shell and the Workers. When I launch spark-shell in yarn-client mode, I notice that Workers on the YARN containers are trying to talk to the driver (spark-shell), however, the firewall is not opened and caused time

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-02 Thread Andrew Lee
y 2014 14:49:23 -0400 Subject: Re: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication From: yana.kadiy...@gmail.com To: user@spark.apache.org I think what you want to do is set spark.driver.port to a fixed port. On Fri, May 2, 2014 at 1:52 PM, Andrew Lee wr

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-04 Thread Andrew Lee
tp://apache-spark-user-list.1001560.n3.nabble.com/Securing-Spark-s-Network-tp4832p4984.html [2] http://en.wikipedia.org/wiki/Ephemeral_port [3] http://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html Jacob D. Eisinger IBM Emerging Technologies jeis...@us.ibm.com - (512

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-06 Thread Andrew Lee
ng Technologies jeis...@us.ibm.com - (512) 286-6075 Andrew Lee ---05/04/2014 09:57:08 PM---Hi Jacob, Taking both concerns into account, I'm actually thinking about using a separate subnet to From: Andrew Lee To: "user@spark.apache.org" Date: 05/04/2014 09:57 PM Subject:

RE: run spark0.9.1 on yarn with hadoop CDH4

2014-05-06 Thread Andrew Lee
Please check JAVA_HOME. Usually it should point to /usr/java/default on CentOS/Linux. or FYI: http://stackoverflow.com/questions/1117398/java-home-directory > Date: Tue, 6 May 2014 00:23:02 -0700 > From: sln-1...@163.com > To: u...@spark.incubator.apache.org > Subject: run spark0.9.1 on yarn wit

Is spark 1.0.0 "spark-shell --master=yarn" running in yarn-cluster mode or yarn-client mode?

2014-05-21 Thread Andrew Lee
Does anyone know if: ./bin/spark-shell --master yarn is running yarn-cluster or yarn-client by default? Base on source code: ./core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala if (args.deployMode == "cluster" && args.master.startsWith("yarn")) { args.master = "yarn-cl

RE: Is spark 1.0.0 "spark-shell --master=yarn" running in yarn-cluster mode or yarn-client mode?

2014-05-21 Thread Andrew Lee
nd so it falls into the second "if" case you mentioned: if (args.deployMode != "cluster" && args.master.startsWith("yarn")) { args.master = "yarn-client"} 2014-05-21 10:57 GMT-07:00 Andrew Lee : Does anyone know if: ./bin/spark-shell --master yarn