Hi Roberto,
I'm not an EMR person, but it looks like option -h is deploying the necessary
dataneucleus JARs for you.The req for HiveContext is the hive-site.xml and
dataneucleus JARs. As long as these 2 are there, and Spark is compiled with
-Phive, it should work.
spark-shell runs in yarn-client
I have encountered the same problem after following the document.
Here's my spark-defaults.confspark.shuffle.service.enabled true
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.executorIdleTimeout 60
spark.dynamicAllocation.cachedExecutorIdleTimeout 120
spark.dynamicAllocation.in
In fact, it does require ojdbc from Oracle which also requires a username and
password. This was added as part of the testing scope for Oracle's docker.
I notice this PR and commit in branch-2.0 according to
https://issues.apache.org/jira/browse/SPARK-12941.
In the comment, I'm not sure what d
>From branch-2.0, Spark 2.0.0 preview,
I found it interesting, no matter what you do by configuring
spark.sql.warehouse.dir
it will always pull up the default path which is /user/hive/warehouse
In the code, I notice that at LOC45
./sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/a
@databricks.com
To: alee...@hotmail.com
CC: zjf...@gmail.com; rp...@njit.edu; user@spark.apache.org
Hi all,
Did you forget to restart the node managers after editing yarn-site.xml by any
chance?
-Andrew
2015-07-17 8:32 GMT-07:00 Andrew Lee :
I have encountered the same problem after followi
Hi Andrew,
Thanks for the advice. I didn't see the log in the NodeManager, so apparently,
something was wrong with the yarn-site.xml configuration.
After digging in more, I realize it was an user error. I'm sharing this with
other people so others may know what mistake I have made.
When I review
Hi All,
In Spark 1.2.0-rc1, I have tried to set the hive.metastore.warehouse.dir to
share with the Hive warehouse location on HDFS, however, it does NOT work on
yarn-cluster mode. On the Namenode audit log, I see that spark is trying to
access the default hive warehouse location which is
/user/
It looks like this is related to the underlying Hadoop configuration.
Try to deploy the Hadoop configuration with your job with --files and
--driver-class-path, or to the default /etc/hadoop/conf core-site.xml.
If that is not an option (depending on how your Hadoop cluster is setup), then
hard co
I'm using mysql as the metastore DB with Spark 1.2.I simply copy the
hive-site.xml to /etc/spark/ and added the mysql JDBC JAR to spark-env.sh in
/etc/spark/, everything works fine now.
My setup looks like this.
Tableau => Spark ThriftServer2 => HiveServer2
It's talking to Tableau Desktop 8.3. In
I have ThriftServer2 up and running, however, I notice that it relays the query
to HiveServer2 when I pass the hive-site.xml to it.
I'm not sure if this is the expected behavior, but based on what I have up and
running, the ThriftServer2 invokes HiveServer2 that results in MapReduce or Tez
query
heck your hive-site.xml. Are you directing to the hive server 2 port instead
of spark thrift port?
Their default ports are both 1.
From: Andrew Lee [mailto:alee...@hotmail.com]
Sent: Wednesday, February 11, 2015 12:00 PM
To: sjbrunst; user@spark.apache.org
Subject: RE: Is the Th
Sorry folks, it is executing Spark jobs instead of Hive jobs. I mis-read the
logs since there were other activities going on on the cluster.
From: alee...@hotmail.com
To: ar...@sigmoidanalytics.com; tsind...@gmail.com
CC: user@spark.apache.org
Subject: RE: SparkSQL + Tableau Connector
Date: Wed,
: Running query '
cache table test '
15/02/11 19:25:38 INFO MemoryStore: ensureFreeSpace(211383) called with
curMem=101514, maxMem=278019440
15/02/11 19:25:38 INFO MemoryStore: Block broadcast_2 stored as values in
memory (estimated size 206.4 KB, free 264.8 MB)
I see no way in
HI All,
Just want to give everyone an update of what worked for me. Thanks for Cheng's
comment and other ppl's help.
So what I misunderstood was the --driver-class-path and how that was related to
--files. I put both /etc/hive/hive-site.xml in both --files and
--driver-class-path when I started
Hi All,
Affected version: spark 1.2.1 / 1.2.2 / 1.3-rc1
Posting this problem to user group first to see if someone is encountering the
same problem.
When submitting spark jobs that invokes HiveContext APIs on a Kerberos Hadoop +
YARN (2.4.1) cluster, I'm getting this error.
javax.security.sasl.
.com
> CC: user@spark.apache.org
>
> I think you want to take a look at:
> https://issues.apache.org/jira/browse/SPARK-6207
>
> On Mon, Apr 20, 2015 at 1:58 PM, Andrew Lee wrote:
> > Hi All,
> >
> > Affected version: spark 1.2.1 / 1.2.2 / 1.3-rc1
> >
>
Hi All,
Have anyone ran into the same problem? By looking at the source code in
official release (rc11),this property settings is set to false by default,
however, I'm seeing the .sparkStaging folder remains on the HDFS and causing it
to fill up the disk pretty fast since SparkContext deploys th
Forgot to mention that I am using spark-submit to submit jobs, and a verbose
mode print out looks like this with the SparkPi examples.The .sparkStaging
won't be deleted. My thoughts is that this should be part of the staging and
should be cleaned up as well when sc gets terminated.
[tes
I checked the source code, it looks like it was re-added back based on JIRA
SPARK-1588, but I don't know if there's any test case associated with this?
SPARK-1588. Restore SPARK_YARN_USER_ENV and SPARK_JAVA_OPTS for YARN.
Sandy Ryza
2014-04-29 12:54:02 -0700
Commit: 5f48721, git
Hi Christophe,
Make sure you have 3 slashes in the hdfs scheme.
e.g.
hdfs:///:9000/user//spark-events
and in the spark-defaults.conf as
well.spark.eventLog.dir=hdfs:///:9000/user//spark-events
> Date: Thu, 19 Jun 2014 11:18:51 +0200
> From: christophe.pre...@kelkoo.com
> To: user@spark.apache.org
Hi All,
I have HistoryServer up and running, and it is great.
Is it possible to also enable HsitoryServer to parse failed jobs event by
default as well?
I get "No Completed Applications Found" if job fails.
=Event Log Location: hdfs:///user/test01/spark/logs/No Completed
Applications Foun
in the history server faster. Haven't reliably tested
this though. May just be a coincidence of timing.
-Suren
On Wed, Jul 2, 2014 at 8:01 PM, Andrew Lee wrote:
Hi All,
I have HistoryServer up and running, and it is great.
Is it possible to also enable HsitoryServer to parse failed jo
Hi Kudryavtsev,
Here's what I am doing as a common practice and reference, I don't want to say
it is best practice since it requires a lot of customer experience and
feedback, but from a development and operating stand point, it will be great to
separate the YARN container logs with the Spark lo
Build: Spark 1.0.0 rc11 (git commit tag:
2f1dc868e5714882cf40d2633fb66772baf34789)
Hi All,
When I enabled the spark-defaults.conf for the eventLog, spark-shell broke
while spark-submit works.
I'm trying to create a separate directory per user to keep track with their own
Spark job event
As mentioned, deprecated in Spark 1.0+.
Try to use the --driver-class-path:
./bin/spark-shell --driver-class-path yourlib.jar:abc.jar:xyz.jar
Don't use glob *, specify the JAR one by one with colon.
Date: Wed, 9 Jul 2014 13:45:07 -0700
From: kat...@cs.pitt.edu
Subject: SPARK_CLASSPATH Warning
To
Ok, I found it on JIRA SPARK-2390:
https://issues.apache.org/jira/browse/SPARK-2390
So it looks like this is a known issue.
From: alee...@hotmail.com
To: user@spark.apache.org
Subject: spark-1.0.0-rc11 2f1dc868 spark-shell not honoring --properties-file
option?
Date: Tue, 8 Jul 2014 15:17:00 -070
Hi All,
Currently, if you are running Spark HiveContext API with Hive 0.12, it won't
work due to the following 2 libraries which are not consistent with Hive 0.12
and Hadoop as well. (Hive libs aligns with Hadoop libs, and as a common
practice, they should be consistent to work inter-operable).
> problems in theory, and you show it causes a problem in practice. Not
> to mention it causes issues for Hive-on-Spark now.
>
> On Mon, Jul 21, 2014 at 6:27 PM, Andrew Lee wrote:
> > Hive and Hadoop are using an older version of guava libraries (11.0.1) where
> >
Hi Michael,
If I understand correctly, the assembly JAR file is deployed onto HDFS
/user/$USER/.stagingSpark folders that will be used by all computing (worker)
nodes when people run in yarn-cluster mode.
Could you elaborate more what does the document mean by this? It is a bit
misleading and I
Hi Jianshi,
Could you provide which HBase version you're using?
By the way, a quick sanity check on whether the Workers can access HBase?
Were you able to manually write one record to HBase with the serialize
function? Hardcode and test it ?
From: jianshi.hu...@gmail.com
Date: Fri, 25 Jul 2014 15
Hi All,
Not sure if anyone has ran into this problem, but this exist in spark 1.0.0
when you specify the location in conf/spark-defaults.conf for
spark.eventLog.dir hdfs:///user/$USER/spark/logs
to use the $USER env variable.
For example, I'm running the command with user 'test'.
In spark-submit,
n the path you
provide to spark.eventLog.dir.
-Andrew
2014-07-28 12:40 GMT-07:00 Andrew Lee :
Hi All,
Not sure if anyone has ran into this problem, but this exist in spark 1.0.0
when you specify the location in conf/spark-defaults.conf for
spark.eventLog.dir hdfs:///user/$USER/spark/logs
to u
e files so
it got that exception.
I appended the resource files explicitly to --jars option and it worked fine.
The "Caused by..." messages were found in yarn logs actually, I think it might
be useful if I can seem them from the console which runs spark-submit. Would
that be po
Hi All,
It has been awhile, but what I did to make it work is to make sure the
followings:
1. Hive is working when you run Hive CLI and JDBC via Hiveserver2
2. Make sure you have the hive-site.xml from above Hive configuration. The
problem here is that you want the hive-site.xml from the Hive
You should be able to use either SBT or maven to create your JAR files (not a
fat jar), and only deploying the JAR for spark-submit.
1. Sync spark libs and versions with your development env and CLASSPATH in your
IDE (unfortunately this needs to be hard copied, and may result in split-brain
syn
ring Hive tables by using SET command. For example:
>>
>> hiveContext.hql("SET
>> hive.metastore.warehouse.dir=hdfs://localhost:54310/user/hive/warehouse")
>>
>>
>>
>>
>> On Thu, Jul 31, 2014 at 8:05 AM, Andrew Lee <
>
>> alee526@
Hi Patrick,
In Impala 131, when you update tables and metadata, do you still need to run
'invalidate metadata' in impala-shell? My understanding is that it is a pull
architecture to refresh the metastore on the catalogd in Impala, not sure if
this still applies to this case since you are updatin
(false).setMaster("local").setAppName("test data
> > exchange with Hive")
> > conf.set("spark.driver.host", "localhost")
> > val sc = new SparkContext(conf)
> > val rdd = sc.makeRDD(Seq(rec))
> > rdd.map((x: MyRe
though - might
> >be too risky at this point.
> >
> >I'm not familiar with spark-sql.
> >
> >On Fri, Aug 22, 2014 at 11:25 AM, Andrew Lee wrote:
> >> Hopefully there could be some progress on SPARK-2420. It looks like
> >>shading
> >> ma
Hi All,
I have tried to pass the properties via the SparkContext.setLocalProperty and
HiveContext.setConf, both failed. Based on the results (haven't get a chance to
look into the code yet), HiveContext will try to initiate the JDBC connection
right away, I couldn't set other properties dynamica
A follow up on the hive-site.xml, if you
1. Specify it in spark/conf, then you can NOT apply it via the
--driver-class-path option, otherwise, you will get the following exceptions
when initializing SparkContext.
org.apache.spark.SparkException: Found both spark.driver.extraClassPath and
Hi All,
I have been contemplating at this problem and couldn't figure out what is
missing in the configuration. I traced the script and try to look for
CLASSPATH and see what is included, however, I couldn't find any place that
is honoring/inheriting HADOOP_CLASSPATH (or pulling in any map-reduce
Hi All,
I'm getting the following error when I execute start-master.sh which also
invokes spark-class at the end.
Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/
You need to build Spark with 'sbt/sbt assembly' before running this program.
After digging into the cod
builtin to the jar it self so need for random class paths.
On Tue, Mar 25, 2014 at 1:47 PM, Andrew Lee wrote:
Hi All,
I'm getting the following error when I execute start-master.sh which also
invokes spark-class at the end.
Failed to find Spark assembly in /root/spark/assemb
Hi Julien,
The ADD_JAR doesn't work in the command line. I checked spark-class, and I
couldn't find any Bash shell bringing in the variable ADD_JAR to the CLASSPATH.
Were you able to print out the properties and environment variables from the
Web GUI?
localhost:4040
This should give you an idea w
Hi All,
I encountered this problem when the firewall is enabled between the spark-shell
and the Workers.
When I launch spark-shell in yarn-client mode, I notice that Workers on the
YARN containers are trying to talk to the driver (spark-shell), however, the
firewall is not opened and caused time
y 2014 14:49:23 -0400
Subject: Re: spark-shell driver interacting with Workers in YARN mode -
firewall blocking communication
From: yana.kadiy...@gmail.com
To: user@spark.apache.org
I think what you want to do is set spark.driver.port to a fixed port.
On Fri, May 2, 2014 at 1:52 PM, Andrew Lee wr
tp://apache-spark-user-list.1001560.n3.nabble.com/Securing-Spark-s-Network-tp4832p4984.html
[2] http://en.wikipedia.org/wiki/Ephemeral_port
[3]
http://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html
Jacob D. Eisinger
IBM Emerging Technologies
jeis...@us.ibm.com - (512
ng Technologies
jeis...@us.ibm.com - (512) 286-6075
Andrew Lee ---05/04/2014 09:57:08 PM---Hi Jacob, Taking both concerns into
account, I'm actually thinking about using a separate subnet to
From: Andrew Lee
To: "user@spark.apache.org"
Date: 05/04/2014 09:57 PM
Subject:
Please check JAVA_HOME. Usually it should point to /usr/java/default on
CentOS/Linux.
or FYI: http://stackoverflow.com/questions/1117398/java-home-directory
> Date: Tue, 6 May 2014 00:23:02 -0700
> From: sln-1...@163.com
> To: u...@spark.incubator.apache.org
> Subject: run spark0.9.1 on yarn wit
Does anyone know if:
./bin/spark-shell --master yarn
is running yarn-cluster or yarn-client by default?
Base on source code:
./core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
if (args.deployMode == "cluster" && args.master.startsWith("yarn")) {
args.master = "yarn-cl
nd so it falls into the second "if" case you mentioned:
if (args.deployMode != "cluster" && args.master.startsWith("yarn")) {
args.master = "yarn-client"}
2014-05-21 10:57 GMT-07:00 Andrew Lee :
Does anyone know if:
./bin/spark-shell --master yarn
52 matches
Mail list logo