Hi,
This is about spark 0.9.
I have a 3 node spark cluster. I want to add a locally available jarfile
(present on all nodes) to the SPARK_CLASPATH variable in
/etc/spark/conf/spark-env.sh so that all nodes can access it.
Question is,
should I edit 'spark-env.sh' on all nodes to add the jar ?
Thanks, I hope this problem will go away once I upgrade to spark 1.0 where we
can send the clusterwide classpaths using spark-submit command
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/question-about-setting-SPARK-CLASSPATH-IN-spark-env-sh-tp7809p7822.ht
by the way, any idea how to sync the spark config dir with other nodes in the
cluster?
~santhosh
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/question-about-setting-SPARK-CLASSPATH-IN-spark-env-sh-tp7809p7853.html
Sent from the Apache Spark User List mai
I am also facing the same problem. I have implemented Serializable for my
code, but the exception is thrown from third party libraries on which I have
no control .
Exception in thread "main" org.apache.spark.SparkException: Job aborted:
Task not serializable: java.io.NotSerializableException: (li
Can someone answer this question please?
Specifically about the Serializable implementation of dependent jars .. ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/java-io-NotSerializableException-Of-dependent-Java-lib-tp1973p3087.html
Sent from the Apache
This worked great. Thanks a lot
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Java-API-Serialization-Issue-tp1460p3178.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hello
I have a requirement to set some env values for my spark jobs.
Does anyone know how to set them? Specifically following variables:
1) ORACLE_HOME
2) LD_LIBRARY_PATH
thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-environment-varia
I have been writing map-reduce on hadoop using PIG , and is now trying to
migrate to SPARK.
My cluster consists of multiple nodes, and the jobs depend on a native
library (.so files).
In hadoop and PIG , I could distribute the files across nodes using
"-files" or "-archive" option, but I could no
I tried it, it did not work
conf.setExecutorEnv("ORACLE_HOME", orahome)
conf.setExecutorEnv("LD_LIBRARY_PATH", ldpath)
Any idea how to set it using java.library.path ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-environment-variabl
OK, it was working.
I printed System.getenv(..) for both env variables and they gave correct
values.
However it did not give me the intended result. My intention was to load a
native library from LD_LIBRARY_PATH, but looks like the library is loaded
from value of -Djava.library.path.
Value o
Got it finally, pasting it here so that it will be useful for others
val conf = new SparkConf()
.setJars(jarList);
conf.setExecutorEnv("ORACLE_HOME", myOraHome)
conf.setExecutorEnv("SPARK_JAVA_OPTS",
"-Djava.library.path=/my/custom/path")
--
View this message in context:
http:/
Curious to know, were you able to do distributed caching for spark?
I have done that for hadoop and pig, but could not find a way to do it in
spark
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Configuring-distributed-caching-with-Spark-and-YARN-tp1074p33
I think with addJar() there is no 'caching', in the sense files will be
copied everytime per job.
Whereas in hadoop distributed cache, files will be copied only once, and a
symlink will be created to the cache file for subsequent runs:
https://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/fi
13 matches
Mail list logo