Hi,
I'd like to know where I could find more information related to the
depreciation of the actor system in spark (from 1.4.x).
I'm interested in the reasons for this decision,
Cheers
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-ActorSystem-t
You'd need to provide information such as executor configuration (#cores,
memory size). You might have less scheduler delay with smaller, but more
numerous executors, than the contrary.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/scheduler-delay-time-tp6
If you want to exploit properly the 8 nodes of your cluster, you should use ~
2 times that number for partitioning.
You can specify the number of partitions when calling parallelize, as
following:
JavaRDD pnts = sc.parallelize(points, 16);
--
View this message in context:
http://apache-spar
can you please share your application code?
I suspect that you're not making a good use of the cluster by configuring a
wrong number of partitions in your RDDs.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-my-spark-implementation-parallel-tp2
Hi,
I have several issues related to HDFS, that may have different roots. I'm
posting as much information as I can, with the hope that I can get your
opinion on at least some of them. Basically the cases are:
- HDFS classes not found
- Connections with some datanode seems to be slow/ unexpectedly
Q1: You can change the port number on the master in the file
conf/spark-defaults.conf. I don't know what will be the impact on a cloudera
distro thought.
Q2: Yes: a Spark worker needs to be present on each node which you want to
make available to the driver.
Q3: You can submit an application from
Also, it's worth noting that I'm using the prebuilt version for hadoop 2.4
and higher from the official website.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Issues-when-combining-Spark-and-a-third-party-java-library-tp21367p23770.html
Sent from the Apach
I'm using hadoop 2.5.2 with spark 1.4.0 and I can also see in my logs:
15/07/09 06:39:02 DEBUG HadoopRDD: SplitLocationInfo and other new Hadoop
classes are unavailable. Using the older Hadoop location info code.
java.lang.ClassNotFoundException:
org.apache.hadoop.mapred.InputSplitWithLocationInfo
Same feedback with spark 1.4.0 and hadoop 2.5.2.
Workload is completing tho.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/akka-remote-transport-Transport-InvalidAssociationException-The-remote-system-terminated-the-associan-tp20071p23713.html
Sent from t
I think the properties that you have in your hdfs-site.xml should go in the
core-site.xml (at least for the namenode.name and datanote.data ones). I
might be wrong here, but that's what I have in my setup.
you should also add hadoop.tmp.dir in your core-site.xml. That might be the
source of your i
Can you share your hadoop configuration file please?
- etc/hadoop/core-site.xml
- etc/hadoop/hdfs-site.xml
- etc/hadoop/hadoo-env.sh
AFAIK, the following properties should be configured:
hadoop.tmp.dir, dfs.namenode.name.dir, dfs.datanode.data.dir and
dfs.namenode.checkpoint.dir
Otherwise, an H
Hi,
I've been compiling spark 1.4.0 with SBT, from the source tarball available
on the official website. I cannot run spark's master, even tho I have built
and run several other instance of spark on the same machine (spark 1.3,
master branch, pre built 1.4, ...)
/starting org.apache.spark.deploy.
Hi there,
I have some traces from my master and some workers where for some reason,
the ./work directory of an application can not be created on the workers.
There is also an issue with the master's temp directory creation.
master logs: http://pastebin.com/v3NCzm0u
worker's logs: http://pastebin.
Is it possible to recreate the same views given in the webui for completed
applications, when rebooting the master, thanks to the log files? I just
tried to change the url of the form
http://w.x.y.z:8080/history/app-2-0036, by giving the appID, but it
redirected me on the master's homepage.
You can see the amount of memory consumed by each executor in the web ui (go
to the application page, and click on the executor tab).
Otherwise, for a finer grained monitoring, I can only think of correlating a
system monitoring tool like Ganglia, with the event timeline of your job.
--
View th
Basically, here's a dump of the SO question I opened
(http://stackoverflow.com/questions/31033724/spark-1-4-0-java-lang-nosuchmethoderror-com-google-common-base-stopwatch-elapse)
I'm using spark 1.4.0 and when running the Scala SparkPageRank example
(*examples/src/main/scala/org/apache/spark/examp
I'm wondering if there is a real benefit for splitting my memory in two for
the datanode/workers.
Datanodes and OS needs memory to perform their business. I suppose there
could be loss of performance if they came to compete for memory with the
worker(s).
Any opinion? :-)
--
View this message i
You can specify the jars of your application to be included with spark-submit
with the /--jars/ switch.
Otherwise, are you sure that your newly compiled spark jar assembly is in
assembly/target/scala-2.10/?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Su
Also, still for 1), in conf/spark-defaults.sh, you can give the following
arguments to tune the Driver's resources:
spark.driver.cores
spark.driver.memory
Not sure if you can pass them at submit time, but it should be possible.
--
View this message in context:
http://apache-spark-user-list.10
For 1)
In standalone mode, you can increase the worker's resource allocation in
their local conf/spark-env.sh with the following variables:
SPARK_WORKER_CORES,
SPARK_WORKER_MEMORY
At application submit time, you can tune the number of resource allocated to
executors with /--executor-cores/ and /
Actually this is somehow confusing for two reasons:
- First, the option 'spark.executor.instances', which seems to be only dealt
with in the case of YARN in the source code of SparkSubmit.scala, is also
present in the conf/spark-env.sh file under the standalone section, which
would indicate that i
Note that this property is only available for YARN
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23256.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
You should try, from the SparkConf object, to issue a get.
I don't have the exact name for the matching key, but from reading the code
in SparkSubmit.scala, it should be something like:
conf.get("spark.executor.instances")
--
View this message in context:
http://apache-spark-user-list.1001560
If I read the code correctly, in RDD.scala, each rdd keeps track of it's own
dependencies, (from Dependency.scala), and has methods to access to it's
/ancestors/ dependencies, thus being able to recompute the lineage (see
getNarrowAncestors() or getDependencies() in some rdd like UnionRDD).
So it
24 matches
Mail list logo