from:"maxdml"

Spark and ActorSystem

2015-08-18 Thread maxdml

Hi, I'd like to know where I could find more information related to the depreciation of the actor system in spark (from 1.4.x). I'm interested in the reasons for this decision, Cheers -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-and-ActorSystem-t

Re: scheduler delay time

2015-08-04 Thread maxdml

You'd need to provide information such as executor configuration (#cores, memory size). You might have less scheduler delay with smaller, but more numerous executors, than the contrary. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/scheduler-delay-time-tp6

Re: How to make my spark implementation parallel?

2015-07-13 Thread maxdml

If you want to exploit properly the 8 nodes of your cluster, you should use ~ 2 times that number for partitioning. You can specify the number of partitions when calling parallelize, as following: JavaRDD pnts = sc.parallelize(points, 16); -- View this message in context: http://apache-spar

Re: How to make my spark implementation parallel?

2015-07-13 Thread maxdml

can you please share your application code? I suspect that you're not making a good use of the cluster by configuring a wrong number of partitions in your RDDs. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-make-my-spark-implementation-parallel-tp2

HDFS performances + unexpected death of executors.

2015-07-13 Thread maxdml

Hi, I have several issues related to HDFS, that may have different roots. I'm posting as much information as I can, with the hope that I can get your opinion on at least some of them. Basically the cases are: - HDFS classes not found - Connections with some datanode seems to be slow/ unexpectedly

Re: Is it possible to change the default port number 7077 for spark?

2015-07-12 Thread maxdml

Q1: You can change the port number on the master in the file conf/spark-defaults.conf. I don't know what will be the impact on a cloudera distro thought. Q2: Yes: a Spark worker needs to be present on each node which you want to make available to the driver. Q3: You can submit an application from

Re: Issues when combining Spark and a third party java library

2015-07-10 Thread maxdml

Also, it's worth noting that I'm using the prebuilt version for hadoop 2.4 and higher from the official website. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Issues-when-combining-Spark-and-a-third-party-java-library-tp21367p23770.html Sent from the Apach

Re: Issues when combining Spark and a third party java library

2015-07-10 Thread maxdml

I'm using hadoop 2.5.2 with spark 1.4.0 and I can also see in my logs: 15/07/09 06:39:02 DEBUG HadoopRDD: SplitLocationInfo and other new Hadoop classes are unavailable. Using the older Hadoop location info code. java.lang.ClassNotFoundException: org.apache.hadoop.mapred.InputSplitWithLocationInfo

Re: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down

2015-07-08 Thread maxdml

Same feedback with spark 1.4.0 and hadoop 2.5.2. Workload is completing tho. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/akka-remote-transport-Transport-InvalidAssociationException-The-remote-system-terminated-the-associan-tp20071p23713.html Sent from t

Re: Spark standalone cluster - Output file stored in temporary directory in worker

2015-07-07 Thread maxdml

I think the properties that you have in your hdfs-site.xml should go in the core-site.xml (at least for the namenode.name and datanote.data ones). I might be wrong here, but that's what I have in my setup. you should also add hadoop.tmp.dir in your core-site.xml. That might be the source of your i

Re: Spark standalone cluster - Output file stored in temporary directory in worker

2015-07-06 Thread maxdml

Can you share your hadoop configuration file please? - etc/hadoop/core-site.xml - etc/hadoop/hdfs-site.xml - etc/hadoop/hadoo-env.sh AFAIK, the following properties should be configured: hadoop.tmp.dir, dfs.namenode.name.dir, dfs.datanode.data.dir and dfs.namenode.checkpoint.dir Otherwise, an H

Master doesn't start, no logs

2015-07-06 Thread maxdml

Hi, I've been compiling spark 1.4.0 with SBT, from the source tarball available on the official website. I cannot run spark's master, even tho I have built and run several other instance of spark on the same machine (spark 1.3, master branch, pre built 1.4, ...) /starting org.apache.spark.deploy.

Directory creation failed leads to job fail (should it?)

2015-06-29 Thread maxdml

Hi there, I have some traces from my master and some workers where for some reason, the ./work directory of an application can not be created on the workers. There is also an issue with the master's temp directory creation. master logs: http://pastebin.com/v3NCzm0u worker's logs: http://pastebin.

Vision old applications in webui with json logs

2015-06-25 Thread maxdml

Is it possible to recreate the same views given in the webui for completed applications, when rebooting the master, thanks to the log files? I just tried to change the url of the form http://w.x.y.z:8080/history/app-2-0036, by giving the appID, but it redirected me on the master's homepage.

Re: How to get the memory usage infomation of a spark application

2015-06-25 Thread maxdml

You can see the amount of memory consumed by each executor in the web ui (go to the application page, and click on the executor tab). Otherwise, for a finer grained monitoring, I can only think of correlating a system monitoring tool like Ganglia, with the event timeline of your job. -- View th

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedMillis()J

2015-06-24 Thread maxdml

Basically, here's a dump of the SO question I opened (http://stackoverflow.com/questions/31033724/spark-1-4-0-java-lang-nosuchmethoderror-com-google-common-base-stopwatch-elapse) I'm using spark 1.4.0 and when running the Scala SparkPageRank example (*examples/src/main/scala/org/apache/spark/examp

Should I keep memory dedicated for HDFS and Spark on cluster nodes?

2015-06-23 Thread maxdml

I'm wondering if there is a real benefit for splitting my memory in two for the datanode/workers. Datanodes and OS needs memory to perform their business. I suppose there could be loss of performance if they came to compete for memory with the worker(s). Any opinion? :-) -- View this message i

Re: Submitting Spark Applications using Spark Submit

2015-06-18 Thread maxdml

You can specify the jars of your application to be included with spark-submit with the /--jars/ switch. Otherwise, are you sure that your newly compiled spark jar assembly is in assembly/target/scala-2.10/? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Su

Re: Can we increase the space of spark standalone cluster

2015-06-17 Thread maxdml

Also, still for 1), in conf/spark-defaults.sh, you can give the following arguments to tune the Driver's resources: spark.driver.cores spark.driver.memory Not sure if you can pass them at submit time, but it should be possible. -- View this message in context: http://apache-spark-user-list.10

Re: Can we increase the space of spark standalone cluster

2015-06-17 Thread maxdml

For 1) In standalone mode, you can increase the worker's resource allocation in their local conf/spark-env.sh with the following variables: SPARK_WORKER_CORES, SPARK_WORKER_MEMORY At application submit time, you can tune the number of resource allocated to executors with /--executor-cores/ and /

Re: Determining number of executors within RDD

2015-06-10 Thread maxdml

Actually this is somehow confusing for two reasons: - First, the option 'spark.executor.instances', which seems to be only dealt with in the case of YARN in the source code of SparkSubmit.scala, is also present in the conf/spark-env.sh file under the standalone section, which would indicate that i

Re: Determining number of executors within RDD

2015-06-10 Thread maxdml

Note that this property is only available for YARN -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23256.html Sent from the Apache Spark User List mailing list archive at Nabble.com. ---

Re: Determining number of executors within RDD

2015-06-09 Thread maxdml

You should try, from the SparkConf object, to issue a get. I don't have the exact name for the matching key, but from reading the code in SparkSubmit.scala, it should be something like: conf.get("spark.executor.instances") -- View this message in context: http://apache-spark-user-list.1001560

Re: How does lineage get passed down in RDDs

2015-06-08 Thread maxdml

If I read the code correctly, in RDD.scala, each rdd keeps track of it's own dependencies, (from Dependency.scala), and has methods to access to it's /ancestors/ dependencies, thus being able to recompute the lineage (see getNarrowAncestors() or getDependencies() in some rdd like UnionRDD). So it

Spark and ActorSystem

Re: scheduler delay time

Re: How to make my spark implementation parallel?

Re: How to make my spark implementation parallel?

HDFS performances + unexpected death of executors.

Re: Is it possible to change the default port number 7077 for spark?

Re: Issues when combining Spark and a third party java library

Re: Issues when combining Spark and a third party java library

Re: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down

Re: Spark standalone cluster - Output file stored in temporary directory in worker

Re: Spark standalone cluster - Output file stored in temporary directory in worker

Master doesn't start, no logs

Directory creation failed leads to job fail (should it?)

Vision old applications in webui with json logs

Re: How to get the memory usage infomation of a spark application

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedMillis()J

Should I keep memory dedicated for HDFS and Spark on cluster nodes?

Re: Submitting Spark Applications using Spark Submit

Re: Can we increase the space of spark standalone cluster

Re: Can we increase the space of spark standalone cluster

Re: Determining number of executors within RDD

Re: Determining number of executors within RDD

Re: Determining number of executors within RDD

Re: How does lineage get passed down in RDDs

24 matches

Site Navigation

Mail list logo

Footer information