Re: Understanding Spark Memory distribution
Hi Ankur If you using standalone mode, your config is wrong. You should use "export SPARK_DAEMON_MEMORY=xxx " in config/spark-env.sh. At least it works on my spark 1.3.0 standalone mode machine. BTW, The SPARK_DRIVER_MEMORY is used in Yarn mode and looks like the standalone mode don't use this config. To debug this, please type "ps auxw | grep org.apache.spark.deploy.master.[M]aster" in master machine. You can see the Xmx and Xms option. Wisely Chen On Mon, Mar 30, 2015 at 3:55 AM, Ankur Srivastava < ankur.srivast...@gmail.com> wrote: > Hi Wisely, > > I am running on Amazon EC2 instances so I can not doubt the hardware. > Moreover my other pipelines run successfully except for this which involves > Broadcasting large object. > > My spark-en.sh setting are: > > SPARK_MASTER_IP= > > SPARK_LOCAL_IP= > > SPARK_DRIVER_MEMORY=24g > > SPARK_WORKER_MEMORY=28g > > SPARK_EXECUTOR_MEMORY=26g > > SPARK_WORKER_CORES=8 > > My spark-default.sh settings are: > > spark.eventLog.enabled true > > spark.eventLog.dir /srv/logs/ > > spark.serializer org.apache.spark.serializer.KryoSerializer > > spark.kryo.registrator > com.test.utils.KryoSerializationRegistrator > > spark.executor.extraJavaOptions "-verbose:gc -XX:+PrintGCDetails > -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/srv/logs/ -XX:+UseG1GC" > > spark.shuffle.consolidateFiles true > > spark.shuffle.managersort > > spark.shuffle.compress true > > spark.rdd.compress true > Thanks > Ankur > > On Sat, Mar 28, 2015 at 7:57 AM, Wisely Chen > wrote: > >> Hi Ankur >> >> If your hardware is ok, looks like it is config problem. Can you show me >> the config of spark-env.sh or JVM config? >> >> Thanks >> >> Wisely Chen >> >> 2015-03-28 15:39 GMT+08:00 Ankur Srivastava : >> >>> Hi Wisely, >>> I have 26gb for driver and the master is running on m3.2xlarge machines. >>> >>> I see OOM errors on workers and even they are running with 26th of >>> memory. >>> >>> Thanks >>> >>> On Fri, Mar 27, 2015, 11:43 PM Wisely Chen >>> wrote: >>> Hi In broadcast, spark will collect the whole 3gb object into master node and broadcast to each slaves. It is very common situation that the master node don't have enough memory . What is your master node settings? Wisely Chen Ankur Srivastava 於 2015年3月28日 星期六寫道: I have increased the "spark.storage.memoryFraction" to 0.4 but I still > get OOM errors on Spark Executor nodes > > > 15/03/27 23:19:51 INFO BlockManagerMaster: Updated info of block > broadcast_5_piece10 > > 15/03/27 23:19:51 INFO TorrentBroadcast: Reading broadcast variable 5 > took 2704 ms > > 15/03/27 23:19:52 INFO MemoryStore: ensureFreeSpace(672530208) called > with curMem=2484698683, maxMem=9631778734 > > 15/03/27 23:19:52 INFO MemoryStore: Block broadcast_5 stored as values > in memory (estimated size 641.4 MB, free 6.0 GB) > > 15/03/27 23:34:02 WARN AkkaUtils: Error sending message in 1 attempts > > java.util.concurrent.TimeoutException: Futures timed out after [30 > seconds] > > at > scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) > > at > scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) > > at > scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) > > at > scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > > at scala.concurrent.Await$.result(package.scala:107) > > at > org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187) > > at > org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:407) > > 15/03/27 23:34:02 ERROR Executor: Exception in task 7.0 in stage 2.0 > (TID 4007) > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1986) > > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > > Thanks > > Ankur > > On Fri, Mar 27, 2015 at 2:52 PM, Ankur Srivastava < > ankur.srivast...@gmail.com> wrote: > >> Hi All, >> >> I am running a spark cluster on EC2 instances of type: m3.2xlarge. I >> have given 26gb of memory with all 8 cores to my executors. I can see >> that >> in the logs too: >> >> *15/03/27 21:31:06 INFO AppClient$ClientActor: Executor added: >> app-20150327213106-/0 on worker-20150327212934-10.x.y.z-40128 >> (10.x.y.z:40128) with 8 cores* >> >> I am not caching any RDD so I have set "spark.storage.memoryFraction" >> to 0.2.
Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1
Hi I am quite beginner in spark and I have similar issue last week. I don't know if my issue is the same as yours. I found that my program's jar contain protobuf and when I remove this dependency on my program's pom.xml, rebuild my program and it works. Here is how I solved my own issue. Environment: Spark 0.9, HDFS (Hadoop 2.3), Scala 2.10. My spark is hadoop 2 HDP2 prebuild version from http://spark.apache.org/downloads.html. I don't build spark by my own. Problem : I use spark 0.9 example folder's word count program to connect my hdfs file which is build on hadoop 2.3. The running command is "./bin/run-example org.apache.spark.examples.WordCount" It show "Caused by: java.lang.VerifyError". I survey a lot on web but cannot get any workable solution. How I Solve my issue I found that if I use spark 0.9's spark-shell and it can connect hdfs file without this problem. But if I use run-example command, it show java.lang.VerifyError. I think the main reason is these two command(spark-shell and run-example)'s classpath is different. Run-Example's classpath is $SPARK_HOME /examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.0-incubating.jar::$SPARK_HOME/conf:$SPARK_HOME/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar Spark-Home's classpath is :$SPARK_HOME/conf:$SPARK_HOME/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar The class path difference is $SPARK_HOME/examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.0-incubating.jar and it is build by exmaple program. When I look into this jar file, I found that it contain two protobuf which I don't know where it is from. I remove all dependency from my example pom.xml and left only one dependncy "spark-core". I rebuild it and it success. I don't know if my issue is the same as yours. I hope it can help. Wisely Chen On Wed, Mar 26, 2014 at 7:10 AM, Patrick Wendell wrote: > Starting with Spark 0.9 the protobuf dependency we use is shaded and > cannot interfere with other protobuf libaries including those in > Hadoop. Not sure what's going on in this case. Would someone who is > having this problem post exactly how they are building spark? > > - Patrick > > On Fri, Mar 21, 2014 at 3:49 PM, Aureliano Buendia > wrote: > > > > > > > > On Tue, Mar 18, 2014 at 12:56 PM, Ognen Duzlevski > > wrote: > >> > >> > >> On 3/18/14, 4:49 AM, dmpou...@gmail.com wrote: > >>> > >>> On Sunday, 2 March 2014 19:19:49 UTC+2, Aureliano Buendia wrote: > > Is there a reason for spark using the older akka? > > > > > On Sun, Mar 2, 2014 at 1:53 PM, 1esha wrote: > > The problem is in akka remote. It contains files compiled with 2.4.*. > When > > you run it with 2.5.* in classpath it fails like above. > > > > Looks like moving to akka 2.3 will solve this issue. Check this issue > - > > > > https://www.assembla.com/spaces/akka/tickets/3154-use-protobuf-version-2-5-0#/activity/ticket > : > > > Is the solution to exclude the 2.4.*. dependency on protobuf or will > thi produce more complications? > >> > >> I am not sure I remember what the context was around this but I run > 0.9.0 > >> with hadoop 2.2.0 just fine. > > > > > > The problem is that spark depends on an older version of akka, which > depends > > on an older version of protobuf (2.4). > > > > This means people cannot use protobuf 2.5 with spark. > > > >> > >> Ognen > > > > >
Re: Distributed running in Spark Interactive shell
This response is for Sai The easiest way to verify your current Spark-Shell setting is just type "sc.master" IF your setting is correct, it should return scala> sc.master res0: String = spark://master.ip.url.com:5050 If your SPARK_MASTER_IP is not correct setting, it will response scala> sc.master res0: String = local That means your spark-shell is running on local mode. You can also check on Spark master's web ui. You should have a Spark-Shell program running on master's application list. Wisely Chen On Wed, Mar 26, 2014 at 10:12 PM, Nan Zhu wrote: > and, yes, I think that picture is a bit misleading, though in the > following paragraph it has mentioned that > > " > Because the driver *schedules* tasks on the cluster, it should be run > close to the worker nodes, preferably on the same local area network. If > you'd like to send requests to the cluster remotely, it's better to open an > RPC to the driver and have it submit operations from nearby than to run a > driver far away from the worker nodes. > " > > -- > Nan Zhu > > On Wednesday, March 26, 2014 at 9:59 AM, Nan Zhu wrote: > > master does more work than that actually, I just explained why he should > set MASTER_IP correctly > > a simplified list: > > 1. maintain the worker status > > 2. maintain in-cluster driver status > > 3. maintain executor status (the worker tells master what happened on the > executor, > > > > -- > Nan Zhu > > > On Wednesday, March 26, 2014 at 9:46 AM, Yana Kadiyska wrote: > > Nan (or anyone who feels they understand the cluster architecture well), > can you clarify something for me. > > From reading this user group and your explanation above it appears that > the cluster master is only involved in this during application startup -- > to allocate executors(from what you wrote sounds like the driver itself > passes the job/tasks to the executors). From there onwards all computation > is done on the executors, who communicate results directly to the driver if > certain actions (say collect) are performed. Is that right? The only > description of the cluster I've seen came from here: > https://spark.apache.org/docs/0.9.0/cluster-overview.html but that > picture suggests there is no direct communication between driver and > executors, which I believe is wrong (unless I am misreading the picture -- > I believe Master and "Cluster Manager" refer to the same thing?). > > The very short form of my question is, does the master do anything other > than executor allocation? > > > On Wed, Mar 26, 2014 at 9:23 AM, Nan Zhu wrote: > > what you only need to do is ensure your spark cluster is running well, > (you can check by access the Spark UI to see if all workers are displayed) > > then, you have to set correct SPARK_MASTER_IP in the machine where you run > spark-shell > > The more details are : > > when you run bin/spark-shell, it will start the driver program in that > machine, interacting with the Master to start the application (in this > case, it is spark-shell) > > the Master tells Workers to start executors for your application, and the > executors will try to register with your driver, > > then your driver can distribute tasks to the executors, i.e. run in a > distributed fashion > > > Best, > > -- > Nan Zhu > > On Wednesday, March 26, 2014 at 9:01 AM, Sai Prasanna wrote: > > Nan Zhu, its the later, I want to distribute the tasks to the cluster > [machines available.] > > If i set the SPARK_MASTER_IP at the other machines and set the slaves-IP > in the /conf/slaves at the master node, will the interactive shell code run > at the master get distributed across multiple machines ??? > > > > > > On Wed, Mar 26, 2014 at 6:32 PM, Nan Zhu wrote: > > what do you mean by run across the cluster? > > you want to start the spark-shell across the cluster or you want to > distribute tasks to multiple machines? > > if the former case, yes, as long as you indicate the right master URL > > if the later case, also yes, you can observe the distributed task in the > Spark UI > > -- > Nan Zhu > > On Wednesday, March 26, 2014 at 8:54 AM, Sai Prasanna wrote: > > Is it possible to run across cluster using Spark Interactive Shell ? > > To be more explicit, is the procedure similar to running standalone > master-slave spark. > > I want to execute my code in the interactive shell in the master-node, > and it should run across the cluster [say 5 node]. Is the procedure similar > ??? > > > > > > -- > *Sai Prasanna. AN* > *II M.Tech (CS), SSSIHL* > > > *Entire water in the ocean can never sink a ship, Unless it gets inside. > All the pressures of life can never hurt you, Unless you let them in.* > > > > > > -- > *Sai Prasanna. AN* > *II M.Tech (CS), SSSIHL* > > > *Entire water in the ocean can never sink a ship, Unless it gets inside. > All the pressures of life can never hurt you, Unless you let them in.* > > > > > >
Re: Calling Spark enthusiasts in NYC
Hi Andy We are from Taiwan. We are already planning to have a Spark meetup. We already have some resources like place and food budget. But we do need some other resource. Please contact me offline. Thanks Wisely Chen On Tue, Apr 1, 2014 at 1:28 AM, Andy Konwinski wrote: > Hi folks, > > We have seen a lot of community growth outside of the Bay Area and we are > looking to help spur even more! > > For starters, the organizers of the Spark meetups here in the Bay Area > want to help anybody that is interested in setting up a meetup in a new > city. > > Some amazing Spark champions have stepped forward in Seattle, Vancouver, > Boulder/Denver, and a few other areas already. > > Right now, we are looking to connect with you Spark enthusiasts in NYC > about helping to run an inaugural Spark Meetup in your area. > > You can reply to me directly if you are interested and I can tell you > about all of the resources we have to offer (speakers from the core > community, a budget for food, help scheduling, etc.), and let's make this > happen! > > Andy >
Re: Lost an executor error - Jobs fail
Hi Praveen What is your config about "* spark.local.dir" ? * Is all your worker has this dir and all worker has right permission on this dir? I think this is the reason of your error Wisely Chen On Mon, Apr 14, 2014 at 9:29 PM, Praveen R wrote: > Had below error while running shark queries on 30 node cluster and was not > able to start shark server or run any jobs. > > *14/04/11 19:06:52 ERROR scheduler.TaskSchedulerImpl: Lost an executor 4 > (already removed): Failed to create local directory (bad spark.local.dir?)* > *Full log: *https://gist.github.com/praveenr019/10647049 > > After spending quite some time, found it was due to disk read errors on > one node and had the cluster working after removing the node. > > Wanted to know if there is any configuration (like akkaTimeout) which can > handle this or does mesos help ? > > Shouldn't the worker be marked dead in such scenario, instead of making > the cluster non-usable so the debugging can be done at leisure. > > Thanks, > Praveen R > > >
Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1
Hi Prasad Sorry for missing your reply. https://gist.github.com/thegiive/10791823 Here it is. Wisely Chen On Fri, Apr 4, 2014 at 11:57 PM, Prasad wrote: > Hi Wisely, > Could you please post your pom.xml here. > > Thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Error-reading-HDFS-file-using-spark-0-9-0-hadoop-2-2-0-incompatible-protobuf-2-5-and-2-4-1-tp2158p3770.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
Re: Java RDD structure for Matrix predict?
Hi Sandeep I think you should use testRatings.mapToPair instead of testRatings.map. So the code should be JavaPairRDD usersProducts = training.mapToPair( new PairFunction() { public Tuple2 call(Rating r) throws Exception { return new Tuple2(r.user(), r.product()); } } ); It works on my side. Wisely Chen On Wed, May 28, 2014 at 6:27 AM, Sandeep Parikh wrote: > I've got a trained MatrixFactorizationModel via ALS.train(...) and now I'm > trying to use it to predict some ratings like so: > > JavaRDD predictions = model.predict(usersProducts.rdd()) > > Where usersProducts is built from an existing Ratings dataset like so: > > JavaPairRDD usersProducts = testRatings.map( > new PairFunction() { > public Tuple2 call(Rating r) throws Exception { > return new Tuple2(r.user(), r.product()); > } > } > ); > > The problem is that model.predict(...) doesn't like usersProducts, > claiming that the method doesn't accept an RDD of type Tuple2 however the > docs show the method signature as follows: > > def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating] > > Am I missing something? The JavaRDD is just a list of Tuple2 elements, > which would match the method signature but the compile is complaining. > > Thanks! > >
RE: Announcing Spark 1.0.0
Great work! On May 30, 2014 10:15 PM, "Ian Ferreira" wrote: > Congrats > > Sent from my Windows Phone > -- > From: Dean Wampler > Sent: 5/30/2014 6:53 AM > To: user@spark.apache.org > Subject: Re: Announcing Spark 1.0.0 > > Congratulations!! > > > On Fri, May 30, 2014 at 5:12 AM, Patrick Wendell > wrote: > > I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 > is a milestone release as the first in the 1.0 line of releases, > providing API stability for Spark's core interfaces. > > Spark 1.0.0 is Spark's largest release ever, with contributions from > 117 developers. I'd like to thank everyone involved in this release - > it was truly a community effort with fixes, features, and > optimizations contributed from dozens of organizations. > > This release expands Spark's standard libraries, introducing a new SQL > package (SparkSQL) which lets users integrate SQL queries into > existing Spark workflows. MLlib, Spark's machine learning library, is > expanded with sparse vector support and several new algorithms. The > GraphX and Streaming libraries also introduce new features and > optimizations. Spark's core engine adds support for secured YARN > clusters, a unified tool for submitting Spark applications, and > several performance and stability improvements. Finally, Spark adds > support for Java 8 lambda syntax and improves coverage of the Java and > Python API's. > > Those features only scratch the surface - check out the release notes here: > http://spark.apache.org/releases/spark-release-1-0-0.html > > Note that since release artifacts were posted recently, certain > mirrors may not have working downloads for a few hours. > > - Patrick > > > > > -- > Dean Wampler, Ph.D. > Typesafe > @deanwampler > http://typesafe.com > http://polyglotprogramming.com >