Re: the spark configuage

2014-05-04 Thread Sophia
It maybe caused by this.I use the CDH4 version and I will try to configure the HADOOP_HOME.Thank you. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/the-spark-configuage-tp5098p5299.html Sent from the Apache Spark User List mailing list archive at Nabble.co

different in spark on yarn mode and standalone mode

2014-05-04 Thread Sophia
Hey you guys, What is the different in spark on yarn mode and standalone mode about resource schedule? Wish you happy everyday. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/different-in-spark-on-yarn-mode-and-standalone-mode-tp5300.html Sent from the Apac

Re: Crazy Kryo Exception

2014-05-04 Thread Soren Macbeth
Does this perhaps have to do with the spark.closure.serializer? On Sat, May 3, 2014 at 7:50 AM, Soren Macbeth wrote: > Poking around in the bowels of scala, it seems like this has something to > do with implicit scala -> java collection munging. Why would it be doing > this and where? The stack

using kryo for spark.closure.serializer with a registrator doesn't work

2014-05-04 Thread Soren Macbeth
Is this supposed to be supported? It doesn't work, at least in mesos fine grained mode. First it fails a bunch of times because it can't find my registrator class because my assembly jar hasn't been fetched like so: java.lang.ClassNotFoundException: pickles.kryo.PicklesRegistrator at java.

Re: cache not work as expected for iteration?

2014-05-04 Thread Andrea Esposito
Maybe your memory isn't enough to contain the current RDD and also all the past ones? RDDs that are cached or persisted have to be unpersisted explicitly, no auto-unpersist (maybe changes will be for 1.0 version?) exists. Be careful that calling cache() or persist() doesn't imply the RDD will be ma

sbt run with spark.ContextCleaner ERROR

2014-05-04 Thread wxhsdp
Hi, all i use sbt to run my spark application, after the app completes, error occurs: 14/05/04 17:32:28 INFO network.ConnectionManager: Selector thread was interrupted! 14/05/04 17:32:28 ERROR spark.ContextCleaner: Error in cleaning thread java.lang.InterruptedException at java.lang.Objec

Re: sbt run with spark.ContextCleaner ERROR

2014-05-04 Thread Tathagata Das
Can you tell which version of Spark you are using? Spark 1.0 RC3, or something intermediate? And do you call sparkContext.stop at the end of your application? If so, does this error occur before or after the stop()? TD On Sun, May 4, 2014 at 2:40 AM, wxhsdp wrote: > Hi, all > > i use sbt to ru

Re: "sbt/sbt run" command returns a JVM problem

2014-05-04 Thread Carter
Hi Michael, The log after I typed "last" is as below: > last scala.tools.nsc.MissingRequirementError: object scala not found. at scala.tools.nsc.symtab.Definitions$definitions$.getModuleOrClass(Definitions.scala:655) at scala.tools.nsc.symtab.Definitions$definitions$.getModule(Defi

Re: SparkException: env SPARK_YARN_APP_JAR is not set

2014-05-04 Thread phoenix bai
according to the code, SPARK_YARN_APP_JAR is retrieved from system variables. and the key-value pairs you pass through to JavaSparkContext is isolated from system variables. so, you maybe should try setting it through System.setProperty(). thanks On Wed, Apr 23, 2014 at 6:05 PM, 肥肥 <19934...@qq.

Re: cache not work as expected for iteration?

2014-05-04 Thread Earthson
thx for the help, unpersist is excatly what I want:) I see that spark will remove some cache automatically when memory is full, it is much more helpful if the rule satisfy something like LRU It seems that persist and cache is some kind of lazy? -- View this message in context: http://

Re: sbt run with spark.ContextCleaner ERROR

2014-05-04 Thread wxhsdp
Hi, TD actually, i'am not very clear with my spark version. i check out from https://github.com/apache/spark/trunk on Apr 30. please tell me from where do you get the version Spark 1.0 RC3 i do not call sparkContext.stop. now i add it to the end of my code here's the log 14/05/04 18:48:21 INFO

NoSuchMethodError: breeze.linalg.DenseMatrix

2014-05-04 Thread wxhsdp
Hi, i'am trying to use breeze linalg library for matrix operation in my spark code. i already add dependency on breeze in my build.sbt, and package my code sucessfully. when i run on local mode, sbt "run local...", everything is ok but when turn to standalone mode, sbt "run spark://127.0.

unsubscribe

2014-05-04 Thread Nabeel Memon
unsubscribe

Re: Lease Exception hadoop 2.4

2014-05-04 Thread Andre Kuhnen
Please, can anyone give a feedback? thanks Hello, I am getting this warning after upgrading Hadoop 2.4, when I try to write something to the HDFS. The content is written correctly, but I do not like this warning. DO I have to compile SPARK with hadoop 2.4? WARN TaskSetManager: Loss was due to

Re: Lease Exception hadoop 2.4

2014-05-04 Thread Mayur Rustagi
You should compile Spark with every hadoop version you use. I am surprised its working otherwise as HDFS breaks compatibility quite often. As for this error it comes when your code writes/reads from file that has already deleted. Are you trying to update a single file in multiple mappers/reduce par

Re: Lease Exception hadoop 2.4

2014-05-04 Thread Andre Kuhnen
Thanks Mayur, the only think that my code is doing is: read from s3, and saveAsTextFile on hdfs. Like I said, everything is written correctly, but at the end of the job there is this warnning, I will try to compile with hadoop 2.4 thanks 2014-05-04 11:17 GMT-03:00 Mayur Rustagi : > You

works disconnected with master but still keep alive

2014-05-04 Thread Cheney Sun
Hi experts, I set up an Spark cluster in the standalone mode with 10 workers and the version is 0.9.1. I chose the version with the assumption that the latest version is always the most stable one. However, when I unintentionally run an problematic job (such as config the SPARK_HOME with a wrong p

Re: cache not work as expected for iteration?

2014-05-04 Thread Nicholas Chammas
Yes, persist/cache will cache an RDD only when an action is applied to it. On Sun, May 4, 2014 at 6:32 AM, Earthson wrote: > thx for the help, unpersist is excatly what I want:) > > I see that spark will remove some cache automatically when memory is full, > it is much more helpful if the rule

Re: Reading multiple S3 objects, transforming, writing back one

2014-05-04 Thread Nicholas Chammas
Chris, To use s3distcp in this case, are you suggesting saving the RDD to local/ephemeral HDFS and then copying it up to S3 using this tool? On Sat, May 3, 2014 at 7:14 PM, Chris Fregly wrote: > not sure if this directly addresses your issue, peter, but it's worth > mentioned a handy AWS EMR u

Re: Reading multiple S3 objects, transforming, writing back one

2014-05-04 Thread Peter
Thank you Chris, I am familiar with S3distcp, I'm trying to replicate some of that functionality and combine it with my log post processing in one step instead of yet another step.  On Saturday, May 3, 2014 4:15 PM, Chris Fregly wrote: not sure if this directly addresses your issue, peter, but

Re: Reading multiple S3 objects, transforming, writing back one

2014-05-04 Thread Peter
Hi Patrick I should probably explain my use case in a bit more detail. I have hundreds of thousands to millions of clients uploading events to my pipeline, these are batched periodically (every 60 seconds atm) into logs which are dumped into S3 (and uploaded into a data warehouse). I need to po

Re: NoSuchMethodError: breeze.linalg.DenseMatrix

2014-05-04 Thread DB Tsai
If you add the breeze dependency in your build.sbt project, it will not be available to all the workers. There are couple options, 1) use sbt assembly to package breeze into your application jar. 2) manually copy breeze jar into all the nodes, and have them in the classpath. 3) spark 1.0 has breez

Re: NoSuchMethodError: breeze.linalg.DenseMatrix

2014-05-04 Thread Yadid Ayzenberg
An additional option 4) Use SparkContext.addJar() and have the application ship your jar to all the nodes. Yadid On 5/4/14, 4:07 PM, DB Tsai wrote: If you add the breeze dependency in your build.sbt project, it will not be available to all the workers. There are couple options, 1) use sbt a

spark ec2 error

2014-05-04 Thread Jeremy Freeman
Hi all, A heads up in case others hit this and are confused… This nice addition causes an error if running the spark-ec2.py deploy script from a version other than master (e.g. 0.8.0). The error occurs during launch, here: ... Creating local config

Initial job has not accepted any resources

2014-05-04 Thread pedro
I have been working on a Spark program, completed it, but have spent the past few hours trying to run on EC2 without any luck. I am hoping i can comprehensively describe my problem and what I have done, but I am pretty stuck. My code uses the following lines to configure the SparkContext, which ar

spark streaming question

2014-05-04 Thread Weide Zhang
Hi , It might be a very general question to ask here but I'm curious to know why spark streaming can achieve better throughput than storm as claimed in the spark streaming paper. Does it depend on certain use cases and/or data source ? What drives better performance in spark streaming case or in o

Re: compile spark 0.9.1 in hadoop 2.2 above exception

2014-05-04 Thread arsingh
Life saver tip. Worked like a charm (was getting frustrated). sbt/sbt clean Did the trick. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/compile-spark-0-9-1-in-hadoop-2-2-above-exception-tp4795p5325.html Sent from the Apache Spark User List mailing list

Re: spark ec2 error

2014-05-04 Thread Patrick Wendell
Hey Jeremy, This is actually a big problem - thanks for reporting it, I'm going to revert this change until we can make sure it is backwards compatible. - Patrick On Sun, May 4, 2014 at 2:00 PM, Jeremy Freeman wrote: > Hi all, > > A heads up in case others hit this and are confused... This nic

Re: spark ec2 error

2014-05-04 Thread Patrick Wendell
Okay I just went ahead and fixed this to make it backwards-compatible (was a simple fix). I launched a cluster successfully with Spark 0.8.1. Jeremy - if you could try again and let me know if there are any issues, that would be great. Thanks again for reporting this. On Sun, May 4, 2014 at 3:41

Re: spark streaming question

2014-05-04 Thread Chris Fregly
great questions, weide. in addition, i'd also like to hear more about how to horizontally scale a spark-streaming cluster. i've gone through the samples (standalone mode) and read the documentation, but it's still not clear to me how to scale this puppy out under high load. i assume i add more r

Re: spark ec2 error

2014-05-04 Thread Jeremy Freeman
Cool, glad to help! I just tested with 0.8.1 and 0.9.0 and both worked perfectly, so seems to all be good. -- Jeremy -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-ec2-error-tp5323p5329.html Sent from the Apache Spark User List mailing list archive a

Re: Lease Exception hadoop 2.4

2014-05-04 Thread Andre Kuhnen
I compiled spark with SPARK_HADOOP_VERSION=2.4.0 sbt/sbt assembly, fixed the s3 dependencies, but I am still getting the same error... 14/05/05 00:32:33 WARN TaskSetManager: Loss was due to org.apache.hadoop.ipc.RemoteException org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.na

unsubscribe

2014-05-04 Thread ZHANG Jun
原始邮件 主题:unsubscribe 发件人:Nabeel Memon 收件人:user@spark.apache.org 抄送: unsubscribe

Error starting EC2 cluster

2014-05-04 Thread Aliaksei Litouka
I am using Spark 0.9.1. When I'm trying to start a EC2 cluster with the spark-ec2 script, an error occurs and the following message is issued: AttributeError: 'module' object has no attribute 'check_output'. By this time, EC2 instances are up and running but Spark doesn't seem to be installed on th

RE: different in spark on yarn mode and standalone mode

2014-05-04 Thread Liu, Raymond
In the core, they are not quite different In standalone mode, you have spark master and spark worker who allocate driver and executors for your spark app. While in Yarn mode, Yarn resource manager and node manager do this work. When the driver and executors have been launched, the rest part of res

Re: Initial job has not accepted any resources

2014-05-04 Thread Jeremy Freeman
Hey Pedro, >From which version of Spark were you running the spark-ec2.py script? You might have run into the problem described here (http://apache-spark-user-list.1001560.n3.nabble.com/spark-ec2-error-td5323.html), which Patrick just fixed up to ensure backwards compatibility. With the bug, it w

Re: Initial job has not accepted any resources

2014-05-04 Thread pedro
Hi Jeremy, I am running from the most recent release, 0.9. I just fixed the problem, and it is indeed correct setting of variables in deployment. Once I had the cluster I wanted running, I began to suspect that master was not responding. So I killed a worker, then recreated it, and found it cou

Re: KafkaInputDStream mapping of partitions to tasks

2014-05-04 Thread Aries
Hi, Has anyone work on this? Best Aries 在 2014年3月30日,3:22,Nicolas Bär 写道: > Hi > > Is there any workaround to this problem? > > I'm trying to implement a KafkaReceiver using the SimpleConsumer API [1] of > Kafka and handle the partition assignment manually. The easiest setup in this

Re: works disconnected with master but still keep alive

2014-05-04 Thread Cheney Sun
No reply, maybe I didn't make it clear. I try to add more information. When the worker node attempts to launch a problematic executor, not only the executor fails to launch but also the worker is removed by master. The worker will try to re-register with master but rejected. In the master log, the

Re: Lease Exception hadoop 2.4

2014-05-04 Thread Andre Kuhnen
I think I forgot to rsync the slaves with the new compiled jar, I will give it a try as soon as possible, Em 04/05/2014 21:35, "Andre Kuhnen" escreveu: > I compiled spark with SPARK_HADOOP_VERSION=2.4.0 sbt/sbt assembly, fixed > the s3 dependencies, but I am still getting the same error... > 14

Re: "sbt/sbt run" command returns a JVM problem

2014-05-04 Thread phoenix bai
the total memory of your machine is 2G right? then how much memory is left free? wouldn`t ubuntu take up quite a big portion of 2G? just a guess! On Sat, May 3, 2014 at 8:15 PM, Carter wrote: > Hi, thanks for all your help. > I tried your setting in the sbt file, but the problem is still there

Re: master attempted to re-register the worker and then took all workers as unregistered

2014-05-04 Thread Cheney Sun
Hi Nan, Have you found a way to fix the issue? Now I run into the same problem with version 0.9.1. Thanks, Cheney -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/master-attempted-to-re-register-the-worker-and-then-took-all-workers-as-unregistered-tp553p53

Re: ClassNotFoundException

2014-05-04 Thread pedro
I just ran into the same problem. I will respond if I find how to fix. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-tp5182p5342.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Initial job has not accepted any resources

2014-05-04 Thread pedro
Since it appears breeze is going to be included by default in Spark in 1.0, and I ran into the issue here: http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-td5182.html And it seems like the issues I had were recently introduced, I am cloning spark and checking out the 1.0

what`s the meaning of primitive in "gradient descent primitive"?

2014-05-04 Thread phoenix bai
Hi all, I am reading the doc of spark ( http://spark.apache.org/docs/0.9.0/mllib-guide.html#gradient-descent-primitive). I am trying to translate the doc into Chinese, and there it talks about gradient descent primitive, and but i am not quite sure what it mean by primitive? I know gradient desce

Re: spark 0.9.1: ClassNotFoundException

2014-05-04 Thread phoenix bai
check if the jar file that includes your example code is under examples/target/scala-2.10/. On Sat, May 3, 2014 at 5:58 AM, SK wrote: > I am using Spark 0.9.1 in standalone mode. In the > SPARK_HOME/examples/src/main/scala/org/apache/spark/ folder, I created my > directory called "mycode" in w

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-04 Thread Andrew Lee
Hi Jacob, Taking both concerns into account, I'm actually thinking about using a separate subnet to isolate the Spark Workers, but need to look into how to bind the process onto the correct interface first. This may require some code change.Separate subnet doesn't limit itself with port range so

Re: pySpark memory usage

2014-05-04 Thread Aaron Davidson
I'd just like to update this thread by pointing to the PR based on our initial design: https://github.com/apache/spark/pull/640 This solution is a little more general and avoids catching IOException altogether. Long live exception propagation! On Mon, Apr 28, 2014 at 1:28 PM, Patrick Wendell wr

Cache issue for iteration with broadcast

2014-05-04 Thread Earthson
A new broadcast object will generated for every iteration step, it may eat up the memory and make persist fail. The broadcast object should not be removed because RDD may be recomputed. And I am trying to prevent recomputing RDD, it need old broadcast release some memory. I've tried to set "spar

Re: Cache issue for iteration with broadcast

2014-05-04 Thread Earthson
Code Here Finally, iteration still runs into recomputing... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cache-issue-for-iteration-with-broadcast-tp5350

Is any idea on architecture based on Spark + Spray + Akka

2014-05-04 Thread ZhangYi
Hi all, Currently, our project is planning to adopt spark to be big data platform. For the client side, we decide expose REST api based on Spray. Our domain is focus on communication field for 3G and 4G user of processing some data analyst and statictics . Now, Spark + Spray is brand new for us

Re: Is any idea on architecture based on Spark + Spray + Akka

2014-05-04 Thread 诺铁
hello,ZhangYi I find ooyala's opensourced spark-jobserver, https://github.com/ooyala/spark-jobserver seems that they are also using akka and spray and spark, maybe helpful for you. On Mon, May 5, 2014 at 11:37 AM, ZhangYi wrote: > Hi all, > > Currently, our project is planning to adopt spar

Re: Cache issue for iteration with broadcast

2014-05-04 Thread Earthson
I tried using serialization instead of broadcast, and my program exit with Error(beyond physical memory limits). The large object can not be released by GC? because it is needed for recomputing? So what is the recomended way to solve this problem? -- View this message in context: http://apache

Re: NoSuchMethodError: breeze.linalg.DenseMatrix

2014-05-04 Thread wxhsdp
Hi, DB, i think it's something related to "sbt publishLocal" if i remove the breeze dependency in my sbt file, breeze can not be found [error] /home/wxhsdp/spark/example/test/src/main/scala/test.scala:5: not found: object breeze [error] import breeze.linalg._ [error]^ here's my sbt file:

spark streaming kafka output

2014-05-04 Thread Weide Zhang
Hi , Is there any code to implement a kafka output for spark streaming? My use case is all the output need to be dumped back to kafka cluster again after data is processed ? What will be guideline to implement such function ? I heard foreachRDD will create one instance of producer per batch ? If

Re: NoSuchMethodError: breeze.linalg.DenseMatrix

2014-05-04 Thread DB Tsai
Since the breeze jar is brought into spark by mllib package, you may want to add mllib as your dependency in spark 1.0. For bring it from your application yourself, you can either use sbt assembly in ur build project to generate a flat myApp-assembly.jar which contains breeze jar, or use spark add