date:20140316

Task Recalculate or toal failure due to fectchError

2014-03-16 Thread guojc

Hi there, In our experiment with spark, we found same spark application has large variance on execution time and sometimes even fail totally. And in the log, we find this usually due to task resubmit from fetch failure, with log as following, 14/03/16 16:40:38 WARN TaskSetManager: Lost TID

Contributing pyspark ports

2014-03-16 Thread Krakna H

Is there any documentation on contributing pyspark ports of additions to Spark? I only see guidelines on Scala contributions ( https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark). Specifically, I'm interested in porting mllib and graphx contributions. -- View this message i

[Powered by] Yandex Islands powered by Spark

2014-03-16 Thread Egor Pahomov

Hi, page https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Sparksays I need write here, if want my project to be added there. In Yandex (www.yandex.com) now we using spark for project Yandex Islands ( http://www.searchenginejournal.com/yandex-islands-markup-issues-implementation/71891/)

Separating classloader management from SparkContexts

2014-03-16 Thread Punya Biswal

Hi all, I'm trying to use Spark to support users who are interactively refining the code that processes their data. As a concrete example, I might create an RDD[String] and then write several versions of a function to map over the RDD until I'm satisfied with the transformation. Right now, once I

Maximum memory limits

2014-03-16 Thread Debasish Das

Hi, I gave my spark job 16 gb of memory and it is running on 8 executors. The job needs more memory due to ALS requirements (20M x 1M matrix) On each node I do have 96 gb of memory and I am using 16 gb out of it. I want to increase the memory but I am not sure what is the right way to do that...

Re: Maximum memory limits

2014-03-16 Thread Sean Owen

Are you using HEAD or 0.9.0? I know there was a memory issue fixed a few weeks ago that made ALS need a lot more memory than is needed. https://github.com/apache/incubator-spark/pull/629 Try the latest code. -- Sean Owen | Director, Data Science | London On Sun, Mar 16, 2014 at 11:40 AM, Debas

Re: Maximum memory limits

2014-03-16 Thread Debasish Das

Thanks Sean...let me get the latest code..do you know which PR was it ? But will the executors run fine with say 32 gb or 64 gb of memory ? Does not JVM shows up issues when the max memory goes beyond certain limit... Also the failure is due to GC limits from jblas...and I was thinking that jblas

How to kill a spark app ?

2014-03-16 Thread Debasish Das

Are these the right options: 1. If there is a spark script, just do a ctrl-c from spark-shell and the job will be killed property. 2. For spark application also ctrl c will kill the job property on the cluster: Somehow the ctrl-c option did not work for us... Similar option works fine for scald

Re: Maximum memory limits

2014-03-16 Thread Sean Owen

You should simply use a snapshot built from HEAD of github.com/apache/sparkif you can. The key change is in MLlib and with any luck you can just replace that bit. See the PR I referenced. Sure with enough memory you can get it to run even with the memory issue, but it could be hundreds of GB at yo

Re: How to kill a spark app ?

2014-03-16 Thread Mayur Rustagi

Thr is a no good way to kill jobs in Spark yet. The closest is cancelAllJobs & cancelJobGroup in spark context. I have had bugs using both. I am trying to test them out, typically you would start a different thread & call these functions on it when you wish to cancel a job. Regards Mayur Mayur Rus

Re: possible bug in Spark's ALS implementation...

2014-03-16 Thread Matei Zaharia

On Mar 14, 2014, at 5:52 PM, Michael Allman wrote: > I also found that the product and user RDDs were being rebuilt many times > over in my tests, even for tiny data sets. By persisting the RDD returned > from updateFeatures() I was able to avoid a raft of duplicate computations. > Is there a rea

Re: Contributing pyspark ports

2014-03-16 Thread Matei Zaharia

Unfortunately there isn’t a guide, but you can read a PySpark internals overview at https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals. This would be the thing to follow. In terms of MLlib and GraphX, I think MLlib will be easier to expose at first — it’s designed to be easy t

Re: How to kill a spark app ?

2014-03-16 Thread Debasish Das

From http://spark.incubator.apache.org/docs/latest/spark-standalone.html#launching-applications-inside-the-cluster ./bin/spark-class org.apache.spark.deploy.Client kill does not work / has bugs ? On Sun, Mar 16, 2014 at 1:17 PM, Mayur Rustagi wrote: > Thr is a no good way to kill jobs in Spar

Re: How to kill a spark app ?

2014-03-16 Thread Mayur Rustagi

This is meant to kill the whole driver hosted inside the Master (new feature as of 0.9.0). I assume you are trying to kill a job/task/stage inside the Spark rather than the whole application. Regards Mayur Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi

Re: How to kill a spark app ?

2014-03-16 Thread Debasish Das

Thanks Mayur... I need both...but to start with even application killer will help a lot... Somehow that command did not work for meI will try it again from the spark main folder.. On Sun, Mar 16, 2014 at 1:43 PM, Mayur Rustagi wrote: > This is meant to kill the whole driver hosted inside t

Re: Maximum memory limits

2014-03-16 Thread Patrick Wendell

Sean - was this merged into the 0.9 branch as well (it seems so based on the message from rxin). If so it might make sense to try out the head of branch-0.9 as well. Unless there are *also* other changes relevant to this in master. - Patrick On Sun, Mar 16, 2014 at 12:24 PM, Sean Owen wrote: > Y

Re: Maximum memory limits

2014-03-16 Thread Sean Owen

Good point -- there's been another optimization for ALS in HEAD ( https://github.com/apache/spark/pull/131), but yes the better place to pick up just essential changes since 0.9.0 including the previous one is the 0.9 branch. -- Sean Owen | Director, Data Science | London On Sun, Mar 16, 2014 at

Re: How to kill a spark app ?

2014-03-16 Thread Mayur Rustagi

Are you embedding your driver inside the cluster? If not then that command will not kill the driver. You can simply kill the application by killing the scala application. So if its spark shell, simply by killing the shell the application will disconnect from the cluster. If the driver is embedded

Running Spark on a single machine

2014-03-16 Thread goi cto

Hi, I know it is probably not the purpose of spark but the syntax is easy and cool... I need to run some spark like code in memory on a single machine any pointers how to optimize it to run only on one machine? -- Eran | CTO

Machine Learning on streaming data

2014-03-16 Thread Nasir Khan

hi, I m into a project in which i have to get streaming URL's and Filter it and classify it as benin or suspicious. Now Machine Learning and Streaming are two separate things in apache spark (AFAIK). my Question is Can we apply Online Machine Learning Algorithms on Streams?? I am at Beginner Leve

Re: slf4j and log4j loop

2014-03-16 Thread Patrick Wendell

This is not released yet but we're planning to cut a 0.9.1 release very soon (e.g. most likely this week). In the mean time you'll have checkout branch-0.9 of Spark and publish it locally then depend on the snapshot version. Or just wait it out... On Fri, Mar 14, 2014 at 2:01 PM, Adrian Mocanu wr

Re: Running Spark on a single machine

2014-03-16 Thread Nick Pentreath

Please follow the instructions at http://spark.apache.org/docs/latest/index.html and http://spark.apache.org/docs/latest/quick-start.html to get started on a local machine. — Sent from Mailbox for iPhone On Sun, Mar 16, 2014 at 11:39 PM, goi cto wrote: > Hi, > I know it is probably not th

Re: How to kill a spark app ?

2014-03-16 Thread Matei Zaharia

If it’s a driver on the cluster, please open a JIRA issue about this — this kill command is indeed intended to work. Matei On Mar 16, 2014, at 2:35 PM, Mayur Rustagi wrote: > Are you embedding your driver inside the cluster? > If not then that command will not kill the driver. You can simply k

Re: [Powered by] Yandex Islands powered by Spark

2014-03-16 Thread Matei Zaharia

Thanks, I’ve added you: https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark. Let me know if you want to change any wording. Matei On Mar 16, 2014, at 6:48 AM, Egor Pahomov wrote: > Hi, page https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark > says I need write

Re: Running Spark on a single machine

2014-03-16 Thread goi cto

Sorry, I did not explain myself correctly. I know how to run spark, the question is how to instruct spark to do all of the computation on a single machine? I was trying to convert the code to scala but I miss some of the methods of spark like reduceByKey Eran On Mon, Mar 17, 2014 at 7:25 AM, Nic

Re: Running Spark on a single machine

2014-03-16 Thread Ewen Cheslack-Postava

Those pages include instructions for running locally: "Note that all of the sample programs take a parameter specifying the cluster URL to connect to. This can be a URL for a distributed cluster, or local to run locally with one thread, or local[N] to run locally with N threads. You should st

Task Recalculate or toal failure due to fectchError

Contributing pyspark ports

[Powered by] Yandex Islands powered by Spark

Separating classloader management from SparkContexts

Maximum memory limits

Re: Maximum memory limits

Re: Maximum memory limits

How to kill a spark app ?

Re: Maximum memory limits

Re: How to kill a spark app ?

Re: possible bug in Spark's ALS implementation...

Re: Contributing pyspark ports

Re: How to kill a spark app ?

Re: How to kill a spark app ?

Re: How to kill a spark app ?

Re: Maximum memory limits

Re: Maximum memory limits

Re: How to kill a spark app ?

Running Spark on a single machine

Machine Learning on streaming data

Re: slf4j and log4j loop

Re: Running Spark on a single machine

Re: How to kill a spark app ?

Re: [Powered by] Yandex Islands powered by Spark

Re: Running Spark on a single machine

Re: Running Spark on a single machine

26 matches

Site Navigation

Mail list logo

Footer information