Maximum memory limits

2014-03-16 Thread Debasish Das
Hi, I gave my spark job 16 gb of memory and it is running on 8 executors. The job needs more memory due to ALS requirements (20M x 1M matrix) On each node I do have 96 gb of memory and I am using 16 gb out of it. I want to increase the memory but I am not sure what is the right way to do that...

Github reviews now going to separate reviews@ mailing list

2014-03-16 Thread Patrick Wendell
Hey All, We've created a new list called revi...@spark.apache.org which will contain the contents from the github pull requests and comments. Note that these e-mails will no longer appear on the dev list. Thanks to Apache Infra for helping us set this up. To subscribe to this e-mail: reviews-sub

[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-03-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-37764448 Unless you are a spark developer, including at Yahoo, the person building the assembly jar is not the same as the person using spark : so depending on assembled jar contai

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/42#issuecomment-37764131 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/42#issuecomment-37764130 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this f

[GitHub] spark pull request: Bugfixes/improvements to scheduler

2014-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/159#issuecomment-37764078 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Bugfixes/improvements to scheduler

2014-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/159#issuecomment-37764079 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13202/ --- If your project

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-16 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/149 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabl

[GitHub] spark pull request: [SPARK-1132] Persisting Web UI through refacto...

2014-03-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/42#issuecomment-37763540 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have th

[GitHub] spark pull request: Bugfixes/improvements to scheduler

2014-03-16 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/159#discussion_r10640471 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -533,8 +575,11 @@ private[spark] class TaskSetManager(

[GitHub] spark pull request: Spark 615 map partitions with index callable f...

2014-03-16 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/16#discussion_r10640426 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala --- @@ -72,11 +72,12 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends S

[GitHub] spark pull request: Bugfixes/improvements to scheduler

2014-03-16 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/159#discussion_r10640410 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala --- @@ -298,6 +298,94 @@ class TaskSetManagerSuite extends FunSuite wi

[GitHub] spark pull request: Bugfixes/improvements to scheduler

2014-03-16 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/159#discussion_r10640396 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -533,8 +575,11 @@ private[spark] class TaskSetManager(

[GitHub] spark pull request: Bugfixes/improvements to scheduler

2014-03-16 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/159#discussion_r10640377 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -228,12 +239,18 @@ private[spark] class TaskSetManager( * Th

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/149#issuecomment-37762626 Thanks I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have thi

[GitHub] spark pull request: Update CommandUtils.scala

2014-03-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/157#issuecomment-37762544 Hey @baishuo I'd separately try to debug why SPARK_JAVA_OPTS isn't working. In general we probably don't want to hard code debugging options like this in the launcher.

[GitHub] spark pull request: Bugfixes/improvements to scheduler

2014-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/159#issuecomment-37762134 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: Bugfixes/improvements to scheduler

2014-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/159#issuecomment-37762133 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: SPARK-1255: Allow user to pass Serializer obje...

2014-03-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/149#issuecomment-37762064 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled an

[GitHub] spark pull request: Bugfixes/improvements to scheduler

2014-03-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/159#issuecomment-37761904 Jenkins, test this please (?) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-03-16 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-37761875 @sryza when a user builds an application assembly jar, they are allowed to bundle their own log4j.properties file in the jar. Is this not working for you on YARN? Spark's

[GitHub] spark pull request: Bugfixes/improvements to scheduler

2014-03-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/159#issuecomment-37761392 jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: Bugfixes/improvements to scheduler

2014-03-16 Thread mridulm
GitHub user mridulm opened a pull request: https://github.com/apache/spark/pull/159 Bugfixes/improvements to scheduler Move the PR#517 of apache-incubator-spark to the apache-spark You can merge this pull request into a Git repository by running: $ git pull https://github.com/

[GitHub] spark pull request: Update CommandUtils.scala

2014-03-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/157#issuecomment-37761089 That is weird - you can see the use of SPARK_JAVA_OPTS just a few lines above in the patch you submitted. --- If your project is set up for it, you can reply to this emai

[GitHub] spark pull request: "Adding an option to persist Spark RDD blocks ...

2014-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/158#issuecomment-37760378 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: "Adding an option to persist Spark RDD blocks ...

2014-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/158#issuecomment-37760380 One or more automated tests failed Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13201/ --- If your p

[GitHub] spark pull request: "Adding an option to persist Spark RDD blocks ...

2014-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/158#issuecomment-37760327 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not hav

[GitHub] spark pull request: "Adding an option to persist Spark RDD blocks ...

2014-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/158#issuecomment-37760328 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have t

[GitHub] spark pull request: Update CommandUtils.scala

2014-03-16 Thread baishuo
Github user baishuo commented on the pull request: https://github.com/apache/spark/pull/157#issuecomment-37760047 Thank you for mridulm‘s update。 I had set SPARK_JAVA_OPTS of one worker to "-Xdebug -Xrunjdwp:transport=dt_socket,address=18000,server=y,suspend=y" and try to do a

[GitHub] spark pull request: "Adding an option to persist Spark RDD blocks ...

2014-03-16 Thread RongGu
Github user RongGu commented on the pull request: https://github.com/apache/spark/pull/158#issuecomment-37759232 The PR is moved here, and the left commits will be push to this site then. --- If your project is set up for it, you can reply to this email and have your reply appear on G

[GitHub] spark pull request: "Adding an option to persist Spark RDD blocks ...

2014-03-16 Thread RongGu
GitHub user RongGu opened a pull request: https://github.com/apache/spark/pull/158 "Adding an option to persist Spark RDD blocks into Tachyon." Move the PR#468 of apache-incubator-spark to the apache-spark "Adding an option to persist Spark RDD blocks into Tachyon." You can merg

[GitHub] spark pull request: Update CommandUtils.scala

2014-03-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/157#issuecomment-37758977 This can be done with SPARK_JAVA_OPTS set to java debug options. That goes to master and executors. Practically, particularly in multi-tennet deployments this n

[GitHub] spark pull request: Update CommandUtils.scala

2014-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/157#issuecomment-37758897 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: Update CommandUtils.scala

2014-03-16 Thread baishuo
GitHub user baishuo opened a pull request: https://github.com/apache/spark/pull/157 Update CommandUtils.scala enable the user can do remote-debugging on the ExcutorRunner Process. we need one flag to enable this function: spark.excutor.debug, an other flag spark.excutor.debug.port

[GitHub] spark pull request: [SPARK-1259] Make RDD locally iterable

2014-03-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/156#issuecomment-37756097 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request: [SPARK-1259] Make RDD locally iterable

2014-03-16 Thread epahomov
GitHub user epahomov opened a pull request: https://github.com/apache/spark/pull/156 [SPARK-1259] Make RDD locally iterable You can merge this pull request into a Git repository by running: $ git pull https://github.com/epahomov/spark SPARK-1259 Alternatively you can review a

[GitHub] spark pull request: [SPARK-1259] Make RDD locally iterable

2014-03-16 Thread epahomov
Github user epahomov closed the pull request at: https://github.com/apache/spark/pull/155 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark pull request: [SPARK-1259] Make RDD locally iterable

2014-03-16 Thread epahomov
GitHub user epahomov opened a pull request: https://github.com/apache/spark/pull/155 [SPARK-1259] Make RDD locally iterable You can merge this pull request into a Git repository by running: $ git pull https://github.com/epahomov/spark SPARK-914 Alternatively you can review an

[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-03-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-37755416 There is a user exposed option to configure log4j when run in yarn - which is shipped as part of the job if specified. On Sun, Mar 16, 2014 at 2:25 AM, San

[GitHub] spark pull request: SPARK-1252. On YARN, use container-log4j.prope...

2014-03-16 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/148#issuecomment-37752740 Currently, Spark doesn't ship a log4j.properties. It uses the log4j.properties that comes from Hadoop. This log4j.properties is meant for Hadoop services, not YARN contain

Ping on SPARK-1177

2014-03-16 Thread Egor Pahomov
Spark documentation and spark code helps you run your application from shell. In my company it's not convenient - we run cluster task from code in our web service. It took me a lot of time to bring as much configuration in code as I can, because configuration at process start - quite hard in our re

[GitHub] spark pull request: SPARK-1251 Support for optimizing and executin...

2014-03-16 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/146#issuecomment-37751624 Hey Michael, I really like the docs and API for this! I tried this out in spark-shell though and saw a few errors: * The built-in SQL seems to be case-sensitive

[GitHub] spark pull request: remove staging dir when app quiting for yarn-c...

2014-03-16 Thread mridulm
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/154#issuecomment-37751154 jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: fix compile error for hadoop CDH 4.4+

2014-03-16 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/151#discussion_r10638435 --- Diff: core/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandlerMacro.scala --- @@ -0,0 +1,46 @@ +/* + * Licensed to the Apache Softwa