Getting new metrics into /api/v1

2015-11-03 Thread Charles Yeh
Hello, I'm trying to get maxCores and memoryPerExecutorMB into /api/v1 for this ticket: https://issues.apache.org/jira/browse/SPARK-10565 I can't figure out which *getApplicationInfoList *is used by *ApiRootResource.scala. *It's attached in SparkUI but SparkUI's doesn't have start / end times and

Re: Please reply if you use Mesos fine grained mode

2015-11-03 Thread Timothy Chen
Fine grain mode does reuse the same JVM but perhaps different placement or different allocated cores comparing to the same total memory allocation. Tim Sent from my iPhone > On Nov 3, 2015, at 6:00 PM, Reynold Xin wrote: > > Soren, > > If I understand how Mesos works correctly, even the fine

Re: Please reply if you use Mesos fine grained mode

2015-11-03 Thread MEETHU MATHEW
Hi, We are using Mesos fine grained mode because we can have multiple instances of spark to share machines and each application get resources dynamically allocated.  Thanks & Regards,  Meethu M On Wednesday, 4 November 2015 5:24 AM, Reynold Xin wrote: If you are using Spark with M

Re: Please reply if you use Mesos fine grained mode

2015-11-03 Thread Reynold Xin
Soren, If I understand how Mesos works correctly, even the fine grained mode keeps the JVMs around? On Tue, Nov 3, 2015 at 4:22 PM, Soren Macbeth wrote: > we use fine-grained mode. coarse-grained mode keeps JVMs around which > often leads to OOMs, which in turn kill the entire executor, causin

Re: Please reply if you use Mesos fine grained mode

2015-11-03 Thread Jerry Lam
We "used" Spark on Mesos to build interactive data analysis platform because the interactive session could be long and might not use Spark for the entire session. It is very wasteful of resources if we used the coarse-grained mode because it keeps resource for the entire session. Therefore, fine-gr

Re: Implementation of RNN/LSTM in Spark

2015-11-03 Thread Sasaki Kai
Hi, Disha deeplearning4j seems to implement distributed RNN on core and scalaout packages. http://deeplearning4j.org/recurrentnetwork.html https://github.com/deeplearning4j/deeplearning4j It migh

Re: Please reply if you use Mesos fine grained mode

2015-11-03 Thread Soren Macbeth
we use fine-grained mode. coarse-grained mode keeps JVMs around which often leads to OOMs, which in turn kill the entire executor, causing entire stages to be retried. In fine-grained mode, only the task fails and subsequently gets retried without taking out an entire stage or worse. On Tue, Nov 3

Please reply if you use Mesos fine grained mode

2015-11-03 Thread Reynold Xin
If you are using Spark with Mesos fine grained mode, can you please respond to this email explaining why you use it over the coarse grained mode? Thanks.

[VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-03 Thread Reynold Xin
Please vote on releasing the following candidate as Apache Spark version 1.5.2. The vote is open until Sat Nov 7, 2015 at 00:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.5.2 [ ] -1 Do not release this package because ... The r

Re: Info about Dataset

2015-11-03 Thread Sandy Ryza
Hi Justin, The Dataset API proposal is available here: https://issues.apache.org/jira/browse/SPARK-. -Sandy On Tue, Nov 3, 2015 at 1:41 PM, Justin Uang wrote: > Hi, > > I was looking through some of the PRs slated for 1.6.0 and I noted > something called a Dataset, which looks like a new c

Info about Dataset

2015-11-03 Thread Justin Uang
Hi, I was looking through some of the PRs slated for 1.6.0 and I noted something called a Dataset, which looks like a new concept based off of the scaladoc for the class. Can anyone point me to some references/design_docs regarding the choice to introduce the new concept? I presume it is probably

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Justin Uang
Cool, thanks for the dev insight into what parts of the codebase are worthwhile, and which are not =) On Tue, Nov 3, 2015 at 10:25 PM Reynold Xin wrote: > It is quite a bit of work. Again, I think going through the file system > API is more ideal in the long run. In the long run, I don't even th

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Reynold Xin
It is quite a bit of work. Again, I think going through the file system API is more ideal in the long run. In the long run, I don't even think the current offheap API makes much sense, and we should consider just removing it to simplify things. On Tue, Nov 3, 2015 at 1:20 PM, Justin Uang wrote:

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Justin Uang
Alright, we'll just stick with normal caching then. Just for future reference, how much work would it be to get it to retain the partitions in tachyon. This is especially helpful in a multitenant situation, where many users each have their own persistent spark contexts, but where the notebooks can

Re: Pickle Spark DataFrame

2015-11-03 Thread Justin Uang
Is the Manager a python multiprocessing manager? Why are you using parallelism on python when theoretically most of the heavy lifting is done via spark? On Wed, Oct 28, 2015 at 4:27 PM agg212 wrote: > I would just like to be able to put a Spark DataFrame in a manager.dict() > and > be able to ge

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Reynold Xin
It is lost unfortunately (although can be recomputed automatically). On Tue, Nov 3, 2015 at 1:13 PM, Justin Uang wrote: > Thanks for your response. I was worried about #3, vs being able to use the > objects directly. #2 seems to be the dealbreaker for my use case right? > Even if it I am using

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Justin Uang
Thanks for your response. I was worried about #3, vs being able to use the objects directly. #2 seems to be the dealbreaker for my use case right? Even if it I am using tachyon for caching, if an executor is lost, then that partition is lost for the purposes of spark? On Tue, Nov 3, 2015 at 5:53 P

Frozen exception while dynamically creating classes inside Spark using JavaAssist API

2015-11-03 Thread Rachana Srivastava
I am trying to dynamically create a new class in Spark using javaassist API. The code seems very simple just invoking makeClass API on a hardcoded class name. The code works find outside Spark environment but getting this chedkNotFrozen exception when I am running the code inside Spark Code Exce

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Reynold Xin
I don't think there is any special handling w.r.t. Tachyon vs in-heap caching. As a matter of fact, I think the current offheap caching implementation is pretty bad, because: 1. There is no namespace sharing in offheap mode 2. Similar to 1, you cannot recover the offheap memory once Spark driver o

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Justin Uang
Yup, but I'm wondering what happens when an executor does get removed, but when we're using tachyon. Will the cached data still be available, since we're using off-heap storage, so the data isn't stored in the executor? On Tue, Nov 3, 2015 at 4:57 PM Ryan Williams wrote: > fwiw, I think that hav

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Ryan Williams
fwiw, I think that having cached RDD partitions prevents executors from being removed under dynamic allocation by default; see SPARK-8958 . The "spark.dynamicAllocation.cachedExecutorIdleTimeout" config

Re: Unchecked contribution (JIRA and PR)

2015-11-03 Thread Jerry Lam
Sergio, you are not alone for sure. Check the RowSimilarity implementation [SPARK-4823]. It has been there for 6 months. It is very likely those which don't merge in the version of spark that it was developed will never merged because spark changes quite significantly from version to version if the

Re: Implementation of RNN/LSTM in Spark

2015-11-03 Thread Disha Shrivastava
Hi Julio, Can you please cite references based on the distributed implementation? On Tue, Nov 3, 2015 at 8:52 PM, Julio Antonio Soto de Vicente < ju...@esbet.es> wrote: > Hi, > Is my understanding that little research has been done yet on distributed > computation (without access to shared memor

Re: Unchecked contribution (JIRA and PR)

2015-11-03 Thread Reynold Xin
Sergio, Usually it takes a lot of effort to get something merged into Spark itself, especially for relatively new algorithms that might not have established itself yet. I will leave it to mllib maintainers to comment on the specifics of the individual algorithms proposed here. Just another genera

Re: Implementation of RNN/LSTM in Spark

2015-11-03 Thread Julio Antonio Soto de Vicente
Hi, Is my understanding that little research has been done yet on distributed computation (without access to shared memory) in RNN. I also look forward to contributing in this respect. > El 03/11/2015, a las 16:00, Disha Shrivastava escribió: > > I would love to work on this and ask for ideas

Re: Master build fails ?

2015-11-03 Thread Jean-Baptiste Onofré
Hi Ted, thanks for the update. The build with sbt is in progress on my box. Regards JB On 11/03/2015 03:31 PM, Ted Yu wrote: Interesting, Sbt builds were not all failing: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/ FYI On Tue, Nov 3, 2015 at 5:58 AM, Jean-Baptiste Onofré ma

Re: Implementation of RNN/LSTM in Spark

2015-11-03 Thread Disha Shrivastava
I would love to work on this and ask for ideas on how it can be done or can suggest some papers as starting point. Also, I wanted to know if Spark would be an ideal platform to have a distributive implementation for RNN/LSTM On Mon, Nov 2, 2015 at 10:52 AM, Sasaki Kai wrote: > Hi, Disha > > Ther

Re: Extracting RDD of values per key from PairRDD

2015-11-03 Thread Vivekananda Venkateswaran
Hi Deepak, AFAIK, such nested RDDs are not allowed. Thanks, -Venkat. On Tue, Nov 3, 2015 at 4:20 PM, Deepak Gopalakrishnan wrote: > Hello, > > I have a use case where I need to get *an RDD of values per key *from a > PairRDD. Below is my PairRDD. > > JavaPairRDD> classifiedSampleRdd = > samp

Re: Master build fails ?

2015-11-03 Thread Ted Yu
Interesting, Sbt builds were not all failing: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/ FYI On Tue, Nov 3, 2015 at 5:58 AM, Jean-Baptiste Onofré wrote: > Hi Jacek, > > it works fine with mvn: the problem is with sbt. > > I suspect a different reactor order in sbt compare to

Re: SparkLauncher#setJavaHome does not set JAVA_HOME in child process

2015-11-03 Thread Ted Yu
Opening JIRA is fine. Thanks On Tue, Nov 3, 2015 at 4:25 AM, gus wrote: > Thanks, Ted. > The SparkLauncher test suite runs fine for me, with or without the change. > Do you agree this is a bug? If so, should I open a JIRA? > > > > -- > View this message in context: > http://apache-spark-develop

Re: Master build fails ?

2015-11-03 Thread Jean-Baptiste Onofré
Hi Jacek, it works fine with mvn: the problem is with sbt. I suspect a different reactor order in sbt compare to mvn. Regards JB On 11/03/2015 02:44 PM, Jacek Laskowski wrote: Hi, Just built the sources using the following command and it worked fine. ➜ spark git:(master) ✗ ./build/mvn -Pya

Re: Master build fails ?

2015-11-03 Thread Jacek Laskowski
Hi, Just built the sources using the following command and it worked fine. ➜ spark git:(master) ✗ ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -Dscala-2.11 -Phive -Phive-thriftserver -DskipTests clean install ... [INFO] --

Re: Running individual test classes

2015-11-03 Thread Stefano Baghino
Thank you for the tip, I'll keep that in mind. On Tue, Nov 3, 2015 at 1:57 PM, Ted Yu wrote: > My experience is that going through tests in each module takes some time > before reaching the test specified by the wildcard. > > Some test, such as SparkLauncherSuite, would run even if not in wildca

Re: Running individual test classes

2015-11-03 Thread Ted Yu
My experience is that going through tests in each module takes some time before reaching the test specified by the wildcard. Some test, such as SparkLauncherSuite, would run even if not in wildcard. FYI > On Nov 3, 2015, at 1:24 AM, Nitin Goyal wrote: > > In maven, you might want to try fo

Re: Unchecked contribution (JIRA and PR)

2015-11-03 Thread Sean Owen
Generally speaking, the default disposition of any PR or JIRA is "won't merge" until proven otherwise. This is especially true of large, stand-alone features like a new ML algorithm. I believe the lack of traction means there is not interest in adding this to Spark and so these issues should be clo

Re: Master build fails ?

2015-11-03 Thread Jean-Baptiste Onofré
Thanks for the update, I used mvn to build but without hive profile. Let me try with mvn with the same options as you and sbt also. I keep you posted. Regards JB On 11/03/2015 12:55 PM, Jeff Zhang wrote: I found it is due to SPARK-11073. Here's the command I used to build build/sbt clean co

Re: SparkLauncher#setJavaHome does not set JAVA_HOME in child process

2015-11-03 Thread gus
Thanks, Ted. The SparkLauncher test suite runs fine for me, with or without the change. Do you agree this is a bug? If so, should I open a JIRA? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/SparkLauncher-setJavaHome-does-not-set-JAVA-HOME-in-child-p

Re: Master build fails ?

2015-11-03 Thread Saisai Shao
Yeah, I also met this problem, just curious why jenkins test is OK. On Tue, Nov 3, 2015 at 7:55 PM, Jeff Zhang wrote: > I found it is due to SPARK-11073. > > Here's the command I used to build > > build/sbt clean compile -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver > -Psparkr > > On Tue, Nov 3

Re: Running individual test classes

2015-11-03 Thread Stefano Baghino
Good to know, thank you very much. :) On Tue, Nov 3, 2015 at 12:02 PM, Michael Armbrust wrote: > We support both build systems. We use maven to publish the canonical > distributions as it interoperates better with downstream consumers. Most > of the developers that I know, however, use SBT for

Re: Master build fails ?

2015-11-03 Thread Jeff Zhang
I found it is due to SPARK-11073. Here's the command I used to build build/sbt clean compile -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver -Psparkr On Tue, Nov 3, 2015 at 7:52 PM, Jean-Baptiste Onofré wrote: > Hi Jeff, > > it works for me (with skipping the tests). > > Let me try again, just

Re: Master build fails ?

2015-11-03 Thread Jean-Baptiste Onofré
Hi Jeff, it works for me (with skipping the tests). Let me try again, just to be sure. Regards JB On 11/03/2015 11:50 AM, Jeff Zhang wrote: Looks like it's due to guava version conflicts, I see both guava 14.0.1 and 16.0.1 under lib_managed/bundles. Anyone meet this issue too ? [error] /User

Re: Running individual test classes

2015-11-03 Thread Michael Armbrust
We support both build systems. We use maven to publish the canonical distributions as it interoperates better with downstream consumers. Most of the developers that I know, however, use SBT for day to day development. On Tue, Nov 3, 2015 at 11:36 AM, Stefano Baghino < stefano.bagh...@radicalbit.

Master build fails ?

2015-11-03 Thread Jeff Zhang
Looks like it's due to guava version conflicts, I see both guava 14.0.1 and 16.0.1 under lib_managed/bundles. Anyone meet this issue too ? [error] /Users/jzhang/github/spark_apache/core/src/main/scala/org/apache/spark/SecurityManager.scala:26: object HashCodes is not a member of package com.google

Extracting RDD of values per key from PairRDD

2015-11-03 Thread Deepak Gopalakrishnan
Hello, I have a use case where I need to get *an RDD of values per key *from a PairRDD. Below is my PairRDD. JavaPairRDD> classifiedSampleRdd = sampleRDD.groupByKey(); I want a separate RDD for the vectors per double entry in the key. *I would now want a RDD of values for each key.* Which will b

Unchecked contribution (JIRA and PR)

2015-11-03 Thread Sergio Ramírez
Hello all: I developed two packages for MLlib in March. These have been also upload to the spark-packages repository. Associated to these packages, I created two JIRA's threads and the correspondent pull requests, which are listed below: https://github.com/apache/spark/pull/5184 https://gith

Re: Running individual test classes

2015-11-03 Thread Stefano Baghino
Oh, I saw POMs and thought I was supposed to use Maven. Thank you so much for the help, I'll try it as soon as possible. On Tue, Nov 3, 2015 at 10:24 AM, Nitin Goyal wrote: > In maven, you might want to try following :- > > -DwildcardSuites=org.apache.spark.ml.ProbabilisticClassifierSuite > > On

Re: Running individual test classes

2015-11-03 Thread Nitin Goyal
In maven, you might want to try following :- -DwildcardSuites=org.apache.spark.ml.ProbabilisticClassifierSuite On Tue, Nov 3, 2015 at 2:42 PM, Michael Armbrust wrote: > In SBT: > > build/sbt "mllib/test-only *ProbabilisticClassifierSuite" > > On Tue, Nov 3, 2015 at 9:27 AM, Stefano Baghino < >

Re: Running individual test classes

2015-11-03 Thread Michael Armbrust
In SBT: build/sbt "mllib/test-only *ProbabilisticClassifierSuite" On Tue, Nov 3, 2015 at 9:27 AM, Stefano Baghino < stefano.bagh...@radicalbit.io> wrote: > Hi all, > > I'm new to contributing to Spark (and Apache projects in general); I've > started working on SPARK-7425 >

Running individual test classes

2015-11-03 Thread Stefano Baghino
Hi all, I'm new to contributing to Spark (and Apache projects in general); I've started working on SPARK-7425 and have implemented what looks like a viable solution. Now I'd like to test it, however I'm having some trouble running an individual te

Re: Ability to offer initial coefficients in ml.LogisticRegression

2015-11-03 Thread Holden Karau
Thats correct :) On Mon, Nov 2, 2015 at 8:04 PM, YiZhi Liu wrote: > Hi Holden, > > Yep the issue id is correct. It seems that you're waiting for > SPARK-11136 which Jayant is working on? > > Best, > Yizhi > > 2015-11-03 11:14 GMT+08:00 Holden Karau : > > Hi YiZhi, > > > > I've been waiting on th

Anyone has perfect solution for spark source code compilation issue on intellij

2015-11-03 Thread canan chen
Hi folks, I often meet the spark compilation issue on intellij. It wastes me lots of time. I googled it and found someone else also meet similar issue, but seems no perfect solution for now. but still wondering anyone here has perfect solution for that. The issue happens sometimes, I don't know wh