Re: Spark Mesos Dispatcher

2015-07-19 Thread Jerry Lam
I only used client mode both 1.3 and 1.4 versions on mesos. I skimmed through https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcher.scala. I would actually backport the Cluster Mode feature. Sorry, I don't have an answer for this. On

Re: Spark Mesos Dispatcher

2015-07-19 Thread Jerry Lam
Yes. Sent from my iPhone > On 19 Jul, 2015, at 10:52 pm, "Jahagirdar, Madhu" > wrote: > > All, > > Can we run different version of Spark using the same Mesos Dispatcher. For > example we can run drivers with Spark 1.3 and Spark 1.4 at the same time ? > > Regards, > Madhu Jahagirdar > > Th

Re: KinesisStreamSuite failing in master branch

2015-07-19 Thread Tathagata Das
The PR to fix this is out. https://github.com/apache/spark/pull/7519 On Sun, Jul 19, 2015 at 6:41 PM, Tathagata Das wrote: > I am taking care of this right now. > > On Sun, Jul 19, 2015 at 6:08 PM, Patrick Wendell > wrote: > >> I think we should just revert this patch on all affected branches.

Re: KinesisStreamSuite failing in master branch

2015-07-19 Thread Tathagata Das
I am taking care of this right now. On Sun, Jul 19, 2015 at 6:08 PM, Patrick Wendell wrote: > I think we should just revert this patch on all affected branches. No > reason to leave the builds broken until a fix is in place. > > - Patrick > > On Sun, Jul 19, 2015 at 6:03 PM, Josh Rosen wrote: >

What is the reason there is no out of the box sortByValue API?

2015-07-19 Thread suyog choudhari

Re: KinesisStreamSuite failing in master branch

2015-07-19 Thread Patrick Wendell
I think we should just revert this patch on all affected branches. No reason to leave the builds broken until a fix is in place. - Patrick On Sun, Jul 19, 2015 at 6:03 PM, Josh Rosen wrote: > Yep, I emailed TD about it; I think that we may need to make a change to the > pull request builder to f

Re: KinesisStreamSuite failing in master branch

2015-07-19 Thread Josh Rosen
Yep, I emailed TD about it; I think that we may need to make a change to the pull request builder to fix this. Pending that, we could just revert the commit that added this. On Sun, Jul 19, 2015 at 5:32 PM, Ted Yu wrote: > Hi, > I noticed that KinesisStreamSuite fails for both hadoop profiles i

KinesisStreamSuite failing in master branch

2015-07-19 Thread Ted Yu
Hi, I noticed that KinesisStreamSuite fails for both hadoop profiles in master Jenkins builds. From https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-with-YARN/HADOOP_PROFILE=hadoop-2.4,label=centos/3011/console : KinesisStreamSuite:*** RUN ABORTED *** java.lang.AssertionError: asser

Re: Compact RDD representation

2015-07-19 Thread Сергей Лихоман
Hi Juan, It's exactly what I meant. if we will have high load with many repetitions it can significantly reduce rdd size and improve performance. in real use cases application frequently need to enrich data from cache or external system, so we will save time on each repetition. I will also do some

Re: Compact RDD representation

2015-07-19 Thread Juan Rodríguez Hortalá
Hi, My two cents is that that could be interesting if all RDD and pair RDD operations would be lifted to work on groupedRDD. For example as suggested a map on grouped RDDs would be more efficient if the original RDD had lots of duplicate entries, but for RDDs with little repetitions I guess you in

Re: Compact RDD representation

2015-07-19 Thread Sandy Ryza
In the Spark model, constructing an RDD does not mean storing all its contents in memory. Rather, an RDD is a description of a dataset that enables iterating over its contents, record by record (in parallel). The only time the full contents of an RDD are stored in memory is when a user explicitly

Re: Compact RDD representation

2015-07-19 Thread Сергей Лихоман
Sorry, maybe I am saying something completely wrong... we have a stream, we digitize it to created rdd. rdd in this case will be just array of any. than we apply transformation to create new grouped rdd and GC should remove original rdd from memory(if we won't persist it). Will we have GC step in

Re: Compact RDD representation

2015-07-19 Thread Sandy Ryza
The user gets to choose what they want to reside in memory. If they call rdd.cache() on the original RDD, it will be in memory. If they call rdd.cache() on the compact RDD, it will be in memory. If cache() is called on both, they'll both be in memory. -Sandy On Sun, Jul 19, 2015 at 11:09 AM, С

Re: Compact RDD representation

2015-07-19 Thread Сергей Лихоман
Thanks for answer! Could you please answer for one more question? Will we have in memory original rdd and grouped rdd in the same time? 2015-07-19 21:04 GMT+03:00 Sandy Ryza : > Edit: the first line should read: > > val groupedRdd = rdd.map((_, 1)).reduceByKey(_ + _) > > On Sun, Jul 19, 2015 at

Re: Compact RDD representation

2015-07-19 Thread Sandy Ryza
This functionality already basically exists in Spark. To create the "grouped RDD", one can run: val groupedRdd = rdd.reduceByKey(_ + _) To get it back into the original form: groupedRdd.flatMap(x => List.fill(x._1)(x._2)) -Sandy -Sandy On Sun, Jul 19, 2015 at 10:40 AM, Сергей Лихоман wr

Re: Compact RDD representation

2015-07-19 Thread Sandy Ryza
Edit: the first line should read: val groupedRdd = rdd.map((_, 1)).reduceByKey(_ + _) On Sun, Jul 19, 2015 at 11:02 AM, Sandy Ryza wrote: > This functionality already basically exists in Spark. To create the > "grouped RDD", one can run: > > val groupedRdd = rdd.reduceByKey(_ + _) > > To g

Compact RDD representation

2015-07-19 Thread Сергей Лихоман
Hi, I am looking for suitable issue for Master Degree project(it sounds like scalability problems and improvements for spark streaming) and seems like introduction of grouped RDD(for example: don't store "Spark", "Spark", "Spark", instead store ("Spark", 3)) can: 1. Reduce memory needed for RDD (

Re: [discuss] Removing individual commit messages from the squash commit message

2015-07-19 Thread Sandy Ryza
+1 On Sat, Jul 18, 2015 at 4:00 PM, Mridul Muralidharan wrote: > Thanks for detailing, definitely sounds better. > +1 > > Regards > Mridul > > On Saturday, July 18, 2015, Reynold Xin wrote: > >> A single commit message consisting of: >> >> 1. Pull request title (which includes JIRA number and c

Re: Foundation policy on releases and Spark nightly builds

2015-07-19 Thread Patrick Wendell
Sean B., Thank you for giving a thorough reply. I will work with Sean O. and see what we can change to make us more in line with the stated policy. I did some research and it appears that some time between October [1] and December [2] 2006, this page was modified to include stricter policy surrou

Re: Foundation policy on releases and Spark nightly builds

2015-07-19 Thread Patrick Wendell
Hey Sean, One other thing I'd be okay doing is moving the main text about nightly builds to the wiki and just have header called "Nightly builds" at the end of the downloads page that says "For developers, Spark maintains nightly builds. More information is available on the [Spark developer Wiki](

Re: Foundation policy on releases and Spark nightly builds

2015-07-19 Thread Sean Owen
I am going to make an edit to the download page on the web site to start, as that much seems uncontroversial. Proposed change: Reorder sections to put developer-oriented sections at the bottom, including the info on nightly builds: Download Spark Link with Spark All Releases Spark Source C