Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Marcin Tustin
The use case of docker images in general is that you can deploy and develop with exactly the same binary environment - same java 8, same scala, same spark. This makes things repeatable. On Wed, May 25, 2016 at 8:38 PM, Matei Zaharia wrote: > Just wondering, what is the main use case for the Dock

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Matei Zaharia
Just wondering, what is the main use case for the Docker images -- to develop apps locally or to deploy a cluster? If the image is really just a script to download a certain package name from a mirror, it may be okay to create an official one, though it does seem tricky to make it properly use t

Re: Labeling Jiras

2016-05-25 Thread Luciano Resende
On Wed, May 25, 2016 at 3:45 PM, Reynold Xin wrote: > I think the risk is everybody starts following this, then this will be > unmanageable, given the size of the number of organizations involved. > > The two main labels that we actually use are starter + releasenotes. > > Well, if we consider th

Re: Labeling Jiras

2016-05-25 Thread Reynold Xin
I think the risk is everybody starts following this, then this will be unmanageable, given the size of the number of organizations involved. The two main labels that we actually use are starter + releasenotes. On Wed, May 25, 2016 at 2:58 PM, Luciano Resende wrote: > > > On Wed, May 25, 2016 at

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Luciano Resende
On Wed, May 25, 2016 at 2:34 PM, Sean Owen wrote: > I don't think the project would bless anything but the standard > release artifacts since only those are voted on. People are free to > maintain whatever they like and even share it, as long as it's clear > it's not from the Apache project. > >

Re: Labeling Jiras

2016-05-25 Thread Sean Owen
Yeah I think using labels is fine -- just not if they're for someone's internal purpose. I don't have a problem with using meaningful labels if they're meaningful to everyone. In fact, I'd rather be using labels rather than "umbrella" JIRAs. Labels I have removed as unuseful are ones like "patch"

Re: Labeling Jiras

2016-05-25 Thread Luciano Resende
On Wed, May 25, 2016 at 2:33 PM, Sean Owen wrote: > I don't think we generally use labels at all except "starter". I > sometimes remove labels when I'm editing a JIRA otherwise, perhaps to > make that point. I don't recall doing this recently. > We have used for other things in the past, like to

Spark docker image - does that sound useful?

2016-05-25 Thread Marcin Tustin
Makes sense, but then let me ask a different question: if there's demand, should the project brew up its own release version in docker format? I've copied this to the user list to see if there's any demand. On Wed, May 25, 2016 at 5:34 PM, Sean Owen wrote: > I don't think the project would bles

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Sean Owen
I don't think the project would bless anything but the standard release artifacts since only those are voted on. People are free to maintain whatever they like and even share it, as long as it's clear it's not from the Apache project. On Wed, May 25, 2016 at 3:41 PM, Marcin Tustin wrote: > Ah ver

Re: Labeling Jiras

2016-05-25 Thread Sean Owen
I don't think we generally use labels at all except "starter". I sometimes remove labels when I'm editing a JIRA otherwise, perhaps to make that point. I don't recall doing this recently. However I'd say they should not be used to tag JIRAs for your internal purposes. Have you looked at things lik

Labeling Jiras

2016-05-25 Thread Luciano Resende
I recently used labels to mark couple jiras that me and my team have some interest on them, so it's easier to share a query and check the status on them. But I noticed that these labels were removed. Are there any issues with labeling jiras ? Any other suggestions ? -- Luciano Resende http://t

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Marcin Tustin
Ah very nice. Would it be possible to have this blessed into an official image? On Wed, May 25, 2016 at 4:12 PM, Luciano Resende wrote: > > > On Wed, May 25, 2016 at 6:53 AM, Marcin Tustin > wrote: > >> Would it be useful to start baking docker images? Would anyone find that >> a boon to their

Re: feedback on dataset api explode

2016-05-25 Thread Koert Kuipers
oh yes, this was by accident, it should have gone to dev On Wed, May 25, 2016 at 4:20 PM, Reynold Xin wrote: > Created JIRA ticket: https://issues.apache.org/jira/browse/SPARK-15533 > > @Koert - Please keep API feedback coming. One thing - in the future, can > you send api feedbacks to the dev@

Re: feedback on dataset api explode

2016-05-25 Thread Reynold Xin
Created JIRA ticket: https://issues.apache.org/jira/browse/SPARK-15533 @Koert - Please keep API feedback coming. One thing - in the future, can you send api feedbacks to the dev@ list instead of user@? On Wed, May 25, 2016 at 1:05 PM, Cheng Lian wrote: > Agree, since they can be easily replac

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Luciano Resende
On Wed, May 25, 2016 at 6:53 AM, Marcin Tustin wrote: > Would it be useful to start baking docker images? Would anyone find that a > boon to their testing? > > +1, I had done one (still based on 1.6) for some SystemML experiments, I could easily get it based on a nightly build. https://github.co

LiveListenerBus with started and stopped flags? Why both?

2016-05-25 Thread Jacek Laskowski
Hi, I'm wondering why LiveListenerBus has two AtomicBoolean flags [1]? Could it not have just one, say started? Why does Spark have to check the stopped state? [1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala#L49-L51 Pozdrawiam

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Reynold Xin
The maven artifacts can be found at https://repository.apache.org/content/repositories/orgapachespark-1182/ But really for people on this list, it might be better to go straight to the nightly snapshots. https://repository.apache.org/content/groups/snapshots/org/apache/spark/ On Wed, May 25, 201

The 7th and Largest Spark Summit is less than 2 weeks away!

2016-05-25 Thread Scott walent
*With every Spark Summit, an Apache Spark Community event, increasing numbers of users and developers attend. This is the seventh Summit, and whether you believe that “Seven” is the world’s most popular number, we are offering a special promo code* for all Apache Spark users and developers on this

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Reynold Xin
Yup I have published it to maven. Will post the link in a bit. One thing is that for developers, it might be better to use the nightly snapshot because that one probably has fewer bugs than the preview one. On Wednesday, May 25, 2016, Daniel Darabos wrote: > Awesome, thanks! It's very helpful f

Re: Cannot build master with sbt

2016-05-25 Thread Yiannis Gkoufas
Thanks so much for the workaround! On 25 May 2016 at 14:17, Nick Pentreath wrote: > I've filed https://issues.apache.org/jira/browse/SPARK-15525 > > For now, you would have to check out sbt-antlr4 at > https://github.com/ihji/sbt-antlr4/commit/23eab68b392681a7a09f6766850785afe8dfa53d > (since >

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Marcin Tustin
Would it be useful to start baking docker images? Would anyone find that a boon to their testing? On Wed, May 25, 2016 at 2:44 AM, Reynold Xin wrote: > In the past the Spark community have created preview packages (not > official releases) and used those as opportunities to ask community members

Re: Cartesian join on RDDs taking too much time

2016-05-25 Thread Max Sperlich
Cartesian joins tend to give a huge result size, and are inherently slow. If RDD B has N records then your result size will be at least N * 30 MB, since you have to replicate all the rows of A for a single record in B. Assuming RDD B has 10,000 records then you can see that your cartesian join wil

Re: Cannot build master with sbt

2016-05-25 Thread Nick Pentreath
I've filed https://issues.apache.org/jira/browse/SPARK-15525 For now, you would have to check out sbt-antlr4 at https://github.com/ihji/sbt-antlr4/commit/23eab68b392681a7a09f6766850785afe8dfa53d (since I don't see any branches or tags in the github repo for different versions), and sbt publishLoca

Cannot build master with sbt

2016-05-25 Thread Yiannis Gkoufas
Hi there, I have cloned the latest version from github. I am using scala 2.10.x When I invoke build/sbt clean package I get the exceptions because for the sbt-antlr library: [warn] module not found: com.simplytyped#sbt-antlr4;0.7.10 [warn] typesafe-ivy-releases: tried [warn] https://re

Re: [ANNOUNCE] Apache Spark 2.0.0-preview release

2016-05-25 Thread Daniel Darabos
Awesome, thanks! It's very helpful for preparing for the migration. Do you plan to push 2.0.0-preview to Maven too? (I for one would appreciate the convenience.) On Wed, May 25, 2016 at 8:44 AM, Reynold Xin wrote: > In the past the Spark community have created preview packages (not > official re

Cartesian join on RDDs taking too much time

2016-05-25 Thread Priya Ch
Hi All, I have two RDDs A and B where in A is of size 30 MB and B is of size 7 MB, A.cartesian(B) is taking too much time. Is there any bottleneck in cartesian operation ? I am using spark 1.6.0 version Regards, Padma Ch