Re: Spark has a compile dependency on scalatest

2016-10-31 Thread Sean Owen
SBT and Maven resolution rules do differ. I thought SBT was generally latest-first though, which should make 3.0 take priority. Maven is more like closest-first, which means you can pretty much always override things in your own build. An exclusion is the right way to go in this case because the de

Interesting in contributing to spark

2016-10-31 Thread Zak H
Hi, I'd like to introduce myself. My name is Zak and I'm a software engineer. I'm interested in contributing to spark as a way to learn more. I've signed up to the mailing list and hope to learn more about spark. What do you recommend I start on as my first bug ? I have a working knowledge of scal

Re: Interesting in contributing to spark

2016-10-31 Thread Reynold Xin
Welcome! This is the best guide to get started: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark On Mon, Oct 31, 2016 at 5:09 AM, Zak H wrote: > Hi, > > I'd like to introduce myself. My name is Zak and I'm a software engineer. > I'm interested in contributing to spark as

Re: Odp.: Spark Improvement Proposals

2016-10-31 Thread Cody Koeninger
Now that spark summit europe is over, are any committers interested in moving forward with this? https://github.com/koeninger/spark-1/blob/SIP-0/docs/spark-improvement-proposals.md Or are we going to let this discussion die on the vine? On Mon, Oct 17, 2016 at 10:05 AM, Tomasz Gawęda wrote: > M

JIRA Components for Streaming

2016-10-31 Thread Michael Armbrust
I'm planning to do a little maintenance on JIRA to hopefully improve the visibility into the progress / gaps in Structured Streaming. In particular, while we share a lot of optimization / execution logic with SQL, the set of desired features and bugs is fairly different. Proposal: - Structured

Re: JIRA Components for Streaming

2016-10-31 Thread Cody Koeninger
Makes sense to me. I do wonder if e.g. [SPARK-12345][STRUCTUREDSTREAMING][KAFKA] is going to leave any room in the Github PR form for actual title content? On Mon, Oct 31, 2016 at 1:37 PM, Michael Armbrust wrote: > I'm planning to do a little maintenance on JIRA to hopefully improve the > visi

Re: JIRA Components for Streaming

2016-10-31 Thread Reynold Xin
Maybe just streaming or SS in GitHub? On Monday, October 31, 2016, Cody Koeninger wrote: > Makes sense to me. > > I do wonder if e.g. > > [SPARK-12345][STRUCTUREDSTREAMING][KAFKA] > > is going to leave any room in the Github PR form for actual title content? > > On Mon, Oct 31, 2016 at 1:37 PM,

Re: Issue with repartition and cache

2016-10-31 Thread ankits
Hi, Did you ever figure this one out? I'm seeing the same behavior: Calling cache() after a repartition() makes Spark cache the version of the RDD BEFORE the repartition, which means a shuffle everytime it is accessed.. However, calling cache before the repartition() seems to work fine, the cach

Updating Parquet dep to 1.9

2016-10-31 Thread Michael Allman
Hi All, Is anyone working on updating Spark's Parquet library dep to 1.9? If not, I can at least get started on it and publish a PR. Cheers, Michael - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Odp.: Spark Improvement Proposals

2016-10-31 Thread Ryan Blue
I agree, we should push forward on this. I think there is enough consensus to call a vote, unless someone else thinks that there is more to discuss? rb On Mon, Oct 31, 2016 at 10:34 AM, Cody Koeninger wrote: > Now that spark summit europe is over, are any committers interested in > moving forwa

Re: Odp.: Spark Improvement Proposals

2016-10-31 Thread Marcelo Vanzin
The proposal looks OK to me. I assume, even though it's not explicitly called, that voting would happen by e-mail? A template for the proposal document (instead of just a bullet nice) would also be nice, but that can be done at any time. BTW, shameless plug: I filed SPARK-18085 which I consider a

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-31 Thread Reynold Xin
OK I will cut a new RC tomorrow. Any other issues people have seen? On Fri, Oct 28, 2016 at 2:58 PM, Shixiong(Ryan) Zhu wrote: > -1. > > The history server is broken because of some refactoring work in > Structured Streaming: https://issues.apache.org/jira/browse/SPARK-18143 > > On Fri, Oct 28,

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-10-31 Thread Denny Lee
Oh, I forgot to note that when downloading and running against the Spark 2.0.2 without Hadoop binaries, I got a JNI error due to an exception with org / slf4j / logger (i.e. org.slf4j.Logger class is not found). On Mon, Oct 31, 2016 at 4:35 PM Reynold Xin wrote: > OK I will cut a new RC tomorr

Re: Python Spark Improvements (forked from Spark Improvement Proposals)

2016-10-31 Thread mariusvniekerk
So i've been working on some very very early stage apache arrow integration. My current plan it to emulate some of how the R function execution works. If there are any other people working on similar efforts it would be good idea to combine efforts. I can see how much effort is involved in conv

Re: Python Spark Improvements (forked from Spark Improvement Proposals)

2016-10-31 Thread Holden Karau
I believe Bryan is also working on this a little - and I'm a little busy with the other stuff but would love to stay in the loop on Arrow progress :) On Monday, October 31, 2016, mariusvniekerk wrote: > So i've been working on some very very early stage apache arrow > integration. > My current p