Re: Odp.: Spark Improvement Proposals

2016-11-01 Thread Reynold Xin
Most things looked OK to me too, although I do plan to take a closer look after Nov 1st when we cut the release branch for 2.1. On Mon, Oct 31, 2016 at 3:12 PM, Marcelo Vanzin wrote: > The proposal looks OK to me. I assume, even though it's not explicitly > called, that voting would happen by e

Re: Updating Parquet dep to 1.9

2016-11-01 Thread Sean Owen
Yes this came up from a different direction: https://issues.apache.org/jira/browse/SPARK-18140 I think it's fine to pursue an upgrade to fix these several issues. The question is just how well it will play with other components, so bears some testing and evaluation of the changes from 1.8, but yes

Re: Python Spark Improvements (forked from Spark Improvement Proposals)

2016-11-01 Thread Holden Karau
On that note there is some discussion on the Jira - https://issues.apache.org/jira/browse/SPARK-13534 :) On Mon, Oct 31, 2016 at 8:32 PM, Holden Karau wrote: > I believe Bryan is also working on this a little - and I'm a little busy > with the other stuff but would love to stay in the loop on Ar

Re: Updating Parquet dep to 1.9

2016-11-01 Thread Ryan Blue
1.9.0 includes some fixes intended specifically for Spark: * PARQUET-389: Evaluates push-down predicates for missing columns as though they are null. This is to address Spark's work-around that requires reading and merging file schemas, even for metastore tables. * PARQUET-654: Adds an option to d

Question about using collaborative filtering in MLlib

2016-11-01 Thread Zak H
Hi, I'm using the Java Api for Dataframe api for Spark-Mllib. Should I be using the RDD api instead as I'm not sure if this functionality has been ported over to dataframes, correct me if I'm wrong. My goal is to evaluate spark's recommendation capabilities. I'm looking at this example: http://s

Re: Updating Parquet dep to 1.9

2016-11-01 Thread Reynold Xin
Ryan want to submit a pull request? On Tue, Nov 1, 2016 at 9:05 AM, Ryan Blue wrote: > 1.9.0 includes some fixes intended specifically for Spark: > > * PARQUET-389: Evaluates push-down predicates for missing columns as > though they are null. This is to address Spark's work-around that requires

Re: Updating Parquet dep to 1.9

2016-11-01 Thread Ryan Blue
I can when I'm finished with a couple other issues if no one gets to it first. Michael, if you're interested in updating to 1.9.0 I'm happy to help review that PR. On Tue, Nov 1, 2016 at 1:03 PM, Reynold Xin wrote: > Ryan want to submit a pull request? > > > On Tue, Nov 1, 2016 at 9:05 AM, Ryan

Re: JIRA Components for Streaming

2016-11-01 Thread Michael Armbrust
I did this . Please help me correct any issues I may have missed. On Mon, Oct 31, 2016 at 11:37 AM, Michael Armbrust wrote: > I'm planning to do a little ma

view canonicalization - looking for database gurus to chime in

2016-11-01 Thread Reynold Xin
I know there are a lot of people with experience on developing database internals on this list. Please take a look at this proposal for a new, simpler way to handle view canonicalization in Spark SQL: https://issues.apache.org/jira/browse/SPARK-18209 It sounds much simpler than what we currently d

Re: getting encoder implicits to be more accurate

2016-11-01 Thread Sam Goodwin
You don't need compiler time macros for this, you can do it quite easily using shapeless. I've been playing with a project which borrows ideas from spray-json and spray-json-shapeless to implement Row marshalling for arbitrary case classes. It's checked and generated at compile time, supports arbit

[VOTE] Release Apache Spark 2.0.2 (RC2)

2016-11-01 Thread Reynold Xin
Please vote on releasing the following candidate as Apache Spark version 2.0.2. The vote is open until Fri, Nov 4, 2016 at 22:00 PDT and passes if a majority of at least 3+1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.0.2 [ ] -1 Do not release this package because ... The t

Re: [VOTE] Release Apache Spark 2.0.2 (RC1)

2016-11-01 Thread vijoshi
Hi, Have encountered an issue with History Server in 2.0 - and updated https://issues.apache.org/jira/browse/SPARK-16808 with a comment detailing the problem. This is a regression in 2.0 from 1.6, so this issue exists since 2.0.1. Encountered this very recently when we evaluated moving to 2.0 fro

Re: [VOTE] Release Apache Spark 2.0.2 (RC2)

2016-11-01 Thread vijoshi
Hi, Have encountered an issue with History Server in 2.0 - and updated https://issues.apache.org/jira/browse/SPARK-16808 with a comment detailing the problem. This is a regression in 2.0 from 1.6, so this issue exists since 2.0.1. Encountered this very recently when we evaluated moving to 2.0 fr

Re: [VOTE] Release Apache Spark 2.0.2 (RC2)

2016-11-01 Thread Reynold Xin
Vinayak, Thanks for the email. This is really not the thread meant for reporting existing regressions. It's best just commenting on the jira ticket and even better submit a fix for it. On Tuesday, November 1, 2016, vijoshi wrote: > > Hi, > > Have encountered an issue with History Server in 2.0

Re: [VOTE] Release Apache Spark 2.0.2 (RC2)

2016-11-01 Thread vijoshi
Sure - given the nature of the bug, it looked like it may have gone under the radar in prior 2.0 releases (test cases pass) so thought to bring attention to this for some evaluation of the criticality this issue. Will take further discussion to the ticket. -- View this message in context: http

[ANNOUNCE] Apache Spark branch-2.1

2016-11-01 Thread Reynold Xin
Hi all, Following the release schedule as outlined in the wiki, I just created branch-2.1 to form the basis of the 2.1 release. As of today we have less than 50 open issues for 2.1.0. The next couple of weeks we as a community should focus on testing and bug fixes and burn down the number of outst