Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
Bummer. People should still feel welcome to test the existing RC so we can rule out other issues. On Tue, May 15, 2018 at 2:04 PM, Xiao Li wrote: > -1 > > We have a correctness bug fix that was merged after 2.3 RC1. It would be > nice to have that in Spark 2.3.1 release. > > https://issues.apache

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Xiao Li
-1 We have a correctness bug fix that was merged after 2.3 RC1. It would be nice to have that in Spark 2.3.1 release. https://issues.apache.org/jira/browse/SPARK-24259 Xiao 2018-05-15 14:00 GMT-07:00 Marcelo Vanzin : > Please vote on releasing the following candidate as Apache Spark version >

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
It's in. That link is only a list of the currently open bugs. On Tue, May 15, 2018 at 2:02 PM, Justin Miller wrote: > Did SPARK-24067 not make it in? I don’t see it in https://s.apache.org/Q3Uo. > > Thanks, > Justin > > On May 15, 2018, at 3:00 PM, Marcelo Vanzin wrote: > > Please vote on releas

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Justin Miller
Did SPARK-24067 not make it in? I don’t see it in https://s.apache.org/Q3Uo . Thanks, Justin > On May 15, 2018, at 3:00 PM, Marcelo Vanzin wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.3.1. > > The vote is open until Friday

Re: [VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
I'll start with my +1 (binding). I've ran unit tests and a bunch of integration tests on the hadoop-2.7 package. Please note that there are still a few flaky tests. Please check jira before you decide to send a -1 because of a flaky test. Also, apologies for the delay in getting the RC ready. Sti

[VOTE] Spark 2.3.1 (RC1)

2018-05-15 Thread Marcelo Vanzin
Please vote on releasing the following candidate as Apache Spark version 2.3.1. The vote is open until Friday, May 18, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 2.3.1 [ ] -1 Do not release this package because ... To le

Re: Preventing predicate pushdown

2018-05-15 Thread Tomasz Gawęda
Thanks, filled https://issues.apache.org/jira/browse/SPARK-24288 Pozdrawiam / Best regards, Tomek On 2018-05-15 18:29, Wenchen Fan wrote: applying predict pushdown is an optimization, and it makes sense to provide configs to turn off certain optimizations. Feel free to create a JIRA. Thanks, W

Re: Preventing predicate pushdown

2018-05-15 Thread Wenchen Fan
applying predict pushdown is an optimization, and it makes sense to provide configs to turn off certain optimizations. Feel free to create a JIRA. Thanks, Wenchen On Tue, May 15, 2018 at 8:33 PM, Tomasz Gawęda wrote: > Hi, > > while working with JDBC datasource I saw that many "or" clauses with

Preventing predicate pushdown

2018-05-15 Thread Tomasz Gawęda
Hi, while working with JDBC datasource I saw that many "or" clauses with non-equality operators causes huge performance degradation of SQL query to database (DB2). For example: val df = spark.read.format("jdbc").(other options to parallelize load).load() df.where(s"(date1 > $param1 and (date1

Re: Sort-merge join improvement

2018-05-15 Thread Petar Zecevic
Based on some reviews I put additional effort into fixing the case when wholestage codegen is turned off. Sort-merge join with additional range conditions is now 10x faster (can be more or less, depending on exact use-case) in both cases - with wholestage turned off or on - compared to non-opt