ClassNotFoundException while running unit test with local cluster mode in Intellij IDEA

2018-01-30 Thread wuyi
Dear devs, I'v got stuck on this issue for several days, and I need help now. At the first, I run into an old issue, which is the same as http://apache-spark-developers-list.1001551.n3.nabble.com/test-cases-stuck-on-quot-local-cluster-mode-quot-of-ReplSuite-td3086.html

[SQL] Tests for ExtractFiltersAndInnerJoins.flattenJoin

2018-01-30 Thread Jacek Laskowski
Hi, While exploring ReorderJoin optimization I wrote few unit test-like examples that demo how ExtractFiltersAndInnerJoins.flattenJoin [1] works. I've been wondering if the examples could become unit tests instead. There are 6 different join-filter plan combinations using Catalyst DSL to create t

Re: ClassNotFoundException while running unit test with local cluster mode in Intellij IDEA

2018-01-30 Thread Wenchen Fan
You can run test in SBT and attach your IDEA to it for debugging, which works for me. On Tue, Jan 30, 2018 at 7:44 PM, wuyi wrote: > Dear devs, > I'v got stuck on this issue for several days, and I need help now. > At the first, I run into an old issue, which is the same as > http://apac

Re: ClassNotFoundException while running unit test with local cluster mode in Intellij IDEA

2018-01-30 Thread wuyi
Hi, cloud0fan. Yeah, tests run well in SBT. Maybe, I should try your way. Thanks! -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: ClassNotFoundException while running unit test with local cluster mode in Intellij IDEA

2018-01-30 Thread wuyi
Hi, cloud0fan, I tried it and that's really good and cool! Thanks again! -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-30 Thread Andrew Ash
I'd like to nominate SPARK-23274 as a potential blocker for the 2.3.0 release as well, due to being a regression from 2.2.0. The ticket has a simple repro included, showing a query that works in prior releases but now fails with an exception in t

PSA: Release and commit quality

2018-01-30 Thread Ryan Blue
Hi everyone, I’ve noticed some questionable practices around commits going into master lately (and historically, to be honest) and I want to remind everyone about some best practices for commit and release quality. - *Please don’t mix partial, unrelated changes into a commit.* This makes

Re: PSA: Release and commit quality

2018-01-30 Thread Xiao Li
Hi, Ryan, Thanks for your inputs. These comments are pretty helpful! Please continue to help us improve Spark and Spark community. Thanks again, Xiao 2018-01-30 12:58 GMT-08:00 Ryan Blue : > Hi everyone, > > I’ve noticed some questionable practices around commits going into master > lately (

[SQL] [Suggestion] Add top() to Dataset

2018-01-30 Thread Yacine Mazari
Hi All, Would it make sense to add a "top()" method to the Dataset API? This method would return a Dataset containing the top k elements, the caller may then do further processing on the Dataset or call collect(). This is in contrast with RDD's top() which returns a collected array. In terms of i

Re: [SQL] [Suggestion] Add top() to Dataset

2018-01-30 Thread Reynold Xin
For the DataFrame/Dataset API, the optimizer rewrites orderBy followed by a take into a priority queue based top implementation actually. On Tue, Jan 30, 2018 at 11:10 PM, Yacine Mazari wrote: > Hi All, > > Would it make sense to add a "top()" method to the Dataset API? > This method would retu

Re: [SQL] [Suggestion] Add top() to Dataset

2018-01-30 Thread Yacine Mazari
Thanks for the quick reply and explanation @rxin. So if one does not want to collect()/take() but want the top k as a dataset to do further transformations there is no optimized API, that's why I am suggesting adding this "top()" as a public method. If that sounds like a good idea, I will open a

Re: [SQL] [Suggestion] Add top() to Dataset

2018-01-30 Thread Wenchen Fan
You can use `Dataset.limit`, which return a new `Dataset` instead of an Array. Then you can transform it and still get the top k optimization from Spark. On Wed, Jan 31, 2018 at 3:39 PM, Yacine Mazari wrote: > Thanks for the quick reply and explanation @rxin. > > So if one does not want to colle