Re: Automatic PR labeling

2020-04-01 Thread Hyukjin Kwon
@Nicholas Chammas Would you be interested in tacking a look? I would love this to be done. 2020년 3월 25일 (수) 오전 10:30, Hyukjin Kwon 님이 작성: > That should be cool. There were a bit of discussions about which account > should label. If we can replace it, I think it sounds great! > > 2020년 3월 25일 (수)

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Jungtaek Lim
I didn't point out actual case "intentionally", because I want to avoid unnecessary debate and make sure we don't decide with bias. Note that the context would include people. I have been seen these requests consistently (at least consistently for 1, but I feel I also saw 2 more than couple of tim

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Nicholas Chammas
Probably the discussion here about Improvement Jira tickets and the "Affects Version" field: https://github.com/apache/spark/pull/27534#issuecomment-588416416 On Wed, Apr 1, 2020 at 9:59 PM Hyukjin Kwon wrote: > > 2) check with older versions to fill up affects version for bug > I don't agree wi

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Hyukjin Kwon
> 2) check with older versions to fill up affects version for bug I don't agree with this in general. To me usually it's "For the type of bug, assign one valid version" instead. > The only place where I can see some amount of investigation being required would be for security issues or correctness

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Mridul Muralidharan
I agree with what Sean detailed. The only place where I can see some amount of investigation being required would be for security issues or correctness issues. Knowing the affected versions, particularly if an earlier supported version does not have the bug, will help users understand the broken/in

Re: [DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Sean Owen
I think we discussed this briefly on a PR. It's not as clear what it means for an Improvement to 'affect a version'. Certainly, an improvement to a feature introduced in 1.2.3 can't affect anything earlier, and implicitly affects everything after. It's not wrong to say it affects the latest versio

[DISCUSS] filling affected versions on JIRA issue

2020-04-01 Thread Jungtaek Lim
Hi devs, I know we're busy with making Spark 3.0 be out, but I think the topic is good to discuss at any time and actually be better to be resolved sooner than later. In the page "Contributing to Spark", we describe the guide of "affects version" as "For Bugs, assign at least one version that is

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Ryan Blue
-1 (non-binding) I agree with Jungtaek. The change to create datasource tables instead of Hive tables by default (no USING or STORED AS clauses) has created confusing behavior and should either be rolled back or fixed before 3.0. On Wed, Apr 1, 2020 at 5:12 AM Sean Owen wrote: > Those are not p

Re: Need to order iterator values in spark dataframe

2020-04-01 Thread Ranjan, Abhinav
Enrico, The below solution works but there is a little glitch. It is working fine in spark-shell but failing for *_/skewed keys/_* while doing a spark-submit. while looking into the execution plan, the partitioning value is same for both repartition and groupByKey and is driven by the value

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Sean Owen
Those are not per se release blockers. They are (perhaps important) improvements to functionality. I don't know who is active and able to review that part of the code; I'd look for authors of changes in the surrounding code. The question here isn't so much what one would like to see in this release

Re: [VOTE] Apache Spark 3.0.0 RC1

2020-04-01 Thread Dr. Kent Yao
-1 Do not release this package because v3.0.0 is the 3rd major release since we added Spark On Kubernetes. Can we make it more production-ready as it has been experimental for more than 2 years? The main practical adoption of Spark on Kubernetes is to take on the role of other cluster managers(ma