date:20190221

Re: [DISCUSS] Spark 3.0 and DataSourceV2

2019-02-21 Thread Ryan Blue

In addition to logical plans, we need SQL support. That requires resolving v2 tables from a catalog and a few other changes like separating v1 plans from SQL parsing (see the earlier dev list thread). I’d also like to add DDL operations for v2. I think it also makes sense to add a new DF write API

Re: [DISCUSS] Spark 3.0 and DataSourceV2

2019-02-21 Thread Matt Cheah

To evaluate the amount of work required to get Data Source V2 into Spark 3.0, we should have a list of all the specific SPIPs and patches that are pending that would constitute a successful and usable revamp of that API. Here are the ones I could find and know off the top of my head: Table Catal

Re: [DISCUSS] Spark 3.0 and DataSourceV2

2019-02-21 Thread Ryan Blue

I'm all for making releases more often if we want. But this work could really use a target release to motivate getting it done. If we agree that it will block a release, then everyone is motivated to review and get the PRs in. If this work doesn't make it in the 3.0 release, I'm not confident that

Re: [DISCUSS] Spark 3.0 and DataSourceV2

2019-02-21 Thread Matei Zaharia

How large would the delay be? My 2 cents are that there’s nothing stopping us from making feature releases more often if we want to, so we shouldn’t see this as an “either delay 3.0 or release in >6 months” decision. If the work is likely to get in with a small delay and simplifies our work afte

[DISCUSS] Spark 3.0 and DataSourceV2

2019-02-21 Thread Ryan Blue

Hi everyone, In the DSv2 sync last night, we had a discussion about roadmap and what the goal should be for getting the main features into Spark. We all agreed that 3.0 should be that goal, even if it means delaying the 3.0 release. The possibility of delaying the 3.0 release may be controversial

DataSourceV2 sync notes - 20 Feb 2019

2019-02-21 Thread Ryan Blue

Here are my notes from the DSv2 sync last night. As always, if you have corrections, please reply with them. And if you’d like to be included on the invite to participate in the next sync (6 March), send me an email. Here’s a quick summary of the topics where we had consensus last night: - The

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-21 Thread DB Tsai

I am cutting a new rc4 with fix from Felix. Thanks. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0359BC9965359766 On Thu, Feb 21, 2019 at 8:57 AM Felix Cheung wrote: > > I merged the fix to 2.4. > > > _

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-21 Thread Felix Cheung

I merged the fix to 2.4. From: Felix Cheung Sent: Wednesday, February 20, 2019 9:34 PM To: DB Tsai; Spark dev list Cc: Cesar Delgado Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC2) Could you hold for a bit - I have one more fix to get in _

Re: Thoughts on dataframe cogroup?

2019-02-21 Thread Li Jin

I am wondering do other people have opinion/use case on cogroup? On Wed, Feb 20, 2019 at 5:03 PM Li Jin wrote: > Alessandro, > > Thanks for the reply. I assume by "equi-join", you mean "equality full > outer join" . > > Two issues I see with equity outer join is: > (1) equity outer join will gi

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-21 Thread Sean Owen

That looks like a change to restore some behavior that was removed in 2.2. It's not directly relevant to a release vote on 2.4.1. See the existing discussion at https://github.com/apache/spark/pull/22144#issuecomment-432258536 It may indeed be a good thing to change but just continue the discussion

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-21 Thread Parth Gandhi

Hello, In https://issues.apache.org/jira/browse/SPARK-24935, I am getting requests from people that they were hoping for the fix to be merged in Spark 2.4.1. The concerned PR is here: https://github.com/apache/spark/pull/23778. I do not mind if we do not merge it for 2.4.1 and I do not want

Re: Moving forward with the timestamp proposal

2019-02-21 Thread Zoltan Ivanfi

Hi, We can add these new SQL types by adding support to the file formats first. But the most important and immediate goal is reserving these types for their desired meaning and that can already be done without such support. Of course, eventually the new types need to be implemented as well, and f

Re: [VOTE] SPIP: Identifiers for multi-catalog Spark

2019-02-21 Thread Xiao Li

+1 This is in the right direction. The resolution rules and catalog APIs need more discussion when we implement it. In the current stage, we can disallow the runtime creation of the catalog. This will complicate the name resolution in a multi-session environment. For example, when one user creates

Re: [DISCUSS] Spark 3.0 and DataSourceV2

Re: [DISCUSS] Spark 3.0 and DataSourceV2

Re: [DISCUSS] Spark 3.0 and DataSourceV2

Re: [DISCUSS] Spark 3.0 and DataSourceV2

[DISCUSS] Spark 3.0 and DataSourceV2

DataSourceV2 sync notes - 20 Feb 2019

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

Re: Thoughts on dataframe cogroup?

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

Re: Moving forward with the timestamp proposal

Re: [VOTE] SPIP: Identifiers for multi-catalog Spark

13 matches

Site Navigation

Mail list logo

Footer information