date:20190228

Re: [VOTE] SPIP: Spark API for Table Metadata

2019-02-28 Thread John Zhuge

+1 (non-binding) On Thu, Feb 28, 2019 at 9:11 AM Matt Cheah wrote: > +1 (non-binding) > > > > *From: *Jamison Bennett > *Date: *Thursday, February 28, 2019 at 8:28 AM > *To: *Ryan Blue , Spark Dev List > *Subject: *Re: [VOTE] SPIP: Spark API for Table Metadata > > > > +1 (non-binding) > > > *J

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Matt Cheah

I want to specifically highlight and +1 a point that Ryan brought up: A commitment binds us to do this and make a reasonable attempt at finishing on time. If we choose not to commit, or if we choose to commit and don’t make a reasonable attempt, then we need to ask, “what happened?” Is Spark

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Mridul Muralidharan

I am -1 on this vote for pretty much all the reasons that Mark mentioned. A major version change gives us an opportunity to remove deprecated interfaces, stabilize experimental/developer api, drop support for outdated functionality/platforms and evolve the project with a vision for foreseeable fu

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Joseph Torres

I'm not worried about rushing. I worry that, without clear parameters for the amount or types of DSv2 delays that are acceptable, we might end up holding back 3.0 indefinitely to meet the deadline when we wouldn't have made that decision de novo. (Or even worse, the PMC eventually feels they must r

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Ryan Blue

The question is, what does it bind? I’m not pushing for a binding statement to do this or delay the 3.0 release because I don’t think that’s a very reasonable thing to do. It may well be that there is a good reason for missing the goal. So “what does it bind?” is an apt question. A commitment bi

Structured Streaming - compare previous row value with current

2019-02-28 Thread Raphael Hirsiger

Hi there, Would you be able to give advise on how to best compare a previous row value in a structured streaming DF with the current one? Kind regards, Raphael - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Sean Owen

This is a fine thing to VOTE on. Committers (and community, non-binding) can VOTE on what we like; we just don't do it often where not required because it's a) overkill overhead over simple lazy consensus, and b) it can be hard to say what the binding VOTE binds if it's not a discrete commit or rel

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Joseph Torres

I’m sure we, as a community, will seriously consider any proposal that Spark would benefit if the PMC delays release X to include changes A, B, C. Indeed, every release I remember has had a few iterations of “can we hold the train for a bit because it would be super great to get this PR in”. Many

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Mark Hamstra

I agree that adding new features in a major release is not forbidden, but that is just not the primary goal of a major release. If we reach the point where we are happy with the new public API before some new features are in a satisfactory state to be merged, then I don't want there to be a prior p

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Ryan Blue

Mark, I disagree. Setting common goals is a critical part of getting things done. This doesn't commit the community to push out the release if the goals aren't met, but does mean that we will, as a community, seriously consider it. This is also an acknowledgement that this is the most important fe

Re: CombinePerKey and GroupByKey

2019-02-28 Thread Reynold Xin

This should be fine. Dataset.groupByKey is a logical operation, not a physical one (as in Spark wouldn’t always materialize all the groups in memory). On Thu, Feb 28, 2019 at 1:46 AM Etienne Chauchot wrote: > Hi all, > > I'm migrating RDD pipelines to Dataset and I saw that Combine.PerKey is no

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Ryan Blue

Mark, if this goal is adopted, "we" is the Apache Spark community. On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra wrote: > Who is "we" in these statements, such as "we should consider a functional > DSv2 implementation a blocker for Spark 3.0"? If it means those > contributing to the DSv2 effort w

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Mark Hamstra

Then I'm -1. Setting new features as blockers of major releases is not proper project management, IMO. On Thu, Feb 28, 2019 at 10:06 AM Ryan Blue wrote: > Mark, if this goal is adopted, "we" is the Apache Spark community. > > On Thu, Feb 28, 2019 at 9:52 AM Mark Hamstra > wrote: > >> Who is "we

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Mark Hamstra

Who is "we" in these statements, such as "we should consider a functional DSv2 implementation a blocker for Spark 3.0"? If it means those contributing to the DSv2 effort want to set their own goals, milestones, etc., then that is fine with me. If you mean that the Apache Spark project should offici

Re: [VOTE] SPIP: Spark API for Table Metadata

2019-02-28 Thread Matt Cheah

+1 (non-binding) From: Jamison Bennett Date: Thursday, February 28, 2019 at 8:28 AM To: Ryan Blue , Spark Dev List Subject: Re: [VOTE] SPIP: Spark API for Table Metadata +1 (non-binding) Jamison Bennett Cloudera Software Engineer jamison.benn...@cloudera.com 515 Congress Ave, Suite

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Matt Cheah

+1 (non-binding) Are identifiers and namespaces going to be rolled under one of those six points? From: Ryan Blue Reply-To: "rb...@netflix.com" Date: Thursday, February 28, 2019 at 8:39 AM To: Spark Dev List Subject: [VOTE] Functional DataSourceV2 in Spark 3.0 I’d like to call a vote

Re: [DISCUSS] Spark 3.0 and DataSourceV2

2019-02-28 Thread Ryan Blue

Thanks for the discussion, everyone. Since there aren't many objections to the scope and we are aligned on what this commitment would mean, I've started a vote thread for it. rb On Wed, Feb 27, 2019 at 5:32 PM Wenchen Fan wrote: > I'm good with the list from Ryan, thanks! > > On Thu, Feb 28, 20

[VOTE] Functional DataSourceV2 in Spark 3.0

2019-02-28 Thread Ryan Blue

I’d like to call a vote for committing to getting DataSourceV2 in a functional state for Spark 3.0. For more context, please see the discussion thread, but here is a quick summary about what this commitment means: - We think that a “functional DSv2” is an achievable goal for the Spark 3.0 r

Re: [VOTE] SPIP: Spark API for Table Metadata

2019-02-28 Thread Jamison Bennett

+1 (non-binding) Jamison Bennett Cloudera Software Engineer jamison.benn...@cloudera.com 515 Congress Ave, Suite 1212 | Austin, TX | 78701 On Thu, Feb 28, 2019 at 10:20 AM Ryan Blue wrote: > +1 (non-binding) > > On Wed, Feb 27, 2019 at 8:34 PM Russell Spitzer > wrote: > >> +1 (non-

Re: [VOTE] SPIP: Spark API for Table Metadata

2019-02-28 Thread Ryan Blue

+1 (non-binding) On Wed, Feb 27, 2019 at 8:34 PM Russell Spitzer wrote: > +1 (non-binding) > > On Wed, Feb 27, 2019, 6:28 PM Ryan Blue wrote: > >> Hi everyone, >> >> In the last DSv2 sync, the consensus was that the table metadata SPIP was >> ready to bring up for a vote. Now that the multi-cat

CombinePerKey and GroupByKey

2019-02-28 Thread Etienne Chauchot

Hi all, I'm migrating RDD pipelines to Dataset and I saw that Combine.PerKey is no more there in the Dataset API. So, I translated it to: KeyValueGroupedDataset> groupedDataset = keyedDataset.groupByKey(KVHelpers.extractKey(), EncoderHelpers.genericEncoder()); Dataset> combinedDataset =

Re: [VOTE] SPIP: Spark API for Table Metadata

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Structured Streaming - compare previous row value with current

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Re: CombinePerKey and GroupByKey

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Re: [VOTE] SPIP: Spark API for Table Metadata

Re: [VOTE] Functional DataSourceV2 in Spark 3.0

Re: [DISCUSS] Spark 3.0 and DataSourceV2

[VOTE] Functional DataSourceV2 in Spark 3.0

Re: [VOTE] SPIP: Spark API for Table Metadata

Re: [VOTE] SPIP: Spark API for Table Metadata

CombinePerKey and GroupByKey

21 matches

Site Navigation

Mail list logo

Footer information