We will have to fix that before we declare dev2 is stable, because InternalRow is not a stable API. We don’t necessarily need to do it in 3.0.
On Tue, Feb 26, 2019 at 5:10 PM Matt Cheah <mch...@palantir.com> wrote: > Will that then require an API break down the line? Do we save that for > Spark 4? > > > > -Matt Cheah? > > > > *From: *Ryan Blue <rb...@netflix.com> > *Reply-To: *"rb...@netflix.com" <rb...@netflix.com> > *Date: *Tuesday, February 26, 2019 at 4:53 PM > *To: *Matt Cheah <mch...@palantir.com> > *Cc: *Sean Owen <sro...@apache.org>, Wenchen Fan <cloud0...@gmail.com>, > Xiao Li <lix...@databricks.com>, Matei Zaharia <matei.zaha...@gmail.com>, > Spark Dev List <dev@spark.apache.org> > *Subject: *Re: [DISCUSS] Spark 3.0 and DataSourceV2 > > > > That's a good question. > > > > While I'd love to have a solution for that, I don't think it is a good > idea to delay DSv2 until we have one. That is going to require a lot of > internal changes and I don't see how we could make the release date if we > are including an InternalRow replacement. > > > > On Tue, Feb 26, 2019 at 4:41 PM Matt Cheah <mch...@palantir.com> wrote: > > Reynold made a note earlier about a proper Row API that isn’t InternalRow > – is that still on the table? > > > > -Matt Cheah > > > > *From: *Ryan Blue <rb...@netflix.com> > *Reply-To: *"rb...@netflix.com" <rb...@netflix.com> > *Date: *Tuesday, February 26, 2019 at 4:40 PM > *To: *Matt Cheah <mch...@palantir.com> > *Cc: *Sean Owen <sro...@apache.org>, Wenchen Fan <cloud0...@gmail.com>, > Xiao Li <lix...@databricks.com>, Matei Zaharia <matei.zaha...@gmail.com>, > Spark Dev List <dev@spark.apache.org> > *Subject: *Re: [DISCUSS] Spark 3.0 and DataSourceV2 > > > > Thanks for bumping this, Matt. I think we can have the discussion here to > clarify exactly what we’re committing to and then have a vote thread once > we’re agreed. > > Getting back to the DSv2 discussion, I think we have a good handle on what > would be added: > > · Plugin system for catalogs > > · TableCatalog interface (I’ll start a vote thread for this SPIP > shortly) > > · TableCatalog implementation backed by SessionCatalog that can > load v2 tables > > · Resolution rule to load v2 tables using the new catalog > > · CTAS logical and physical plan nodes > > · Conversions from SQL parsed logical plans to v2 logical plans > > Initially, this will always use the v2 catalog backed by SessionCatalog to > avoid dependence on the multi-catalog work. All of those are already > implemented and working, so I think it is reasonable that we can get them > in. > > Then we can consider a few stretch goals: > > · Get in as much DDL as we can. I think create and drop table > should be easy. > > · Multi-catalog identifier parsing and multi-catalog support > > If we get those last two in, it would be great. We can make the call > closer to release time. Does anyone want to change this set of work? > > > > On Tue, Feb 26, 2019 at 4:23 PM Matt Cheah <mch...@palantir.com> wrote: > > What would then be the next steps we'd take to collectively decide on > plans and timelines moving forward? Might I suggest scheduling a conference > call with appropriate PMCs to put our ideas together? Maybe such a > discussion can take place at next week's meeting? Or do we need to have a > separate formalized voting thread which is guided by a PMC? > > My suggestion is to try to make concrete steps forward and to avoid > letting this slip through the cracks. > > I also think there would be merits to having a project plan and estimates > around how long each of the features we want to complete is going to take > to implement and review. > > -Matt Cheah > > On 2/24/19, 3:05 PM, "Sean Owen" <sro...@apache.org> wrote: > > Sure, I don't read anyone making these statements though? Let's assume > good intent, that "foo should happen" as "my opinion as a member of > the community, which is not solely up to me, is that foo should > happen". I understand it's possible for a person to make their opinion > over-weighted; this whole style of decision making assumes good actors > and doesn't optimize against bad ones. Not that it can't happen, just > not seeing it here. > > I have never seen any vote on a feature list, by a PMC or otherwise. > We can do that if really needed I guess. But that also isn't the > authoritative process in play here, in contrast. > > If there's not a more specific subtext or issue here, which is fine to > say (on private@ if it's sensitive or something), yes, let's move on > in good faith. > > On Sun, Feb 24, 2019 at 3:45 PM Mark Hamstra <m...@clearstorydata.com> > wrote: > > There is nothing wrong with individuals advocating for what they > think should or should not be in Spark 3.0, nor should anyone shy away from > explaining why they think delaying the release for some reason is or isn't > a good idea. What is a problem, or is at least something that I have a > problem with, are declarative, pseudo-authoritative statements that 3.0 (or > some other release) will or won't contain some feature, API, etc. or that > some issue is or is not blocker or worth delaying for. When the PMC has not > voted on such issues, I'm often left thinking, "Wait... what? Who decided > that, or where did that decision come from?" > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix >