No problem if we can't add them, this is experimental anyway so this release should be more about validating the API and the start of our implementation. I just don't think we can recommend that anyone actually use DataSourceV2 without these patches.
On Wed, Feb 21, 2018 at 9:21 AM, Wenchen Fan <cloud0...@gmail.com> wrote: > SPARK-23323 adds a new API, I'm not sure we can still do it at this stage > of the release... Besides users can work around it by calling the spark > output coordinator themselves in their data source. > > SPARK-23203 is non-trivial and didn't fix any known bugs, so it's hard to > convince other people that it's safe to add it to the release during the RC > phase. > > SPARK-23418 depends on the above one. > > Generally they are good to have in Spark 2.3, if they were merged before > the RC. I think this is a lesson we should learn from, that we should work > on stuff we want in the release before the RC, instead of after. > > On Thu, Feb 22, 2018 at 1:01 AM, Ryan Blue <rb...@netflix.com.invalid> > wrote: > >> What does everyone think about getting some of the newer DataSourceV2 >> improvements in? It should be low risk because it is a new code path, and >> v2 isn't very usable without things like support for using the output >> commit coordinator to deconflict writes. >> >> The ones I'd like to get in are: >> * Use the output commit coordinator: https://issues.ap >> ache.org/jira/browse/SPARK-23323 >> * Use immutable trees and the same push-down logic as other read paths: >> https://issues.apache.org/jira/browse/SPARK-23203 >> * Don't allow users to supply schemas when they aren't supported: >> https://issues.apache.org/jira/browse/SPARK-23418 >> >> I think it would make the 2.3.0 release more usable for anyone interested >> in the v2 read and write paths. >> >> Thanks! >> >> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu <weichen...@databricks.com> >> wrote: >> >>> +1 >>> >>> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin <van...@cloudera.com> >>> wrote: >>> >>>> Done, thanks! >>>> >>>> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal <samee...@apache.org> >>>> wrote: >>>> > Sure, please feel free to backport. >>>> > >>>> > On 20 February 2018 at 18:02, Marcelo Vanzin <van...@cloudera.com> >>>> wrote: >>>> >> >>>> >> Hey Sameer, >>>> >> >>>> >> Mind including https://github.com/apache/spark/pull/20643 >>>> >> (SPARK-23468) in the new RC? It's a minor bug since I've only hit it >>>> >> with older shuffle services, but it's pretty safe. >>>> >> >>>> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal <samee...@apache.org >>>> > >>>> >> wrote: >>>> >> > This RC has failed due to >>>> >> > https://issues.apache.org/jira/browse/SPARK-23470. >>>> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!), I'll >>>> follow >>>> >> > up >>>> >> > with an RC5 soon. >>>> >> > >>>> >> > On 20 February 2018 at 16:49, Ryan Blue <rb...@netflix.com> wrote: >>>> >> >> >>>> >> >> +1 >>>> >> >> >>>> >> >> Build & tests look fine, checked signature and checksums for src >>>> >> >> tarball. >>>> >> >> >>>> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan) Zhu >>>> >> >> <shixi...@databricks.com> wrote: >>>> >> >>> >>>> >> >>> I'm -1 because of the UI regression >>>> >> >>> https://issues.apache.org/jira/browse/SPARK-23470 : the All >>>> Jobs page >>>> >> >>> may be >>>> >> >>> too slow and cause "read timeout" when there are lots of jobs and >>>> >> >>> stages. >>>> >> >>> This is one of the most important pages because when it's >>>> broken, it's >>>> >> >>> pretty hard to use Spark Web UI. >>>> >> >>> >>>> >> >>> >>>> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido < >>>> marcogaid...@gmail.com> >>>> >> >>> wrote: >>>> >> >>>> >>>> >> >>>> +1 >>>> >> >>>> >>>> >> >>>> 2018-02-20 12:30 GMT+01:00 Hyukjin Kwon <gurwls...@gmail.com>: >>>> >> >>>>> >>>> >> >>>>> +1 too >>>> >> >>>>> >>>> >> >>>>> 2018-02-20 14:41 GMT+09:00 Takuya UESHIN < >>>> ues...@happy-camper.st>: >>>> >> >>>>>> >>>> >> >>>>>> +1 >>>> >> >>>>>> >>>> >> >>>>>> >>>> >> >>>>>> On Tue, Feb 20, 2018 at 2:14 PM, Xingbo Jiang >>>> >> >>>>>> <jiangxb1...@gmail.com> >>>> >> >>>>>> wrote: >>>> >> >>>>>>> >>>> >> >>>>>>> +1 >>>> >> >>>>>>> >>>> >> >>>>>>> >>>> >> >>>>>>> Wenchen Fan <cloud0...@gmail.com>于2018年2月20日 周二下午1:09写道: >>>> >> >>>>>>>> >>>> >> >>>>>>>> +1 >>>> >> >>>>>>>> >>>> >> >>>>>>>> On Tue, Feb 20, 2018 at 12:53 PM, Reynold Xin >>>> >> >>>>>>>> <r...@databricks.com> >>>> >> >>>>>>>> wrote: >>>> >> >>>>>>>>> >>>> >> >>>>>>>>> +1 >>>> >> >>>>>>>>> >>>> >> >>>>>>>>> On Feb 20, 2018, 5:51 PM +1300, Sameer Agarwal >>>> >> >>>>>>>>> <sameer.a...@gmail.com>, wrote: >>>> >> >>>>>>>>>> >>>> >> >>>>>>>>>> this file shouldn't be included? >>>> >> >>>>>>>>>> >>>> >> >>>>>>>>>> https://dist.apache.org/repos/ >>>> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml >>>> >> >>>>>>>>> >>>> >> >>>>>>>>> >>>> >> >>>>>>>>> I've now deleted this file >>>> >> >>>>>>>>> >>>> >> >>>>>>>>>> From: Sameer Agarwal <sameer.a...@gmail.com> >>>> >> >>>>>>>>>> Sent: Saturday, February 17, 2018 1:43:39 PM >>>> >> >>>>>>>>>> To: Sameer Agarwal >>>> >> >>>>>>>>>> Cc: dev >>>> >> >>>>>>>>>> Subject: Re: [VOTE] Spark 2.3.0 (RC4) >>>> >> >>>>>>>>>> >>>> >> >>>>>>>>>> I'll start with a +1 once again. >>>> >> >>>>>>>>>> >>>> >> >>>>>>>>>> All blockers reported against RC3 have been resolved and >>>> the >>>> >> >>>>>>>>>> builds are healthy. >>>> >> >>>>>>>>>> >>>> >> >>>>>>>>>> On 17 February 2018 at 13:41, Sameer Agarwal >>>> >> >>>>>>>>>> <samee...@apache.org> >>>> >> >>>>>>>>>> wrote: >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> Please vote on releasing the following candidate as >>>> Apache >>>> >> >>>>>>>>>>> Spark >>>> >> >>>>>>>>>>> version 2.3.0. The vote is open until Thursday February >>>> 22, >>>> >> >>>>>>>>>>> 2018 at 8:00:00 >>>> >> >>>>>>>>>>> am UTC and passes if a majority of at least 3 PMC +1 >>>> votes are >>>> >> >>>>>>>>>>> cast. >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> [ ] +1 Release this package as Apache Spark 2.3.0 >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> [ ] -1 Do not release this package because ... >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> To learn more about Apache Spark, please see >>>> >> >>>>>>>>>>> https://spark.apache.org/ >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> The tag to be voted on is v2.3.0-rc4: >>>> >> >>>>>>>>>>> https://github.com/apache/spark/tree/v2.3.0-rc4 >>>> >> >>>>>>>>>>> (44095cb65500739695b0324c177c19dfa1471472) >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> List of JIRA tickets resolved in this release can be >>>> found >>>> >> >>>>>>>>>>> here: >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> https://issues.apache.org/jira >>>> /projects/SPARK/versions/12339551 >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> The release files, including signatures, digests, etc. >>>> can be >>>> >> >>>>>>>>>>> found at: >>>> >> >>>>>>>>>>> https://dist.apache.org/repos/ >>>> dist/dev/spark/v2.3.0-rc4-bin/ >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> Release artifacts are signed with the following key: >>>> >> >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> The staging repository for this release can be found at: >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> https://repository.apache.org/ >>>> content/repositories/orgapachespark-1265/ >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> The documentation corresponding to this release can be >>>> found >>>> >> >>>>>>>>>>> at: >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> https://dist.apache.org/repos/ >>>> dist/dev/spark/v2.3.0-rc4-docs/_site/index.html >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> FAQ >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> ======================================= >>>> >> >>>>>>>>>>> What are the unresolved issues targeted for 2.3.0? >>>> >> >>>>>>>>>>> ======================================= >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> Please see https://s.apache.org/oXKi. At the time of >>>> writing, >>>> >> >>>>>>>>>>> there are currently no known release blockers. >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> ========================= >>>> >> >>>>>>>>>>> How can I help test this release? >>>> >> >>>>>>>>>>> ========================= >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> If you are a Spark user, you can help us test this >>>> release by >>>> >> >>>>>>>>>>> taking an existing Spark workload and running on this >>>> release >>>> >> >>>>>>>>>>> candidate, >>>> >> >>>>>>>>>>> then reporting any regressions. >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> If you're working in PySpark you can set up a virtual >>>> env and >>>> >> >>>>>>>>>>> install the current RC and see if anything important >>>> breaks, >>>> >> >>>>>>>>>>> in the >>>> >> >>>>>>>>>>> Java/Scala you can add the staging repository to your >>>> projects >>>> >> >>>>>>>>>>> resolvers and >>>> >> >>>>>>>>>>> test with the RC (make sure to clean up the artifact >>>> cache >>>> >> >>>>>>>>>>> before/after so >>>> >> >>>>>>>>>>> you don't end up building with a out of date RC going >>>> >> >>>>>>>>>>> forward). >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> =========================================== >>>> >> >>>>>>>>>>> What should happen to JIRA tickets still targeting 2.3.0? >>>> >> >>>>>>>>>>> =========================================== >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> Committers should look at those and triage. Extremely >>>> >> >>>>>>>>>>> important >>>> >> >>>>>>>>>>> bug fixes, documentation, and API tweaks that impact >>>> >> >>>>>>>>>>> compatibility should be >>>> >> >>>>>>>>>>> worked on immediately. Everything else please retarget to >>>> >> >>>>>>>>>>> 2.3.1 or 2.4.0 as >>>> >> >>>>>>>>>>> appropriate. >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> =================== >>>> >> >>>>>>>>>>> Why is my bug not fixed? >>>> >> >>>>>>>>>>> =================== >>>> >> >>>>>>>>>>> >>>> >> >>>>>>>>>>> In order to make timely releases, we will typically not >>>> hold >>>> >> >>>>>>>>>>> the >>>> >> >>>>>>>>>>> release unless the bug in question is a regression from >>>> 2.2.0. >>>> >> >>>>>>>>>>> That being >>>> >> >>>>>>>>>>> said, if there is something which is a regression from >>>> 2.2.0 >>>> >> >>>>>>>>>>> and has not >>>> >> >>>>>>>>>>> been correctly targeted please ping me or a committer to >>>> help >>>> >> >>>>>>>>>>> target the >>>> >> >>>>>>>>>>> issue (you can see the open issues listed as impacting >>>> Spark >>>> >> >>>>>>>>>>> 2.3.0 at >>>> >> >>>>>>>>>>> https://s.apache.org/WmoI). >>>> >> >>>>>>>>>> >>>> >> >>>>>>>>>> >>>> >> >>>>>>>>>> >>>> >> >>>>>>>>>> >>>> >> >>>>>>>>>> -- >>>> >> >>>>>>>>>> Sameer Agarwal >>>> >> >>>>>>>>>> Computer Science | UC Berkeley >>>> >> >>>>>>>>>> http://cs.berkeley.edu/~sameerag >>>> >> >>>>>>>>> >>>> >> >>>>>>>>> >>>> >> >>>>>>>>> >>>> >> >>>>>>>>> >>>> >> >>>>>>>>> -- >>>> >> >>>>>>>>> Sameer Agarwal >>>> >> >>>>>>>>> Computer Science | UC Berkeley >>>> >> >>>>>>>>> http://cs.berkeley.edu/~sameerag >>>> >> >>>>>>>> >>>> >> >>>>>>>> >>>> >> >>>>>> >>>> >> >>>>>> >>>> >> >>>>>> >>>> >> >>>>>> -- >>>> >> >>>>>> Takuya UESHIN >>>> >> >>>>>> Tokyo, Japan >>>> >> >>>>>> >>>> >> >>>>>> http://twitter.com/ueshin >>>> >> >>>>> >>>> >> >>>>> >>>> >> >>>> >>>> >> >>> >>>> >> >> >>>> >> >> >>>> >> >> >>>> >> >> -- >>>> >> >> Ryan Blue >>>> >> >> Software Engineer >>>> >> >> Netflix >>>> >> > >>>> >> > >>>> >> >>>> >> >>>> >> >>>> >> -- >>>> >> Marcelo >>>> > >>>> > >>>> >>>> >>>> >>>> -- >>>> Marcelo >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>> >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > > -- Ryan Blue Software Engineer Netflix