Barrier mode seems like a high impact feature on Spark's core code: is one additional week enough time to properly vet this feature?
On Tue, Jul 31, 2018 at 7:10 AM, Joseph Torres <joseph.tor...@databricks.com > wrote: > Full continuous processing aggregation support ran into unanticipated > scalability and scheduling problems. We’re planning to overcome those by > using some of the barrier execution machinery, but since barrier execution > itself is still in progress the full support isn’t going to make it into > 2.4. > > Jose > > On Tue, Jul 31, 2018 at 6:07 AM Tomasz Gawęda <tomasz.gaw...@outlook.com> > wrote: > >> Hi, >> >> what is the status of Continuous Processing + Aggregations? As far as I >> remember, Jose Torres said it should be easy to perform aggregations if >> coalesce(1) work. IIRC it's already merged to master. >> >> Is this work in progress? If yes, it would be great to have full >> aggregation/join support in Spark 2.4 in CP. >> >> Pozdrawiam / Best regards, >> >> Tomek >> >> >> On 2018-07-31 10:43, Petar Zečević wrote: >> > This one is important to us: https://issues.apache.org/ >> jira/browse/SPARK-24020 (Sort-merge join inner range optimization) but I >> think it could be useful to others too. >> > >> > It is finished and is ready to be merged (was ready a month ago at >> least). >> > >> > Do you think you could consider including it in 2.4? >> > >> > Petar >> > >> > >> > Wenchen Fan @ 1970-01-01 01:00 CET: >> > >> >> I went through the open JIRA tickets and here is a list that we should >> consider for Spark 2.4: >> >> >> >> High Priority: >> >> SPARK-24374: Support Barrier Execution Mode in Apache Spark >> >> This one is critical to the Spark ecosystem for deep learning. It only >> has a few remaining works and I think we should have it in Spark 2.4. >> >> >> >> Middle Priority: >> >> SPARK-23899: Built-in SQL Function Improvement >> >> We've already added a lot of built-in functions in this release, but >> there are a few useful higher-order functions in progress, like >> `array_except`, `transform`, etc. It would be great if we can get them in >> Spark 2.4. >> >> >> >> SPARK-14220: Build and test Spark against Scala 2.12 >> >> Very close to finishing, great to have it in Spark 2.4. >> >> >> >> SPARK-4502: Spark SQL reads unnecessary nested fields from Parquet >> >> This one is there for years (thanks for your patience Michael!), and >> is also close to finishing. Great to have it in 2.4. >> >> >> >> SPARK-24882: data source v2 API improvement >> >> This is to improve the data source v2 API based on what we learned >> during this release. From the migration of existing sources and design of >> new features, we found some problems in the API and want to address them. I >> believe this should be >> >> the last significant API change to data source v2, so great to have in >> Spark 2.4. I'll send a discuss email about it later. >> >> >> >> SPARK-24252: Add catalog support in Data Source V2 >> >> This is a very important feature for data source v2, and is currently >> being discussed in the dev list. >> >> >> >> SPARK-24768: Have a built-in AVRO data source implementation >> >> Most of it is done, but date/timestamp support is still missing. Great >> to have in 2.4. >> >> >> >> SPARK-23243: Shuffle+Repartition on an RDD could lead to incorrect >> answers >> >> This is a long-standing correctness bug, great to have in 2.4. >> >> >> >> There are some other important features like the adaptive execution, >> streaming SQL, etc., not in the list, since I think we are not able to >> finish them before 2.4. >> >> >> >> Feel free to add more things if you think they are important to Spark >> 2.4 by replying to this email. >> >> >> >> Thanks, >> >> Wenchen >> >> >> >> On Mon, Jul 30, 2018 at 11:00 PM Sean Owen <sro...@apache.org> wrote: >> >> >> >> In theory releases happen on a time-based cadence, so it's pretty >> much wrap up what's ready by the code freeze and ship it. In practice, the >> cadence slips frequently, and it's very much a negotiation about what >> features should push the >> >> code freeze out a few weeks every time. So, kind of a hybrid >> approach here that works OK. >> >> >> >> Certainly speak up if you think there's something that really needs >> to get into 2.4. This is that discuss thread. >> >> >> >> (BTW I updated the page you mention just yesterday, to reflect the >> plan suggested in this thread.) >> >> >> >> On Mon, Jul 30, 2018 at 9:51 AM Tom Graves >> <tgraves...@yahoo.com.invalid> wrote: >> >> >> >> Shouldn't this be a discuss thread? >> >> >> >> I'm also happy to see more release managers and agree the time is >> getting close, but we should see what features are in progress and see how >> close things are and propose a date based on that. Cutting a branch to >> soon just creates >> >> more work for committers to push to more branches. >> >> >> >> http://spark.apache.org/versioning-policy.html mentioned the code >> freeze and release branch cut mid-august. >> >> >> >> Tom >> > >> > --------------------------------------------------------------------- >> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> > >> >>