Re: code freeze and branch cut for Apache Spark 2.4

Erik Erlandson Tue, 31 Jul 2018 11:54:37 -0700

Barrier mode seems like a high impact feature on Spark's core code: is one
additional week enough time to properly vet this feature?


On Tue, Jul 31, 2018 at 7:10 AM, Joseph Torres <[email protected]
> wrote:

> Full continuous processing aggregation support ran into unanticipated
> scalability and scheduling problems. We’re planning to overcome those by
> using some of the barrier execution machinery, but since barrier execution
> itself is still in progress the full support isn’t going to make it into
> 2.4.
>
> Jose
>
> On Tue, Jul 31, 2018 at 6:07 AM Tomasz Gawęda <[email protected]>
> wrote:
>
>> Hi,
>>
>> what is the status of Continuous Processing + Aggregations? As far as I
>> remember, Jose Torres said it should  be easy to perform aggregations if
>> coalesce(1) work. IIRC it's already merged to master.
>>
>> Is this work in progress? If yes, it would be great to have full
>> aggregation/join support in Spark 2.4 in CP.
>>
>> Pozdrawiam / Best regards,
>>
>> Tomek
>>
>>
>> On 2018-07-31 10:43, Petar Zečević wrote:
>> > This one is important to us: https://issues.apache.org/
>> jira/browse/SPARK-24020 (Sort-merge join inner range optimization) but I
>> think it could be useful to others too.
>> >
>> > It is finished and is ready to be merged (was ready a month ago at
>> least).
>> >
>> > Do you think you could consider including it in 2.4?
>> >
>> > Petar
>> >
>> >
>> > Wenchen Fan @ 1970-01-01 01:00 CET:
>> >
>> >> I went through the open JIRA tickets and here is a list that we should
>> consider for Spark 2.4:
>> >>
>> >> High Priority:
>> >> SPARK-24374: Support Barrier Execution Mode in Apache Spark
>> >> This one is critical to the Spark ecosystem for deep learning. It only
>> has a few remaining works and I think we should have it in Spark 2.4.
>> >>
>> >> Middle Priority:
>> >> SPARK-23899: Built-in SQL Function Improvement
>> >> We've already added a lot of built-in functions in this release, but
>> there are a few useful higher-order functions in progress, like
>> `array_except`, `transform`, etc. It would be great if we can get them in
>> Spark 2.4.
>> >>
>> >> SPARK-14220: Build and test Spark against Scala 2.12
>> >> Very close to finishing, great to have it in Spark 2.4.
>> >>
>> >> SPARK-4502: Spark SQL reads unnecessary nested fields from Parquet
>> >> This one is there for years (thanks for your patience Michael!), and
>> is also close to finishing. Great to have it in 2.4.
>> >>
>> >> SPARK-24882: data source v2 API improvement
>> >> This is to improve the data source v2 API based on what we learned
>> during this release. From the migration of existing sources and design of
>> new features, we found some problems in the API and want to address them. I
>> believe this should be
>> >> the last significant API change to data source v2, so great to have in
>> Spark 2.4. I'll send a discuss email about it later.
>> >>
>> >> SPARK-24252: Add catalog support in Data Source V2
>> >> This is a very important feature for data source v2, and is currently
>> being discussed in the dev list.
>> >>
>> >> SPARK-24768: Have a built-in AVRO data source implementation
>> >> Most of it is done, but date/timestamp support is still missing. Great
>> to have in 2.4.
>> >>
>> >> SPARK-23243: Shuffle+Repartition on an RDD could lead to incorrect
>> answers
>> >> This is a long-standing correctness bug, great to have in 2.4.
>> >>
>> >> There are some other important features like the adaptive execution,
>> streaming SQL, etc., not in the list, since I think we are not able to
>> finish them before 2.4.
>> >>
>> >> Feel free to add more things if you think they are important to Spark
>> 2.4 by replying to this email.
>> >>
>> >> Thanks,
>> >> Wenchen
>> >>
>> >> On Mon, Jul 30, 2018 at 11:00 PM Sean Owen <[email protected]> wrote:
>> >>
>> >>   In theory releases happen on a time-based cadence, so it's pretty
>> much wrap up what's ready by the code freeze and ship it. In practice, the
>> cadence slips frequently, and it's very much a negotiation about what
>> features should push the
>> >>   code freeze out a few weeks every time. So, kind of a hybrid
>> approach here that works OK.
>> >>
>> >>   Certainly speak up if you think there's something that really needs
>> to get into 2.4. This is that discuss thread.
>> >>
>> >>   (BTW I updated the page you mention just yesterday, to reflect the
>> plan suggested in this thread.)
>> >>
>> >>   On Mon, Jul 30, 2018 at 9:51 AM Tom Graves
>> <[email protected]> wrote:
>> >>
>> >>   Shouldn't this be a discuss thread?
>> >>
>> >>   I'm also happy to see more release managers and agree the time is
>> getting close, but we should see what features are in progress and see how
>> close things are and propose a date based on that.  Cutting a branch to
>> soon just creates
>> >>   more work for committers to push to more branches.
>> >>
>> >>    http://spark.apache.org/versioning-policy.html mentioned the code
>> freeze and release branch cut mid-august.
>> >>
>> >>   Tom
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: [email protected]
>> >
>>
>>

Re: code freeze and branch cut for Apache Spark 2.4

Reply via email to