Re: [DISCUSS] Flink 1.6 features

sagar loke Sat, 16 Jun 2018 20:12:28 -0700

We are eagerly waiting for

  - Extends Streaming Sinks:
     - Bucketing Sink should support S3 properly (compensate for eventual
consistency), work with Flink's shaded S3 file systems, and efficiently
support formats that compress/index arcoss individual rows (Parquet, ORC,
...)


Especially for ORC and Parquet sinks. Since, We are planning to use
Kafka-jdbc to move data from rdbms to hdfs.

Thanks,

On Sat, Jun 16, 2018 at 5:08 PM Elias Levy <fearsome.lucid...@gmail.com>
wrote:

> One more, since it we have to deal with it often:
>
> - Idling sources (Kafka in particular) and proper watermark propagation:
> FLINK-5018 / FLINK-5479
>
> On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy <fearsome.lucid...@gmail.com>
> wrote:
>
>> Since wishes are free:
>>
>> - Standalone cluster job isolation:
>> https://issues.apache.org/jira/browse/FLINK-8886
>> - Proper sliding window joins (not overlapping hoping window joins):
>> https://issues.apache.org/jira/browse/FLINK-6243
>> - Sharing state across operators:
>> https://issues.apache.org/jira/browse/FLINK-6239
>> - Synchronizing streams: https://issues.apache.org/jira/browse/FLINK-4558
>>
>> Seconded:
>> - Atomic cancel-with-savepoint:
>> https://issues.apache.org/jira/browse/FLINK-7634
>> - Support dynamically changing CEP patterns :
>> https://issues.apache.org/jira/browse/FLINK-7129
>>
>>
>> On Fri, Jun 8, 2018 at 1:31 PM, Stephan Ewen <se...@apache.org> wrote:
>>
>>> Hi all!
>>>
>>> Thanks for the discussion and good input. Many suggestions fit well with
>>> the proposal above.
>>>
>>> Please bear in mind that with a time-based release model, we would
>>> release whatever is mature by end of July.
>>> The good thing is we could schedule the next release not too far after
>>> that, so that the features that did not quite make it will not be delayed
>>> too long.
>>> In some sense, you could read this as as "*what to do first*" list,
>>> rather than "*this goes in, other things stay out"*.
>>>
>>> Some thoughts on some of the suggestions
>>>
>>> *Kubernetes integration:* An opaque integration with Kubernetes should
>>> be supported through the "as a library" mode. For a deeper integration, I
>>> know that some committers have experimented with some PoC code. I would let
>>> Till add some thoughts, he has worked the most on the deployment parts
>>> recently.
>>>
>>> *Per partition watermarks with idleness:* Good point, could one
>>> implement that on the current interface, with a periodic watermark
>>> extractor?
>>>
>>> *Atomic cancel-with-savepoint:* Agreed, this is important. Making this
>>> work with all sources needs a bit more work. We should have this in the
>>> roadmap.
>>>
>>> *Elastic Bloomfilters:* This seems like an interesting new feature -
>>> the above suggested feature set was more about addressing some longer
>>> standing issues/requests. However, nothing should prevent contributors to
>>> work on that.
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>> On Wed, Jun 6, 2018 at 6:23 AM, Yan Zhou [FDS Science] <
>>> yz...@coupang.com> wrote:
>>>
>>>> +1 on https://issues.apache.org/jira/browse/FLINK-5479
>>>> [FLINK-5479] Per-partition watermarks in ...
>>>> <https://issues.apache.org/jira/browse/FLINK-5479>
>>>> issues.apache.org
>>>> Reported in ML:
>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html
>>>> It's normally not a common case to have Kafka partitions not producing any
>>>> data, but it'll probably be good to handle this as well. I ...
>>>>
>>>> ------------------------------
>>>> *From:* Rico Bergmann <i...@ricobergmann.de>
>>>> *Sent:* Tuesday, June 5, 2018 9:12:00 PM
>>>> *To:* Hao Sun
>>>> *Cc:* d...@flink.apache.org; user
>>>> *Subject:* Re: [DISCUSS] Flink 1.6 features
>>>>
>>>> +1 on K8s integration
>>>>
>>>>
>>>>
>>>> Am 06.06.2018 um 00:01 schrieb Hao Sun <ha...@zendesk.com>:
>>>>
>>>> adding my vote to K8S Job mode, maybe it is this?
>>>> > Smoothen the integration in Container environment, like "Flink as a
>>>> Library", and easier integration with Kubernetes services and other 
>>>> proxies.
>>>>
>>>>
>>>>
>>>> On Mon, Jun 4, 2018 at 11:01 PM Ben Yan <yan.xiao.bin.m...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Stephan,
>>>>
>>>> Will  [ https://issues.apache.org/jira/browse/FLINK-5479 ]
>>>> (Per-partition watermarks in FlinkKafkaConsumer should consider idle
>>>> partitions) be included in 1.6? As we are seeing more users with this
>>>> issue on the mailing lists.
>>>>
>>>> Thanks.
>>>> Ben
>>>>
>>>> 2018-06-05 5:29 GMT+08:00 Che Lui Shum <sh...@us.ibm.com>:
>>>>
>>>> Hi Stephan,
>>>>
>>>> Will FLINK-7129 (Support dynamically changing CEP patterns) be included
>>>> in 1.6? There were discussions about possibly including it in 1.6:
>>>>
>>>> http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=ou7gru2o9jtowxn1lc1f7nkcxayn6a3e58kxctb4b50...@mail.gmail.com%3e
>>>>
>>>> Thanks,
>>>> Shirley Shum
>>>>
>>>> [image: Inactive hide details for Stephan Ewen ---06/04/2018 02:21:47
>>>> AM---Hi Flink Community! The release of Apache Flink 1.5 has happ]Stephan
>>>> Ewen ---06/04/2018 02:21:47 AM---Hi Flink Community! The release of Apache
>>>> Flink 1.5 has happened (yay!) - so it is a good time
>>>>
>>>> From: Stephan Ewen <se...@apache.org>
>>>> To: d...@flink.apache.org, user <user@flink.apache.org>
>>>> Date: 06/04/2018 02:21 AM
>>>> Subject: [DISCUSS] Flink 1.6 features
>>>> ------------------------------
>>>>
>>>>
>>>>
>>>> Hi Flink Community!
>>>>
>>>> The release of Apache Flink 1.5 has happened (yay!) - so it is a good
>>>> time to start talking about what to do for release 1.6.
>>>>
>>>> *== Suggested release timeline ==*
>>>>
>>>> I would propose to release around *end of July* (that is 8-9 weeks
>>>> from now).
>>>>
>>>> The rational behind that: There was a lot of effort in release testing
>>>> automation (end-to-end tests, scripted stress tests) as part of release
>>>> 1.5. You may have noticed the big set of new modules under
>>>> "flink-end-to-end-tests" in the Flink repository. It delayed the 1.5
>>>> release a bit, and needs to continue as part of the coming release cycle,
>>>> but should help make releasing more lightweight from now on.
>>>>
>>>> (Side note: There are also some nightly stress tests that we created
>>>> and run at data Artisans, and where we are looking whether and in which way
>>>> it would make sense to contribute them to Flink.)
>>>>
>>>> *== Features and focus areas ==*
>>>>
>>>> We had a lot of big and heavy features in Flink 1.5, with FLIP-6, the
>>>> new network stack, recovery, SQL joins and client, ... Following something
>>>> like a "tick-tock-model", I would suggest to focus the next release more on
>>>> integrations, tooling, and reducing user friction.
>>>>
>>>> Of course, this does not mean that no other pull request gets reviewed,
>>>> an no other topic will be examined - it is simply meant as a help to
>>>> understand where to expect more activity during the next release cycle.
>>>> Note that these are really the coarse focus areas - don't read this as a
>>>> comprehensive list.
>>>>
>>>> This list is my first suggestion, based on discussions with committers,
>>>> users, and mailing list questions.
>>>>
>>>>   - Support Java 9 and Scala 2.12
>>>>
>>>>   - Smoothen the integration in Container environment, like "Flink as a
>>>> Library", and easier integration with Kubernetes services and other 
>>>> proxies.
>>>>
>>>>   - Polish the remaing parts of the FLIP-6 rewrite
>>>>
>>>>   - Improve state backends with asynchronous timer snapshots, efficient
>>>> timer deletes, state TTL, and broadcast state support in RocksDB.
>>>>
>>>>   - Extends Streaming Sinks:
>>>>      - Bucketing Sink should support S3 properly (compensate for
>>>> eventual consistency), work with Flink's shaded S3 file systems, and
>>>> efficiently support formats that compress/index arcoss individual rows
>>>> (Parquet, ORC, ...)
>>>>      - Support ElasticSearch's new REST API
>>>>
>>>>   - Smoothen State Evolution to support type conversion on snapshot
>>>> restore
>>>>
>>>>   - Enhance Stream SQL and CEP
>>>>      - Add support for "update by key" Table Sources
>>>>      - Add more table sources and sinks (Kafka, Kinesis, Files, K/V
>>>> stores)
>>>>      - Expand SQL client
>>>>      - Integrate CEP and SQL, through MATCH_RECOGNIZE clause
>>>>      - Improve CEP Performance of SharedBuffer on RocksDB
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
> --
Cheers,
Sagar

Re: [DISCUSS] Flink 1.6 features

Reply via email to