Re: [DISCUSS] Flink 1.6 features

zhangminglei Sun, 17 Jun 2018 06:45:33 -0700

But if we do hive on flink , I think it should be a very big project.


> 在 2018年6月17日，下午9:36，Will Du <will...@gmail.com> 写道：
> 
> Agree, two missing pieces I think could make Flink more competitive against 
> Spark SQL/Stream and Kafka Stream
> 1. Flink over Hive or Flink SQL hive table source and sink
> 2. Flink ML on stream
> 
> 
>> On Jun 17, 2018, at 8:34 AM, zhangminglei <18717838...@163.com> wrote:
>> 
>> Actually, I have been an idea, how about support hive on flink ? Since lots 
>> of business are written by hive sql. And users wants to transform map reduce 
>> to fink without changing the sql.
>> 
>> Zhangminglei
>> 
>> 
>> 
>>> 在 2018年6月17日，下午8:11，zhangminglei <18717838...@163.com> 写道：
>>> 
>>> Hi, Sagar
>>> 
>>> There already has relative JIRAs for ORC and Parquet, you can take a look 
>>> here: 
>>> 
>>> https://issues.apache.org/jira/browse/FLINK-9407 
>>> <https://issues.apache.org/jira/browse/FLINK-9407> and 
>>> https://issues.apache.org/jira/browse/FLINK-9411 
>>> <https://issues.apache.org/jira/browse/FLINK-9411>
>>> 
>>> For ORC format, Currently only support basic data types, such as Long, 
>>> Boolean, Short, Integer, Float, Double, String. 
>>> 
>>> Best
>>> Zhangminglei
>>> 
>>> 
>>> 
>>>> 在 2018年6月17日，上午11:11，sagar loke <sagar...@gmail.com> 写道：
>>>> 
>>>> We are eagerly waiting for 
>>>> 
>>>> - Extends Streaming Sinks:
>>>>   - Bucketing Sink should support S3 properly (compensate for eventual 
>>>> consistency), work with Flink's shaded S3 file systems, and efficiently 
>>>> support formats that compress/index arcoss individual rows (Parquet, ORC, 
>>>> ...)
>>>> 
>>>> Especially for ORC and Parquet sinks. Since, We are planning to use 
>>>> Kafka-jdbc to move data from rdbms to hdfs. 
>>>> 
>>>> Thanks,
>>>> 
>>>> On Sat, Jun 16, 2018 at 5:08 PM Elias Levy <fearsome.lucid...@gmail.com 
>>>> <mailto:fearsome.lucid...@gmail.com>> wrote:
>>>> One more, since it we have to deal with it often:
>>>> 
>>>> - Idling sources (Kafka in particular) and proper watermark propagation: 
>>>> FLINK-5018 / FLINK-5479
>>>> 
>>>> On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy <fearsome.lucid...@gmail.com 
>>>> <mailto:fearsome.lucid...@gmail.com>> wrote:
>>>> Since wishes are free:
>>>> 
>>>> - Standalone cluster job isolation: 
>>>> https://issues.apache.org/jira/browse/FLINK-8886 
>>>> <https://issues.apache.org/jira/browse/FLINK-8886>
>>>> - Proper sliding window joins (not overlapping hoping window joins): 
>>>> https://issues.apache.org/jira/browse/FLINK-6243 
>>>> <https://issues.apache.org/jira/browse/FLINK-6243>
>>>> - Sharing state across operators: 
>>>> https://issues.apache.org/jira/browse/FLINK-6239 
>>>> <https://issues.apache.org/jira/browse/FLINK-6239>
>>>> - Synchronizing streams: https://issues.apache.org/jira/browse/FLINK-4558 
>>>> <https://issues.apache.org/jira/browse/FLINK-4558>
>>>> 
>>>> Seconded:
>>>> - Atomic cancel-with-savepoint: 
>>>> https://issues.apache.org/jira/browse/FLINK-7634 
>>>> <https://issues.apache.org/jira/browse/FLINK-7634>
>>>> - Support dynamically changing CEP patterns : 
>>>> https://issues.apache.org/jira/browse/FLINK-7129 
>>>> <https://issues.apache.org/jira/browse/FLINK-7129>
>>>> 
>>>> 
>>>> On Fri, Jun 8, 2018 at 1:31 PM, Stephan Ewen <se...@apache.org 
>>>> <mailto:se...@apache.org>> wrote:
>>>> Hi all!
>>>> 
>>>> Thanks for the discussion and good input. Many suggestions fit well with 
>>>> the proposal above.
>>>> 
>>>> Please bear in mind that with a time-based release model, we would release 
>>>> whatever is mature by end of July.
>>>> The good thing is we could schedule the next release not too far after 
>>>> that, so that the features that did not quite make it will not be delayed 
>>>> too long.
>>>> In some sense, you could read this as as "what to do first" list, rather 
>>>> than "this goes in, other things stay out".
>>>> 
>>>> Some thoughts on some of the suggestions
>>>> 
>>>> Kubernetes integration: An opaque integration with Kubernetes should be 
>>>> supported through the "as a library" mode. For a deeper integration, I 
>>>> know that some committers have experimented with some PoC code. I would 
>>>> let Till add some thoughts, he has worked the most on the deployment parts 
>>>> recently.
>>>> 
>>>> Per partition watermarks with idleness: Good point, could one implement 
>>>> that on the current interface, with a periodic watermark extractor?
>>>> 
>>>> Atomic cancel-with-savepoint: Agreed, this is important. Making this work 
>>>> with all sources needs a bit more work. We should have this in the roadmap.
>>>> 
>>>> Elastic Bloomfilters: This seems like an interesting new feature - the 
>>>> above suggested feature set was more about addressing some longer standing 
>>>> issues/requests. However, nothing should prevent contributors to work on 
>>>> that.
>>>> 
>>>> Best,
>>>> Stephan
>>>> 
>>>> 
>>>> On Wed, Jun 6, 2018 at 6:23 AM, Yan Zhou [FDS Science] <yz...@coupang.com 
>>>> <mailto:yz...@coupang.com>> wrote:
>>>> +1 on https://issues.apache.org/jira/browse/FLINK-5479 
>>>> <https://issues.apache.org/jira/browse/FLINK-5479>
>>>> [FLINK-5479] Per-partition watermarks in ... 
>>>> <https://issues.apache.org/jira/browse/FLINK-5479>
>>>> issues.apache.org <http://issues.apache.org/>
>>>> Reported in ML: 
>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html
>>>>  
>>>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html>
>>>>  It's normally not a common case to have Kafka partitions not producing 
>>>> any data, but it'll probably be good to handle this as well. I ...
>>>> 
>>>> From: Rico Bergmann <i...@ricobergmann.de <mailto:i...@ricobergmann.de>>
>>>> Sent: Tuesday, June 5, 2018 9:12:00 PM
>>>> To: Hao Sun
>>>> Cc: d...@flink.apache.org <mailto:d...@flink.apache.org>; user
>>>> Subject: Re: [DISCUSS] Flink 1.6 features
>>>> 
>>>> +1 on K8s integration 
>>>> 
>>>> 
>>>> 
>>>> Am 06.06.2018 um 00:01 schrieb Hao Sun <ha...@zendesk.com 
>>>> <mailto:ha...@zendesk.com>>:
>>>> 
>>>>> adding my vote to K8S Job mode, maybe it is this?
>>>>>> Smoothen the integration in Container environment, like "Flink as a 
>>>>>> Library", and easier integration with Kubernetes services and other 
>>>>>> proxies.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Jun 4, 2018 at 11:01 PM Ben Yan <yan.xiao.bin.m...@gmail.com 
>>>>> <mailto:yan.xiao.bin.m...@gmail.com>> wrote:
>>>>> Hi Stephan,
>>>>> 
>>>>> Will  [ https://issues.apache.org/jira/browse/FLINK-5479 
>>>>> <https://issues.apache.org/jira/browse/FLINK-5479> ]  (Per-partition 
>>>>> watermarks in FlinkKafkaConsumer should consider idle partitions) be 
>>>>> included in 1.6? As we are seeing more users with this issue on the 
>>>>> mailing lists.
>>>>> 
>>>>> Thanks.
>>>>> Ben
>>>>> 
>>>>> 2018-06-05 5:29 GMT+08:00 Che Lui Shum <sh...@us.ibm.com 
>>>>> <mailto:sh...@us.ibm.com>>:
>>>>> Hi Stephan,
>>>>> 
>>>>> Will FLINK-7129 (Support dynamically changing CEP patterns) be included 
>>>>> in 1.6? There were discussions about possibly including it in 1.6: 
>>>>> http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=ou7gru2o9jtowxn1lc1f7nkcxayn6a3e58kxctb4b50...@mail.gmail.com%3e
>>>>>  
>>>>> <http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=ou7gru2o9jtowxn1lc1f7nkcxayn6a3e58kxctb4b50...@mail.gmail.com%3e>
>>>>> 
>>>>> Thanks,
>>>>> Shirley Shum
>>>>> 
>>>>> Stephan Ewen ---06/04/2018 02:21:47 AM---Hi Flink Community! The release 
>>>>> of Apache Flink 1.5 has happened (yay!) - so it is a good time
>>>>> 
>>>>> From: Stephan Ewen <se...@apache.org <mailto:se...@apache.org>>
>>>>> To: d...@flink.apache.org <mailto:d...@flink.apache.org>, user 
>>>>> <user@flink.apache.org <mailto:user@flink.apache.org>>
>>>>> Date: 06/04/2018 02:21 AM
>>>>> Subject: [DISCUSS] Flink 1.6 features
>>>>> 
>>>>> 
>>>>> 
>>>>> Hi Flink Community!
>>>>> 
>>>>> The release of Apache Flink 1.5 has happened (yay!) - so it is a good 
>>>>> time to start talking about what to do for release 1.6.
>>>>> 
>>>>> == Suggested release timeline ==
>>>>> 
>>>>> I would propose to release around end of July (that is 8-9 weeks from 
>>>>> now).
>>>>> 
>>>>> The rational behind that: There was a lot of effort in release testing 
>>>>> automation (end-to-end tests, scripted stress tests) as part of release 
>>>>> 1.5. You may have noticed the big set of new modules under 
>>>>> "flink-end-to-end-tests" in the Flink repository. It delayed the 1.5 
>>>>> release a bit, and needs to continue as part of the coming release cycle, 
>>>>> but should help make releasing more lightweight from now on.
>>>>> 
>>>>> (Side note: There are also some nightly stress tests that we created and 
>>>>> run at data Artisans, and where we are looking whether and in which way 
>>>>> it would make sense to contribute them to Flink.)
>>>>> 
>>>>> == Features and focus areas ==
>>>>> 
>>>>> We had a lot of big and heavy features in Flink 1.5, with FLIP-6, the new 
>>>>> network stack, recovery, SQL joins and client, ... Following something 
>>>>> like a "tick-tock-model", I would suggest to focus the next release more 
>>>>> on integrations, tooling, and reducing user friction. 
>>>>> 
>>>>> Of course, this does not mean that no other pull request gets reviewed, 
>>>>> an no other topic will be examined - it is simply meant as a help to 
>>>>> understand where to expect more activity during the next release cycle. 
>>>>> Note that these are really the coarse focus areas - don't read this as a 
>>>>> comprehensive list.
>>>>> 
>>>>> This list is my first suggestion, based on discussions with committers, 
>>>>> users, and mailing list questions.
>>>>> 
>>>>> - Support Java 9 and Scala 2.12
>>>>> 
>>>>> - Smoothen the integration in Container environment, like "Flink as a 
>>>>> Library", and easier integration with Kubernetes services and other 
>>>>> proxies.
>>>>> 
>>>>> - Polish the remaing parts of the FLIP-6 rewrite
>>>>> 
>>>>> - Improve state backends with asynchronous timer snapshots, efficient 
>>>>> timer deletes, state TTL, and broadcast state support in RocksDB.
>>>>> 
>>>>> - Extends Streaming Sinks:
>>>>>   - Bucketing Sink should support S3 properly (compensate for eventual 
>>>>> consistency), work with Flink's shaded S3 file systems, and efficiently 
>>>>> support formats that compress/index arcoss individual rows (Parquet, ORC, 
>>>>> ...)
>>>>>   - Support ElasticSearch's new REST API
>>>>> 
>>>>> - Smoothen State Evolution to support type conversion on snapshot restore
>>>>> 
>>>>> - Enhance Stream SQL and CEP
>>>>>   - Add support for "update by key" Table Sources
>>>>>   - Add more table sources and sinks (Kafka, Kinesis, Files, K/V stores)
>>>>>   - Expand SQL client
>>>>>   - Integrate CEP and SQL, through MATCH_RECOGNIZE clause
>>>>>   - Improve CEP Performance of SharedBuffer on RocksDB
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Cheers,
>>>> Sagar
>>> 
>> 
>>

Re: [DISCUSS] Flink 1.6 features

Reply via email to