Unbalanced processing of KeyedStream

2019-01-02 Thread Jozef Vilcek
Hello, I am facing a problem where KeyedStream is purely parallelised on workers for case where number of keys is close to parallelism. Some workers process zero keys, some more than one. This is because of `KeyGroupRangeAssignment.assignKeyToParallelOperator()` in `KeyGroupStreamPartitioner` as

Re: subscribe to flink dev maillist

2019-01-02 Thread Chesnay Schepler
Do subscribe to the dev mailing list, please send a mail to dev-subscr...@flink.apache.org On 02.01.2019 06:14, Zhang Shaoquan wrote: subscribe to flink dev maillist

Re: Apply for flink contributor permission

2019-01-02 Thread Chesnay Schepler
I've given you contributor permissions. On 02.01.2019 05:37, Haibo Sun wrote: Hi guys, Could anyone kindly give me contributor permission? My JIRA username is sunhaibotb. Thanks, Haibo

[jira] [Created] (FLINK-11248) Support Row/CRow state schema evolution

2019-01-02 Thread boshu Zheng (JIRA)
boshu Zheng created FLINK-11248: --- Summary: Support Row/CRow state schema evolution Key: FLINK-11248 URL: https://issues.apache.org/jira/browse/FLINK-11248 Project: Flink Issue Type: Sub-task

[jira] [Created] (FLINK-11249) FlinkKafkaProducer011 can not be migrated to FlinkKafkaProducer

2019-01-02 Thread Piotr Nowojski (JIRA)
Piotr Nowojski created FLINK-11249: -- Summary: FlinkKafkaProducer011 can not be migrated to FlinkKafkaProducer Key: FLINK-11249 URL: https://issues.apache.org/jira/browse/FLINK-11249 Project: Flink

[jira] [Created] (FLINK-11250) fix thread lack when StreamTask switched from DEPLOYING to CANCELING

2019-01-02 Thread lamber-ken (JIRA)
lamber-ken created FLINK-11250: -- Summary: fix thread lack when StreamTask switched from DEPLOYING to CANCELING Key: FLINK-11250 URL: https://issues.apache.org/jira/browse/FLINK-11250 Project: Flink

[jira] [Created] (FLINK-11251) Incompatible metric name on prometheus reporter

2019-01-02 Thread Wei-Che Wei (JIRA)
Wei-Che Wei created FLINK-11251: --- Summary: Incompatible metric name on prometheus reporter Key: FLINK-11251 URL: https://issues.apache.org/jira/browse/FLINK-11251 Project: Flink Issue Type: Bug

Re: [DISCUSSION] Complete restart after successive failures

2019-01-02 Thread Piotr Nowojski
Hi Gyula, Personally I do not see a problem with providing such an option of “clean restart” after N failures, especially if we set the default value for N to +infinity. However guys working more with Flink’s scheduling systems might have more to say about this. Piotrek > On 29 Dec 2018, at 1

[jira] [Created] (FLINK-11252) Download page contains irrelevant "Scala 2.11" column

2019-01-02 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-11252: Summary: Download page contains irrelevant "Scala 2.11" column Key: FLINK-11252 URL: https://issues.apache.org/jira/browse/FLINK-11252 Project: Flink

Re: Unbalanced processing of KeyedStream

2019-01-02 Thread Ken Krugler
FWIW, if you want exactly one record per operator, then this code should generate key values that will be partitioned properly.

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2019-01-02 Thread Zhang, Xuefu
Hi Eron, Happy New Year! Thank you very much for your contribution, especially during the holidays. Wile I'm encouraged by your work, I'd also like to share my thoughts on how to move forward. First, please note that the design discussion is still finalizing, and we expect some moderate chang

Re: Unbalanced processing of KeyedStream

2019-01-02 Thread Jozef Vilcek
Thanks Ken. Yes, similar approach is suggested in post I shared in my question. But to me it feels a bit hack-ish. I would like to know if this is only solution with Flink or do I miss something? Can there be more API-ish support for such use-case from Flink? Is there a reason why there is none? Or

Re: Unbalanced processing of KeyedStream

2019-01-02 Thread Ken Krugler
Hi Jozef, Processing just a few keys (# of keys ≅ # of operators) in Flink isn’t common, from what I’ve seen. Another possible option is to broadcast all records, and then in each operator decide what records to process, based on the operator index and the key value. Something like this in you

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2019-01-02 Thread Eron Wright
I propose that the community review and merge the PRs that I posted, and then evolve the design thru 1.8 and beyond. I think having a basic infrastructure in place now will accelerate the effort, do you agree? Thanks again! On Wed, Jan 2, 2019 at 11:20 AM Zhang, Xuefu wrote: > Hi Eron, > > Ha

Apply for flink contributor permission

2019-01-02 Thread peibin wang
Hi guys, Could anyone kindly give me the contributor permission? My JIRA username is wangpeibin. Thanks, Peibin

[jira] [Created] (FLINK-11253) Incorrect way to stop yarn session described in yarn_setup document

2019-01-02 Thread Tao Yang (JIRA)
Tao Yang created FLINK-11253: Summary: Incorrect way to stop yarn session described in yarn_setup document Key: FLINK-11253 URL: https://issues.apache.org/jira/browse/FLINK-11253 Project: Flink

[jira] [Created] (FLINK-11254) Unify serialization format of savepoint for switching state backends

2019-01-02 Thread Congxian Qiu (JIRA)
Congxian Qiu created FLINK-11254: Summary: Unify serialization format of savepoint for switching state backends Key: FLINK-11254 URL: https://issues.apache.org/jira/browse/FLINK-11254 Project: Flink

[jira] [Created] (FLINK-11255) RemoteStreamEnvironment

2019-01-02 Thread Benjamin Lee (JIRA)
Benjamin Lee created FLINK-11255: Summary: RemoteStreamEnvironment Key: FLINK-11255 URL: https://issues.apache.org/jira/browse/FLINK-11255 Project: Flink Issue Type: Improvement Com

[DISCUSS] Detection Flink Backpressure

2019-01-02 Thread 裴立平
Recently I want to optimize the way to find the positions where the backpressures occured . I read some blogs about flink-backpressure and have a rough idea of it . The method which Flink adopted is thread-stack-sample , it's heavy and no-lasting . The positions where backpressures occured are

[jira] [Created] (FLINK-11256) Referencing StreamNode objects directly in StreamEdge causes the sizes of JobGraph and TDD to become unnecessarily large

2019-01-02 Thread Haibo Suen (JIRA)
Haibo Suen created FLINK-11256: -- Summary: Referencing StreamNode objects directly in StreamEdge causes the sizes of JobGraph and TDD to become unnecessarily large Key: FLINK-11256 URL: https://issues.apache.org/jira/

Re: [DISCUSS] Detection Flink Backpressure

2019-01-02 Thread Yun Gao
Hello liping, Thank you for proposing to optimize the backpressure detection! From our previous experience, we think the InputBufferPoolUsageGauge and OutputBufferPoolUsageGauge may be useful for detecting backpressure: for a list of tasks A ---> B > C, if we found that the OutputBuf

[jira] [Created] (FLINK-11257) FlinkKafkaConsumer should support assgin partition

2019-01-02 Thread shengjk1 (JIRA)
shengjk1 created FLINK-11257: Summary: FlinkKafkaConsumer should support assgin partition Key: FLINK-11257 URL: https://issues.apache.org/jira/browse/FLINK-11257 Project: Flink Issue Type: New F