Re: Re: [ANNOUNCE] Welcome Guowei Ma as a new Apache Flink Committer

2021-01-19 Thread SHI Xiaogang
Congratulations MA! Regards, Xiaogang Yun Tang 于2021年1月20日周三 下午2:24写道: > Congratulations Guowei! > > Best > Yun Tang > > From: Yang Wang > Sent: Wednesday, January 20, 2021 13:59 > To: dev > Subject: Re: Re: [ANNOUNCE] Welcome Guowei Ma as a new Apache Flink >

Re: [DISCUSS] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-09 Thread SHI Xiaogang
Hi, I don't think the proposal is a good solution to the problems. I am in favour of using a ProcessFunction chained to the source/sink function to serialize/deserialize the records, instead of embedding (de)serialization schema in source/sink function. Message packing is heavily used in our prod

Re: [DISCUSS] Improve history server with log support

2020-02-17 Thread SHI Xiaogang
ce system you proposed. > > could > > > > you share more information regarding this? > > > > Such as how would the live events being listened; how would the trace > > > being > > > > collected/stored; etc. > > > > > > > >

Re: [DISCUSS] Improve history server with log support

2020-02-12 Thread SHI Xiaogang
Hi Rong Rong, Thanks for the proposal. We are also suffering from some pains brought by history server. To address them, we propose a trace system, which is very similar to the metric system, for historical information. A trace is semi-structured information about events in Flink. Useful traces i

Re: [ANNOUNCE] Jark Wu is now part of the Flink PMC

2019-11-10 Thread SHI Xiaogang
Congratulations, Jark. You make a great job of improving the community. Regards, Xiaogang Benchao Li 于2019年11月10日周日 下午4:14写道: > Congratulations, Jark! > > Yun Tang 于2019年11月10日 周日下午3:25写道: > > > Congratulations, Jark > > > > Best > > Yun Tang > > > > On 11/10/19, 10:40 AM, "vino yang" wrote:

Re: [DISCUSS] Semantic and implementation of per-job mode

2019-10-30 Thread SHI Xiaogang
Hi Thanks for bringing this. The design looks very nice to me in that 1. In the new per-job mode, we don't need to compile user programs in the client and can directly run user programs with user jars. That way, it's easier for resource isolation in multi-tenant platforms and is much safer. 2. Th

Re: [SURVEY] Dropping non Credit-based Flow Control

2019-10-20 Thread SHI Xiaogang
+1 Credit-based flow control has long been used in our production environment as well. It works fine and there seems no reason to use non credit-based implementation. Regards, Xiaogang Zhu Zhu 于2019年10月19日周六 下午3:01写道: > +1 to drop the non credit-based flow control. > We have turned to credit-b

Re: [VOTE] FLIP-74: Flink JobClient API

2019-10-10 Thread SHI Xiaogang
+1. The interface looks fine to me. Regards, Xiaogang Zili Chen 于2019年10月9日周三 下午2:36写道: > Given the ongoing FlinkForward Berlin event, I'm going to extend > this vote thread with a bit of period, said until Oct. 11th(Friday). > > Best, > tison. > > > Zili Chen 于2019年10月7日周一 下午4:15写道: > > > Hi

Re: [PROPOSAL] Merge NewClusterClient into ClusterClient

2019-09-18 Thread SHI Xiaogang
Hi Tison, Thanks for bringing this. I think it's fine to break the back compatibility of client API now that ClusterClient is not well designed for public usage. But from my perspective, we should postpone any modification to existing interfaces until we come to an agreement on new client API. Ot

Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread SHI Xiaogang
Congratulations! Regards, Xiaogang Guowei Ma 于2019年9月11日周三 下午7:07写道: > Congratulations Zili ! > > Best, > Guowei > > > Fabian Hueske 于2019年9月11日周三 下午7:02写道: > >> Congrats Zili Chen :-) >> >> Cheers, Fabian >> >> Am Mi., 11. Sept. 2019 um 12:48 Uhr schrieb Biao Liu > >: >> >>> Congrats Zili! >>

Re: How to handle Flink Job with 400MB+ Uberjar with 800+ containers ?

2019-08-30 Thread SHI Xiaogang
ready started TaskManagers in P2P fashion, not to have a > blocker on HDFS replication. > > Spark job without any tuning exact same size jar with 800 executors, can > start without any issue at the same cluster in less than a minute. > > *Further questions:* > > *@ S

Re: How to handle Flink Job with 400MB+ Uberjar with 800+ containers ?

2019-08-29 Thread SHI Xiaogang
Hi Datashov, We faced similar problems in our production clusters. Now both lauching and stopping of containers are performed in the main thread of YarnResourceManager. As containers are launched and stopped one after another, it usually takes long time to boostrap large jobs. Things get worse wh

Re: [DISCUSS] Builder dedicated for testing

2019-08-27 Thread SHI Xiaogang
Hi Tison, Thanks for bringing this up to discussion. I think it's helpful to reducing unnecessary constructors with instance builders in test scope. Now certain classes, e.g., Execution, ExecutionVertex and StateHandle, are instantiated (including mocking and spying) here and there in the test co

Re: [DISCUSS] Enhance Support for Multicast Communication Pattern

2019-08-26 Thread SHI Xiaogang
discussion and very thanks for all the > deep > thoughts! > > For now, I think this discussion contains two scenarios: one if for > iteration library support and the other is for SQL join support. I think > both of the two scenarios are useful but they seem to have differ

Re: [DISCUSS] Enhance Support for Multicast Communication Pattern

2019-08-26 Thread SHI Xiaogang
deep > thoughts! > > For now, I think this discussion contains two scenarios: one if for > iteration library support and the other is for SQL join support. I think > both of the two scenarios are useful but they seem to have different best > suitable solutions. For making the disc

Re: [DISCUSS] Enhance Support for Multicast Communication Pattern

2019-08-25 Thread SHI Xiaogang
Hi all, I also think that multicasting is a necessity in Flink, but more details are needed to be considered. Currently network is tightly coupled with states in Flink to achieve automatic scaling. We can only access keyed states in keyed streams and operator states in all streams. In the concret

Re: [DISCUSS] Use Java's Duration instead of Flink's Time

2019-08-23 Thread SHI Xiaogang
+1 to replace Flink's time with Java's Duration. Besides, i also suggest to use Java's Instant for "point-in-time". It can take care of time units when we calculate Duration between different instants. Regards, Xiaogang Zili Chen 于2019年8月24日周六 上午10:45写道: > Hi vino, > > I agree that it introduc

Re: [DISCUSS] FLIP-52: Remove legacy Program interface.

2019-08-14 Thread SHI Xiaogang
+1 Glad that programming with flink becomes simpler and easier. Regards, Xiaogang Aljoscha Krettek 于2019年8月14日周三 下午11:31写道: > +1 (for the same reasons I posted on the other thread) > > > On 14. Aug 2019, at 15:03, Zili Chen wrote: > > > > +1 > > > > It could be regarded as part of Flink clien

Re: [DISCUSS][CODE STYLE] Breaking long function argument lists and chained method calls

2019-08-02 Thread SHI Xiaogang
ferent styles. This rule should be also applied > to function arguments. > > BTW, big +1 to options from Chesnay. We should make auto-format possible on > our project. > > 1. > > https://docs.google.com/document/d/1owKfK1DwXA-w6qnx3R7t2D_o0BsFkkukGlRhvl3XXjQ/edit# > > T

Re: [DISCUSS][CODE STYLE] Breaking long function argument lists and chained method calls

2019-08-01 Thread SHI Xiaogang
Hi Andrey, Thanks for bringing this. Personally, I prefer to the following style which (1) puts the right parenthese in the next line (2) a new line for each exception if exceptions can not be put in the same line That way, parentheses are aligned in a similar way to braces and exceptions can be

Re: [DISCUSS] Allow at-most-once delivery in case of failures

2019-07-23 Thread SHI Xiaogang
ned regions > > I think contributors/committers could implements this separate from the > Flink core. The feature would be trial-run it through the community > packages. If it gains a lot of traction, the community could decide to put > in the effort to merge this into the core. >

Re: [ANNOUNCE] Zhijiang Wang has been added as a committer to the Flink project

2019-07-22 Thread SHI Xiaogang
Congratulations Zhijiang! Regards, Xiaogang Guowei Ma 于2019年7月23日周二 上午8:08写道: > Congratulations Zhijiang > > 发自我的 iPhone > > > 在 2019年7月23日,上午12:55,Xuefu Z 写道: > > > > Congratulations, Zhijiang! > > > >> On Mon, Jul 22, 2019 at 7:42 AM Bo WANG > wrote: > >> > >> Congratulations Zhijiang! > >>

Re: About Deprecating split/select for DataStream API

2019-06-18 Thread SHI Xiaogang
gt; > >>> 1. The split/select may have been widely used without touching the > broken part. > >>> 2. Though restricted compared with side output, the semantics for > split/select itself is acceptable since union does not support different > data types either. > >>> 3.

Re: About Deprecating split/select for DataStream API

2019-06-17 Thread SHI Xiaogang
d inside an operator rather than separately. This is what > SideOutputs do, as you define them inside the ProcessFunction, rather > than afterwards. Therefore I am very much in favor of using them for > those cases. Once again if the problem is that they are available only > in the Pr

Re: [DISCUSS] Flink client api enhancement for downstream project

2019-06-17 Thread SHI Xiaogang
Hi Jeff and Flavio, Thanks Jeff a lot for proposing the design document. We are also working on refactoring ClusterClient to allow flexible and efficient job management in our real-time platform. We would like to draft a document to share our ideas with you. I think it's a good idea to have some

Re: About Deprecating split/select for DataStream API

2019-06-17 Thread SHI Xiaogang
/select API. I think if there are some problems with > the API, it's better to fix them instead of deprecating them. > And select/split are straightforward and convenient APIs. It's worth to > have them. > > Regards, > Jark > > On Mon, 17 Jun 2019 at 14:46, vino yang

Re: About Deprecating split/select for DataStream API

2019-06-16 Thread SHI Xiaogang
Hi Xingcan, Thanks for bringing it up for discusson. I agree with you that we should not deprecate the split/select methods. Their semantics are very clear and they are widely adopted by Flink users. We should fix these problems instead of simply deprecating the methods. Regards, Xiaogang Xingc

Re: [DISCUSS] Allow at-most-once delivery in case of failures

2019-06-11 Thread SHI Xiaogang
gt; > It is an interesting topic. > > > > Notice that there is some effort to build a mature mllib of flink these > > days, it could be also possible for some ml cases trade off correctness > for > > timeliness or throughput. Excatly-once delivery excatly makes flink sta

[DISCUSS] Allow at-most-once delivery in case of failures

2019-06-11 Thread SHI Xiaogang
Flink offers a fault-tolerance mechanism to guarantee at-least-once and exactly-once message delivery in case of failures. The mechanism works well in practice and makes Flink stand out among stream processing systems. But the guarantee on at-least-once and exactly-once delivery does not come with

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-03 Thread SHI Xiaogang
Nice feature. Looking forward to having it in Flink. Regards, Xiaogang vino yang 于2019年6月3日周一 下午3:31写道: > Hi all, > > As we mentioned in some conference, such as Flink Forward SF 2019 and QCon > Beijing 2019, our team has implemented "Local aggregation" in our inner > Flink fork. This feature c

Re: [Discuss][FLINK-8297]A solution for FLINK-8297 Timebased RocksDBListState

2019-04-16 Thread SHI Xiaogang
Hi all, I can provide more details about the private solution mentioned by Yun. We noticed that the re-ordering of elements due to internal clock jump will break the semantics of LIST. So we decide not to use timestamp in the keys. Instead, we format the key in a list state as STATE_NAME#SEQUENC

Re: [DISCUSS] Task speculative execution for Flink batch

2018-11-06 Thread SHI Xiaogang
Hi, +1 for the speculative execution. It will be more great if it can work well with exisitng checkpointing and pipelined execution. That way, we can move a further step towards the unification of batch and stream processing. Regards, Xiaogang Jeff Zhang 于2018年11月7日周三 上午9:40写道: > +1 for the s

Re: [DISCUSS] Enhancing the functionality and productivity of Table API

2018-11-06 Thread SHI Xiaogang
Hi all, Thank you for your replies and comments. I have similar consideration like Piotrek. My opinion is that two APIs are enough for Flink, a declarative one (SQL) and one imperative one (DataStream). From my perspective, most of users prefer SQL at most time and turn to Data Stream when the l

Re: [DISCUSS] Enhancing the functionality and productivity of Table API

2018-11-05 Thread SHI Xiaogang
Hi all, I think it's good to enhance the functionality and productivity of Table API, but still I think SQL + DataStream is a better choice from user experience 1. The unification of batch and stream processing is very attractive, and many our users are moving their batch-processing applications t

Re: Re: [ANNOUNCE] New Flink committer Jincheng Sun

2017-07-11 Thread SHI Xiaogang
Congratulations, Jincheng! 2017-07-12 2:30 GMT+08:00 Tzu-Li (Gordon) Tai : > Congratulations and welcome to committership, Jincheng! > > - Gordon > > > On 12 July 2017 at 1:06:15 AM, jincheng sun (sunjincheng...@gmail.com) > wrote: > > Thanks a lot guys! > I hoping to help out the community as mu

Re: [DISCUSS] FLIP-22: Eager State Declaration

2017-07-04 Thread SHI Xiaogang
Hi Tzu-Li, Thanks for the proposal. The changes are great. I have several questions about some details. First, do you have any plan to provide a method to remove states? Now states can only be created (either lazily or eagerly), but cannot be removed. We cannot remove those states not registered

Re: [ANNOUNCE] New Flink committer Shaoxuan Wang

2017-06-21 Thread SHI Xiaogang
Congrats, Shaoxuan Regards, Xiaogang 2017-06-22 9:08 GMT+08:00 jincheng sun : > Congratulations Shaoxuan. > > > 2017-06-22 8:56 GMT+08:00 Zhangrucong : > > > Congrats Shaoxuan! > > > > -邮件原件- > > 发件人: Fabian Hueske [mailto:fhue...@gmail.com] > > 发送时间: 2017年6月22日 4:19 > > 收件人: dev@flink.a

Re: [ANNOUNCE] New committer: Dawid Wysakowicz

2017-06-19 Thread SHI Xiaogang
Congrats Dawid. Great thanks for your contribution! Xiaogang 2017-06-19 18:52 GMT+08:00 Dawid Wysakowicz : > Thank you all for the warm welcome. I will do my best to be as helpful as > possible. >

Re: Incremental checkpoint branch

2017-03-03 Thread SHI Xiaogang
Hi Vinshnu, We have obtained an initial design of incremental checkpointing [1] and will start working on incremental checkpointing the next week. You can watch the issue FLINK-5053 [2] to get timely notification of the updates. All suggestions are welcome. [1] https://docs.google.com/document/d

[DISCUSS] Support Incremental Checkpointing in Flink

2017-02-13 Thread SHI Xiaogang
Hi all, Incremental checkpointing can help a lot in improving the efficiency of fault tolerance and recovery in Flink. I wrote an initial design of incremental checkpointing in Flink, and am looking forwards for your comments. https://docs.google.com/document/d/1VvvPp09gGdVb9D2wHx6NX99yQK0jSUMW

Re: [jira] [Created] (FLINK-5544) Implement Internal Timer Service in RocksDB

2017-01-17 Thread SHI Xiaogang
Hi Florian, The memory usage depends on the types of keys and namespaces. We have not experienced that many concurrently open windows. But given that each open window needs several bytes for its timer, 2M open windows may cost up to hundreds of MB. Regards Xiaogang 2017-01-18 14:45 GMT+08:00 F

Re: [DISCUSS] FLIP-14: Loops API and Termination

2016-11-11 Thread SHI Xiaogang
tusUpdate' from > runtime (see 2.1 in the steps). > - RS and Heads will broadcast StatusUpdate event and will not notify its > status. > - When StatusUpdate event gets back to the head it will notify its > WORKING status. > > Hope that answers your concern. > &g

Re: [DISCUSS] FLIP-14: Loops API and Termination

2016-11-10 Thread SHI Xiaogang
Hi Paris I have several concerns about the correctness of the termination protocol. I think the termination protocol put an end to the computation even when the computation has not converged. Suppose there exists a loop context constructed by a OP operator, a Head operator and a Tail operator (il

Re: Add MapState for keyed streams

2016-10-19 Thread SHI Xiaogang
> Hi Xiaogang, > > I think maybe return Set> is better than > Iterator>. > Because users can use foreach on Set but not Iterator, and can use > iterator access via set.iterator(). > Maybe Map.entrySet() is a more familiar way to users. > > > - Jark Wu &

Re: Add MapState for keyed streams

2016-10-19 Thread SHI Xiaogang
ur interface). The Entry interface not > only allows to get access to the key and value of the map entry but also > allows to set a value for the respective key via setValue (even though it's > an optional operation). Maybe we want to offer something similar when > getting acces

Re: Add MapState for keyed streams

2016-10-19 Thread SHI Xiaogang
offer something similar when > getting access to the entry set via the iterator method. > > Cheers, > Till > > On Wed, Oct 19, 2016 at 4:18 AM, SHI Xiaogang > wrote: > > > Hi, all. I created the JIRA https://issues.apache.org/ > > jira/browse/FLINK-4856 to >

Add MapState for keyed streams

2016-10-18 Thread SHI Xiaogang
Hi, all. I created the JIRA https://issues.apache.org/jira/browse/FLINK-4856 to propose adding MapStates into Flink. MapStates are very useful in our daily jobs. For example, when implementing DistinctCount, we store the values into a MapState and the result of each group(key) is exactly the numbe

[DISCUSS] Support Suspending and Resuming of Flink Jobs

2016-10-12 Thread SHI Xiaogang
Hi all, Currently, savepoints are exactly the completed checkpoints, and Flink provides commands (save/run) to allow saving and restoring jobs. But in the near future, savepoints will be very different from checkpoints because they will have common serialization formats and allow recover from majo