Re: [ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-07 Thread Hequn Cheng
Thanks a lot for the release and your great job, Gordon! Also thanks to everyone who made this release possible! Best, Hequn On Tue, Apr 7, 2020 at 8:58 PM Tzu-Li (Gordon) Tai wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink Stateful Functions 2.0.0. >

Re: [ANNOUNCE] Apache Flink 1.9.3 released

2020-04-25 Thread Hequn Cheng
@Dian, thanks a lot for the release and for being the release manager. Also thanks to everyone who made this release possible! Best, Hequn On Sat, Apr 25, 2020 at 7:57 PM Dian Fu wrote: > Hi everyone, > > The Apache Flink community is very happy to announce the release of Apache > Flink 1.9.3,

Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Hequn Cheng
Thanks Dian for the great work and thanks to everyone who makes this release possible! Best, Hequn On Wed, Jul 22, 2020 at 4:40 PM Jark Wu wrote: > Congratulations! Thanks Dian for the great work and to be the release > manager! > > Best, > Jark > > On Wed, 22 Jul 2020 at 15:45, Yangze Guo wro

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-07-31 Thread Hequn Cheng
Hi Jincheng, Thanks a lot for raising the discussion. +1 for the FLIP. I think this will bring big benefits for the PyFlink users. Currently, the Python TableAPI document is hidden deeply under the TableAPI&SQL tab which makes it quite unreadable. Also, the PyFlink documentation is mixed with Jav

Re: Implementing a low level join

2019-08-13 Thread Hequn Cheng
Hi Felipe, > I want to implement a join operator which can use different strategies for joining tuples. Not all kinds of join strategies can be applied to streaming jobs. Take sort-merge join as an example, it's impossible to sort an unbounded data. However, you can perform a window join and use t

Re: Implementing a low level join

2019-08-14 Thread Hequn Cheng
tions. My question is more about if I can have an operator > which decides beteween broadcast and regular join dynamically. I suppose I > will have to extend the generic TwoInputStreamOperator in Flink. Do you > have any suggestion? > > Thanks > > On Wed, 14 Aug 2019, 03

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-15 Thread Hequn Cheng
Congratulations Andrey! On Thu, Aug 15, 2019 at 3:30 PM Fabian Hueske wrote: > Congrats Andrey! > > Am Do., 15. Aug. 2019 um 07:58 Uhr schrieb Gary Yao : > > > Congratulations Andrey, well deserved! > > > > Best, > > Gary > > > > On Thu, Aug 15, 2019 at 7:50 AM Bowen Li wrote: > > > > > Congrat

Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread Hequn Cheng
Congratulations! Best, Hequn On Thu, Sep 12, 2019 at 9:24 AM Jark Wu wrote: > Congratulations Zili! > > Best, > Jark > > On Wed, 11 Sep 2019 at 23:06, wrote: > > > Congratulations, Zili. > > > > > > > > Best, > > > > Xingcan > > > > > > > > *From:* SHI Xiaogang > > *Sent:* Wednesday, Septembe

Re: [External] Re: From Kafka Stream to Flink

2019-09-19 Thread Hequn Cheng
://github.com/apache/flink/blob/master/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/planner/runtime/stream/sql/SplitAggregateITCase.scala#L228 On Thu, Sep 19, 2019 at 9:40 PM Casado Tejedor, Rubén < ruben.casado.teje...@accenture.com> wrote: > Thanks Fabian. @Hequn Che

Re: [PROPOSAL] Contribute Stateful Functions to Apache Flink

2019-10-12 Thread Hequn Cheng
Hi Stephan, Big +1 for adding this to Apache Flink! As for the problem of whether this should be added to the Flink main repository, from my side, I prefer to put it in the main repository. Not only Stateful Functions shares very close relations with the current Flink, but also other libs or modu

Re: [ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Hequn Cheng
Thanks a lot to Jark, Jincheng, and everyone that make this release possible. Best, Hequn On Sat, Oct 19, 2019 at 10:29 PM Zili Chen wrote: > Thanks a lot for being release manager Jark. Great work! > > Best, > tison. > > > Till Rohrmann 于2019年10月19日周六 下午10:15写道: > >> Thanks a lot for being ou

[ANNOUNCE] Apache Flink 1.8.3 released

2019-12-11 Thread Hequn Cheng
Hi, The Apache Flink community is very happy to announce the release of Apache Flink 1.8.3, which is the third bugfix release for the Apache Flink 1.8 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streamin

Re: [ANNOUNCE] Weekly Community Update 2019/50

2019-12-15 Thread Hequn Cheng
Hi Konstantin, Happy holidays and thanks a lot for your great job on the updates continuously. With the updates, it is easier for us to catch up with what's going on in the community, which I think is quite helpful. I'm wondering if I can do some help and cover this during your vocation. :) Best

Re: [ANNOUNCE] Weekly Community Update 2019/50

2019-12-17 Thread Hequn Cheng
king forward to your updates! > > Cheers, > > Konstantin > > On Mon, Dec 16, 2019 at 5:53 AM Hequn Cheng wrote: > >> Hi Konstantin, >> >> Happy holidays and thanks a lot for your great job on the updates >> continuously. >> With the updates,

[ANNOUNCE] Weekly Community Update 2019/51

2019-12-22 Thread Hequn Cheng
Dear community, Happy to share this week's brief community digest with updates on Flink 1.10 and Flink 1.9.2, a proposal to integrate Flink Docker image publication into Flink release process, a discussion on new features of PyFlink and a couple of blog posts. Enjoy. Flink Development ===

[ANNOUNCE] Weekly Community Update 2019/52

2019-12-29 Thread Hequn Cheng
Dear community, Happy to share a short community update this week. Due to the holiday, the dev@ mailing list is pretty quiet these days. Flink Development == * [sql] Jark proposes to correct the terminology of "Time-windowed Join" to "Interval Join" in Table API & SQL before 1.10 is

[ANNOUNCE] Weekly Community Update 2020/01

2020-01-04 Thread Hequn Cheng
Dear community, Happy new year! Wishing you a new year rich with the blessings of love, joy, warmth, and laughter. Wish Flink will get better and better. Nice to share this week’s community digest with an update on Flink-1.10.0, a proposal to set blink planner as the default planner for SQL Clien

Re: [DISCUSS] Set default planner for SQL Client to Blink planner in 1.10 release

2020-01-04 Thread Hequn Cheng
+1 to make blink planner as the default planner for SQL Client, hence we can give the blink planner a bit more exposure. Best, Hequn On Fri, Jan 3, 2020 at 6:32 PM Jark Wu wrote: > Hi Benoît, > > Thanks for the reminder. I will look into the issue and hopefully we can > target it into 1.9.2 and

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Hequn Cheng
Congratulations, Dian. Well deserved! Best, Hequn On Thu, Jan 16, 2020 at 6:08 PM Leonard Xu wrote: > Congratulations! Dian Fu > > Best, > Leonard > > 在 2020年1月16日,18:00,Jeff Zhang 写道: > > Congrats Dian Fu ! > > jincheng sun 于2020年1月16日周四 下午5:58写道: > >> Hi everyone, >> >> I'm very happy to a

[ANNOUNCE] Apache Flink 1.9.2 released

2020-01-31 Thread Hequn Cheng
Hi everyone, The Apache Flink community is very happy to announce the release of Apache Flink 1.9.2, which is the second bugfix release for the Apache Flink 1.9 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate dat

Re: [ANNOUNCE] Apache Flink 1.9.2 released

2020-01-31 Thread Hequn Cheng
> > Best, > Jincheng > > Hequn Cheng 于2020年1月31日周五 下午6:55写道: > >> Hi everyone, >> >> The Apache Flink community is very happy to announce the release of >> Apache Flink 1.9.2, which is the second bugfix release for the Apache Flink >> 1.9 series. &

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread Hequn Cheng
Hi Jincheng, +1 for this proposal. >From the perspective of users, I think it would nice to have PyFlink on PyPI which makes it much easier to install PyFlink. Best, Hequn On Tue, Feb 4, 2020 at 1:09 PM Jeff Zhang wrote: > +1 > > > Xingbo Huang 于2020年2月4日周二 下午1:07写道: > >> Hi Jincheng, >> >> T

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-11 Thread Hequn Cheng
+1 (non-binding) - Check signature and checksum. - Install package successfully with Pip under Python 3.7.4. - Run wordcount example successfully under Python 3.7.4. Best, Hequn On Tue, Feb 11, 2020 at 12:17 PM Dian Fu wrote: > +1 (non-binding) > > - Verified the signature and checksum > - Pip

Re: [ANNOUNCE] Apache Flink 1.10.0 released

2020-02-12 Thread Hequn Cheng
Great thanks to Yu & Gary for being the release manager! Also thanks to everyone who made this release possible! Best, Hequn On Thu, Feb 13, 2020 at 9:54 AM Rong Rong wrote: > Congratulations, a big thanks to the release managers for all the hard > works!! > > -- > Rong > > On Wed, Feb 12, 2020

Re: [ANNOUNCE] Apache Flink Python API(PyFlink) 1.9.2 released

2020-02-13 Thread Hequn Cheng
Thanks a lot for the release, Jincheng! Also thanks to everyone that make this release possible! Best, Hequn On Thu, Feb 13, 2020 at 2:18 PM Dian Fu wrote: > Thanks for the great work, Jincheng. > > Regards, > Dian > > 在 2020年2月13日,下午1:32,jincheng sun 写道: > > Hi everyone, > > The Apache Flink

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-20 Thread Hequn Cheng
Congratulations Jingsong! Well deserved. Best, Hequn On Fri, Feb 21, 2020 at 2:42 PM Yang Wang wrote: > Congratulations!Jingsong. Well deserved. > > > Best, > Yang > > Zhijiang 于2020年2月21日周五 下午1:18写道: > >> Congrats Jingsong! Welcome on board! >> >> Best, >> Zhijiang >> >> -

Re: DataStreamCalcRule grows beyond 64 KB

2018-06-11 Thread Hequn Cheng
Hi rakeshchalasani, At the moment flink only splits methods by fields to avoid 64k problem, so current implementation will reach the limits if a single field becomes too large. Flink community has already planed to solve the problem, see [1]. As a workaround, you can define you own udf to avoid th

Re: What does the number in front of ">" mean when I print a DataStream

2018-06-13 Thread Hequn Cheng
Hi, chris It means there are four threads and each thread outputs a record. You can use env.setParallelism() to change the default value(i.e., 4) to other values. Best, Hequn On Thu, Jun 14, 2018 at 9:09 AM, chrisr123 wrote: > > What does the number in front of the ">" character mean when cal

Re: Simple stdout sink for testing Table API?

2018-06-23 Thread Hequn Cheng
Hi chrisr, It seems there are no "single line" ways to solve your problem. To print results on screen, you can use the DataStream.print() / DataSet.print() method, and to limit the output you can add a FilterFunction. The code looks like: Table projection1 = customers > .select("id,last_n

Re: Measure Latency from source to sink

2018-06-25 Thread Hequn Cheng
Hi antonio, I see two options to solve your problem. 1. Enable the latency tracking[1]. But you have to pay attention to it's mechanism, for example, a) the sources only *periodically* emit a special record and b) the latency markers are not accounting for the time user records spend in operators

Re: Restore state from save point with add new flink sql

2018-06-26 Thread Hequn Cheng
Hi I'm not sure about the answer. I have a feeling that if we only add new code below the old code(i.e., append new code after old code), the uid will not be changed. On Tue, Jun 26, 2018 at 3:06 PM, Till Rohrmann wrote: > I think so. Maybe Fabian or Timo can correct me if I'm wrong here. > > O

Re: Measure Latency from source to sink

2018-06-26 Thread Hequn Cheng
Hi antonio, latency is exposed via a metric. You can find each operator's latency through flink UI(Overview->Task Metrics -> select the task, for example select the sink -> Add metric -> find latency metric) On Tue, Jun 26, 2018 at 11:18 PM, antonio saldivar wrote: > Hello thank you > > I also w

Re: Streaming

2018-06-27 Thread Hequn Cheng
Hi aitozi, 1> CountDistinct Currently (flink-1.5), CountDistinct is supported in SQL only under window as RongRong described. There are ways to implement non-window CountDistinct, for example: a) you can write a CountDistinct udaf using MapView or b) Use two groupBy to achieve it. The first groupB

Re: Streaming

2018-06-27 Thread Hequn Cheng
For the above two non-window approaches, the second one achieves a better performance. => For the above two non-window approaches, the second one achieves a better performance in most cases especially when there are many same rows. On Thu, Jun 28, 2018 at 12:25 AM, Hequn Cheng wrote: &

Re: Over Window Not Processing Messages

2018-06-27 Thread Hequn Cheng
Hi Gregory, As you are using the rowtime over window. It is probably a watermark problem. The window only output when watermarks make a progress. You can use processing-time(instead of row-time) to verify the assumption. Also, make sure there are data in each of you source partition, the watermark

Re: Over Window Not Processing Messages

2018-06-27 Thread Hequn Cheng
> On Wed, Jun 27, 2018 at 6:49 PM, Hequn Cheng wrote: > >> Hi Gregory, >> >> As you are using the rowtime over window. It is probably a watermark >> problem. The window only output when watermarks make a progress. You can >> use processing-time(instead of row-

Re: Streaming

2018-06-28 Thread Hequn Cheng
Hi aitozi, 1> how will sql translated into a datastream job? The Table API and SQL leverage Apache Calcite for parsing, validation, and query optimization. After optimization, the logical plan of the job will be translated into a datastream job. The logical plan contains many different logical ope

Re: Data Type of timestamp in Streaming SQL Result? Long instead of timestamp?

2018-06-28 Thread Hequn Cheng
Hi chrisr, > *Is there a way to retrieve this from the query as a long instead?* You have to convert the timestamp type to long type. It seems there are no internal udf to convert timestamp to unix timestamp, however you can write one and used in your SQL. > *Question: how do I express that I wan

Re: First try running I am getting org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.

2018-06-29 Thread Hequn Cheng
Hi Mich, You port is not matching. Start netcat with "nc -l 2219 &", but run flink job with "--port 2199". On Fri, Jun 29, 2018 at 8:40 PM, Mich Talebzadeh wrote: > Hi, > > I have installed flink 1.5 in standalone mode and trying a basic run as > per this example > > https://ci.apache.org/pr

Re: First try running I am getting org.apache.flink.client.program.ProgramInvocationException: Could not retrieve the execution result.

2018-06-29 Thread Hequn Cheng
wn risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, dam

Re: Existing files and directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode to overwrite existing files and directories.

2018-07-02 Thread Hequn Cheng
Hi Mich, It seems the writeMode has not been set correctly. Have you ever tried > .writeAsText("/tmp/md_streaming.txt", FileSystem.WriteMode.OVERWRITE); On Mon, Jul 2, 2018 at 10:44 PM, Mich Talebzadeh wrote: > Flink 1.5 > > This streaming data written to a file > > val stream = env >

Re: How to set global config in the rich functions

2018-07-04 Thread Hequn Cheng
Hi zhen, Global configs can not be passed like this. You can set the global configs through ExecutionConfig, more details here[1]. [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/best_practices.html#register-the-parameters-globally On Wed, Jul 4, 2018 at 8:27 PM, zhen li wrote:

Re: Facing issue in RichSinkFunction

2018-07-05 Thread Hequn Cheng
Hi Amol, The implementation of the RichSinkFunction probably contains a field that is not serializable. To avoid serializable exception, you can: 1. Marking the field as transient. This makes the serialization mechanism skip the field. 2. If the field is part of the object's persistent state, the

Re: Filter columns of a csv file with Flink

2018-07-06 Thread Hequn Cheng
Hi francois, If I understand correctly, you can use sql or table-api to solve you problem. As you want to project part of columns from source, a columnar storage like parquet/orc would be efficient. Currently, ORC table source is supported in flink, you can find more details here[1]. Also, there a

Re: Filter columns of a csv file with Flink

2018-07-06 Thread Hequn Cheng
chema? > > Big thank to put on the table-api's way :) > > Best R > > François Lacombe > > > > 2018-07-06 16:53 GMT+02:00 Hequn Cheng : > >> Hi francois, >> >> If I understand correctly, you can use sql or table-api to solve you >> proble

Re: Pass a method as parameter

2018-07-08 Thread Hequn Cheng
Hi Soheil, What do you mean by "give it a written function's name B" and "function A will apply function B"? Do you mean function A override B? Perhaps the DataStream api guide[1] may gives you some guidance. Best, Hequn [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/datastream_a

Re: Filter columns of a csv file with Flink

2018-07-09 Thread Hequn Cheng
at describe them. >>> As my users can send erroneous files, inconsistent with schemas, I want >>> to check if files structure is right before processing them. >>> I see that CsvTableSource allows to define csv fields. Then, will it >>> check if columns actu

Re: How to trigger a function on the state periodically?

2018-07-09 Thread Hequn Cheng
Hi anna, > I need to trigger a function once every day If you want to trigger by the function itself, you can use the Timer[1]. Both types of timers (processing-time and event-time) are internally maintained by the TimerService, and onTimer() method will be called once a timer fires. If you want t

Re: How to trigger a function on the state periodically?

2018-07-09 Thread Hequn Cheng
e of the > stream volume and window size. > I want to store in the state, for every user the last activity date and > process them once daily. > > I want to make sure I am heading in the right direction. Thank you for > your suggestions. > > -Anna > > On Mon, Ju

Re: How to trigger a function on the state periodically?

2018-07-09 Thread Hequn Cheng
, 2018 at 1:54 PM, anna stax wrote: > Thanks Hequn. I think so too, the large number of timers could be a > problem. > > On Mon, Jul 9, 2018 at 10:23 PM, Hequn Cheng wrote: > >> Hi anna, >> >> According to your description, I think we can use the Timer to solve

Re: Confusions About JDBCOutputFormat

2018-07-10 Thread Hequn Cheng
Hi wangsan, I agree with you. It would be kind of you to open a jira to check the problem. For the first problem, I think we need to establish connection each time execute batch write. And, it is better to get the connection from a connection pool. For the second problem, to avoid multithread pro

Re: Access the data in a stream after writing to a sink

2018-07-10 Thread Hequn Cheng
Hi Teena, It seems that a sink can not output data into another sink. Maybe we can implement a combined user defined sink. In the combined sink, only write to the next sink if the first write is successful. On Tue, Jul 10, 2018 at 3:23 PM, Teena Kappen // BPRISE < teena.kap...@bprise.com> wrote:

Re: Filter columns of a csv file with Flink

2018-07-10 Thread Hequn Cheng
; wrote: > Hi Hequn, > > 2018-07-10 3:47 GMT+02:00 Hequn Cheng : > >> Maybe I misunderstand you. So you don't want to skip the whole file? >> > Yes I do > By skipping the whole file I mean "throw an Exception to stop the process > and inform user that file i

Re: Confusions About JDBCOutputFormat

2018-07-10 Thread Hequn Cheng
connection is closed? > > May be we could use a Timer to test the connection periodically and keep > it alive. What do you think? > > I will open a jira and try to work on that issue. > > Best, > wangsan > > > > On Jul 10, 2018, at 8:38 PM, Hequn Cheng wrote: >

Re: Confusions About JDBCOutputFormat

2018-07-11 Thread Hequn Cheng
> connection pools. Test and refresh the connection periodically can simply > solve this problem. I’ve implemented this at https://github.com/apache/ > flink/pull/6301, It would be kind of you to review this. > > Best, > wangsan > > > > On Jul 11, 2018, at 2:25 PM, Hequ

Re: How to create User Notifications/Reminder ?

2018-07-11 Thread Hequn Cheng
Hi shyla, There is a same question[1] asked two days ago. Maybe it is helpful for you. Let me know if you have any other concern. Best, Hequn [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-trigger-a-function-on-the-state-periodically-td21311.html On Thu, Jul 12,

Re: Need to better way to create JSON If we have TableSchema and Row

2018-07-11 Thread Hequn Cheng
Hi shivam, It seems there is no such a function but you can write one by yourself, maybe use the com.fasterxml.jackson.databind.ObjectMapper. Best, Hequn On Thu, Jul 12, 2018 at 1:56 AM, Shivam Sharma <28shivamsha...@gmail.com> wrote: > Hi All, > > I have TableSchema >

Re: How to create User Notifications/Reminder ?

2018-07-12 Thread Hequn Cheng
shyla deshpande wrote: > Hi Hequen, > > I was more interested in solving using CEP. > I want to have a window of 2 weeks and in the Timeout Handler I want to > create Notification/Reminder. > Is this doable in Flink 1.4.2.? > > Thanks > > > On Wed, Jul 11, 2018 at 6:14

Re: 答复: TumblingProcessingTimeWindow emits extra results for a same window

2018-07-12 Thread Hequn Cheng
Hi Yuan, Haven't heard about this before. Which flink version do you use? The cause may be: 1. userId not 100% identical, for example contains invisible characters. 2. The machine clock vibrated. Otherwise, there are some bugs we don't know. Best, Hequn On Thu, Jul 12, 2018 at 8:00 PM, Yuan,Yo

Re: TimeWindow doesn't trigger reduce function

2018-07-13 Thread Hequn Cheng
Hi Soheil, It seems you job stops within 1 second? The processing time window doesn't output data if time hasn't reach the window end. While event time window will output a final watermark during close() to avoid this problem. You can try to increase the running time of your job to get the output

Re: [ANNOUNCE] Apache Flink 1.5.1 released

2018-07-13 Thread Hequn Cheng
Cool, thanks to Chesnay! Best, Hequn On Fri, Jul 13, 2018 at 8:25 PM, vino yang wrote: > Thanks Chesnay, great job! > > Thanks, > Vino > > 2018-07-13 20:20 GMT+08:00 Till Rohrmann : > >> Great to hear. Big thank you to the community for the hard work and to >> Chesnay for being our release mana

Re: 答复: 答复: TumblingProcessingTimeWindow emits extra results for a same window

2018-07-13 Thread Hequn Cheng
;:153144840,"cnt":1593435,"userId":"user01","min_ts": > *1531448399978*,"max_ts":1531448519978} > > > > jobB(4-minute window) output > > {"timestamp":153144792,"cnt":3306838,"userId":"use

Re: Filtering and mapping data after window opertator

2018-07-14 Thread Hequn Cheng
Hi Soheil, We can't apply FilterFunction or MapFunction on WindowedStream. It is recommended to do these operations on DataStream, for example, temp.filter().map().keyBy(0).timeWindow(). Best, Hequn On Sat, Jul 14, 2018 at 9:14 PM, Soheil Pourbafrani wrote: > Hi, I'm getting data stream from a

Re: reduce a data stream to other type

2018-07-15 Thread Hequn Cheng
Hi Soheil, Yes, reduce function doesn't allow this. A ReduceFunction specifies how two elements from the input are combined to produce an output element of the same type. You can use AggregateFunction or FoldFunction. More details here[1]. Best, Hequn [1] https://ci.apache.org/projects/flink/fli

Re: specifying prefix for print(), printToErr() ?

2018-07-15 Thread Hequn Cheng
Hi chrisr, The document is misleading. Only DataSet api support prefixed print now. I create a jira for DataStream[1]. For now, you can use a customed SinkFunction to achieve this. [1]. https://issues.apache.org/jira/browse/FLINK-9850 On Sat, Jul 14, 2018 at 10:34 PM, chrisr123 wrote: > > The

Re: Flink Query Optimizer

2018-07-15 Thread Hequn Cheng
Hi Albert, Apache Flink leverages Apache Calcite to optimize and translate queries. The optimization currently performed include projection and filter push-down, subquery decorrelation, and other kinds of query rewriting. Flink does not yet optimize the order of joins[1]. I agree with you it is va

Re: Persisting Table in Flink API

2018-07-15 Thread Hequn Cheng
Hi Shivam, Currently, fink sql/table-api support window join and non-window join[1]. If your requirements are not being met by sql/table-api, you can also use the datastream to implement your own logic. You can refer to the non-window join implement as an example[2][3]. Best, Hequn [1] https://c

Re: Persisting Table in Flink API

2018-07-16 Thread Hequn Cheng
-dimension table join and the table is huge, use Redis or >> some cache based on memory, can help to process your problem. And you can >> customize the flink's physical plan (like Hequn said) and use async >> operator to optimize access to the third-party system. >> &g

Re: TumblingProcessingTimeWindow emits extra results for a same window

2018-07-16 Thread Hequn Cheng
at will happen if more > elements arrive *at the last millisecond, but AFTER the window is fired*? > > Thanks, > Youjun > *发件人**:* Hequn Cheng > *发送时间:* Friday, July 13, 2018 9:44 PM > *收件人:* Yuan,Youjun > *抄送:* Timo Walther ; user@flink.apache.org > *主题:* Re: 答复: 答复: Tumblin

Re: clear method on Window Trigger

2018-07-17 Thread Hequn Cheng
Hi Soheil, The clear() method performs any action needed upon removal of the corresponding window. This is called when a window is purged. The differences between FIRE and FIRE_AND_PURGE is FIRE only trigger the computation while FIRE_AND_PURGE trigger the computation and clear the elements in the

Re: Why data didn't enter the time window in EventTime mode

2018-07-18 Thread Hequn Cheng
Hi Soheil, > wait 8 milliseconds (according to my code) to see if any other data with the same key received or not and after 8 millisecond it will be triggered. Yes, but the time is event time, so if there is no data from source the time won't be advanced. There are some reasons why the event tim

Re: Why data didn't enter the time window in EventTime mode

2018-07-19 Thread Hequn Cheng
: > Hi Soheil, > > Hequn is right. This might be an issue with advancing event-time. > You can monitor that by checking the watermarks in the web dashboard or > print-debug it with a ProcessFunction which can lookup the current > watermark. > > Best, Fabian > > 2018-07

Re: working with flink Session Windows

2018-07-20 Thread Hequn Cheng
Hi antonio, I think it worth a try to test the performance in your scenario, since job performance can be affected by a number of factors(say your WindowFunction). Best, Hequn On Sat, Jul 21, 2018 at 2:59 AM, antonio saldivar wrote: > Hello > > I am building an app but for this UC I want to te

Re: streaming predictions

2018-07-22 Thread Hequn Cheng
Hi Cederic, I am not familiar with SVM or machine learning but I think we can work it out together. What problem have you met when you try to implement this function? From my point of view, we can rebuild the model in the flatMap function and use it to predict the input data. There are some flatMa

Re: Behaviour of triggers in Flink

2018-07-22 Thread Hequn Cheng
Hi Harshvardhan, By specifying a trigger using trigger() you are overwriting the default trigger of a WindowAssigner. For example, if you specify a CountTrigger for TumblingEventTimeWindows you will no longer get window firings based on the progress of time but only by count. Right now, you have t

Re: Triggers for late elements

2018-07-22 Thread Hequn Cheng
Hi harshvardhan, No, by default, late elements will be thrown away. There are documents about window here[1]. Best, Hequn [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/windows.html#windows On Mon, Jul 23, 2018 at 1:34 AM, Harshvardhan Agrawal < harshvardhan.ag

Re: override jvm params

2018-07-25 Thread Hequn Cheng
Hi Cussac, If I understand correctly, you want to pass rules.consumer.topic=test and rules.consumer.topic=test to flink jvm. I think you can try: flink run -m $HOSTPORT -yD rules.consumer.topic=test -yD rules.consumer.topic=test Hope this helps. Hequn On Wed, Jul 25, 2018 at 3:26 PM, Cussac, Fran

Re: How to connect more than 2 hetrogenous Streams!!

2018-07-27 Thread Hequn Cheng
Hi Puneet, Flink doesn't support connecting more than 2 streams with different schema. There are ways I think might help you. 1. unify the schema and use union. 2. use multi join to join different streams. Hope this helps. Hequn On Thu, Jul 26, 2018 at 2:29 PM, Puneet Kinra < puneet.ki...@custom

Re: Flink Cluster and Pipeline Version Compatibility?

2018-07-27 Thread Hequn Cheng
Hi, jlist9 > Is it so that the pipeline jars must be build with the same version of the cluster they'll be running on? Most interfaces are supported for backward comparability. And the closer the flink version is, the smaller the differences between interfaces. However, it is not for sure. Hence,

Re: Order of events in a Keyed Stream

2018-07-27 Thread Hequn Cheng
Hi Harshvardhan, There are a number of factors to consider. 1. the consecutive Kafka messages must exist in a same topic of kafka. 2. the data should not been rebalanced. For example, operators should be chained in order to avoid rebalancing. 3. if you perform keyBy(), you should keyBy on a field

Re: Order of events in a Keyed Stream

2018-07-28 Thread Hequn Cheng
ages will exist on the same topic. I intend to keyby on the same > field. The question is that will the two messages be mapped to the same > task manager and on the same slot. Also will they be processed in correct > order given they have the same keys? > > On Fri, Jul 27, 2018 at 21:

Re: Detect late data in processing time

2018-07-30 Thread Hequn Cheng
Hi Soheil, No, we can't set watermark during processing time. And there are no late data considering processing time window. So the problem is what data is bad data when you use processing time? Maybe there are other ways to solve your problem. Best, Hequn On Mon, Jul 30, 2018 at 8:22 PM, Sohei

Re: Counting elements that appear "behind" the watermark

2018-07-30 Thread Hequn Cheng
Hi Julio, If I understand correctly, you want to adjust your watermarks automatically? It is true that there are no direct ways to get metric from the AssignerWithPeriodicWatermarks. Adding ProcessFunction before assignTimestampsAndWatermarks seems a solution. In the ProcessFunction, you can count

Re: scala IT

2018-07-30 Thread Hequn Cheng
Hi Nicos, It is weird. Have you updated your code? I check your code and the function has implemented ResultTypeQueryable. The code should works well. Best, Hequn On Tue, Jul 31, 2018 at 6:30 AM, Nicos Maris wrote: > Hi all, > > > the integration test in scala documented at the testing section

Re: Event time didn't advance because of some idle slots

2018-07-31 Thread Hequn Cheng
Hi Soheil, You can set parallelism to 1 to solve the problem. Or use markAsTemporarilyIdle() as Fabian said(the link maybe is https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.jav

Re: Converting a DataStream into a Table throws error

2018-07-31 Thread Hequn Cheng
Hi, Mich You can try adding "import org.apache.flink.table.api.scala._", so that the Symbol can be recognized as an Expression. Best, Hequn On Wed, Aug 1, 2018 at 6:16 AM, Mich Talebzadeh wrote: > Hi, > > I am following this example > > https://ci.apache.org/projects/flink/flink-docs- > releas

Re: Access to Kafka Event Time

2018-08-01 Thread Hequn Cheng
Hi Vishal, > We have a use case where multiple topics are streamed to hdfs and we would want to created buckets based on ingestion time If I understand correctly, you want to create buckets based on event time. Maybe you can use window[1]. For example, a tumbling window of 5 minutes groups rows in

Re: potential software engineering problems

2018-08-04 Thread Hequn Cheng
Hi Stephen, I think the design part is the most difficult. Sometimes the design needs to be discussed again and again. This is same for developing other applications. Best, Hequn On Sat, Aug 4, 2018 at 8:28 AM, Stephen wrote: > Hi, everyone. I'm a graduate student on software engineering. I fo

Re: Event Time Session Window does not trigger..

2018-08-04 Thread Hequn Cheng
Hi shyla, I answered a similar question on stackoverflow[1], you can take a look first. Best, Hequn [1] https://stackoverflow.com/questions/51691269/event-time-window-in-flink-does-not-trigger On Sun, Aug 5, 2018 at 11:24 AM, shyla deshpande wrote: > Hi, > > I used PopularPlacesFromKafka from

Re: Event Time Session Window does not trigger..

2018-08-05 Thread Hequn Cheng
re question. >> >> Can I set the TimeCharacteristic to the stream level instead on the >> application level? >> Can I use both TimeCharacteristic.ProcessingTime and >> TimeCharacteristic.EventTime in an application. >> >> Thank you >> >> On Sat, Aug 4, 2018

Re: Converting a DataStream into a Table throws error

2018-08-06 Thread Hequn Cheng
mage or destruction of data or any >>>>>>>>>> other >>>>>>>>>> property which may arise from relying on this email's technical >>>>>>>>>> content is >>>>>>>>>>

Re: Accessing source table data from hive/Presto

2018-08-06 Thread Hequn Cheng
Hi srimugunthan, I found a related link[1]. Hope it helps. [1] https://stackoverflow.com/questions/41683108/flink-1-1-3-interact-with-hive-2-1-0 On Tue, Aug 7, 2018 at 2:35 AM, srimugunthan dhandapani < srimugunthan.dhandap...@gmail.com> wrote: > Hi all, > I read the Flink documentation and ca

Re: Running SQL to print to Std Out

2018-08-06 Thread Hequn Cheng
Hi Mich, When you print to stdout on cluster, you have to look at the taskmanager .out file (also available in the UI). Best, Hequn On Tue, Aug 7, 2018 at 4:48 AM, Mich Talebzadeh wrote: > Hi, > > This is the streaming program I have for trade prices following the doc > for result set for tabl

Re: Passing the individual table coilumn values to the local variables

2018-08-07 Thread Hequn Cheng
Hi Mich, We can't convert a DataStream to a value. There are some options: 1. Use a TableSink to write data[1] into Hbase. 2. Use a UDF[2]. Best, Hequn [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sourceSinks.html#define-a-tablesink [2] https://ci.apache.org/projects/flin

Re: flink requires table key when insert into upsert table sink

2018-08-10 Thread Hequn Cheng
Hi, *> Could you give an example that the query has a unique key?* Consider the following sql: SELECT a, SUM(b) as d > FROM Orders > GROUP BY a The result table contains unique key of a. A document about Streaming Concepts[1] may be helpful for you. *> What is the mechanism flink infer which

Re: How to do test in Flink?

2018-08-12 Thread Hequn Cheng
Hi Chang, There are some harness tests which can be used to test your function. It is also a common way to test function or operator in flink internal tests. Currently, the harness classes mainly include: - KeyedOneInputStreamOperatorTestHarness - KeyedTwoInputStreamOperatorTestHarness -

Re: How to do test in Flink?

2018-08-13 Thread Hequn Cheng
rds/祝好, > > Chang Liu 刘畅 > > > On 13 Aug 2018, at 04:01, Hequn Cheng wrote: > > Hi Chang, > > There are some harness tests which can be used to test your function. It > is also a common way to test function or operator in flink internal tests. > Currently, the h

Re: Flink 1.6 ExecutionJobVertex.getTaskInformationOrBlobKey OutOfMemoryError

2018-08-13 Thread Hequn Cheng
Hi, Have you ever increased the memory of job master? If you run a flink job on yarn, you can increase job master's memory by "-yjm 1024m"[1]. Best, Hequn [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/yarn_setup.html#run-a-flink-job-on-yarn On Mon, Aug 13, 2018 at 10

Re: Tuning checkpoint

2018-08-13 Thread Hequn Cheng
Hi mingliang, Considering your first question. I answered it on stack overflow[1]. Hope it helps. Best, Hequn [1] https://stackoverflow.com/questions/51832577/what-may-probably-cause-large-alignment-duration-for-flink-job On Tue, Aug 14, 2018 at 10:16 AM, 祁明良 wrote: > Thank you for this grea

Re: Flink SQL does not support rename after cast type

2018-08-13 Thread Hequn Cheng
Hi Henry, Flink does support rename column after casting. The exception is not caused by cast. It is caused by mixing of types, for example, the query > "CASE 1 WHEN 1 THEN *true* WHEN 2 THEN *'string'* ELSE NULL END" will throw the same exception since type of true and 'string' are not same.

  1   2   3   >