Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-07-31 Thread Hequn Cheng
Hi Jincheng, Thanks a lot for raising the discussion. +1 for the FLIP. I think this will bring big benefits for the PyFlink users. Currently, the Python TableAPI document is hidden deeply under the TableAPI&SQL tab which makes it quite unreadable. Also, the PyFlink documentation is mixed with Jav

Re: [ANNOUNCE] Apache Flink 1.11.1 released

2020-07-22 Thread Hequn Cheng
Thanks Dian for the great work and thanks to everyone who makes this release possible! Best, Hequn On Wed, Jul 22, 2020 at 4:40 PM Jark Wu wrote: > Congratulations! Thanks Dian for the great work and to be the release > manager! > > Best, > Jark > > On Wed, 22 Jul 2020 at 15:45, Yangze Guo wro

Re: [ANNOUNCE] Apache Flink 1.9.3 released

2020-04-25 Thread Hequn Cheng
@Dian, thanks a lot for the release and for being the release manager. Also thanks to everyone who made this release possible! Best, Hequn On Sat, Apr 25, 2020 at 7:57 PM Dian Fu wrote: > Hi everyone, > > The Apache Flink community is very happy to announce the release of Apache > Flink 1.9.3,

Re: [ANNOUNCE] Apache Flink Stateful Functions 2.0.0 released

2020-04-07 Thread Hequn Cheng
Thanks a lot for the release and your great job, Gordon! Also thanks to everyone who made this release possible! Best, Hequn On Tue, Apr 7, 2020 at 8:58 PM Tzu-Li (Gordon) Tai wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink Stateful Functions 2.0.0. >

Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-20 Thread Hequn Cheng
Congratulations Jingsong! Well deserved. Best, Hequn On Fri, Feb 21, 2020 at 2:42 PM Yang Wang wrote: > Congratulations!Jingsong. Well deserved. > > > Best, > Yang > > Zhijiang 于2020年2月21日周五 下午1:18写道: > >> Congrats Jingsong! Welcome on board! >> >> Best, >> Zhijiang >> >> -

Re: [ANNOUNCE] Apache Flink Python API(PyFlink) 1.9.2 released

2020-02-13 Thread Hequn Cheng
Thanks a lot for the release, Jincheng! Also thanks to everyone that make this release possible! Best, Hequn On Thu, Feb 13, 2020 at 2:18 PM Dian Fu wrote: > Thanks for the great work, Jincheng. > > Regards, > Dian > > 在 2020年2月13日,下午1:32,jincheng sun 写道: > > Hi everyone, > > The Apache Flink

Re: [ANNOUNCE] Apache Flink 1.10.0 released

2020-02-12 Thread Hequn Cheng
Great thanks to Yu & Gary for being the release manager! Also thanks to everyone who made this release possible! Best, Hequn On Thu, Feb 13, 2020 at 9:54 AM Rong Rong wrote: > Congratulations, a big thanks to the release managers for all the hard > works!! > > -- > Rong > > On Wed, Feb 12, 2020

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-11 Thread Hequn Cheng
+1 (non-binding) - Check signature and checksum. - Install package successfully with Pip under Python 3.7.4. - Run wordcount example successfully under Python 3.7.4. Best, Hequn On Tue, Feb 11, 2020 at 12:17 PM Dian Fu wrote: > +1 (non-binding) > > - Verified the signature and checksum > - Pip

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread Hequn Cheng
Hi Jincheng, +1 for this proposal. >From the perspective of users, I think it would nice to have PyFlink on PyPI which makes it much easier to install PyFlink. Best, Hequn On Tue, Feb 4, 2020 at 1:09 PM Jeff Zhang wrote: > +1 > > > Xingbo Huang 于2020年2月4日周二 下午1:07写道: > >> Hi Jincheng, >> >> T

Re: [ANNOUNCE] Apache Flink 1.9.2 released

2020-01-31 Thread Hequn Cheng
> > Best, > Jincheng > > Hequn Cheng 于2020年1月31日周五 下午6:55写道: > >> Hi everyone, >> >> The Apache Flink community is very happy to announce the release of >> Apache Flink 1.9.2, which is the second bugfix release for the Apache Flink >> 1.9 series. &

[ANNOUNCE] Apache Flink 1.9.2 released

2020-01-31 Thread Hequn Cheng
Hi everyone, The Apache Flink community is very happy to announce the release of Apache Flink 1.9.2, which is the second bugfix release for the Apache Flink 1.9 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate dat

Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Hequn Cheng
Congratulations, Dian. Well deserved! Best, Hequn On Thu, Jan 16, 2020 at 6:08 PM Leonard Xu wrote: > Congratulations! Dian Fu > > Best, > Leonard > > 在 2020年1月16日,18:00,Jeff Zhang 写道: > > Congrats Dian Fu ! > > jincheng sun 于2020年1月16日周四 下午5:58写道: > >> Hi everyone, >> >> I'm very happy to a

Re: [DISCUSS] Set default planner for SQL Client to Blink planner in 1.10 release

2020-01-04 Thread Hequn Cheng
+1 to make blink planner as the default planner for SQL Client, hence we can give the blink planner a bit more exposure. Best, Hequn On Fri, Jan 3, 2020 at 6:32 PM Jark Wu wrote: > Hi Benoît, > > Thanks for the reminder. I will look into the issue and hopefully we can > target it into 1.9.2 and

[ANNOUNCE] Weekly Community Update 2020/01

2020-01-04 Thread Hequn Cheng
Dear community, Happy new year! Wishing you a new year rich with the blessings of love, joy, warmth, and laughter. Wish Flink will get better and better. Nice to share this week’s community digest with an update on Flink-1.10.0, a proposal to set blink planner as the default planner for SQL Clien

[ANNOUNCE] Weekly Community Update 2019/52

2019-12-29 Thread Hequn Cheng
Dear community, Happy to share a short community update this week. Due to the holiday, the dev@ mailing list is pretty quiet these days. Flink Development == * [sql] Jark proposes to correct the terminology of "Time-windowed Join" to "Interval Join" in Table API & SQL before 1.10 is

[ANNOUNCE] Weekly Community Update 2019/51

2019-12-22 Thread Hequn Cheng
Dear community, Happy to share this week's brief community digest with updates on Flink 1.10 and Flink 1.9.2, a proposal to integrate Flink Docker image publication into Flink release process, a discussion on new features of PyFlink and a couple of blog posts. Enjoy. Flink Development ===

Re: [ANNOUNCE] Weekly Community Update 2019/50

2019-12-17 Thread Hequn Cheng
king forward to your updates! > > Cheers, > > Konstantin > > On Mon, Dec 16, 2019 at 5:53 AM Hequn Cheng wrote: > >> Hi Konstantin, >> >> Happy holidays and thanks a lot for your great job on the updates >> continuously. >> With the updates,

Re: [ANNOUNCE] Weekly Community Update 2019/50

2019-12-15 Thread Hequn Cheng
Hi Konstantin, Happy holidays and thanks a lot for your great job on the updates continuously. With the updates, it is easier for us to catch up with what's going on in the community, which I think is quite helpful. I'm wondering if I can do some help and cover this during your vocation. :) Best

[ANNOUNCE] Apache Flink 1.8.3 released

2019-12-11 Thread Hequn Cheng
Hi, The Apache Flink community is very happy to announce the release of Apache Flink 1.8.3, which is the third bugfix release for the Apache Flink 1.8 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streamin

Re: [ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Hequn Cheng
Thanks a lot to Jark, Jincheng, and everyone that make this release possible. Best, Hequn On Sat, Oct 19, 2019 at 10:29 PM Zili Chen wrote: > Thanks a lot for being release manager Jark. Great work! > > Best, > tison. > > > Till Rohrmann 于2019年10月19日周六 下午10:15写道: > >> Thanks a lot for being ou

Re: [PROPOSAL] Contribute Stateful Functions to Apache Flink

2019-10-12 Thread Hequn Cheng
Hi Stephan, Big +1 for adding this to Apache Flink! As for the problem of whether this should be added to the Flink main repository, from my side, I prefer to put it in the main repository. Not only Stateful Functions shares very close relations with the current Flink, but also other libs or modu

Re: [External] Re: From Kafka Stream to Flink

2019-09-19 Thread Hequn Cheng
://github.com/apache/flink/blob/master/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/planner/runtime/stream/sql/SplitAggregateITCase.scala#L228 On Thu, Sep 19, 2019 at 9:40 PM Casado Tejedor, Rubén < ruben.casado.teje...@accenture.com> wrote: > Thanks Fabian. @Hequn Che

Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread Hequn Cheng
Congratulations! Best, Hequn On Thu, Sep 12, 2019 at 9:24 AM Jark Wu wrote: > Congratulations Zili! > > Best, > Jark > > On Wed, 11 Sep 2019 at 23:06, wrote: > > > Congratulations, Zili. > > > > > > > > Best, > > > > Xingcan > > > > > > > > *From:* SHI Xiaogang > > *Sent:* Wednesday, Septembe

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-15 Thread Hequn Cheng
Congratulations Andrey! On Thu, Aug 15, 2019 at 3:30 PM Fabian Hueske wrote: > Congrats Andrey! > > Am Do., 15. Aug. 2019 um 07:58 Uhr schrieb Gary Yao : > > > Congratulations Andrey, well deserved! > > > > Best, > > Gary > > > > On Thu, Aug 15, 2019 at 7:50 AM Bowen Li wrote: > > > > > Congrat

Re: Implementing a low level join

2019-08-14 Thread Hequn Cheng
tions. My question is more about if I can have an operator > which decides beteween broadcast and regular join dynamically. I suppose I > will have to extend the generic TwoInputStreamOperator in Flink. Do you > have any suggestion? > > Thanks > > On Wed, 14 Aug 2019, 03

Re: Implementing a low level join

2019-08-13 Thread Hequn Cheng
Hi Felipe, > I want to implement a join operator which can use different strategies for joining tuples. Not all kinds of join strategies can be applied to streaming jobs. Take sort-merge join as an example, it's impossible to sort an unbounded data. However, you can perform a window join and use t

Re: Is it possible to decide the order of where conditions in Flink SQL

2019-07-28 Thread Hequn Cheng
Hi Tony, There is no order guarantee for filter conditions. The conditions would be pushed down or merged during query optimization. However, you can use the case when[1] to achieve what you want. The code looks like: CASE WHEN !user.is_robot THEN true WHEN UDF_NEED_TO_QUERY_DB(user) THEN true EL

Re: GroupBy result delay

2019-07-24 Thread Hequn Cheng
have parallelism = 32 and only one task has the record. Can you > please elaborate more on why this would affect the watermark advancement? > 3. Event create time is in ms > 4. data span time > window time. I don't quite understand why this > matters. > > Thanks, > Fanbi

Re: GroupBy result delay

2019-07-23 Thread Hequn Cheng
Hi Fanbin, Fabian is right, it should be a watermark problem. Probably, some tasks of the source don't have enough data to advance the watermark. Furthermore, you could also monitor event time through Flink web interface. I have answered a similar question on stackoverflow, see more details here[1

Re: table toRetractStream missing last record and adding extra column (True)

2019-07-17 Thread Hequn Cheng
Hi Sri, Question1: You can use a map to filter the "true", i.e, ds.map(_._2). Note, it's ok to remove the "true" flag for distinct as it does not generate updates. For other query contains updates, such as a non-window group by, we should not filter the flag or the result is not correct. Question

Re: org.apache.flink.table.api.TableException: Only tables that originate from Scala DataStreams can be converted to Scala DataStreams.

2019-07-17 Thread Hequn Cheng
Hi Sri, For scala jobs, we should import the corresponding scala Environment and DataStream. e.g, import org.apache.flink.streaming.api.scala.{DataStream, StreamExecutionEnvironment} import org.apache.flink.table.api.scala.StreamTableEnvironment See example here[1]. Best, Hequn [1] https://gith

Re: [flink 1.8.1]window closed unexpectedly and data drop

2019-07-13 Thread Hequn Cheng
Hi Ever, The window only fires when the watermark passes the end of a window. > now the fourth data came with timestamp:03:17:55, at that time, a new window should be open, and the previous window should closed The previous window may not close if the watermark hasn't passed the end of the wind

Re: Table API and ProcessWindowFunction

2019-07-11 Thread Hequn Cheng
ining/exercises/datastream_java/datatypes/TaxiRide.java > > On Thu, Jul 11, 2019 at 10:01 AM Flavio Pompermaier > wrote: > >> Thanks Hequn, I'll give it a try! >> >> Best, Flavio >> >> On Thu, Jul 11, 2019 at 3:38 AM Hequn Cheng wrote:

Re: [ANNOUNCE] Rong Rong becomes a Flink committer

2019-07-11 Thread Hequn Cheng
Congratulations Rong! Best, Hequn On Fri, Jul 12, 2019 at 12:19 PM Jeff Zhang wrote: > Congrats, Rong! > > > vino yang 于2019年7月12日周五 上午10:08写道: > >> congratulations Rong Rong! >> >> Fabian Hueske 于2019年7月11日周四 下午10:25写道: >> >>> Hi everyone, >>> >>> I'm very happy to announce that Rong Rong ac

Re: Table API and ProcessWindowFunction

2019-07-10 Thread Hequn Cheng
nt this? > > On Tue, Jul 9, 2019 at 4:15 AM Hequn Cheng wrote: > >> Hi Flavio, >> >> Thanks for your information. >> >> From your description, it seems that you only use the window to get the >> start and end time. There are no aggregations happen. If

Re: Add Logical filter on a query plan from Flink Table API

2019-07-09 Thread Hequn Cheng
cite/blob/master/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipeogutierrez.blogspot.com>* > > > On Tue, Jul 9, 20

Re: Add Logical filter on a query plan from Flink Table API

2019-07-09 Thread Hequn Cheng
org/apache/calcite/rel/core/RelFactories.html#LOGICAL_BUILDER > *--* > *-- Felipe Gutierrez* > > *-- skype: felipe.o.gutierrez* > *--* *https://felipeogutierrez.blogspot.com > <https://felipeogutierrez.blogspot.com>* > > > On Tue, Jul 9, 2019 at 5:06 AM Hequn Cheng wrote: >

Re: Add Logical filter on a query plan from Flink Table API

2019-07-08 Thread Hequn Cheng
Hi Felipe, > I would like to create a logical filter if there is no filter set on the logical query. How should I implement it? Do you mean you want to add a LogicalFilter node if the query even doesn't contain filter? Logically, this can be done through a rule. However, it sounds a little hack an

Re: Table API and ProcessWindowFunction

2019-07-08 Thread Hequn Cheng
the UserId. The > problem here is that I need to "materialize" them using Debezium (or > similar) via Kafka and dynamic tables..is there any example of keeping > multiple tables synched with Flink state through Debezium (without the need > of rewriting all the logic for managing UPD

Re: Table API and ProcessWindowFunction

2019-07-08 Thread Hequn Cheng
Hi Flavio, Nice to hear your ideas on Table API! Could you be more specific about your requirements? A detailed scenario would be quite helpful. For example, do you want to emit multi records through the collector or do you want to use the timer? BTW, Table API introduces flatAggregate recently(

Re: [ANNOUNCE] Apache Flink 1.8.1 released

2019-07-02 Thread Hequn Cheng
Thanks for being the release manager and the great work Jincheng! Also thanks to Gorden and the community making this release possible! Best, Hequn On Wed, Jul 3, 2019 at 9:40 AM jincheng sun wrote: > Hi, > > The Apache Flink community is very happy to announce the release of Apache > Flink 1.8

Re: Idle windows

2019-06-21 Thread Hequn Cheng
Hi Ustinov, I guess you have mixed the concept between remainder and the parallelism, i.e., data with remainder 0 don't mean they will be processed by the 0th task after keyBy. Flink will perform a Hash function on the key you have provided, and partition the record based on the key group range.

Re: How can I improve this Flink application for "Distinct Count of elements" in the data stream?

2019-06-12 Thread Hequn Cheng
Hi Felipe, >From your code, I think you want to get the "count distinct" result instead of the "distinct count". They contain a different meaning. To improve the performance, you can replace your DistinctProcessWindowFunction to a DistinctProcessReduceFunction. A ReduceFunction can aggregate the

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread Hequn Cheng
+1 on the proposal! Maintaining only one Python API is helpful for users and contributors. Best, Hequn On Wed, Jun 12, 2019 at 9:41 AM Jark Wu wrote: > +1 and looking forward to the new Python API world. > > Best, > Jark > > On Wed, 12 Jun 2019 at 09:22, Becket Qin wrote: > >> +1 on deprecatin

Re: [SURVEY] Usage of flink-ml and [DISCUSS] Delete flink-ml

2019-05-27 Thread Hequn Cheng
Hi Shaoxuan, Thanks a lot for driving this. +1 to remove the module. The git log of this module shows that it has been inactive for a long time. I think it's ok to remove it for now. It would also be good to switch to the new interface earlier. Best, Hequn On Mon, May 27, 2019 at 8:58 PM Becket

Re: Reconstruct object through partial select query

2019-05-12 Thread Hequn Cheng
Hi shahar, An easier way to solve your problem is to use a Row to store your data instead of the `TaggedEvent `. I think this is what Fabian means. In this way, you don't have to define the user-defined TypeFactory and use the Row type directly. Take `TaggedEvent` as an example, the corresponding

Re: Rich and incrementally aggregating window functions

2019-05-08 Thread Hequn Cheng
Hi, There is a discussion about this before, you can take a look at it[1]. Best, Hequn [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Regarding-implementation-of-aggregate-function-using-a-ProcessFunction-td23473.html#a23531 On Thu, May 9, 2019 at 5:14 AM an0 wrote: >

Re: 回复:Is it possible to handle late data when using table API?

2019-04-16 Thread Hequn Cheng
Hi Lasse, > some devices can deliver data days back in time and I would like to have the results as fast as possible. What JingsongLee said is correct. However, it's possible to handle your problem with Table API according to your description above. You can use the non-window(or unbounded) aggre

Re: Identify orphan records after joining two streams

2019-04-15 Thread Hequn Cheng
Hi Averell, > I feel that it's over-complicated I think a Table API or SQL[1] job can also achieve what you want. Probably more simple and takes up less code. The workflow looks like: 1. union all two source tables. You may need to unify the schema of the two tables as union all can only used to u

Re: Join of DataStream and DataSet

2019-04-14 Thread Hequn Cheng
Hi Reminia, Currently, we can't join a DataStream with a DataSet in Flink. However, the DataSet is actually a kind of bounded stream. From the point of this view, you can use a streaming job to achieve your goal. Flink Table API & SQL support different kinds of join[1]. You can take a closer look

Re: [ANNOUNCE] Apache Flink 1.8.0 released

2019-04-10 Thread Hequn Cheng
Thanks a lot for the great release Aljoscha! Also thanks for the work by the whole community. :-) Best, Hequn On Wed, Apr 10, 2019 at 6:03 PM Fabian Hueske wrote: > Congrats to everyone! > > Thanks Aljoscha and all contributors. > > Cheers, Fabian > > Am Mi., 10. Apr. 2019 um 11:54 Uhr schrieb

Re: Use different functions for different signal values

2019-04-02 Thread Hequn Cheng
Hi Marke, Ken is right. We can use split and select to achieve what you want. Besides, I am thinking if there is a lot of ingesting signals with unique Id's, why not use one function and process different logic in the function. For example, we can call different methods in the function according

Re: How to join stream and dimension data in Flink?

2019-03-13 Thread Hequn Cheng
currency * > > 6. How to use temporal table join to do left join? > > > Best > Henry > > 在 2019年3月13日,上午12:02,Hequn Cheng 写道: > > Hi Henry, > > Yes, you are correct. Basically, there are two ways you can use to join a > Temporal Table. One is provided in Flink

Re: How to join stream and dimension data in Flink?

2019-03-12 Thread Hequn Cheng
e is a > new feature named Temporal Tables delivered by Flink1.7, I think it maybe > could be used to achieve the join between stream and dimension table. But I > am not sure about that. Could anyone help me about it? > Thanks a lot for your help. > > Best > Henry > > 在 2

Re: Flink window triggering and timing on connected streams

2019-02-25 Thread Hequn Cheng
Hi Andrew, > I have an “end session” event that I want to cause the window to fire and purge. Do you want to fire the window only by the 'end session' event? I see one option to solve the problem. You can use a tumbling window(say 5s) and set your timestamp to t‘+5s each time receiving an 'end se

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-21 Thread Hequn Cheng
Hi Stephan, Thanks for summarizing the great roadmap! It is very helpful for users and developers to track the direction of Flink. +1 for putting the roadmap on the website and update it per release. Besides, would be great if the roadmap can add the UpsertSource feature(maybe put it under 'Batch

Re: [ANNOUNCE] Apache Flink 1.7.2 released

2019-02-17 Thread Hequn Cheng
Thanks a lot for the great release @Gordon. Also thanks for the work by the whole community. :-) Best, Hequn On Mon, Feb 18, 2019 at 2:12 PM jincheng sun wrote: > Thanks a lot for being our release manager Gordon , > Great job! > And also a big thanks to the community for making this release

Re: [ANNOUNCE] New Flink PMC member Thomas Weise

2019-02-12 Thread Hequn Cheng
Congrats Thomas! Best, Hequn On Tue, Feb 12, 2019 at 6:53 PM Stefan Richter wrote: > Congrats Thomas!, > > Best, > Stefan > > Am 12.02.2019 um 11:20 schrieb Stephen Connolly < > stephen.alan.conno...@gmail.com>: > > Congratulations to Thomas. I see that this is not his first time in the > PMC

Re: How to create schema for Flink table

2019-01-26 Thread Hequn Cheng
Hi Soheil, DataSet can be converted to or from a Table. More details here: https://ci.apache.org/projects/flink/flink-docs-master/dev/table/common.html#convert-a-datastream-or-dataset-into-a-table Let me know if you have any questions. Best, Hequn On Sun, Jan 27, 2019 at 5:37 AM Soheil Pourbafra

Re: Query on retract stream

2019-01-26 Thread Hequn Cheng
difference > from performance perspective in choosing between the two or both should be > equally performant? > > Gagan > > On Sat, Jan 26, 2019 at 11:50 AM Hequn Cheng wrote: > >> Hi Gagan, >> >> Time attribute fields will be materialized by the unbounded groupby

Re: Print table contents

2019-01-25 Thread Hequn Cheng
Hi Soheil, There is no print() or show() method in Table. As a workaround, you can convert[1] the Table into a DataSet and perform print() or collect() on the DataSet. You have to pay attention to the differences between DataSet.print() and DataSet.collect(). For DataSet.print(), prints the elemen

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-25 Thread Hequn Cheng
Hi Chesnay, Thanks a lot for the proposal! +1 for a leaner flink-dist and improve the "Download" page. I think a leaner flink-dist would be very helpful. If we bundle all jars into a single one, this will easily cause class conflict problem. Best, Hequn On Fri, Jan 25, 2019 at 2:48 PM jincheng

Re: Query on retract stream

2019-01-25 Thread Hequn Cheng
are getting generated from mysql bin >> log where I have seen multiple event updates getting generated with same >> timestamp (may be because they are part of same transaction) and hence will >> need bin log offset along with timestamp to be able to sort them correctly. >> So

Re: Is there a way to get all flink build-in SQL functions

2019-01-24 Thread Hequn Cheng
Hi yinhua, As Chesnay suggest, document is a good way. You can find descriptions and example for each udf. If you only want to get a list of name, you can also take a look at the flink code(i.e., the BasicOperatorTable.builtInSqlOperators

Re: Query on retract stream

2019-01-21 Thread Hequn Cheng
Hi Gagan, > But I also have a requirement for event time based sliding window aggregation Yes, you can achieve this with Flink TableAPI/SQL. However, currently, sliding windows don't support early fire, i.e., only output results when event time reaches the end of the window. Once window fires, th

Re: Query on retract stream

2019-01-21 Thread Hequn Cheng
Hi Gagan, Yes, you can achieve this with Flink TableAPI/SQL. However, you have to pay attention to the following things: 1) Currently, Flink only ingests append streams. In order to ingest upsert streams(steam with keys), you can use groupBy with a user-defined LAST_VALUE aggregate function. For i

Re: There is no classloader in TableFactoryUtil using ConnectTableDescriptor to registerTableSource

2019-01-15 Thread Hequn Cheng
Hi Joshua, Could you use `TableFactoryService` directly to register TableSource? The code looks like: final TableSource tableSource = > TableFactoryService.find(StreamTableSourceFactory.class, > streamTableDescriptor, classloader) > .createStreamTableSource(propertiesMap); > tableEnv.registerTabl

Re: Multiple select single result

2019-01-13 Thread Hequn Cheng
Hi dhanuka, > I am trying to deploy 200 SQL unions and it seems all the tasks getting failing after some time. Would be great if you can show us some information(say exception stack) about the failure. Is it caused by OOM of job manager? > How do i allocate memory for task manager and job manager

Re: breaking the pipe line

2019-01-12 Thread Hequn Cheng
Hi Alieh, Which kind of API do you use? TableApi or SQL or DataStream or DataSet. Would be great if you can show us some information about your pipeline or provide a way to reproduce the problem. Best, Hequn On Sat, Jan 12, 2019 at 1:58 AM Alieh wrote: > Hello all, > > I have a very very long

Re: Is the eval method invoked in a thread safe manner in Fink UDF

2019-01-12 Thread Hequn Cheng
Hi Anil, It is thread-safe. Each udf instance will only run in one task. And for each udf, it processes data synchronously, i.e, the next record will not be processed until the current record is processed. Best, Hequn On Sat, Jan 12, 2019 at 3:12 AM Anil wrote: > Is the eval method invoked in

Re: onTimer function is not getting executed and job is marked as finished.

2019-01-08 Thread Hequn Cheng
t can reproduce your problem easily. Best, Hequn On Tue, Jan 8, 2019 at 11:44 AM Puneet Kinra < puneet.ki...@customercentria.com> wrote: > Hi hequan > > Weird behaviour when i m calling ctx.timeservice() function is getting > exited even not throwing error > > On Tuesday,

Re: How to get the temp result of each dynamic table when executing Flink-SQL?

2019-01-08 Thread Hequn Cheng
Hi, A print user-defined table sink is helpful. I think a print user-defined UDF is another workaround. Hope this helps. Best, Hequn On Tue, Jan 8, 2019 at 1:45 PM yinhua.dai wrote: > In our case, we wrote a console table sink which print everything on the > console, and use "insert into" to w

Re: onTimer function is not getting executed and job is marked as finished.

2019-01-07 Thread Hequn Cheng
Timo >> >> Am 07.01.19 um 14:15 schrieb Puneet Kinra: >> >> Hi Hequn >> >> Its a streaming job . >> >> On Mon, Jan 7, 2019 at 5:51 PM Hequn Cheng wrote: >> >>> Hi Puneet, >>> >>> The value of the registered timer sh

Re: onTimer function is not getting executed and job is marked as finished.

2019-01-07 Thread Hequn Cheng
Hi Puneet, The value of the registered timer should within startTime and endTime of your job. For example, job starts at processing time t1 and stops at processing time t2. You have to make sure t1< `parseLong + 5000` < t2. Best, Hequn On Mon, Jan 7, 2019 at 5:50 PM Puneet Kinra < puneet.ki...@c

Re: Flink SQL client always cost 2 minutes to submit job to a local cluster

2019-01-03 Thread Hequn Cheng
Hi yinhua, Could you help to reproduce the problem? I can help to figure out the root cause. Best, Hequn On Fri, Jan 4, 2019 at 11:37 AM yinhua.dai wrote: > Hi Fabian, > > It's the submission of the jar file cost too long time. > And yes Hequn and your suggestion is working, but just curious

Re: Flink SQL client always cost 2 minutes to submit job to a local cluster

2018-12-28 Thread Hequn Cheng
Hi, yinhua Thanks for looking into the problem. I'm not familiar with the code of these part. As a workaround, you can put your jars into the flink lib folder or add your jars into the classpath. Hope this helps. Best, Hequn On Fri, Dec 28, 2018 at 11:52 AM yinhua.dai wrote: > I am using Flin

Re: Timestamp conversion problem in Flink Table/SQL

2018-12-28 Thread Hequn Cheng
Hi Jiayichao, The two methods do not have to appear in pairs, so you can't use timestamp.getTime() directly. Currently, Flink doesn't support time zone configuration. The timestamp(time of type Timestamp) always means the time in UTC+0. So in the test of your pr[1], the output timestamp means a ti

Re: Flink 1.7 doesn't work with Kafka Table Source Descriptors

2018-12-23 Thread Hequn Cheng
> org.apache.flink.table.descriptors.ConnectTableDescriptor.registerTableSource(ConnectTableDescriptor.scala:46) > at > org.monitoring.stream.analytics.FlinkTableSourceLatest.main(FlinkTableSourceLatest.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMeth

Re: Flink 1.7 doesn't work with Kafka Table Source Descriptors

2018-12-22 Thread Hequn Cheng
Hi dhanuka, I failed to reproduce your error with release-1.7.0. It seems Kafka.toConnectorProperties() should be called instead of ConnectorDescriptor.toConnectorProperties(), the latter one is an abstract class, which lead to the AbstractMethodError. >From the picture uploaded, it is strange th

Re: Watermark not firing to push data

2018-12-15 Thread Hequn Cheng
Hi Vijay, Could you provide more information about your problem? For example - Which kind of window do you use? - What's the window size? - A relatively complete code is better :-) As for the problem, it is probably the event time has not reached the end of the window. You can monitor the waterma

Re: Question about key group / key state & parallelism

2018-12-12 Thread Hequn Cheng
gt; > How can I deal with that ? > > Best Regard and many thanks ! > Bastien > -- > > Bastien DINE > Data Architect / Software Engineer / Sysadmin > bastiendine.io > > > Le mer. 12 déc. 2018 à 13:39, Hequn Cheng a écrit : > >> Hi Bastien, >>

Re: Question about key group / key state & parallelism

2018-12-12 Thread Hequn Cheng
Hi Bastien, Each key “belongs” to exactly one parallel instance of a keyed operator, and each parallel instance contains one or more Key Groups. Keys will be hashed into the corresponding key group deterministically. It is hashed by the value instead of the number of the total records. Different k

Re: sql program throw exception when new kafka with csv format

2018-12-11 Thread Hequn Cheng
Hi Marvin, I had taken a look at the Flink code. It seems we can't use CSV format for Kafka. You can use JSON instead. As the exception shows, Flink can't find a suitable DeserializationSchemaFactory. Currently, only JSON and Avro support DeserializationSchemaFactory. Best, Hequn On Tue, Dec 11,

Re: How flink table api to join with mysql dimtable

2018-11-14 Thread Hequn Cheng
Hi yelun, Currently, there are no direct ways to dynamically load data and do join in Flink-SQL, as a workaround you can implement your logic with an udtf. In the udtf, you can load the data into a cache and update it according to your requirement. Best, Hequn On Wed, Nov 14, 2018 at 10:34 AM ye

Re: Field could not be resolved by the field mapping when using kafka connector

2018-11-13 Thread Hequn Cheng
Hi jeff, We need a different field name for the rowtime indicator, something looks like: > new Schema() > .field("status", Types.STRING) > .field("direction", Types.STRING) > .field("rowtime", Types.SQL_TIMESTAMP).rowtime( > new > Rowtime().timestampsFromFiel

Re: Confused window operation

2018-11-13 Thread Hequn Cheng
Hi Jeff, The window is not a global window. It is related to a specified key. You would have 6 windows after flatMap() and keyBy(). key: hello with 3 windows key: world with 1 window key: flink with 1 window key: hadoop with 1 window Best, Hequn On Wed, Nov 14, 2018 at 10:31 AM Jeff Zhang wrot

Re: DataStream with one DataSource and two different Sinks with diff. schema.

2018-11-09 Thread Hequn Cheng
Hi Marke, You can use split() and select() as is shown here[1]. Best, Hequn [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/#datastream-transformations On Sat, Nov 10, 2018 at 12:23 AM Marke Builder wrote: > Hi, > > what is the recommended way to implement the

Re: ProcessFunction's Event Timer not firing

2018-11-08 Thread Hequn Cheng
Hi Fritz, Watermarks are merged on stream shuffles. If one of the input's watermark not progressing, they will not advance the event time at the operators. I think you should decrease the parallelism of source and make sure there are data in each of your source partition. Note that the Kafka sourc

Re: Kinesis Connector

2018-11-02 Thread Hequn Cheng
Hi Steve, I think we can check the following things step by step: 1. Confirm if the data source has data. 2. Take a look at the log of Taskmanager or Jobmanager and check if there are exceptions. 3. Take a thread dump to see what was doing in the TaskManager. Best, Hequn On Fri, Nov 2, 2018 at

Re: Flink SQL questions

2018-11-01 Thread Hequn Cheng
Hi Michael, There are some test cases in Flink git, such as[1] which I think may help you. Best, Hequn [1] https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/test/java/org/apache/flink/table/runtime/stream/sql/JavaSqlITCase.java On Fri, Nov 2, 2018 at 7:46 AM TechnoMage

Re: Parallelize an incoming stream into 5 streams with the same data

2018-11-01 Thread Hequn Cheng
this doesn't seem to be too helpful as the keyBy KeyedStream is lost in > the next step: > > .keyBy(*d._1,d._2*) > .timeWindowAll(org.apache.flink.streaming.api.windowing.time.Time.of(FIVE, > TimeUnit.SECONDS)) > .process(new WindowProcessing(FIVE_SECONDS)) > > > TIA, > &

Re: Ask about counting elements per window

2018-10-31 Thread Hequn Cheng
Hi Rad, You can take a look at the group window[1] of SQL. I think it may help you. Best, Hequn [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql.html#aggregations On Thu, Nov 1, 2018 at 12:53 AM Rad Rad wrote: > Hi All, > > I have a GPS stream consumed by FlinkKafkaCons

Re: Table API and AVG on dates

2018-10-30 Thread Hequn Cheng
Hi Flavio, You are right. Avg on dates is not supported. It requires numeric types. As a workaround, you can transform the datetime into a numeric type using an udf. Best, Hequn On Wed, Oct 31, 2018 at 1:51 AM Flavio Pompermaier wrote: > Hi to all, > I'm using Flink 1.6.1 and it seems that ave

Re: How to tune Hadoop version in flink shaded jar to Hadoop version actually used?

2018-10-29 Thread Hequn Cheng
Hi Henry, You can specify a specific Hadoop version to build against: > mvn clean install -DskipTests -Dhadoop.version=2.6.1 More details here[1]. Best, Hequn [1] https://ci.apache.org/projects/flink/flink-docs-master/flinkDev/building.html#hadoop-versions On Tue, Oct 30, 2018 at 10:02 AM vi

Re: Parallelize an incoming stream into 5 streams with the same data

2018-10-25 Thread Hequn Cheng
Hi Vijay, Option 1 is the right answer. `keyBy1` and `keyBy2` contain all data in `inputStream`. While option 2 replicate all data to each task and option 3 split data into smaller groups without duplication. Best, Hequn On Fri, Oct 26, 2018 at 7:01 AM Vijay Balakrishnan wrote: > Hi, > I need

Re: Accumulating a batch

2018-10-25 Thread Hequn Cheng
Hi Austin, You can use GroupBy Window[1], such as TUMBLE Window. The size of the window either as time or row-count interval. You can also define your own User-Defined Aggregate Functions[2] to be used in window. Best, Hequn [1] https://ci.apache.org/projects/flink/flink-docs-master/dev/table/ta

Re: Checkpoint acknowledge takes too long

2018-10-25 Thread Hequn Cheng
so I tried > to improve the sink write. After that the end to end duration is below 1s > and the checkpoint timeout is fixed. > > Best > Henry > > > 在 2018年10月24日,下午10:43,徐涛 写道: > > Hequn & Kien, > Thanks a lot for your help, I will try it later. > > Best >

Re: Checkpoint acknowledge takes too long

2018-10-24 Thread Hequn Cheng
Hi Henry, @Kien is right. Take a thread dump to see what was doing in the TaskManager. Also check whether gc happens frequently. Best, Hequn On Wed, Oct 24, 2018 at 5:03 PM 徐涛 wrote: > Hi > I am running a flink application with parallelism 64, I left the > checkpoint timeout default v

Re: Dynamically Generated Classes - Cannot load user class

2018-10-22 Thread Hequn Cheng
Hi shkob > i tried to use getRuntimeContext().getUserCodeClassLoader() as the loader to use for Byte Buddy - but doesnt seem to be enough. >From the log, it seems that the user class can not be found in the classloader. > Cannot load user class: commodel.MyGeneratedClass Have you ever tried

Re: Trigger Firing for Late Window Elements

2018-10-19 Thread Hequn Cheng
Hi Scott, Yes, the window trigger firing for every single late element. If you only want the window to be triggered once, you can: - Remove the allowedLateness() - Use BoundedOutOfOrdernessTimestampExtractor to emit Watermarks that lag behind the element. The code(scala) looks like: > c

  1   2   3   >