Re: When using the batch api, the sink task is always in the created state.

2021-09-07 Thread lec ssmi
quot; (Flink >= 1.12). > > Caizhi Weng 于2021年9月7日周二 下午2:47写道: > >> Hi! >> >> If you mean batch SQL then you'll need to prepare enough task slots for >> all subtasks. The number of task slots needed is the sum of parallelism of >> all subtasks as there is

Re: When using the batch api, the sink task is always in the created state.

2021-09-06 Thread lec ssmi
And My flink version is 1.11.0 lec ssmi 于2021年9月7日周二 下午2:11写道: > Hi: >I'm not familar with batch api .And I write a program just like > "insert into tab_a select * from tab_b". >From the picture, there are only two tasks, one is the source task > whic

When using the batch api, the sink task is always in the created state.

2021-09-06 Thread lec ssmi
Hi: I'm not familar with batch api .And I write a program just like "insert into tab_a select * from tab_b". From the picture, there are only two tasks, one is the source task which is in RUNNING state. And the other one is sink task which is always in CREATE state. According to logs, I

Re: Re: checkpoint delay consume message

2020-12-23 Thread lec ssmi
Checkpoint can be done synchronously and asynchronously, the latter is the default . If you chooese the synchronous way , it may cause this problem. nick toker 于2020年12月23日周三 下午3:53写道: > Hi Yun, > > Sorry but we didn't understand your questions. > The delay we are experiencing is on the *read

Re: checkpoint interval and hdfs file capacity

2020-11-09 Thread lec ssmi
Hi > No matter what interval you set, Flink will take care of the > checkpoints(remove the useless checkpoint when it can), but when you set a > very small checkpoint interval, there may be much high pressure for the > storage system(here is RPC pressure of HDFS NN). > > Best, >

checkpoint interval and hdfs file capacity

2020-11-09 Thread lec ssmi
Hi, if I set the checkpoint interval to be very small, such as 5 seconds, will there be a lot of state files on HDFS? In theory, no matter what the interval is set, every time you checkpoint, the old file will be deleted and new file will be written, right?

Re: runtime memory management

2020-08-31 Thread lec ssmi
jobs are pre-planned for each > operator. > > Thank you~ > > Xintong Song > > > > On Mon, Aug 31, 2020 at 1:33 PM lec ssmi wrote: > >> HI: >> Generally speaking, when we submitting the flink program, the number of >> taskmanager and the memory of each tn

runtime memory management

2020-08-30 Thread lec ssmi
HI: Generally speaking, when we submitting the flink program, the number of taskmanager and the memory of each tn will be specified. And the smallest real execution unit of flink should be operator. Since the calculation logic corresponding to each operator is different, some need to save the

Re: Mongodb_sink

2020-07-14 Thread lec ssmi
First of all, mongodb itself does not support transactions. But you can init a buffer to save each row when transaction begins, and save all the buffered records to database once when transaction completes. C DINESH 于2020年7月15日周三 上午9:28写道: > Hello all, > > Can we implement TwoPhaseCommitPro

Re: the group key is retracted

2020-06-30 Thread lec ssmi
The old value is already counted in a partition, and when the above update occurs, will the count value of the old partition be subtracted by 1, and then added to the new partition? Benchao Li 于2020年7月1日周三 下午1:11写道: > Hi lec ssmi, > > > If the type value of a record is up

Re: the group key is retracted

2020-06-30 Thread lec ssmi
I think the old value will not retract, because the type value update, it > will be calculate in new value, the old value will not be updated > > > -原始邮件----- > *发件人:*"lec ssmi" > *发送时间:*2020-07-01 09:53:39 (星期三) > *收件人:* flink-user > *抄送:* > *主题:* the g

the group key is retracted

2020-06-30 Thread lec ssmi
Hi: When we use sql for aggregation operation, for example, the following sql >select count( distinct name) cnt, type from table group by type Source data can be regarded as bin log data. If the type value of a record is updated in the database, the values before and after the

The trigger of State TTL

2020-06-02 Thread lec ssmi
As we known, flink can set certain TTL config for the state, so that the state can be automatically cleared after being idle for a period of time. But if there is no new record coming in after setting the TTL config , will the state be automatically cleared after a certain time? Or does it re

Pojo List and Map Data Type in UDFs

2020-05-24 Thread lec ssmi
Hi: I received a java pojo serialized json string from kafka, and I want to use UDTF to restore it to a table with a similar structure to pojo. Some member variables of pojo use the List type or Map type whose generic type is also a pojo . The sample code as bellow: public class Car {

The order of Retract Record

2020-05-18 Thread lec ssmi
Hi: When encountering Retract, there was a following sql : *select count(1) count, word group by word* Suppose the current aggregation result is : * 'hello'->3* When there is record to come again, the count of 'hello' will be changed to 4. The following two records will be generated i

Re: TableConfig TTL and Watermark TTL

2020-05-12 Thread lec ssmi
operators. > > lec ssmi 于2020年5月12日周二 下午8:43写道: > >> Then if I don't write time constraints, >> will it expire with the TTL time configured by TableConfig? >> Benchao Li 于 2020年5月12日周二 20:27写道: >> >>> The state will be cleaned with watermark movement. &

Re: TableConfig TTL and Watermark TTL

2020-05-12 Thread lec ssmi
Then if I don't write time constraints, will it expire with the TTL time configured by TableConfig? Benchao Li 于 2020年5月12日周二 20:27写道: > The state will be cleaned with watermark movement. > > lec ssmi 于2020年5月12日周二 下午5:55写道: > >> Hi: >> If I join two streams in S

TableConfig TTL and Watermark TTL

2020-05-12 Thread lec ssmi
Hi: If I join two streams in SQL, the time range is used as a condition, similar to the time interval join in DataStream. So, will this join state expire as the watermark moves, or will it expire with the TTL time configured by TableConfig? Or both? Best Lec Ssmi

async IO in UDFs

2020-05-07 Thread lec ssmi
Hi: Is there any way to implements async IO in UDFs (scalar function, table function, aggregate function)?

Re: multiple joins in one job

2020-05-05 Thread lec ssmi
ime attribute, the planner will give you > an Exception if you did that. > > > lec ssmi 于2020年5月5日周二 下午8:34写道: > >> As you said, if I select all the time attribute fields from >> both , which will be the final one? >> >> Benchao Li 于 2020年5月5日周二

Re: multiple joins in one job

2020-05-05 Thread lec ssmi
> from one of the input, then it will be time attribute automatically. > > lec ssmi 于2020年5月5日周二 下午4:42写道: > >> But I have not found there is any syntax to specify time >> attribute field and watermark again with pure sql. >> >> Fabian Hueske 于

Re: multiple joins in one job

2020-05-05 Thread lec ssmi
here's no need to feed data back to Kafka just to inject it again to > assign new watermarks. > > Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi >: > >> I mean using pure sql statement to make it . Can it be possible? >> >> Fabian Hueske 于2020年5月4日周一 下午4:04写道

Re: multiple joins in one job

2020-05-04 Thread lec ssmi
nsures that the watermark will be aligned with both of them. > > Best, Fabian > > Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi >: > >> Thanks for your replay. >> But as I known, if the time attribute will be retained and the time >> attribute field of both stre

Re: multiple joins in one job

2020-05-03 Thread lec ssmi
e will be preserved after time interval join. > Could you share your DDL and SQL queries with us? > > lec ssmi 于2020年4月30日周四 下午5:48写道: > >> Hi: >>I need to join multiple stream tables using time interval join. The >> problem is that the time attribute will disa

Re: join in sql without time interval

2020-04-30 Thread lec ssmi
Thanks, but is the bottom layer of the table API really implemented like this? Konstantin Knauf 于 2020年4月30日周四 22:02写道: > Hi Lec Ssmi, > > yes, Dastream#connect on two streams both keyed on the productId with a > KeyedCoProcessFunction is the way to go. > > Cheers, > &g

multiple joins in one job

2020-04-30 Thread lec ssmi
Hi: I need to join multiple stream tables using time interval join. The problem is that the time attribute will disappear after the jon , and pure sql cannot declare the time attribute field again . So, to make is success, I need to insert the last result of join to kafka ,and consume it

Re: join in sql without time interval

2020-04-30 Thread lec ssmi
Maybe, the connect method? lec ssmi 于2020年4月30日周四 下午3:59写道: > Hi: > As the following sql: > >SELECT * FROM Orders INNER JOIN Product ON Orders.productId = > Product.id > > If we use datastream API instead of sql, how should it be achieved? > Because the APIs

join in sql without time interval

2020-04-30 Thread lec ssmi
above state capacity problem in sql is using TableConfig. But TableConfig itself can only solve the state ttl problem of non-time operators. So I think the above sql's implementation is neither tumble window join, nor sliding window join and interval join. Best Regards Lec Ssmi

Late data acquisition

2020-04-28 Thread lec ssmi
Hi: can we get data later than watermark in sql ? Best Lec Ssmi

join state TTL

2020-04-27 Thread lec ssmi
and TableConfig? If I use StateTtlConfig and program with TableAPI, can the configuration take effect? Best regards Lec Ssmi

Re: define WATERMARKS in queries/views?

2020-04-26 Thread lec ssmi
According to my practice, it seems that declaring the timestamp and watermark once again will not work, sql seems to have no such syntax. But it's supported in DataStream API. Leonard Xu 于2020年4月26日周日 下午6:30写道: > > > 在 2020年4月23日,22:03,lec ssmi 写道: > > can assignT

Re: define WATERMARKS in queries/views?

2020-04-25 Thread lec ssmi
rmark, then the new > `assignTimestampAndWatermark` will override the previous watermark. > > Best, > Jark > > On Thu, 23 Apr 2020 at 22:03, lec ssmi wrote: > >> can assignTimestampAndWatermark again on a watermarked table? >> >> Jark Wu 于 2020年4月23日周四 2

Re: define WATERMARKS in queries/views?

2020-04-23 Thread lec ssmi
can assignTimestampAndWatermark again on a watermarked table? Jark Wu 于 2020年4月23日周四 20:18写道: > Hi Matyas, > > You can create a new table based on the existing table using LIKE syntax > [1] in the upcoming 1.11 version, e.g. > > CREATE TABLE derived_table ( > WATERMARK FOR tstmp AS tsmp

how to enable retract?

2020-04-22 Thread lec ssmi
Hi: Is there an aggregation operation or window operation, the result is with retract characteristics?

Re: How watermark is generated in sql DDL statement

2020-04-17 Thread lec ssmi
Watermark is updated > - if currentWatermark - lastWatermark > watermarkInterval > - emit watermark to downstream, and update lastWatermark > > lec ssmi 于2020年4月17日周五 下午4:50写道: > >> Maybe you are all right. I was more confused . >> As the cwiki said, flink could us

Re: How watermark is generated in sql DDL statement

2020-04-17 Thread lec ssmi
Maybe you are all right. I was more confused . As the cwiki said, flink could use BoundedOutOfOrderTimestamps , [image: image.png] but I have heard about WatermarkAssignerOperator from Blink developers. Benchao Li 于2020年4月17日周五 下午4:33写道: > Hi lec ssmi, > > It's a good ques

How watermark is generated in sql DDL statement

2020-04-17 Thread lec ssmi
Hi: In sql API , the declaration of watermark is realized by ddl statement . But which way is it implemented? * PeriodicWatermark * or *PunctuatedWatermark*? There seems to be no explanation on the official website. Thanks.

Re: instance number of user defined function

2020-04-16 Thread lec ssmi
appreciating our reply.

instance number of user defined function

2020-04-16 Thread lec ssmi
Hi: I always wonder how much instance has been initialized in the whole flink application. Suppose there is such a scenario: I have a UDTF called '*mongo_join'* through which the flink table can join with external different mongo table according to the parameters passed in.

time column used by timer

2020-02-02 Thread lec ssmi
In KeyedProcessFunction, we can register a timer based on EventTime, but in the register method, we don't need to specify the time column. So if the elements of this KeyedStream are not the classes that originally specified timestamp and watermark, can this timer run normally?