Why my flink sql job on yarn keep crash

2023-04-14 Thread Si-li Liu
My job read data from mysql and write to doris. It will crash after 20 mins ~ 1 hour after start. org.apache.flink.runtime.JobException: Recovery is suppressed by FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=10, backoffTimeMS=1) at org.apache.flink.runtime.executiongraph.failo

Re: How to auto scale yarn session job

2023-03-20 Thread Si-li Liu
affect running jobs. > > Could you provide the failure log? > > Best, > Weihua > > > On Tue, Mar 21, 2023 at 11:57 AM Si-li Liu wrote: > >> I use this command to launch a flink yarn session. >> >> yarn-session.sh -s 6 -jm 2048 -tm 4096 -nm sql-client-sess

How to auto scale yarn session job

2023-03-20 Thread Si-li Liu
I use this command to launch a flink yarn session. yarn-session.sh -s 6 -jm 2048 -tm 4096 -nm sql-client-session -m yarn-cluster -d And all my flink sql job has 2 parallelism, and I found that my yarn session can only have 3 pipelines. If my session doesn't have free slot, submit to this session

Re: How to use lookup join sql hint

2022-11-13 Thread Si-li Liu
eption? You can find it in > FLINK_HOME/log > > Best regards, > Yuxia > > ---------- > *发件人: *"Si-li Liu" > *收件人: *"User" > *发送时间: *星期六, 2022年 11 月 12日 下午 4:27:54 > *主题: *How to use lookup join sql hint > > I try to use this

why select limit so slow on yarn cluster

2022-11-01 Thread Si-li Liu
I created a table using Flink SQL on yarn session. CREATE TEMPORARY TABLE `scrm_admin_role` ( > `id` bigint, > `role_name` string, > `sort` int, > `type` tinyint, > `status` tinyint, > `tenant_id` bigint, > `deleted` BOOLEAN, > `create_time` TIMESTAMP, > `update_time` TIMESTAMP,

Re: NPE when aggregate window.

2021-06-15 Thread Si-li Liu
The key used in the keyBy function. HaochengWang 于2021年6月12日周六 下午11:29写道: > Hi, > I meet the same exception, and find your suggestion here. I'm confused > about > the word 'grouping key', is that refers to the key of the accumulating hash > map, or the key that separate the stream by some inform

Re: NPE when aggregate window.

2021-04-13 Thread Si-li Liu
Tue, Apr 13, 2021 at 4:41 PM Si-li Liu wrote: > >> Hi,Dawid, >> >> Thanks for your help. I use com.google.common.base.Objects.hashCode, pass >> all fields to it and generate a hashcode, and the equal method also compare >> all the fields. >> >> Dawid Wys

Re: NPE when aggregate window.

2021-04-13 Thread Si-li Liu
hcode and equals? > It is very likely caused by an unstable hashcode and that a record with an > incorrect key ends up on a wrong task manager. > > Best, > > Dawid > On 13/04/2021 08:47, Si-li Liu wrote: > > Hi, > > I encounter a weird NPE when try to do aggregate on a

NPE when aggregate window.

2021-04-12 Thread Si-li Liu
Hi, I encounter a weird NPE when try to do aggregate on a fixed window. If I set a small parallism number the whole job uses only one TaskManager, this NPE will not happen. But when the job scales to two TaskManagers, the TaskManager will crash at Create stage. The Flink version I use is 1.11.1.

Re: Is possible that make two operators always locate in same taskmanager?

2020-11-23 Thread Si-li Liu
Thanks for your reply. The source will poll the state of T operator periodicly. The it find the offset is 0 then it can fallback to latest committed offset. Till Rohrmann 于2020年11月23日周一 下午9:35写道: > Hi Si-li Liu, > > if you want to run T with a parallelism of 1, then your paralle

Re: Is possible that make two operators always locate in same taskmanager?

2020-11-23 Thread Si-li Liu
e more complicated. > > Let's see if anyone has an idea on the co-location topic. > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html > > On Sat, Nov 21, 2020 at 3:43 AM Si-li Liu wrote: > >> Thanks for your reply! &

Re: Is possible that make two operators always locate in same taskmanager?

2020-11-20 Thread Si-li Liu
ven runs within > the same thread. > > On Fri, Nov 20, 2020 at 4:02 PM Si-li Liu wrote: > >> Thanks for your reply. >> >> I want to join two stream A and stream B. Items in stream A come in first >> then I keep them in memory cache, as join key and item, the

Re: Is possible that make two operators always locate in same taskmanager?

2020-11-20 Thread Si-li Liu
the third subtask of the join (let's call it J3). > > Remember that through the shuffling before the join there is no clear > correlation between any subtask of A or B to J... > > On Fri, Nov 20, 2020 at 3:58 AM Si-li Liu wrote: > >> Thanks for your help! >&g

Re: Is possible that make two operators always locate in same taskmanager?

2020-11-19 Thread Si-li Liu
or the records that have been > actually processed in the join, you could also replay the data from timestamp> - . > > On Mon, Nov 16, 2020 at 8:39 AM Si-li Liu wrote: > >> Thanks, I'll try it. >> >> Matthias Pohl 于2020年11月14日周六 上午12:53写道: >> >>>

How to achieve co-location constraints in Flink 1.9.1

2020-11-19 Thread Si-li Liu
Hi Flink only have slotSharingGroup API on DataStream class, I can't find any public API to achieve co-location constraints. Could anyone provide me an example? Another question is that if I use slotSharing group, Flink will schedule two sub tasks to same slot is possible. I think such schedule w

Re: Is possible that make two operators always locate in same taskmanager?

2020-11-15 Thread Si-li Liu
g in the same JVM reducing the > amount of memory each operator can utilize. > > Best, > Matthias > > On Mon, Nov 9, 2020 at 4:23 AM Si-li Liu wrote: > >> Thanks for your reply. >> >> It's a streaming job. The join operator is doing join work, such as join. >

Re: Is possible that make two operators always locate in same taskmanager?

2020-11-08 Thread Si-li Liu
message" are we talking > about? Why are you thinking of using a static variable, instead of just > treating this message as part of the data(set/stream)? > > On 11/5/2020 12:55 PM, Si-li Liu wrote: > > Currently I use Flink 1.9.1. The actual thing I want to do is send some >

Is possible that make two operators always locate in same taskmanager?

2020-11-05 Thread Si-li Liu
Currently I use Flink 1.9.1. The actual thing I want to do is send some messages from downstream operators to upstream operators, which I consider use static variable. But it makes me have to make sure in one taskmanager process it always has these two operators, can I use CoLocationGroup to solve

Re: How to get flink JobId in runtime

2020-07-21 Thread Si-li Liu
value properly. Maybe > you can try > `RuntimeContext.getMetricGroup().getAllVariables().get("")`. > > Best, > Congxian > > > Si-li Liu 于2020年7月20日周一 下午7:38写道: > >> Hi >> >> I want to retrieve flink JobId in runtime, for example, during >>

How to get flink JobId in runtime

2020-07-20 Thread Si-li Liu
Hi I want to retrieve flink JobId in runtime, for example, during RichFunction's open method. Is there anyway to do it? I checked the methods in RuntimeContext and ExecutionConfig, seems I can't get this information from them. Thanks! -- Best regards Sili Liu

Re: Checkpoint failed because of TimeWindow cannot be cast to VoidNamespace

2020-07-12 Thread Si-li Liu
uot;. Am I missing something here? > > Best, > Congxian > > > Si-li Liu 于2020年7月10日周五 下午6:06写道: > >> Sorry >> >> I can't reproduce it with reduce/aggregate/fold/apply and due to some >> limitations in my working environment, I can't use flink 1.1

Re: Checkpoint failed because of TimeWindow cannot be cast to VoidNamespace

2020-07-10 Thread Si-li Liu
with a > reduce/aggregate/fold/apply() > function to see what happens? -- this wants to narrow down the problem. > > Best, > Congxian > > > Si-li Liu 于2020年7月3日周五 下午6:44写道: > >> Thanks for your help >> >> 1. I started the job from scr

Re: Checkpoint failed because of TimeWindow cannot be cast to VoidNamespace

2020-07-03 Thread Si-li Liu
gt; > > On Fri, Jul 3, 2020 at 6:38 AM Si-li Liu wrote: > >> Hi, Thanks for your help. >> >> The checkpoint configuration is >> >> checkpoint.intervalMS=30 >> checkpoint.timeoutMS=30 >> >> The error callstack is from JM's log, whi

Re: Checkpoint failed because of TimeWindow cannot be cast to VoidNamespace

2020-07-02 Thread Si-li Liu
hat checkpointing interval do you use? > > Regards, > Roman > > > On Thu, Jul 2, 2020 at 1:57 PM Si-li Liu wrote: > >> Hi, this is our production code so I have to modify it a little bit, such >> as variable name and function name. I think 3 classes I provide here

Re: Checkpoint failed because of TimeWindow cannot be cast to VoidNamespace

2020-07-02 Thread Si-li Liu
Khachatryan Roman 于2020年7月2日周四 下午7:18写道: > Thanks for the clarification. > > Can you also share the code of other parts, particularly MyFunction? > > Regards, > Roman > > > On Thu, Jul 2, 2020 at 12:49 PM Si-li Liu wrote: > >> Rocksdb backend has the same

Re: Checkpoint failed because of TimeWindow cannot be cast to VoidNamespace

2020-07-02 Thread Si-li Liu
her > investigate it. > > Regards, > Roman > > > On Thu, Jul 2, 2020 at 10:52 AM Si-li Liu wrote: > >> I'm using flink 1.9 on Mesos and I try to use my own trigger and evictor. >> The state is stored to memory. >> >> input.setParallelism(process

Checkpoint failed because of TimeWindow cannot be cast to VoidNamespace

2020-07-02 Thread Si-li Liu
I'm using flink 1.9 on Mesos and I try to use my own trigger and evictor. The state is stored to memory. input.setParallelism(processParallelism) .assignTimestampsAndWatermarks(new UETimeAssigner) .keyBy(_.key) .window(TumblingEventTimeWindows.of(Time.minutes(20)))

How to configure TaskManager's JVM options through cmdline?

2018-10-23 Thread Si-li Liu
Hi I'm running a flink job on Mesos and I'm trying to change my TaskManager's JVM options. Because our flink-conf.yaml comes from unify image so I can't modify it. I try to put it in environment variable JVM_ARGS, here it my setting: JVM_ARGS=-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFra

Re: Flink batch processing fault tolerance

2017-02-16 Thread Si-li Liu
Hi, It's the reason why I gave up use Flink for my current project and pick up traditional Hadoop Framework again. 2017-02-17 10:56 GMT+08:00 Renjie Liu : > https://cwiki.apache.org/confluence/display/FLINK/FLIP- > 1+%3A+Fine+Grained+Recovery+from+Task+Failures > This FLIP may help. > > On Thu,

Flink failed when can not connect to BlobServer

2016-11-07 Thread Si-li Liu
Hi, all I use Flink DataSet API to do some batch job, read some log then group and sort them. Our cluster has almost 2000 servers, we get used to use traditional MR job, then I tried Flink to do some experiment job, but I counter this error and can not continue, does anyone can help with it? Our