Hi David, Yes We have multiple disjoint DAGs in one job. We want better CPU utilization. Open Source Flink has a scheduling issue with those types of jobs. I made a fix on 1.13 with AS. Now we are scheduling evenly for all DAGs. However somehow when we get an exception on a DAG we dont want to affect others and restart only whichever gets the exception. I believe the Region Pipelined model is good for what the Default scheduler has.
Do you have anything in your mind that addresses restarting other than Regioned Pipelines ? Thanks On Tue, Apr 18, 2023 at 12:19 AM David Morávek <david.mora...@gmail.com> wrote: > > Our DAG has multiple sources which are not connected to each other. > > To double-check, are you saying the job consists of multiple disjoint DAGs? > > > Do you think somehow the adaptive scheduler supports region pipelines > for streaming jobs ? > > It's doable but might not be straightforward since the AS recycles > ExecutionGraph during restart. It has been a low priority so far because > it's mainly valuable for batch jobs, but we might reconsider it if there > are enough use cases. > > Best, > D. > > On Sat, Apr 15, 2023 at 8:15 AM Talat Uyarer <tuya...@paloaltonetworks.com> > wrote: > >> Thanks David and others. >> >> Our DAG has multiple sources which are not connected to each other. If >> one of them fails, I believe Flink can restart a single region for >> defaultscheduler. but it is not the same case for adaptive scheduler. Do >> you think somehow the adaptive scheduler supports region pipelines for >> streaming jobs ? If it can, local recovery makes sense at that time I >> believe. >> >> Thanks >> >> >> On Wed, Apr 12, 2023 at 2:15 AM David Morávek <david.mora...@gmail.com> >> wrote: >> >>> Hi Talat, >>> >>> For most streaming pipelines, we have to restart the whole pipeline no >>> matter the scheduler used because they're a single pipelined region. One >>> limitation of AdaptiveScheduler is the lack of support for local recovery. >>> This will be addressed in Flink 1.18 [1]. >>> >>> [1] https://issues.apache.org/jira/browse/FLINK-21450 >>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_FLINK-2D21450&d=DwMFaQ&c=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo&r=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g&m=io1uqfos6mZy0_MjhX-EMn6Kv1O-J51WEdXoLFL2-JdSmUURlkVSK9Jo06K7PXbt&s=duMv5JJ1nSCwJNlA1eG3uaopaPpgwqrBfBnbegFEb7s&e=> >>> >>> Best, >>> D. >>> >>> On Tue, Apr 11, 2023 at 4:35 AM Weihua Hu <huweihua....@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> AFAIK, the reactive mode always restarts the whole pipeline now. >>>> >>>> Best, >>>> Weihua >>>> >>>> >>>> On Tue, Apr 11, 2023 at 8:38 AM Talat Uyarer via user < >>>> user@flink.apache.org> wrote: >>>> >>>>> Hi All, >>>>> >>>>> We use Flink 1.13 with reactive mode for our streaming jobs. When we >>>>> have an issue/exception on our pipeline. Flink rescheduled all tasks. Is >>>>> there any way to reschedule only task that had exceptions ? >>>>> >>>>> Thanks >>>>> >>>>