[GitHub] [flink] sjwiesman commented on a change in pull request #16561: [FLINK-21926][doc] Add docs for fine-grained resource management

GitBox Mon, 02 Aug 2021 13:12:19 -0700


sjwiesman commented on a change in pull request #16561:
URL: https://github.com/apache/flink/pull/16561#discussion_r681234695




##########
File path: docs/content/docs/deployment/finegrained_resource.md
##########
@@ -0,0 +1,320 @@
+---
+title: Fine-Grained Resource Management
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Fine-Grained Resource Management
+
+Apache Flink allows you to control the resource consumption of your workload 
in a finer granularity, namely **fine-grained resource management**.
+It provides means for users to further improve Flink’s resource efficiency 
with knowledge of their specific scenarios. 
+
+This page describes the fine-grained resource management’s usage, applicable 
scenarios, and how it works.
+
+{{< hint warning >}}
+**Note:** This feature is currently an MVP (“minimum viable product”) feature 
and only available to [DataStream API]({{< ref "docs/dev/datastream/overview" 
>}}).
+{{< /hint >}}
+
+## Applicable Scenarios
+
+Typical scenarios that potentially benefit from fine-grained resource 
management are where:
+
+  - Tasks have significantly different parallelisms.
+
+  - The resource needed for an entire pipeline is too much to fit into a 
single slot/task manager.
+
+  - Batch jobs where resources needed for tasks of different stages are 
significantly different
+
+An in-depth discussion on why fine-grained resource management can improve 
resource efficiency for the above scenarios is presented in [How it improves 
resource efficiency](#how-it-improves-resource-efficiency).
+
+## How it works
+
+As described in [Flink Architecture]({{< ref 
"docs/concepts/flink-architecture" >}}#anatomy-of-a-flink-cluster),
+the resource for task execution in TaskManager is split into a bunch of slots, 
where job tasks are scheduled.
+The slot is the basic unit of both resource scheduling and resource 
requirement in Flink runtime.
+
+{{< img src="/fig/dynamic_slot_alloc.png" class="center" >}}
+
+Previously in Flink, the resource requirement only contained the number of the 
required slots, without fine-grained resource
+profiles, namely **coarse-grained resource management**. The TaskManager 
contained a fixed number of identical slots to fulfill those requirements.

Review comment:
       ```suggestion
   ```
   
   I moved this down

##########
File path: docs/content/docs/deployment/finegrained_resource.md
##########
@@ -0,0 +1,320 @@
+---
+title: Fine-Grained Resource Management
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Fine-Grained Resource Management
+
+Apache Flink allows you to control the resource consumption of your workload 
in a finer granularity, namely **fine-grained resource management**.
+It provides means for users to further improve Flink’s resource efficiency 
with knowledge of their specific scenarios. 
+
+This page describes the fine-grained resource management’s usage, applicable 
scenarios, and how it works.
+
+{{< hint warning >}}
+**Note:** This feature is currently an MVP (“minimum viable product”) feature 
and only available to [DataStream API]({{< ref "docs/dev/datastream/overview" 
>}}).
+{{< /hint >}}
+
+## Applicable Scenarios
+
+Typical scenarios that potentially benefit from fine-grained resource 
management are where:
+
+  - Tasks have significantly different parallelisms.
+
+  - The resource needed for an entire pipeline is too much to fit into a 
single slot/task manager.
+
+  - Batch jobs where resources needed for tasks of different stages are 
significantly different
+
+An in-depth discussion on why fine-grained resource management can improve 
resource efficiency for the above scenarios is presented in [How it improves 
resource efficiency](#how-it-improves-resource-efficiency).
+
+## How it works
+
+As described in [Flink Architecture]({{< ref 
"docs/concepts/flink-architecture" >}}#anatomy-of-a-flink-cluster),
+the resource for task execution in TaskManager is split into a bunch of slots, 
where job tasks are scheduled.
+The slot is the basic unit of both resource scheduling and resource 
requirement in Flink runtime.
+
+{{< img src="/fig/dynamic_slot_alloc.png" class="center" >}}
+
+Previously in Flink, the resource requirement only contained the number of the 
required slots, without fine-grained resource
+profiles, namely **coarse-grained resource management**. The TaskManager 
contained a fixed number of identical slots to fulfill those requirements.
+
+In fine-grained resource management, the slots are requested with specific 
resource profiles, which can be specified by users.
+Flink will respect those user-specified resource requirements and dynamically 
cut an exactly-matched slot out of the TaskManager’s available
+resources. As shown above, there is a requirement for a slot with 0.25 Core 
and 1GB memory and Flink allocates *Slot 1* for it.
+
+For the resource requirement without a specified resource profile, Flink will 
automatically decide the resource profile of it.
+Currently, the resource profile of it is calculated from [TaskManager’s total 
resource]({{< ref "docs/deployment/memory/mem_setup_tm" >}})
+and [taskmanager.numberOfTaskSlots]({{< ref "docs/deployment/config" 
>}}#taskmanager-numberoftaskslots), just
+like in coarse-grained resource management. As shown above, the total resource 
of TaskManager is 1 Core and 4 GB memory and the number of task slots
+is set to 2, *Slot 2* is created with 0.5 Core and 2 GB memory for the 
requirement without a specified resource profile.
+
+After the allocation of *Slot 1* and *Slot 2*, there is 0.25 Core and 1 GB 
memory remaining as the free resources in the
+TaskManager. These free resources can be further partitioned to fulfill the 
following resource requirements.
+
+Please refer to [Resource Allocation Strategy](#resource-allocation-strategy) 
for more details.
+
+## Usage
+
+To use fine-grained resource management, you need to:
+
+  - Configure to enable fine-grained resource management.
+
+  - Specify the resource requirement.
+
+### Enable Fine-Grained Resource Management
+
+To enable fine-grained resource management, you need to configure the 
[cluster.fine-grained-resource-management.enabled]({{< ref 
"docs/deployment/config" >}}#cluster-fine-grained-resource-management-enabled) 
to true.
+
+{{< hint danger >}}
+Without this configuration, the Flink runtime cannot schedule the slots with 
your specified resource requirement and the job will fail with an exception.
+{{< /hint >}}
+
+### Specify Resource Requirement for Slot Sharing Group
+
+Fine-grained resource requirements are defined on slot sharing groups. A slot 
sharing group is a hint that tells the JobManager operators/tasks in it CAN be 
put into the same slot.
+
+For specifying the resource requirement, you need to:
+
+  - Define the slot sharing group and the operators it contains.
+
+  - Specify the resource of the slot sharing group.
+
+There are two approaches to define the slot sharing group and the operators it 
contains:
+
+  - You can define a slot sharing group only by its name and attach it to an 
operator through the [slotSharingGroup(String name)]({{< ref 
"docs/dev/datastream/operators/overview" >}}#set-slot-sharing-group).
+
+  - You can construct a `SlotSharingGroup` instance, which contains the name 
and an optional resource profile of the slot sharing group. The 
`SlotSharingGroup` can be attached to an operator through 
`slotSharingGroup(SlotSharingGroup ssg)`.
+
+You can specify the resource profile for your slot sharing groups:
+
+  - If you set the slot sharing group through 
`slotSharingGroup(SlotSharingGroup ssg)`, you can specify the resource profile 
in constructing the `SlotSharingGroup` instance.
+
+  - If you only set the name of slot sharing group with 
[slotSharingGroup(String name)]({{< ref 
"docs/dev/datastream/operators/overview" >}}#set-slot-sharing-group). You can 
construct a SlotSharingGroup instance with the same name along with the 
resource profile and register the resource of them with 
`StreamExecutionEnvironment#registerSlotSharingGroup(SlotSharingGroup ssg)`.
+
+{{< tabs "configure-ssg" >}}
+{{< tab "Java" >}}
+```java
+final StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
+
+SlotSharingGroup ssgA = SlotSharingGroup.newBuilder("a")
+  .setCpuCores(1.0)
+  .setTaskHeapMemoryMB(100)
+  .build();
+
+SlotSharingGroup ssgB = SlotSharingGroup.newBuilder("b")
+  .setCpuCores(0.5)
+  .setTaskHeapMemoryMB(100)
+  .build();
+
+someStream.filter(...).slotSharingGroup("a") // Set the slot sharing group 
with name “a”
+.map(...).slotSharingGroup(ssgB); // Directly set the slot sharing group with 
name and resource.
+
+env.registerSlotSharingGroup(ssgA); // Then register the resource of group “a”
+```
+{{< /tab >}}
+{{< tab "Scala" >}}
+```scala
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+
+val ssgA = SlotSharingGroup.newBuilder("a")
+  .setCpuCores(1.0)
+  .setTaskHeapMemoryMB(100)
+  .build()
+
+val ssgB = SlotSharingGroup.newBuilder("b")
+  .setCpuCores(0.5)
+  .setTaskHeapMemoryMB(100)
+  .build()
+
+someStream.filter(...).slotSharingGroup("a") // Set the slot sharing group 
with name “a”
+.map(...).slotSharingGroup(ssgB) // Directly set the slot sharing group with 
name and resource.
+
+env.registerSlotSharingGroup(ssgA) // Then register the resource of group “a”
+```
+{{< /tab >}}
+{{< tab "Python" >}}
+```python
+env = StreamExecutionEnvironment.get_execution_environment()
+
+ssg_a = SlotSharingGroup.builder('a') \
+            .set_cpu_cores(1.0) \
+            .set_task_heap_memory_mb(100) \
+            .build()
+ssg_b = SlotSharingGroup.builder('b') \
+            .set_cpu_cores(0.5) \
+            .set_task_heap_memory_mb(100) \
+            .build()
+
+some_stream.filter(...).slot_sharing_group('a') # Set the slot sharing group 
with name "a"
+.map(...).slot_sharing_group(ssg_b) # Directly set the slot sharing group with 
name and resource.
+
+env.register_slot_sharing_group(ssg_a) # Then register the resource of group 
"a"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+<div class="alert alert-info">
+     <strong>Note:</strong> Each slot sharing group can only attach to one 
specified resource, any conflict will fail the compiling of your job.
+</div>
+
+In constructing the `SlotSharingGroup`, you can set the following resource 
components for the slot sharing group:
+  - **CPU Cores**. Defines how many CPU cores are needed. Required to be 
explicitly configured with positive value.
+  - **[Task Heap Memory]({{< ref "docs/deployment/memory/mem_setup_tm" 
>}}#task-operator-heap-memory)**. Defines how much task heap memory is needed. 
Required to be explicitly configured with positive value.
+  - **[Task Off-Heap Memory]({{< ref "docs/deployment/memory/mem_setup_tm" 
>}}#configure-off-heap-memory-direct-or-native)**. Defines how much task 
off-heap memory is needed, can be 0.
+  - **[Managed Memory]({{< ref "docs/deployment/memory/mem_setup_tm" 
>}}#managed-memory)**. Defines how much task managed memory is needed, can be 0.
+  - **[External Resources]({{< ref 
"docs/deployment/advanced/external_resources" >}})**. Defines the external 
resources needed, can be empty.
+
+{{< tabs "configure-resource" >}}
+{{< tab "Java" >}}
+```java
+// Directly build a slot sharing group with specific resource
+SlotSharingGroup ssgWithResource =
+    SlotSharingGroup.newBuilder("ssg")
+        .setCpuCores(1.0) // required
+        .setTaskHeapMemoryMB(100) // required
+        .setTaskOffHeapMemoryMB(50)
+        .setManagedMemory(MemorySize.ofMebiBytes(200))
+        .setExternalResource("gpu", 1.0)
+        .build();
+
+// Build a slot sharing group without specific resource and then register the 
resource of it in StreamExecutionEnvironment
+SlotSharingGroup ssgWithName = SlotSharingGroup.newBuilder("ssg").build();
+env.registerSlotSharingGroup(ssgWithResource);
+```
+{{< /tab >}}
+{{< tab "Scala" >}}
+```scala
+// Directly build a slot sharing group with specific resource
+val ssgWithResource =
+    SlotSharingGroup.newBuilder("ssg")
+        .setCpuCores(1.0) // required
+        .setTaskHeapMemoryMB(100) // required
+        .setTaskOffHeapMemoryMB(50)
+        .setManagedMemory(MemorySize.ofMebiBytes(200))
+        .setExternalResource("gpu", 1.0)
+        .build()
+
+// Build a slot sharing group without specific resource and then register the 
resource of it in StreamExecutionEnvironment
+val ssgWithName = SlotSharingGroup.newBuilder("ssg").build()
+env.registerSlotSharingGroup(ssgWithResource)
+```
+{{< /tab >}}
+{{< tab "Python" >}}
+```python
+# Directly build a slot sharing group with specific resource
+ssg_with_resource = SlotSharingGroup.builder('ssg') \
+            .set_cpu_cores(1.0) \
+            .set_task_heap_memory_mb(100) \
+            .set_task_off_heap_memory_mb(50) \
+            .set_managed_memory(MemorySize.of_mebi_bytes(200)) \
+            .set_external_resource('gpu', 1.0) \
+            .build()
+
+# Build a slot sharing group without specific resource and then register the 
resource of it in StreamExecutionEnvironment
+ssg_with_name = SlotSharingGroup.builder('ssg').build()
+env.register_slot_sharing_group(ssg_with_resource)
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+{{< hint warning >}}
+**Note:** You can construct a SlotSharingGroup with or without specifying its 
resource profile.
+With specifying the resource profile, you need to explicitly set the **CPU 
cores** and **Task Heap Memory** with a positive value, other components are 
optional.
+{{< /hint >}}
+
+## Limitations
+
+Since fine-grained resource management is a new, experimental feature, not all 
features supported by the default
+scheduler are also available with it. The Flink community is working on 
addressing these limitations.
+
+  - **No support for the [Elastic Scaling]({{< ref 
"docs/deployment/elastic_scaling" >}})**. The elastic scaling only supports 
slot requests without specified-resource at the moment.
+
+  - **No support for task manager redundancy**. The 
[slotmanager.redundant-taskmanager-num]({{< ref "docs/deployment/config" 
>}}#slotmanager-redundant-taskmanager-num) is used to start redundant 
TaskManagers to speed up job recovery. This config option will not take effect 
in fine-grained resource management at the moment.
+
+  - **No support for evenly spread out slot strategy**. This strategy tries to 
spread out the slots evenly across all available TaskManagers. The strategy is 
not supported in the first version of fine-grained resource management and 
[cluster.evenly-spread-out-slots]({{< ref "docs/deployment/config" 
>}}#cluster-evenly-spread-out-slots) will not take effect in it at the moment.
+
+  - **Limited integration with Flink’s Web UI**. Slots in fine-grained 
resource management can have different resource specs. The web UI only shows 
the slot number without its details at the moment.
+
+  - **Limited integration with batch jobs**. At the moment, fine-grained 
resource management requires batch workloads to be executed with types of all 
edges being BLOCKING. To do that, you need to configure 
[fine-grained.shuffle-mode.all-blocking]({{< ref "docs/deployment/config" 
>}}#fine-grained-shuffle-mode-all-blocking) to `true`. Notice that this may 
affect the performance. See 
[FLINK-20865](https://issues.apache.org/jira/browse/FLINK-20865) for more 
details.
+
+  - **Hybrid resource requirements are not recommended**. It is not 
recommended to specify the resource requirements only for some parts of the job 
and leave the requirements for the rest unspecified. Currently, the unspecified 
requirement can be fulfilled with slots of any resource. The actual resource 
acquired by it can be inconsistent across different job executions or failover.
+
+## Notice
+
+  - **Setting the slot sharing group may change the performance**. Setting 
chain-able operators to different slot sharing groups may break [operator 
chains]({{< ref "docs/dev/datastream/operators/overview" 
>}}#task-chaining-and-resource-groups), and thus change the performance.
+
+  - **Slot sharing group will not restrict the scheduling of operators**. The 
slot sharing group only hints the scheduler that the grouped operators CAN be 
deployed into a shared slot. There's no guarantee that the scheduler always 
deploys the grouped operator together. In cases grouped operators are deployed 
into separate slots, the slot resources will be derived from the specified 
group requirement.
+
+## Deep Dive
+
+### How it improves resource efficiency
+
+In this section, we deep dive into how fine-grained resource management 
improves resource efficiency, which can help you to understand whether it can 
benefit your jobs.
+
+Previously, Flink adopted a coarse-grained resource management approach, where 
tasks are deployed into predefined,
+usually identical slots without the notion of how many resources each slot 
contains. For many jobs, using coarse-grained
+resource management and simply putting all tasks into one [slot sharing 
group]({{< ref "docs/dev/datastream/operators/overview" 
>}}#set-slot-sharing-group) works well enough in terms of resource utilization.
+
+  - For many streaming jobs that all tasks have the same parallelism, each 
slot will contain an [entire 
pipeline](https://flink.apache.org/2020/12/15/pipelined-region-sheduling.html#pipelined-regions).
 Ideally, all pipelines should use roughly the same resources, which can be 
satisfied easily by tuning the resources of the identical slots.
+
+  - Resource consumption of tasks varies over time. When consumption of a task 
decreases, the extra resources can be used by another task whose consumption is 
increasing. This, known as the peak shaving and valley filling effect, reduces 
the overall resource needed.
+
+However, there are cases where coarse-grained resource management does not 
work well.
+
+  - Tasks may have different parallelisms. Sometimes, such different 
parallelisms cannot be avoided. E.g., the parallelism of source/sink/lookup 
tasks might be constrained by the partitions and IO load of the external 
upstream/downstream system. In such cases, slots with fewer tasks would need 
fewer resources than those with the [entire 
pipeline](https://flink.apache.org/2020/12/15/pipelined-region-sheduling.html#pipelined-regions)
 of tasks.
+
+  - Sometimes the resource needed for the [entire 
pipeline](https://flink.apache.org/2020/12/15/pipelined-region-sheduling.html#pipelined-regions)
 might be too much to be put into a single slot/TaskManager. In such cases, the 
pipeline needs to be split into multiple SSGs, which may not always have the 
same resource requirement.
+
+  - For batch jobs, not all the tasks can be executed at the same time. Thus, 
the instantaneous resource requirement of the pipeline changes over time.
+
+Trying to execute all tasks with identical slots can result in non-optimal 
resource utilization. The resource of the identical slots
+has to be able to fulfill the highest resource requirement, which will be 
wasteful for other requirements. When expensive external resources
+like GPU are involved, such waste can become even harder to afford. The 
fine-grained resource management leverages slots of different resources
+to improve resource utilization in such scenarios.
+
+### Resource Allocation Strategy
+
+In this section, we talk about the slot partitioning mechanism in Flink 
runtime and the resource allocation strategy, including how
+the Flink runtime selects a TaskManager to cut slots and allocates 
TaskManagers on [Native Kubernetes]({{< ref 
"docs/deployment/resource-providers/native_kubernetes" >}})
+and [YARN]({{< ref "docs/deployment/resource-providers/yarn" >}}). Note that 
the resource allocation strategy is pluggable in
+Flink runtime and here we introduce its default implementation in the first 
step of fine-grained resource
+management. In the future, there might be various strategies that can be 
selected for different scenarios.

Review comment:
       ```suggestion
   management. In the future, there might be various strategies that users can 
select for different scenarios.
   ```

##########
File path: docs/content/docs/deployment/finegrained_resource.md
##########
@@ -0,0 +1,320 @@
+---
+title: Fine-Grained Resource Management
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Fine-Grained Resource Management
+
+Apache Flink allows you to control the resource consumption of your workload 
in a finer granularity, namely **fine-grained resource management**.
+It provides means for users to further improve Flink’s resource efficiency 
with knowledge of their specific scenarios. 
+
+This page describes the fine-grained resource management’s usage, applicable 
scenarios, and how it works.
+
+{{< hint warning >}}
+**Note:** This feature is currently an MVP (“minimum viable product”) feature 
and only available to [DataStream API]({{< ref "docs/dev/datastream/overview" 
>}}).
+{{< /hint >}}
+
+## Applicable Scenarios
+
+Typical scenarios that potentially benefit from fine-grained resource 
management are where:
+
+  - Tasks have significantly different parallelisms.
+
+  - The resource needed for an entire pipeline is too much to fit into a 
single slot/task manager.
+
+  - Batch jobs where resources needed for tasks of different stages are 
significantly different
+
+An in-depth discussion on why fine-grained resource management can improve 
resource efficiency for the above scenarios is presented in [How it improves 
resource efficiency](#how-it-improves-resource-efficiency).
+
+## How it works
+
+As described in [Flink Architecture]({{< ref 
"docs/concepts/flink-architecture" >}}#anatomy-of-a-flink-cluster),
+the resource for task execution in TaskManager is split into a bunch of slots, 
where job tasks are scheduled.
+The slot is the basic unit of both resource scheduling and resource 
requirement in Flink runtime.
+
+{{< img src="/fig/dynamic_slot_alloc.png" class="center" >}}
+
+Previously in Flink, the resource requirement only contained the number of the 
required slots, without fine-grained resource
+profiles, namely **coarse-grained resource management**. The TaskManager 
contained a fixed number of identical slots to fulfill those requirements.
+
+In fine-grained resource management, the slots are requested with specific 
resource profiles, which can be specified by users.
+Flink will respect those user-specified resource requirements and dynamically 
cut an exactly-matched slot out of the TaskManager’s available
+resources. As shown above, there is a requirement for a slot with 0.25 Core 
and 1GB memory and Flink allocates *Slot 1* for it.

Review comment:
       ```suggestion
   With fine-grained resource management, the slots requests contain specific 
resource profiles, which users can specify.
   Flink will respect those user-specified resource requirements and 
dynamically cut an exactly-matched slot out of the TaskManager’s available
   resources. As shown above, there is a requirement for a slot with 0.25 Core 
and 1GB memory, and Flink allocates *Slot 1* for it.
   ```

##########
File path: docs/content/docs/deployment/finegrained_resource.md
##########
@@ -0,0 +1,320 @@
+---
+title: Fine-Grained Resource Management
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Fine-Grained Resource Management
+
+Apache Flink allows you to control the resource consumption of your workload 
in a finer granularity, namely **fine-grained resource management**.
+It provides means for users to further improve Flink’s resource efficiency 
with knowledge of their specific scenarios. 

Review comment:
       ```suggestion
   Apache Flink works hard to auto-derive sensible default resource 
requirements for all applications out of the box. 
   For users who wish to fine-tune their resource consumption, based on 
knowledge of their specific scenarios, Flink offers **fine-grained resource 
management**.
   ```
   
   I want to make it clear to new users that you don't have to do this. 

##########
File path: docs/content/docs/deployment/finegrained_resource.md
##########
@@ -0,0 +1,320 @@
+---
+title: Fine-Grained Resource Management
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Fine-Grained Resource Management
+
+Apache Flink allows you to control the resource consumption of your workload 
in a finer granularity, namely **fine-grained resource management**.
+It provides means for users to further improve Flink’s resource efficiency 
with knowledge of their specific scenarios. 
+
+This page describes the fine-grained resource management’s usage, applicable 
scenarios, and how it works.
+
+{{< hint warning >}}
+**Note:** This feature is currently an MVP (“minimum viable product”) feature 
and only available to [DataStream API]({{< ref "docs/dev/datastream/overview" 
>}}).
+{{< /hint >}}
+
+## Applicable Scenarios
+
+Typical scenarios that potentially benefit from fine-grained resource 
management are where:
+
+  - Tasks have significantly different parallelisms.
+
+  - The resource needed for an entire pipeline is too much to fit into a 
single slot/task manager.
+
+  - Batch jobs where resources needed for tasks of different stages are 
significantly different
+
+An in-depth discussion on why fine-grained resource management can improve 
resource efficiency for the above scenarios is presented in [How it improves 
resource efficiency](#how-it-improves-resource-efficiency).
+
+## How it works
+
+As described in [Flink Architecture]({{< ref 
"docs/concepts/flink-architecture" >}}#anatomy-of-a-flink-cluster),
+the resource for task execution in TaskManager is split into a bunch of slots, 
where job tasks are scheduled.
+The slot is the basic unit of both resource scheduling and resource 
requirement in Flink runtime.
+
+{{< img src="/fig/dynamic_slot_alloc.png" class="center" >}}
+
+Previously in Flink, the resource requirement only contained the number of the 
required slots, without fine-grained resource
+profiles, namely **coarse-grained resource management**. The TaskManager 
contained a fixed number of identical slots to fulfill those requirements.
+
+In fine-grained resource management, the slots are requested with specific 
resource profiles, which can be specified by users.
+Flink will respect those user-specified resource requirements and dynamically 
cut an exactly-matched slot out of the TaskManager’s available
+resources. As shown above, there is a requirement for a slot with 0.25 Core 
and 1GB memory and Flink allocates *Slot 1* for it.
+
+For the resource requirement without a specified resource profile, Flink will 
automatically decide the resource profile of it.
+Currently, the resource profile of it is calculated from [TaskManager’s total 
resource]({{< ref "docs/deployment/memory/mem_setup_tm" >}})
+and [taskmanager.numberOfTaskSlots]({{< ref "docs/deployment/config" 
>}}#taskmanager-numberoftaskslots), just
+like in coarse-grained resource management. As shown above, the total resource 
of TaskManager is 1 Core and 4 GB memory and the number of task slots
+is set to 2, *Slot 2* is created with 0.5 Core and 2 GB memory for the 
requirement without a specified resource profile.
+
+After the allocation of *Slot 1* and *Slot 2*, there is 0.25 Core and 1 GB 
memory remaining as the free resources in the
+TaskManager. These free resources can be further partitioned to fulfill the 
following resource requirements.
+
+Please refer to [Resource Allocation Strategy](#resource-allocation-strategy) 
for more details.
+
+## Usage
+
+To use fine-grained resource management, you need to:
+
+  - Configure to enable fine-grained resource management.
+
+  - Specify the resource requirement.
+
+### Enable Fine-Grained Resource Management
+
+To enable fine-grained resource management, you need to configure the 
[cluster.fine-grained-resource-management.enabled]({{< ref 
"docs/deployment/config" >}}#cluster-fine-grained-resource-management-enabled) 
to true.
+
+{{< hint danger >}}
+Without this configuration, the Flink runtime cannot schedule the slots with 
your specified resource requirement and the job will fail with an exception.
+{{< /hint >}}
+
+### Specify Resource Requirement for Slot Sharing Group
+
+Fine-grained resource requirements are defined on slot sharing groups. A slot 
sharing group is a hint that tells the JobManager operators/tasks in it CAN be 
put into the same slot.
+
+For specifying the resource requirement, you need to:
+
+  - Define the slot sharing group and the operators it contains.
+
+  - Specify the resource of the slot sharing group.
+
+There are two approaches to define the slot sharing group and the operators it 
contains:
+
+  - You can define a slot sharing group only by its name and attach it to an 
operator through the [slotSharingGroup(String name)]({{< ref 
"docs/dev/datastream/operators/overview" >}}#set-slot-sharing-group).
+
+  - You can construct a `SlotSharingGroup` instance, which contains the name 
and an optional resource profile of the slot sharing group. The 
`SlotSharingGroup` can be attached to an operator through 
`slotSharingGroup(SlotSharingGroup ssg)`.
+
+You can specify the resource profile for your slot sharing groups:
+
+  - If you set the slot sharing group through 
`slotSharingGroup(SlotSharingGroup ssg)`, you can specify the resource profile 
in constructing the `SlotSharingGroup` instance.
+
+  - If you only set the name of slot sharing group with 
[slotSharingGroup(String name)]({{< ref 
"docs/dev/datastream/operators/overview" >}}#set-slot-sharing-group). You can 
construct a SlotSharingGroup instance with the same name along with the 
resource profile and register the resource of them with 
`StreamExecutionEnvironment#registerSlotSharingGroup(SlotSharingGroup ssg)`.
+
+{{< tabs "configure-ssg" >}}
+{{< tab "Java" >}}
+```java
+final StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
+
+SlotSharingGroup ssgA = SlotSharingGroup.newBuilder("a")
+  .setCpuCores(1.0)
+  .setTaskHeapMemoryMB(100)
+  .build();
+
+SlotSharingGroup ssgB = SlotSharingGroup.newBuilder("b")
+  .setCpuCores(0.5)
+  .setTaskHeapMemoryMB(100)
+  .build();
+
+someStream.filter(...).slotSharingGroup("a") // Set the slot sharing group 
with name “a”
+.map(...).slotSharingGroup(ssgB); // Directly set the slot sharing group with 
name and resource.
+
+env.registerSlotSharingGroup(ssgA); // Then register the resource of group “a”
+```
+{{< /tab >}}
+{{< tab "Scala" >}}
+```scala
+val env = StreamExecutionEnvironment.getExecutionEnvironment
+
+val ssgA = SlotSharingGroup.newBuilder("a")
+  .setCpuCores(1.0)
+  .setTaskHeapMemoryMB(100)
+  .build()
+
+val ssgB = SlotSharingGroup.newBuilder("b")
+  .setCpuCores(0.5)
+  .setTaskHeapMemoryMB(100)
+  .build()
+
+someStream.filter(...).slotSharingGroup("a") // Set the slot sharing group 
with name “a”
+.map(...).slotSharingGroup(ssgB) // Directly set the slot sharing group with 
name and resource.
+
+env.registerSlotSharingGroup(ssgA) // Then register the resource of group “a”
+```
+{{< /tab >}}
+{{< tab "Python" >}}
+```python
+env = StreamExecutionEnvironment.get_execution_environment()
+
+ssg_a = SlotSharingGroup.builder('a') \
+            .set_cpu_cores(1.0) \
+            .set_task_heap_memory_mb(100) \
+            .build()
+ssg_b = SlotSharingGroup.builder('b') \
+            .set_cpu_cores(0.5) \
+            .set_task_heap_memory_mb(100) \
+            .build()
+
+some_stream.filter(...).slot_sharing_group('a') # Set the slot sharing group 
with name "a"
+.map(...).slot_sharing_group(ssg_b) # Directly set the slot sharing group with 
name and resource.
+
+env.register_slot_sharing_group(ssg_a) # Then register the resource of group 
"a"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+<div class="alert alert-info">
+     <strong>Note:</strong> Each slot sharing group can only attach to one 
specified resource, any conflict will fail the compiling of your job.
+</div>

Review comment:
       ```suggestion
   {{< hint warning >}}
   **Note:** Each slot sharing group can only attach to one specified resource, 
any conflict will fail the compiling of your job.
   {{< /hint >}}
   ```

##########
File path: docs/content/docs/deployment/finegrained_resource.md
##########
@@ -0,0 +1,320 @@
+---
+title: Fine-Grained Resource Management
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Fine-Grained Resource Management
+
+Apache Flink allows you to control the resource consumption of your workload 
in a finer granularity, namely **fine-grained resource management**.
+It provides means for users to further improve Flink’s resource efficiency 
with knowledge of their specific scenarios. 
+
+This page describes the fine-grained resource management’s usage, applicable 
scenarios, and how it works.
+
+{{< hint warning >}}
+**Note:** This feature is currently an MVP (“minimum viable product”) feature 
and only available to [DataStream API]({{< ref "docs/dev/datastream/overview" 
>}}).
+{{< /hint >}}
+
+## Applicable Scenarios
+
+Typical scenarios that potentially benefit from fine-grained resource 
management are where:
+
+  - Tasks have significantly different parallelisms.
+
+  - The resource needed for an entire pipeline is too much to fit into a 
single slot/task manager.
+
+  - Batch jobs where resources needed for tasks of different stages are 
significantly different
+
+An in-depth discussion on why fine-grained resource management can improve 
resource efficiency for the above scenarios is presented in [How it improves 
resource efficiency](#how-it-improves-resource-efficiency).
+
+## How it works
+
+As described in [Flink Architecture]({{< ref 
"docs/concepts/flink-architecture" >}}#anatomy-of-a-flink-cluster),
+the resource for task execution in TaskManager is split into a bunch of slots, 
where job tasks are scheduled.

Review comment:
       ```suggestion
   task execution resources in a TaskManager are split into many slots.
   ```

##########
File path: docs/content/docs/deployment/finegrained_resource.md
##########
@@ -0,0 +1,320 @@
+---
+title: Fine-Grained Resource Management
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Fine-Grained Resource Management
+
+Apache Flink allows you to control the resource consumption of your workload 
in a finer granularity, namely **fine-grained resource management**.
+It provides means for users to further improve Flink’s resource efficiency 
with knowledge of their specific scenarios. 
+
+This page describes the fine-grained resource management’s usage, applicable 
scenarios, and how it works.
+
+{{< hint warning >}}
+**Note:** This feature is currently an MVP (“minimum viable product”) feature 
and only available to [DataStream API]({{< ref "docs/dev/datastream/overview" 
>}}).
+{{< /hint >}}
+
+## Applicable Scenarios
+
+Typical scenarios that potentially benefit from fine-grained resource 
management are where:
+
+  - Tasks have significantly different parallelisms.
+
+  - The resource needed for an entire pipeline is too much to fit into a 
single slot/task manager.
+
+  - Batch jobs where resources needed for tasks of different stages are 
significantly different
+
+An in-depth discussion on why fine-grained resource management can improve 
resource efficiency for the above scenarios is presented in [How it improves 
resource efficiency](#how-it-improves-resource-efficiency).
+
+## How it works
+
+As described in [Flink Architecture]({{< ref 
"docs/concepts/flink-architecture" >}}#anatomy-of-a-flink-cluster),
+the resource for task execution in TaskManager is split into a bunch of slots, 
where job tasks are scheduled.
+The slot is the basic unit of both resource scheduling and resource 
requirement in Flink runtime.

Review comment:
       ```suggestion
   The slot is the basic unit of both resource scheduling and resource 
requirement in Flinks runtime.
   ```
   

##########
File path: docs/content/docs/deployment/finegrained_resource.md
##########
@@ -0,0 +1,320 @@
+---
+title: Fine-Grained Resource Management
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Fine-Grained Resource Management
+
+Apache Flink allows you to control the resource consumption of your workload 
in a finer granularity, namely **fine-grained resource management**.
+It provides means for users to further improve Flink’s resource efficiency 
with knowledge of their specific scenarios. 
+
+This page describes the fine-grained resource management’s usage, applicable 
scenarios, and how it works.
+
+{{< hint warning >}}
+**Note:** This feature is currently an MVP (“minimum viable product”) feature 
and only available to [DataStream API]({{< ref "docs/dev/datastream/overview" 
>}}).
+{{< /hint >}}
+
+## Applicable Scenarios
+
+Typical scenarios that potentially benefit from fine-grained resource 
management are where:
+
+  - Tasks have significantly different parallelisms.
+
+  - The resource needed for an entire pipeline is too much to fit into a 
single slot/task manager.
+
+  - Batch jobs where resources needed for tasks of different stages are 
significantly different
+
+An in-depth discussion on why fine-grained resource management can improve 
resource efficiency for the above scenarios is presented in [How it improves 
resource efficiency](#how-it-improves-resource-efficiency).
+
+## How it works
+
+As described in [Flink Architecture]({{< ref 
"docs/concepts/flink-architecture" >}}#anatomy-of-a-flink-cluster),
+the resource for task execution in TaskManager is split into a bunch of slots, 
where job tasks are scheduled.
+The slot is the basic unit of both resource scheduling and resource 
requirement in Flink runtime.
+
+{{< img src="/fig/dynamic_slot_alloc.png" class="center" >}}
+
+Previously in Flink, the resource requirement only contained the number of the 
required slots, without fine-grained resource
+profiles, namely **coarse-grained resource management**. The TaskManager 
contained a fixed number of identical slots to fulfill those requirements.
+
+In fine-grained resource management, the slots are requested with specific 
resource profiles, which can be specified by users.
+Flink will respect those user-specified resource requirements and dynamically 
cut an exactly-matched slot out of the TaskManager’s available
+resources. As shown above, there is a requirement for a slot with 0.25 Core 
and 1GB memory and Flink allocates *Slot 1* for it.
+

Review comment:
       ```suggestion
   {{< hint info >}}
   Previously in Flink, the resource requirement only contained the number of 
the required slots, without fine-grained resource
   profiles, namely **coarse-grained resource management**. The TaskManager 
contained a fixed number of identical slots to fulfill those requirements.
   {{< /hint >}}
   

##########
File path: docs/content/docs/deployment/finegrained_resource.md
##########
@@ -0,0 +1,320 @@
+---
+title: Fine-Grained Resource Management
+weight: 5
+type: docs
+
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Fine-Grained Resource Management
+
+Apache Flink allows you to control the resource consumption of your workload 
in a finer granularity, namely **fine-grained resource management**.
+It provides means for users to further improve Flink’s resource efficiency 
with knowledge of their specific scenarios. 
+
+This page describes the fine-grained resource management’s usage, applicable 
scenarios, and how it works.
+
+{{< hint warning >}}
+**Note:** This feature is currently an MVP (“minimum viable product”) feature 
and only available to [DataStream API]({{< ref "docs/dev/datastream/overview" 
>}}).
+{{< /hint >}}
+
+## Applicable Scenarios
+
+Typical scenarios that potentially benefit from fine-grained resource 
management are where:
+
+  - Tasks have significantly different parallelisms.
+
+  - The resource needed for an entire pipeline is too much to fit into a 
single slot/task manager.
+
+  - Batch jobs where resources needed for tasks of different stages are 
significantly different
+
+An in-depth discussion on why fine-grained resource management can improve 
resource efficiency for the above scenarios is presented in [How it improves 
resource efficiency](#how-it-improves-resource-efficiency).
+
+## How it works
+
+As described in [Flink Architecture]({{< ref 
"docs/concepts/flink-architecture" >}}#anatomy-of-a-flink-cluster),
+the resource for task execution in TaskManager is split into a bunch of slots, 
where job tasks are scheduled.
+The slot is the basic unit of both resource scheduling and resource 
requirement in Flink runtime.
+
+{{< img src="/fig/dynamic_slot_alloc.png" class="center" >}}
+
+Previously in Flink, the resource requirement only contained the number of the 
required slots, without fine-grained resource
+profiles, namely **coarse-grained resource management**. The TaskManager 
contained a fixed number of identical slots to fulfill those requirements.
+
+In fine-grained resource management, the slots are requested with specific 
resource profiles, which can be specified by users.
+Flink will respect those user-specified resource requirements and dynamically 
cut an exactly-matched slot out of the TaskManager’s available
+resources. As shown above, there is a requirement for a slot with 0.25 Core 
and 1GB memory and Flink allocates *Slot 1* for it.
+
+For the resource requirement without a specified resource profile, Flink will 
automatically decide the resource profile of it.

Review comment:
       ```suggestion
   For the resource requirement without a specified resource profile, Flink 
will automatically decide a resource profile.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] sjwiesman commented on a change in pull request #16561: [FLINK-21926][doc] Add docs for fine-grained resource management

Reply via email to