This is an automated email from the ASF dual-hosted git repository.
wilfreds pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/yunikorn-site.git
The following commit(s) were added to refs/heads/master by this push:
new 686ab877d4 [YUNIKORN-2900] solve broken anchors
686ab877d4 is described below
commit 686ab877d4aecc8d801aa99db12180afe72589bc
Author: pohanhuangtw <[email protected]>
AuthorDate: Thu Nov 27 14:56:05 2025 +1100
[YUNIKORN-2900] solve broken anchors
Fix the broken anchor and some minor spacing issues in the text.
Leaving 1.6.3 updates as part of this change.
Closes: #525
Signed-off-by: hhcs9527 <[email protected]>
Signed-off-by: Wilfred Spiegelenburg <[email protected]>
---
docs/api/system.md | 6 ++---
docs/archived_design/k8shim.md | 4 +--
docs/design/gang_scheduling.md | 26 +++++++++----------
docs/design/scheduler_configuration.md | 13 +++++-----
docs/developer_guide/dependencies.md | 8 +++---
docs/performance/performance_tutorial.md | 28 ++++++++++----------
docs/user_guide/observability/prometheus.md | 18 ++++++-------
src/pages/community/how_to_contribute.md | 20 +++++++--------
versioned_docs/version-1.6.3/api/system.md | 6 ++---
.../version-1.6.3/archived_design/k8shim.md | 4 +--
.../version-1.6.3/design/gang_scheduling.md | 30 +++++++++++-----------
.../design/scheduler_configuration.md | 13 +++++-----
.../version-1.6.3/developer_guide/dependencies.md | 8 +++---
.../performance/performance_tutorial.md | 28 ++++++++++----------
.../user_guide/observability/prometheus.md | 18 ++++++-------
15 files changed, 116 insertions(+), 114 deletions(-)
diff --git a/docs/api/system.md b/docs/api/system.md
index cf7b82fc4b..a74cc6632d 100644
--- a/docs/api/system.md
+++ b/docs/api/system.md
@@ -66,7 +66,7 @@ Note that this list is not guaranteed to remain stable and
can change from relea
**Content examples**
-The output of this REST query can be rather large, and it is a combination of
those which have already been documented as part of the [scheduler
API](scheduler.md#Overview).
+The output of this REST query can be rather large, and it is a combination of
those which have already been documented as part of the [scheduler
API](scheduler.md).
The `RMDiagnostics` shows the content of the K8Shim cache. The exact content
is version dependent and is not stable.
The current content shows the cached objects:
@@ -81,7 +81,7 @@ The current content shows the cached objects:
## Go routine info
-Dumps the stack traces of the currently running goroutines. This is a similar
view as provided in the [pprof goroutine](#pprof-goroutine) in a human-readable
form.
+Dumps the stack traces of the currently running goroutines. This is a similar
view as provided in the [pprof goroutine](#pprof-goroutine) in a human-readable
form.
**URL** : `/debug/stack`
@@ -317,7 +317,7 @@ trace: A trace of execution of the current program. You can
specify the duration
num_symbols: 1
```
-## pprof trace
+## pprof trace
**URL** : `/debug/pprof/trace`
diff --git a/docs/archived_design/k8shim.md b/docs/archived_design/k8shim.md
index 01f08db2d9..7b1a4d3248 100644
--- a/docs/archived_design/k8shim.md
+++ b/docs/archived_design/k8shim.md
@@ -55,7 +55,7 @@ and a [validation
webhook](https://kubernetes.io/docs/reference/access-authn-aut
to immediately transition from the `Starting` to `Running` state so
that it will not block other applications.
2. The `validation webhook` validates the configuration set in the configmap
- This is used to prevent writing malformed configuration into the
configmap.
- - The validation webhook calls scheduler [validation REST
API](api/scheduler.md#configuration-validation) to validate configmap updates.
+ - The validation webhook calls scheduler [validation REST
API](api/cluster.md#configuration-validation) to validate configmap updates.
### Admission controller deployment
@@ -66,7 +66,7 @@ On startup, the admission controller performs a series of
tasks to ensure that i
2. If the secret cannot be found or either CA certificate is within 90 days of
expiration, generates new certificate(s). If a certificate is expiring, a new
one is generated with an expiration of 12 months in the future. If both
certificates are missing or expiring, the second certificate is generated with
an expiration of 6 months in the future. This ensures that both certificates do
not expire at the same time, and that there is an overlap of trusted
certificates.
3. If the CA certificates were created or updated, writes the secrets back to
Kubernetes.
4. Generates an ephemeral TLS server certificate signed by the CA certificate
with the latest expiration date.
-5. Validates, and if necessary, creates or updates the Kubernetes webhook
configurations named `yunikorn-admission-controller-validations` and
`yunikorn-admission-controller-mutations`. If the CA certificates have changed,
the webhooks will also be updated. These webhooks allow the Kubernetes API
server to connect to the admission controller service to perform configmap
validations and pod mutations.
+5. Validates, and if necessary, creates or updates the Kubernetes webhook
configurations named `yunikorn-admission-controller-validations` and
`yunikorn-admission-controller-mutations`. If the CA certificates have changed,
the webhooks will also be updated. These webhooks allow the Kubernetes API
server to connect to the admission controller service to perform configmap
validations and pod mutations.
6. Starts up the admission controller HTTPS server.
Additionally, the admission controller also starts a background task to wait
for CA certificates to expire. Once either certificate is expiring within the
next 30 days, new CA and server certificates are generated, the webhook
configurations are updated, and the HTTPS server is quickly restarted. This
ensures that certificates rotate properly without downtime.
diff --git a/docs/design/gang_scheduling.md b/docs/design/gang_scheduling.md
index d73421e3bd..54d3ba1b39 100644
--- a/docs/design/gang_scheduling.md
+++ b/docs/design/gang_scheduling.md
@@ -167,15 +167,15 @@ For gang scheduling we have a simple one new to one
release relation in the case
The scheduler processes the AllocationAsk as follows:
1. Check if the application has an unreleased allocation for a placeholder
allocation with the same _taskGroupName._ If no placeholder allocations are
found a normal allocation cycle will be used to allocate the request.
-2. A placeholder allocation is selected and marked for release. A request to
release the placeholder allocation is communicated to the shim. This must be an
async process as the shim release process is dependent on the underlying K8s
response which might not be instantaneous.
+2. A placeholder allocation is selected and marked for release. A request to
release the placeholder allocation is communicated to the shim. This must be an
async process as the shim release process is dependent on the underlying K8s
response which might not be instantaneous.
NOTE: no allocations are released in the core at this point in time.
-3. The core “parks” the processing of the real AllocationAsk until the shim
has responded with a confirmation that the placeholder allocation has been
released.
+3. The core “parks” the processing of the real AllocationAsk until the shim
has responded with a confirmation that the placeholder allocation has been
released.
NOTE: locks are released to allow scheduling to continue
4. After the confirmation of the release is received from the shim the
“parked” AllocationAsk processing is finalised.
5. The AllocationAsk is allocated on the same node as the placeholder used.
The removal of the placeholder allocation is finalised in either case. This
all needs to happen as one update to the application, queue and node.
* On success: a new Allocation is created.
- * On Failure: try to allocate on a different node, if that fails the
AllocationAsk becomes unschedulable triggering scale up.
+ * On Failure: try to allocate on a different node, if that fails the
AllocationAsk becomes unschedulable triggering scale up.
6. Communicate the allocation back to the shim (if applicable, based on step 5)
## Application completion
@@ -196,7 +196,7 @@ The time out of the _waiting_ state is new functionality.
Placeholders are not considered active allocations.
Placeholder asks are considered pending resource asks.
-These cases will be handled in the [Cleanup](#Cleanup) below.
+These cases will be handled in the [Cleanup](#cleanup) below.
### Cleanup
When we look at gang scheduling there is a further issue around unused
placeholders, placeholder asks and their cleanup.
@@ -219,7 +219,7 @@ Processing in the core thus needs to consider two cases
that will impact the tra
1. Placeholder asks pending (exit from _accepted_)
2. Placeholders allocated (exit from _waiting_)
-Placeholder asks pending:
+Placeholder asks pending:
Pending placeholder asks are handled via a timeout.
An application must only spend a limited time waiting for all placeholders to
be allocated.
This timeout is needed because an application’s partial placeholders
allocation may occupy cluster resources without really using them.
@@ -259,7 +259,7 @@ Combined flow for the shim and core during timeout of
placeholder:
* After the placeholder Allocations and Asks are released the core moves the
application to the killed state removing it from the queue (4).
* The state change is finalised in the core and shim. (5)
-Allocated placeholders:
+Allocated placeholders:
Leftover placeholders need to be released by the core.
The shim needs to be informed to remove them. This must be triggered on entry
of the _completed_ state.
After the placeholder release is requested by the core the state transition of
the application can proceed.
@@ -429,14 +429,14 @@ In patched message form that would look like:
message UpdateResponse {
...
// Released allocation(s), allocations can be released by either the RM or
scheduler.
- // The TerminationType defines which side needs to act and process the
message.
+ // The TerminationType defines which side needs to act and process the
message.
repeated AllocationRelease releasedAllocations = 3;
...
}
message AllocationReleasesRequest {
// Released allocation(s), allocations can be released by either the RM or
scheduler.
- // The TerminationType defines which side needs to act and process the
message.
+ // The TerminationType defines which side needs to act and process the
message.
repeated AllocationRelease releasedAllocations = 1;
...
}
@@ -469,7 +469,7 @@ In patched message form that would look like:
message AllocationRelease {
enum TerminationType {
STOPPED_BY_RM = 0;
- TIMEOUT = 1;
+ TIMEOUT = 1;
PREEMPTED_BY_SCHEDULER = 2;
PLACEHOLDER_REPLACED = 3;
}
@@ -481,7 +481,7 @@ message AllocationRelease {
// The UUID of the allocation to release, if not set all allocations are
released for
// the applicationID
string UUID = 3;
- // The termination type as described above
+ // The termination type as described above
TerminationType terminationType = 4;
// human-readable message
string message = 5;
@@ -525,7 +525,7 @@ message AllocationReleasesRequest {
...
// Released allocationask(s), allocationasks can be released by either the
RM or
// scheduler. The TerminationType defines which side needs to act and
process the
- // message.
+ // message.
repeated AllocationAskRelease allocationAsksToRelease = 2;
}
```
@@ -536,12 +536,12 @@ In patched message form that would look like:
message AllocationAskRelease {
enum TerminationType {
STOPPED_BY_RM = 0;
- TIMEOUT = 1;
+ TIMEOUT = 1;
PREEMPTED_BY_SCHEDULER = 2;
PLACEHOLDER_REPLACED = 3;
}
...
- // The termination type as described above
+ // The termination type as described above
TerminationType terminationType = 4;
...
}
diff --git a/docs/design/scheduler_configuration.md
b/docs/design/scheduler_configuration.md
index 13a0759097..0e7aa26a0e 100644
--- a/docs/design/scheduler_configuration.md
+++ b/docs/design/scheduler_configuration.md
@@ -54,7 +54,7 @@ Configuration to consider:
## Queue Configuration
### Queue Definition
On startup the scheduler will load the configuration for the queues from the
provided configuration file after initialising the service. If there is no
queue configuration provided the scheduler should start up with a simple
default configuration which performs a well documented default behaviour.
-Based on the kubernetes definition this configuration could be a configMap
<sup id="s1">[1](#f1)</sup> but not a CRD.
+Based on the kubernetes definition this configuration could be a
[configMap](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#should-i-use-a-configmap-or-a-custom-resource)
but not a CRD.
The queue configuration is dynamic. Changing the queue configuration must not
require a scheduler restart.
Changes should be allowed by either calling the GO based API, the REST based
API or by updating the configuration file. Changes made through the API must be
persisted in the configuration file. Making changes through an API is not a
high priority requirement and could be postponed to a later release.
@@ -166,7 +166,7 @@ Defining placement rules in the configuration requires the
following information
* Create (boolean)
* Filter:
* A regular expression or list of users/groups to apply the rule to.
-
+
The filter can be used to allow the rule to be used (default behaviour) or
deny the rule to be used. User or groups matching the filter will be either
allowed or denied.
The filter is defined as follow:
* Type:
@@ -213,7 +213,7 @@ Base point to make: a changed configuration should not
impact the currently runn
### Access Control Lists
The scheduler ACL is independent of the queue ACLs. A scheduler administrator
is not by default allowed to submit an application or administer the queues in
the system.
-All ACL types should use the same definition pattern. We should allow at least
POSIX user and group names which uses the portable filename character set <sup
id="s2">[2](#f2)</sup>. However we should take into account that we could have
domain specifiers based on the environment that the system runs in (@ sign as
per HADOOP-12751).
+All ACL types should use the same definition pattern. We should allow at least
POSIX user and group names which uses the portable filename character set <a
href="#footnote1"><sup>[1]</sup></a>. However we should take into account that
we could have domain specifiers based on the environment that the system runs
in (@ sign as per HADOOP-12751).
By default access control is enabled and access is denied. The only special
case is for the core scheduler which automatically adds the system user, the
scheduler process owner, to the scheduler ACL. The scheduler process owner is
allowed to make sure that the process owner can use the API to call any
administrative actions.
@@ -241,6 +241,7 @@ The full configuration of the K8s shim is still under
development.
The full configuration of the YARN shim is still under development.
---
-<br/><b id="f1"></b>1:
https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#should-i-use-a-configmap-or-a-custom-resource.
[↩](#s1)
-<br/><b id="f2"></b>2: The set of characters from which portable filenames are
constructed. [↩](#s2)
-<br/>`A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j
k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 . _ -`
+<p id="footnote1">
+<strong>1.</strong> The set of characters from which portable filenames are
constructed.<br/>
+<code>A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j
k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 . _ -</code>
+</p>
\ No newline at end of file
diff --git a/docs/developer_guide/dependencies.md
b/docs/developer_guide/dependencies.md
index d458798df2..446bc7ffc9 100644
--- a/docs/developer_guide/dependencies.md
+++ b/docs/developer_guide/dependencies.md
@@ -47,8 +47,8 @@ require (
)
```
Release branches **must** not use pseudo versions.
-During the creation of a release,
[tags](/community/release_procedure#tag-and-update-release-for-version) will be
created.
-These tags will be used as the reference in the go.mod files for the release.
+During the creation of a release,
[tags](/community/release_procedure#tag-for-release) will be created.
+These tags will be used as the reference in the go.mod files for the release.
## Enforcement of pseudo version
In the pull request checks for the `yunikorn-core` and `yunikorn-k8shim`
repositories enforce the format of the versions.
@@ -57,7 +57,7 @@ repositories in the `master` branch is not a pseudo version.
The check enforces that the start of the version reference is `v.0.0.0-`
-Pseudo versions are not enforced in the release branches as per [why a pseudo
version](#why-a-pseudo-version) explanation above.
+Pseudo versions are not enforced in the release branches as per [why a pseudo
version](#why-a-pseudo-version) explanation above.
## Updating the core dependency
Before updating the core dependency must make sure that the scheduler
interface changes are finalised.
@@ -100,7 +100,7 @@ It is therefor that steps 5 and 8 are performed to make
sure there is no regress
## Generating a pseudo version
A pseudo references for use in a go.mod file is based on the commit hash and
timestamp.
-It is simple to generate one using the following steps:
+It is simple to generate one using the following steps:
1. Change to the repository for which the new pseudo version needs to be
generated.
2. Update the local checked out code for the master branch to get the latest
commits
diff --git a/docs/performance/performance_tutorial.md
b/docs/performance/performance_tutorial.md
index f45bbf2648..c2c3e7330e 100644
--- a/docs/performance/performance_tutorial.md
+++ b/docs/performance/performance_tutorial.md
@@ -83,12 +83,12 @@ root hard nofile 50000
Before going into the details, here are the general steps used in our tests:
-- [Step 1](#Kubernetes): Properly configure Kubernetes API server and
controller manager, then add worker nodes.
-- [Step 2](#Setup-Kubemark): Deploy hollow pods,which will simulate worker
nodes, name hollow nodes. After all hollow nodes in ready status, we need to
cordon all native nodes, which are physical presence in the cluster, not the
simulated nodes, to avoid we allocated test workload pod to native nodes.
-- [Step 3](#Deploy-YuniKorn): Deploy YuniKorn using the Helm chart on the
master node, and scale down the Deployment to 0 replica, and [modify the
port](#Setup-Prometheus) in `prometheus.yml` to match the port of the service.
-- [Step 4](#Run-tests): Deploy 50k Nginx pods for testing, and the API server
will create them. But since the YuniKorn scheduler Deployment has been scaled
down to 0 replica, all Nginx pods will be stuck in pending.
+- [Step 1](#kubernetes): Properly configure Kubernetes API server and
controller manager, then add worker nodes.
+- [Step 2](#setup-kubemark): Deploy hollow pods,which will simulate worker
nodes, name hollow nodes. After all hollow nodes in ready status, we need to
cordon all native nodes, which are physical presence in the cluster, not the
simulated nodes, to avoid we allocated test workload pod to native nodes.
+- [Step 3](#deploy-yunikorn): Deploy YuniKorn using the Helm chart on the
master node, and scale down the Deployment to 0 replica, and [modify the
port](#setup-prometheus) in `prometheus.yml` to match the port of the service.
+- [Step 4](#run-tests): Deploy 50k Nginx pods for testing, and the API server
will create them. But since the YuniKorn scheduler Deployment has been scaled
down to 0 replica, all Nginx pods will be stuck in pending.
- [Step 5](../user_guide/troubleshooting.md#restart-the-scheduler): Scale up
The YuniKorn Deployment back to 1 replica, and cordon the master node to avoid
YuniKorn allocating Nginx pods there. In this step, YuniKorn will start
collecting the metrics.
-- [Step 6](#Collect-and-Observe-YuniKorn-metrics): Observe the metrics exposed
in Prometheus UI.
+- [Step 6](#collect-and-observe-yunikorn-metrics): Observe the metrics exposed
in Prometheus UI.
---
## Setup Kubemark
@@ -166,12 +166,12 @@ spec:
name: hollow-node
spec:
nodeSelector: # leverage label to allocate to native node
- tag: tagName
+ tag: tagName
initContainers:
- name: init-inotify-limit
image: docker.io/busybox:latest
imagePullPolicy: IfNotPresent
- command: ['sysctl', '-w', 'fs.inotify.max_user_instances=200'] # set
as same as max_user_instance in actual node
+ command: ['sysctl', '-w', 'fs.inotify.max_user_instances=200'] # set
as same as max_user_instance in actual node
securityContext:
privileged: true
volumes:
@@ -183,7 +183,7 @@ spec:
path: /var/log
containers:
- name: hollow-kubelet
- image: 0yukali0/kubemark:1.20.10 # the kubemark image you build
+ image: 0yukali0/kubemark:1.20.10 # the kubemark image you build
imagePullPolicy: IfNotPresent
ports:
- containerPort: 4194
@@ -215,7 +215,7 @@ spec:
securityContext:
privileged: true
- name: hollow-proxy
- image: 0yukali0/kubemark:1.20.10 # the kubemark image you build
+ image: 0yukali0/kubemark:1.20.10 # the kubemark image you build
imagePullPolicy: IfNotPresent
env:
- name: NODE_NAME
@@ -341,7 +341,7 @@ scrape_configs:
scrape_interval: 1s
metrics_path: '/ws/v1/metrics'
static_configs:
- - targets: ['docker.for.mac.host.internal:9080']
+ - targets: ['docker.for.mac.host.internal:9080']
# 9080 is internal port, need port forward or modify 9080 to service's port
```
@@ -355,7 +355,7 @@ scrape_configs:
Once the environment is setup, you are good to run workloads and collect
results. YuniKorn community has some useful tools to run workloads and collect
metrics, more details will be published here.
-### 1. Scenarios
+### 1. Scenarios
In performance tools, there are three types of tests and feedbacks.
| Test type |
Description
| Diagram | Log |
@@ -364,7 +364,7 @@ In performance tools, there are three types of tests and
feedbacks.
| thourghput | Measure schedulers' throughput by calculating
how many pods are allocated per second based on the pod start time |
Exist | None |
### 2. Build tool
-The performance tool is available in [yunikorn release
repo](https://github.com/apache/yunikorn-release.git),clone the repo to your
workspace.
+The performance tool is available in [yunikorn release
repo](https://github.com/apache/yunikorn-release.git),clone the repo to your
workspace.
```
git clone https://github.com/apache/yunikorn-release.git
```
@@ -388,7 +388,7 @@ If you set these fields with large number to cause timeout
problem, increase val
| --- | ---
|
| SchedulerNames | List of scheduler will run the test
|
| ShowNumOfLastTasks | Show metadata of last number of pods
|
-| CleanUpDelayMs | Controll period to refresh deployments
status and print log |
+| CleanUpDelayMs | Controll period to refresh deployments
status and print log |
| RequestConfigs | Set resource request and decide number
of deployments and pods per deployment with `repeat` and `numPods` |
In this case,yunikorn and default scheduler will sequentially separately
create ten deployments which contains fifty pods.
@@ -493,7 +493,7 @@ In the Kubernetes API server, we need to modify two
parameters: `max-mutating-re
#### Controller-Manager
-In the Kubernetes controller manager, we need to increase the value of three
parameters: `node-cidr-mask-size`, `kube-api-burst` and `kube-api-qps`.
`kube-api-burst` and `kube-api-qps` control the server side request bandwidth.
`node-cidr-mask-size` represents the node CIDR. it needs to be increased as
well in order to scale up to thousands of nodes.
+In the Kubernetes controller manager, we need to increase the value of three
parameters: `node-cidr-mask-size`, `kube-api-burst` and `kube-api-qps`.
`kube-api-burst` and `kube-api-qps` control the server side request bandwidth.
`node-cidr-mask-size` represents the node CIDR. it needs to be increased as
well in order to scale up to thousands of nodes.
Modify `/etc/kubernetes/manifest/kube-controller-manager.yaml`:
diff --git a/docs/user_guide/observability/prometheus.md
b/docs/user_guide/observability/prometheus.md
index b5acf7f446..72b47e355d 100644
--- a/docs/user_guide/observability/prometheus.md
+++ b/docs/user_guide/observability/prometheus.md
@@ -26,7 +26,7 @@ YuniKorn exposes its scheduling metrics via Prometheus. Thus,
we need to set up
We will provide two methods for building Prometheus: either running it locally
or using Helm to deploy it in your cluster. Additionally, in the Helm version,
we will explain how to integrate it with Grafana and provide generic Grafana
Dashboards for monitoring Yunikorn's metrics and observing the changes over
time.
-If you don't know what metric can be used, you can use [REST
API](../../api/scheduler.md#metrics).
+If you don't know what metric can be used, you can use [REST
API](../../api/cluster.md#metrics).
## Run Prometheus locally
@@ -55,7 +55,7 @@ scrape_configs:
scrape_interval: 1s
metrics_path: '/ws/v1/metrics'
static_configs:
- - targets: ['localhost:9080']
+ - targets: ['localhost:9080']
# 9080 is internal port, need port forward or modify 9080 to service's port
```
@@ -67,7 +67,7 @@ Port forwarding for the core's web service on the standard
port can be turned on
kubectl port-forward svc/yunikorn-service 9080:9080 -n yunikorn
```
-`9080`is the default port for core's web service.
+`9080`is the default port for core's web service.
### 4. Execute prometheus
@@ -88,13 +88,13 @@ You can also verify that Prometheus is serving metrics by
navigating to its metr
## Deploy Prometheus and Grafana in a cluster.
### 1. Add Prometheus repository to helm
-
+
```yaml
# add helm repo
helm repo add prometheus-community
https://prometheus-community.github.io/helm-charts
helm repo update
```
-
+
### 2. Use helm to create Prometheus
```yaml
@@ -137,7 +137,7 @@ kubectl apply -f yunikorn-service-monitor.yaml
```
### 4. Access the Prometheus Web UI
-
+
```shell
kubectl port-forward -n prometheus svc/prometheus-kube-prometheus-prometheus
9090:9090
```
@@ -159,14 +159,14 @@ kubectl port-forward -n prometheus svc/prometheus-grafana
7070:80
After running port-forward, you can enter
[localhost:7070](http://localhost:7070) to access grafana, and in the login
page, enter account:`admin` ,password:`prom-operator`.

-
+
### Download JSON files for Yunikorn Dashboard
-
+
A dashboard consists of multiple panels that are organized and arranged in
rows. Each panel has the ability to interact with data from any Grafana data
source that has been configured. For more detailed information, please refer to
the [Grafana Dashboards](https://grafana.com/docs/grafana/latest/dashboards).
We provide a sample dashboard JSON file. To access it, you can navigate to the
[`/deployments/grafana-dashboard`
directory](https://github.com/apache/yunikorn-k8shim/blob/master/deployments/grafana-dashboard)
in the Yunikorn-k8shim repository.
-You can refer to the [REST API](../../api/scheduler.md#metrics) to build your
own custom Dashboard.
+You can refer to the [REST API](../../api/cluster.md#metrics) to build your
own custom Dashboard.
### Import the JSON files in the Dashboard
diff --git a/src/pages/community/how_to_contribute.md
b/src/pages/community/how_to_contribute.md
index c2dd772a44..0cd6c35853 100644
--- a/src/pages/community/how_to_contribute.md
+++ b/src/pages/community/how_to_contribute.md
@@ -56,12 +56,12 @@ JIRAs that have a pull requests linked will have the label
`pull-request-availab
For anything that is more than a trivial change, like a typo or one line code
change, it’s a good idea to discuss your intended approach on the issue.
You are much more likely to have your patch reviewed and committed if you’ve
already got buy-in from the YuniKorn community before you start writing the fix.
-If you cannot assign the JIRA to yourself ask the community to help assign it
and add you to the contributors list in JIRA.
+If you cannot assign the JIRA to yourself ask the community to help assign it
and add you to the contributors list in JIRA.
## Fix an issue
Fixes or improvement must be created on the `master` branch.
Fork the relevant YuniKorn project into your own project and checkout the
`master` branch.
-If the same issue exist in an earlier release branch it can be back ported
after the fix has been added to master.
+If the same issue exist in an earlier release branch it can be back ported
after the fix has been added to master.
Make sure that you have an up-to-date code revision checked out before you
start. Use `git status` to see if you are up-to-date.
Create a branch to work on, a good name to use is the JIRA ID you are working
on.
@@ -77,14 +77,14 @@ In general, if you find a bug while working on a specific
feature, file a JIRA f
This helps us to differentiate between bug fixes and features and allows us to
build stable maintenance releases.
Make sure you have observed the recommendations in the [coding
guidelines](/community/coding_guidelines).
-Before you commit your changes and create a pull request based on your changes
you should run the code checks.
+Before you commit your changes and create a pull request based on your changes
you should run the code checks.
These same checks are run as part of the pull request workflow.
The pull request workflow performs the following checks:
* Apache license check: `make license-check`.
* Go lint check: `make lint`.
* Full unit test suite: `make test`.
-These three checks should pass locally before opening a pull request.
+These three checks should pass locally before opening a pull request.
As part of the pull request workflow all checks must pass before the change is
committed.
For first time contributors to a repository the automated pull request
workflow must be approved by a committer.
Once the workflow has run and has given a pass (a +1 vote) a committer will
review the patch.
@@ -97,7 +97,7 @@ They can be executed locally via
`yunikorn-k8shim/scripts/run-e2e-tests.sh`.
Finally, please write a good, clear commit message, with a short, descriptive
title.
The descriptive title must start with the JIRA ID you are working on.
An example is: `[YUNIKORN-2] Support Gang Scheduling`
-The body of the commit message is used to describe the change made.
+The body of the commit message is used to describe the change made.
The whole commit message will be used to pre-fill the pull request information.
The body of the first commit message will be added to the PR Template.
The JIRA ID in the message will automatically link the pull request and the
JIRA this is an essential part of tracking JIRA progress.
@@ -108,7 +108,7 @@ The dependencies are only relevant for the code
repositories.
| repository | depends on |
|------------------------------|---------------------------------------------|
-| yunikorn-core | yunikorn-scheduler-interface |
+| yunikorn-core | yunikorn-scheduler-interface |
| yunikorn-k8shim | yunikorn-scheduler-interface, yunikorn-core |
| yunikorn-scheduler-interface | none |
| yunikorn-web | yunikorn-core |
@@ -124,7 +124,7 @@ We follow the version numbering as described in the
[version numbering](https://
The master branch *must* use a pseudo version.
See the [Go module dependencies](/docs/next/developer_guide/dependencies) for
updating the pseudo version.
The release branches *must* use branch version.
-See the [release
procedure](/community/release_procedure#tag-and-update-release-for-version) on
when and how to update the tags and references during the release creation.
+See the [release procedure](/community/release_procedure#tag-for-release) on
when and how to update the tags and references during the release creation.
## Documentation updates
Documentation is published and maintained as part of the website.
@@ -164,7 +164,7 @@ After making the changes you can build the website locally
in development mode w
```shell script
./local-build.sh run
```
-The only requirement is that you have docker installed as the build and server
will be run inside a docker container.
+The only requirement is that you have docker installed as the build and server
will be run inside a docker container.
## Create a pull request
Please create a pull request on github with your patch.
@@ -192,7 +192,7 @@ One for the upcoming version for the master and one for
each branch the fix is p
There are three options for committing a change:
* use the script (recommended)
* manually using the git command line
-* use the GitHub web UI "squash and merge" button
+* use the GitHub web UI "squash and merge" button
A [simple shell
script](https://github.com/apache/yunikorn-release/tree/master/tools/merge_pr.sh)
to help with a squash and merge is part of the release repository.
The script handles the checkout, merge and commit message preparation.
@@ -211,7 +211,7 @@ this can cause author names like "username
\<[email protected]
:::
Commit messages **must** comply to a simple set of rules:
-* Subject line is the title of the change formatted as follows: `[JIRA
reference] subject (#PR ID)`.
+* Subject line is the title of the change formatted as follows: `[JIRA
reference] subject (#PR ID)`.
* Second line must be empty, separates the subject from the body.
* The body of the message contains the description of the change.
* All lines in the commit message should be wrapped at 72 characters.
diff --git a/versioned_docs/version-1.6.3/api/system.md
b/versioned_docs/version-1.6.3/api/system.md
index cf7b82fc4b..a74cc6632d 100644
--- a/versioned_docs/version-1.6.3/api/system.md
+++ b/versioned_docs/version-1.6.3/api/system.md
@@ -66,7 +66,7 @@ Note that this list is not guaranteed to remain stable and
can change from relea
**Content examples**
-The output of this REST query can be rather large, and it is a combination of
those which have already been documented as part of the [scheduler
API](scheduler.md#Overview).
+The output of this REST query can be rather large, and it is a combination of
those which have already been documented as part of the [scheduler
API](scheduler.md).
The `RMDiagnostics` shows the content of the K8Shim cache. The exact content
is version dependent and is not stable.
The current content shows the cached objects:
@@ -81,7 +81,7 @@ The current content shows the cached objects:
## Go routine info
-Dumps the stack traces of the currently running goroutines. This is a similar
view as provided in the [pprof goroutine](#pprof-goroutine) in a human-readable
form.
+Dumps the stack traces of the currently running goroutines. This is a similar
view as provided in the [pprof goroutine](#pprof-goroutine) in a human-readable
form.
**URL** : `/debug/stack`
@@ -317,7 +317,7 @@ trace: A trace of execution of the current program. You can
specify the duration
num_symbols: 1
```
-## pprof trace
+## pprof trace
**URL** : `/debug/pprof/trace`
diff --git a/versioned_docs/version-1.6.3/archived_design/k8shim.md
b/versioned_docs/version-1.6.3/archived_design/k8shim.md
index 01f08db2d9..7b1a4d3248 100644
--- a/versioned_docs/version-1.6.3/archived_design/k8shim.md
+++ b/versioned_docs/version-1.6.3/archived_design/k8shim.md
@@ -55,7 +55,7 @@ and a [validation
webhook](https://kubernetes.io/docs/reference/access-authn-aut
to immediately transition from the `Starting` to `Running` state so
that it will not block other applications.
2. The `validation webhook` validates the configuration set in the configmap
- This is used to prevent writing malformed configuration into the
configmap.
- - The validation webhook calls scheduler [validation REST
API](api/scheduler.md#configuration-validation) to validate configmap updates.
+ - The validation webhook calls scheduler [validation REST
API](api/cluster.md#configuration-validation) to validate configmap updates.
### Admission controller deployment
@@ -66,7 +66,7 @@ On startup, the admission controller performs a series of
tasks to ensure that i
2. If the secret cannot be found or either CA certificate is within 90 days of
expiration, generates new certificate(s). If a certificate is expiring, a new
one is generated with an expiration of 12 months in the future. If both
certificates are missing or expiring, the second certificate is generated with
an expiration of 6 months in the future. This ensures that both certificates do
not expire at the same time, and that there is an overlap of trusted
certificates.
3. If the CA certificates were created or updated, writes the secrets back to
Kubernetes.
4. Generates an ephemeral TLS server certificate signed by the CA certificate
with the latest expiration date.
-5. Validates, and if necessary, creates or updates the Kubernetes webhook
configurations named `yunikorn-admission-controller-validations` and
`yunikorn-admission-controller-mutations`. If the CA certificates have changed,
the webhooks will also be updated. These webhooks allow the Kubernetes API
server to connect to the admission controller service to perform configmap
validations and pod mutations.
+5. Validates, and if necessary, creates or updates the Kubernetes webhook
configurations named `yunikorn-admission-controller-validations` and
`yunikorn-admission-controller-mutations`. If the CA certificates have changed,
the webhooks will also be updated. These webhooks allow the Kubernetes API
server to connect to the admission controller service to perform configmap
validations and pod mutations.
6. Starts up the admission controller HTTPS server.
Additionally, the admission controller also starts a background task to wait
for CA certificates to expire. Once either certificate is expiring within the
next 30 days, new CA and server certificates are generated, the webhook
configurations are updated, and the HTTPS server is quickly restarted. This
ensures that certificates rotate properly without downtime.
diff --git a/versioned_docs/version-1.6.3/design/gang_scheduling.md
b/versioned_docs/version-1.6.3/design/gang_scheduling.md
index 44560ef11b..54d3ba1b39 100644
--- a/versioned_docs/version-1.6.3/design/gang_scheduling.md
+++ b/versioned_docs/version-1.6.3/design/gang_scheduling.md
@@ -60,9 +60,9 @@ Combined flow for the shim and core during startup of an
application:
* The placeholder AllocationAsk’s are scheduled by the core as if they were
normal AllocationAsk’s. (5)
* All Allocations, even if they are the result of the placeholder
AllocationAsks being allocated by the scheduler, are communicated back to the
shim.
* The original real pod is passed to the core as an AllocationAsk. (6)
-* After the real pod and all all the placeholder pods are scheduled the shim
starts the real pod that triggered the application creation. (7)
+* After the real pod and all the placeholder pods are scheduled the shim
starts the real pod that triggered the application creation. (7)
-After the first, real, pod is started the following pods should all be handled
in the same way (8):
+After the first real pod is started, the following pods should all be handled
in the same way (8):
* A real pod is created on k8s.
* The pod is processed and an AllocationAsk is created.
* The scheduler processes the AllocationAsk (more detail below) and replaces
a placeholder with the real allocation.
@@ -167,15 +167,15 @@ For gang scheduling we have a simple one new to one
release relation in the case
The scheduler processes the AllocationAsk as follows:
1. Check if the application has an unreleased allocation for a placeholder
allocation with the same _taskGroupName._ If no placeholder allocations are
found a normal allocation cycle will be used to allocate the request.
-2. A placeholder allocation is selected and marked for release. A request to
release the placeholder allocation is communicated to the shim. This must be an
async process as the shim release process is dependent on the underlying K8s
response which might not be instantaneous.
+2. A placeholder allocation is selected and marked for release. A request to
release the placeholder allocation is communicated to the shim. This must be an
async process as the shim release process is dependent on the underlying K8s
response which might not be instantaneous.
NOTE: no allocations are released in the core at this point in time.
-3. The core “parks” the processing of the real AllocationAsk until the shim
has responded with a confirmation that the placeholder allocation has been
released.
+3. The core “parks” the processing of the real AllocationAsk until the shim
has responded with a confirmation that the placeholder allocation has been
released.
NOTE: locks are released to allow scheduling to continue
4. After the confirmation of the release is received from the shim the
“parked” AllocationAsk processing is finalised.
5. The AllocationAsk is allocated on the same node as the placeholder used.
The removal of the placeholder allocation is finalised in either case. This
all needs to happen as one update to the application, queue and node.
* On success: a new Allocation is created.
- * On Failure: try to allocate on a different node, if that fails the
AllocationAsk becomes unschedulable triggering scale up.
+ * On Failure: try to allocate on a different node, if that fails the
AllocationAsk becomes unschedulable triggering scale up.
6. Communicate the allocation back to the shim (if applicable, based on step 5)
## Application completion
@@ -196,7 +196,7 @@ The time out of the _waiting_ state is new functionality.
Placeholders are not considered active allocations.
Placeholder asks are considered pending resource asks.
-These cases will be handled in the [Cleanup](#Cleanup) below.
+These cases will be handled in the [Cleanup](#cleanup) below.
### Cleanup
When we look at gang scheduling there is a further issue around unused
placeholders, placeholder asks and their cleanup.
@@ -219,7 +219,7 @@ Processing in the core thus needs to consider two cases
that will impact the tra
1. Placeholder asks pending (exit from _accepted_)
2. Placeholders allocated (exit from _waiting_)
-Placeholder asks pending:
+Placeholder asks pending:
Pending placeholder asks are handled via a timeout.
An application must only spend a limited time waiting for all placeholders to
be allocated.
This timeout is needed because an application’s partial placeholders
allocation may occupy cluster resources without really using them.
@@ -259,7 +259,7 @@ Combined flow for the shim and core during timeout of
placeholder:
* After the placeholder Allocations and Asks are released the core moves the
application to the killed state removing it from the queue (4).
* The state change is finalised in the core and shim. (5)
-Allocated placeholders:
+Allocated placeholders:
Leftover placeholders need to be released by the core.
The shim needs to be informed to remove them. This must be triggered on entry
of the _completed_ state.
After the placeholder release is requested by the core the state transition of
the application can proceed.
@@ -429,14 +429,14 @@ In patched message form that would look like:
message UpdateResponse {
...
// Released allocation(s), allocations can be released by either the RM or
scheduler.
- // The TerminationType defines which side needs to act and process the
message.
+ // The TerminationType defines which side needs to act and process the
message.
repeated AllocationRelease releasedAllocations = 3;
...
}
message AllocationReleasesRequest {
// Released allocation(s), allocations can be released by either the RM or
scheduler.
- // The TerminationType defines which side needs to act and process the
message.
+ // The TerminationType defines which side needs to act and process the
message.
repeated AllocationRelease releasedAllocations = 1;
...
}
@@ -469,7 +469,7 @@ In patched message form that would look like:
message AllocationRelease {
enum TerminationType {
STOPPED_BY_RM = 0;
- TIMEOUT = 1;
+ TIMEOUT = 1;
PREEMPTED_BY_SCHEDULER = 2;
PLACEHOLDER_REPLACED = 3;
}
@@ -481,7 +481,7 @@ message AllocationRelease {
// The UUID of the allocation to release, if not set all allocations are
released for
// the applicationID
string UUID = 3;
- // The termination type as described above
+ // The termination type as described above
TerminationType terminationType = 4;
// human-readable message
string message = 5;
@@ -525,7 +525,7 @@ message AllocationReleasesRequest {
...
// Released allocationask(s), allocationasks can be released by either the
RM or
// scheduler. The TerminationType defines which side needs to act and
process the
- // message.
+ // message.
repeated AllocationAskRelease allocationAsksToRelease = 2;
}
```
@@ -536,12 +536,12 @@ In patched message form that would look like:
message AllocationAskRelease {
enum TerminationType {
STOPPED_BY_RM = 0;
- TIMEOUT = 1;
+ TIMEOUT = 1;
PREEMPTED_BY_SCHEDULER = 2;
PLACEHOLDER_REPLACED = 3;
}
...
- // The termination type as described above
+ // The termination type as described above
TerminationType terminationType = 4;
...
}
diff --git a/versioned_docs/version-1.6.3/design/scheduler_configuration.md
b/versioned_docs/version-1.6.3/design/scheduler_configuration.md
index 13a0759097..0e7aa26a0e 100644
--- a/versioned_docs/version-1.6.3/design/scheduler_configuration.md
+++ b/versioned_docs/version-1.6.3/design/scheduler_configuration.md
@@ -54,7 +54,7 @@ Configuration to consider:
## Queue Configuration
### Queue Definition
On startup the scheduler will load the configuration for the queues from the
provided configuration file after initialising the service. If there is no
queue configuration provided the scheduler should start up with a simple
default configuration which performs a well documented default behaviour.
-Based on the kubernetes definition this configuration could be a configMap
<sup id="s1">[1](#f1)</sup> but not a CRD.
+Based on the kubernetes definition this configuration could be a
[configMap](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#should-i-use-a-configmap-or-a-custom-resource)
but not a CRD.
The queue configuration is dynamic. Changing the queue configuration must not
require a scheduler restart.
Changes should be allowed by either calling the GO based API, the REST based
API or by updating the configuration file. Changes made through the API must be
persisted in the configuration file. Making changes through an API is not a
high priority requirement and could be postponed to a later release.
@@ -166,7 +166,7 @@ Defining placement rules in the configuration requires the
following information
* Create (boolean)
* Filter:
* A regular expression or list of users/groups to apply the rule to.
-
+
The filter can be used to allow the rule to be used (default behaviour) or
deny the rule to be used. User or groups matching the filter will be either
allowed or denied.
The filter is defined as follow:
* Type:
@@ -213,7 +213,7 @@ Base point to make: a changed configuration should not
impact the currently runn
### Access Control Lists
The scheduler ACL is independent of the queue ACLs. A scheduler administrator
is not by default allowed to submit an application or administer the queues in
the system.
-All ACL types should use the same definition pattern. We should allow at least
POSIX user and group names which uses the portable filename character set <sup
id="s2">[2](#f2)</sup>. However we should take into account that we could have
domain specifiers based on the environment that the system runs in (@ sign as
per HADOOP-12751).
+All ACL types should use the same definition pattern. We should allow at least
POSIX user and group names which uses the portable filename character set <a
href="#footnote1"><sup>[1]</sup></a>. However we should take into account that
we could have domain specifiers based on the environment that the system runs
in (@ sign as per HADOOP-12751).
By default access control is enabled and access is denied. The only special
case is for the core scheduler which automatically adds the system user, the
scheduler process owner, to the scheduler ACL. The scheduler process owner is
allowed to make sure that the process owner can use the API to call any
administrative actions.
@@ -241,6 +241,7 @@ The full configuration of the K8s shim is still under
development.
The full configuration of the YARN shim is still under development.
---
-<br/><b id="f1"></b>1:
https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#should-i-use-a-configmap-or-a-custom-resource.
[↩](#s1)
-<br/><b id="f2"></b>2: The set of characters from which portable filenames are
constructed. [↩](#s2)
-<br/>`A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j
k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 . _ -`
+<p id="footnote1">
+<strong>1.</strong> The set of characters from which portable filenames are
constructed.<br/>
+<code>A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j
k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 . _ -</code>
+</p>
\ No newline at end of file
diff --git a/versioned_docs/version-1.6.3/developer_guide/dependencies.md
b/versioned_docs/version-1.6.3/developer_guide/dependencies.md
index d458798df2..446bc7ffc9 100644
--- a/versioned_docs/version-1.6.3/developer_guide/dependencies.md
+++ b/versioned_docs/version-1.6.3/developer_guide/dependencies.md
@@ -47,8 +47,8 @@ require (
)
```
Release branches **must** not use pseudo versions.
-During the creation of a release,
[tags](/community/release_procedure#tag-and-update-release-for-version) will be
created.
-These tags will be used as the reference in the go.mod files for the release.
+During the creation of a release,
[tags](/community/release_procedure#tag-for-release) will be created.
+These tags will be used as the reference in the go.mod files for the release.
## Enforcement of pseudo version
In the pull request checks for the `yunikorn-core` and `yunikorn-k8shim`
repositories enforce the format of the versions.
@@ -57,7 +57,7 @@ repositories in the `master` branch is not a pseudo version.
The check enforces that the start of the version reference is `v.0.0.0-`
-Pseudo versions are not enforced in the release branches as per [why a pseudo
version](#why-a-pseudo-version) explanation above.
+Pseudo versions are not enforced in the release branches as per [why a pseudo
version](#why-a-pseudo-version) explanation above.
## Updating the core dependency
Before updating the core dependency must make sure that the scheduler
interface changes are finalised.
@@ -100,7 +100,7 @@ It is therefor that steps 5 and 8 are performed to make
sure there is no regress
## Generating a pseudo version
A pseudo references for use in a go.mod file is based on the commit hash and
timestamp.
-It is simple to generate one using the following steps:
+It is simple to generate one using the following steps:
1. Change to the repository for which the new pseudo version needs to be
generated.
2. Update the local checked out code for the master branch to get the latest
commits
diff --git a/versioned_docs/version-1.6.3/performance/performance_tutorial.md
b/versioned_docs/version-1.6.3/performance/performance_tutorial.md
index f45bbf2648..c2c3e7330e 100644
--- a/versioned_docs/version-1.6.3/performance/performance_tutorial.md
+++ b/versioned_docs/version-1.6.3/performance/performance_tutorial.md
@@ -83,12 +83,12 @@ root hard nofile 50000
Before going into the details, here are the general steps used in our tests:
-- [Step 1](#Kubernetes): Properly configure Kubernetes API server and
controller manager, then add worker nodes.
-- [Step 2](#Setup-Kubemark): Deploy hollow pods,which will simulate worker
nodes, name hollow nodes. After all hollow nodes in ready status, we need to
cordon all native nodes, which are physical presence in the cluster, not the
simulated nodes, to avoid we allocated test workload pod to native nodes.
-- [Step 3](#Deploy-YuniKorn): Deploy YuniKorn using the Helm chart on the
master node, and scale down the Deployment to 0 replica, and [modify the
port](#Setup-Prometheus) in `prometheus.yml` to match the port of the service.
-- [Step 4](#Run-tests): Deploy 50k Nginx pods for testing, and the API server
will create them. But since the YuniKorn scheduler Deployment has been scaled
down to 0 replica, all Nginx pods will be stuck in pending.
+- [Step 1](#kubernetes): Properly configure Kubernetes API server and
controller manager, then add worker nodes.
+- [Step 2](#setup-kubemark): Deploy hollow pods,which will simulate worker
nodes, name hollow nodes. After all hollow nodes in ready status, we need to
cordon all native nodes, which are physical presence in the cluster, not the
simulated nodes, to avoid we allocated test workload pod to native nodes.
+- [Step 3](#deploy-yunikorn): Deploy YuniKorn using the Helm chart on the
master node, and scale down the Deployment to 0 replica, and [modify the
port](#setup-prometheus) in `prometheus.yml` to match the port of the service.
+- [Step 4](#run-tests): Deploy 50k Nginx pods for testing, and the API server
will create them. But since the YuniKorn scheduler Deployment has been scaled
down to 0 replica, all Nginx pods will be stuck in pending.
- [Step 5](../user_guide/troubleshooting.md#restart-the-scheduler): Scale up
The YuniKorn Deployment back to 1 replica, and cordon the master node to avoid
YuniKorn allocating Nginx pods there. In this step, YuniKorn will start
collecting the metrics.
-- [Step 6](#Collect-and-Observe-YuniKorn-metrics): Observe the metrics exposed
in Prometheus UI.
+- [Step 6](#collect-and-observe-yunikorn-metrics): Observe the metrics exposed
in Prometheus UI.
---
## Setup Kubemark
@@ -166,12 +166,12 @@ spec:
name: hollow-node
spec:
nodeSelector: # leverage label to allocate to native node
- tag: tagName
+ tag: tagName
initContainers:
- name: init-inotify-limit
image: docker.io/busybox:latest
imagePullPolicy: IfNotPresent
- command: ['sysctl', '-w', 'fs.inotify.max_user_instances=200'] # set
as same as max_user_instance in actual node
+ command: ['sysctl', '-w', 'fs.inotify.max_user_instances=200'] # set
as same as max_user_instance in actual node
securityContext:
privileged: true
volumes:
@@ -183,7 +183,7 @@ spec:
path: /var/log
containers:
- name: hollow-kubelet
- image: 0yukali0/kubemark:1.20.10 # the kubemark image you build
+ image: 0yukali0/kubemark:1.20.10 # the kubemark image you build
imagePullPolicy: IfNotPresent
ports:
- containerPort: 4194
@@ -215,7 +215,7 @@ spec:
securityContext:
privileged: true
- name: hollow-proxy
- image: 0yukali0/kubemark:1.20.10 # the kubemark image you build
+ image: 0yukali0/kubemark:1.20.10 # the kubemark image you build
imagePullPolicy: IfNotPresent
env:
- name: NODE_NAME
@@ -341,7 +341,7 @@ scrape_configs:
scrape_interval: 1s
metrics_path: '/ws/v1/metrics'
static_configs:
- - targets: ['docker.for.mac.host.internal:9080']
+ - targets: ['docker.for.mac.host.internal:9080']
# 9080 is internal port, need port forward or modify 9080 to service's port
```
@@ -355,7 +355,7 @@ scrape_configs:
Once the environment is setup, you are good to run workloads and collect
results. YuniKorn community has some useful tools to run workloads and collect
metrics, more details will be published here.
-### 1. Scenarios
+### 1. Scenarios
In performance tools, there are three types of tests and feedbacks.
| Test type |
Description
| Diagram | Log |
@@ -364,7 +364,7 @@ In performance tools, there are three types of tests and
feedbacks.
| thourghput | Measure schedulers' throughput by calculating
how many pods are allocated per second based on the pod start time |
Exist | None |
### 2. Build tool
-The performance tool is available in [yunikorn release
repo](https://github.com/apache/yunikorn-release.git),clone the repo to your
workspace.
+The performance tool is available in [yunikorn release
repo](https://github.com/apache/yunikorn-release.git),clone the repo to your
workspace.
```
git clone https://github.com/apache/yunikorn-release.git
```
@@ -388,7 +388,7 @@ If you set these fields with large number to cause timeout
problem, increase val
| --- | ---
|
| SchedulerNames | List of scheduler will run the test
|
| ShowNumOfLastTasks | Show metadata of last number of pods
|
-| CleanUpDelayMs | Controll period to refresh deployments
status and print log |
+| CleanUpDelayMs | Controll period to refresh deployments
status and print log |
| RequestConfigs | Set resource request and decide number
of deployments and pods per deployment with `repeat` and `numPods` |
In this case,yunikorn and default scheduler will sequentially separately
create ten deployments which contains fifty pods.
@@ -493,7 +493,7 @@ In the Kubernetes API server, we need to modify two
parameters: `max-mutating-re
#### Controller-Manager
-In the Kubernetes controller manager, we need to increase the value of three
parameters: `node-cidr-mask-size`, `kube-api-burst` and `kube-api-qps`.
`kube-api-burst` and `kube-api-qps` control the server side request bandwidth.
`node-cidr-mask-size` represents the node CIDR. it needs to be increased as
well in order to scale up to thousands of nodes.
+In the Kubernetes controller manager, we need to increase the value of three
parameters: `node-cidr-mask-size`, `kube-api-burst` and `kube-api-qps`.
`kube-api-burst` and `kube-api-qps` control the server side request bandwidth.
`node-cidr-mask-size` represents the node CIDR. it needs to be increased as
well in order to scale up to thousands of nodes.
Modify `/etc/kubernetes/manifest/kube-controller-manager.yaml`:
diff --git
a/versioned_docs/version-1.6.3/user_guide/observability/prometheus.md
b/versioned_docs/version-1.6.3/user_guide/observability/prometheus.md
index b5acf7f446..72b47e355d 100644
--- a/versioned_docs/version-1.6.3/user_guide/observability/prometheus.md
+++ b/versioned_docs/version-1.6.3/user_guide/observability/prometheus.md
@@ -26,7 +26,7 @@ YuniKorn exposes its scheduling metrics via Prometheus. Thus,
we need to set up
We will provide two methods for building Prometheus: either running it locally
or using Helm to deploy it in your cluster. Additionally, in the Helm version,
we will explain how to integrate it with Grafana and provide generic Grafana
Dashboards for monitoring Yunikorn's metrics and observing the changes over
time.
-If you don't know what metric can be used, you can use [REST
API](../../api/scheduler.md#metrics).
+If you don't know what metric can be used, you can use [REST
API](../../api/cluster.md#metrics).
## Run Prometheus locally
@@ -55,7 +55,7 @@ scrape_configs:
scrape_interval: 1s
metrics_path: '/ws/v1/metrics'
static_configs:
- - targets: ['localhost:9080']
+ - targets: ['localhost:9080']
# 9080 is internal port, need port forward or modify 9080 to service's port
```
@@ -67,7 +67,7 @@ Port forwarding for the core's web service on the standard
port can be turned on
kubectl port-forward svc/yunikorn-service 9080:9080 -n yunikorn
```
-`9080`is the default port for core's web service.
+`9080`is the default port for core's web service.
### 4. Execute prometheus
@@ -88,13 +88,13 @@ You can also verify that Prometheus is serving metrics by
navigating to its metr
## Deploy Prometheus and Grafana in a cluster.
### 1. Add Prometheus repository to helm
-
+
```yaml
# add helm repo
helm repo add prometheus-community
https://prometheus-community.github.io/helm-charts
helm repo update
```
-
+
### 2. Use helm to create Prometheus
```yaml
@@ -137,7 +137,7 @@ kubectl apply -f yunikorn-service-monitor.yaml
```
### 4. Access the Prometheus Web UI
-
+
```shell
kubectl port-forward -n prometheus svc/prometheus-kube-prometheus-prometheus
9090:9090
```
@@ -159,14 +159,14 @@ kubectl port-forward -n prometheus svc/prometheus-grafana
7070:80
After running port-forward, you can enter
[localhost:7070](http://localhost:7070) to access grafana, and in the login
page, enter account:`admin` ,password:`prom-operator`.

-
+
### Download JSON files for Yunikorn Dashboard
-
+
A dashboard consists of multiple panels that are organized and arranged in
rows. Each panel has the ability to interact with data from any Grafana data
source that has been configured. For more detailed information, please refer to
the [Grafana Dashboards](https://grafana.com/docs/grafana/latest/dashboards).
We provide a sample dashboard JSON file. To access it, you can navigate to the
[`/deployments/grafana-dashboard`
directory](https://github.com/apache/yunikorn-k8shim/blob/master/deployments/grafana-dashboard)
in the Yunikorn-k8shim repository.
-You can refer to the [REST API](../../api/scheduler.md#metrics) to build your
own custom Dashboard.
+You can refer to the [REST API](../../api/cluster.md#metrics) to build your
own custom Dashboard.
### Import the JSON files in the Dashboard
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]