This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-doris.git
The following commit(s) were added to refs/heads/master by this push: new 3494c89 [improvement](colocation) Add a new config to delay the relocation of colocation group (#7656) 3494c89 is described below commit 3494c8973b91d725cff1c4c20e87bcb1e6f4f300 Author: Mingyu Chen <morningman....@gmail.com> AuthorDate: Tue Jan 18 10:26:36 2022 +0800 [improvement](colocation) Add a new config to delay the relocation of colocation group (#7656) 1. Add a new FE config `colocate_group_relocate_delay_second` The relocation of a colocation group may involve a large number of tablets moving within the cluster. Therefore, we should use a more conservative strategy to avoid relocation of colocation groups as much as possible. Relocation usually occurs after a BE node goes offline or goes down. This config is used to delay the determination of BE node unavailability. The default is 30 minutes, i.e., if a BE node recovers within 30 minutes, relocation of the colocation group will not be triggered. 2. Change the priority of colocate tablet repair and balance task from HIGH to NORMAL 3. Add a new FE config allow_replica_on_same_host If set to true, when creating table, Doris will allow to locate replicas of a tablet on same host. And also the tablet repair and balance will be disabled. This is only for local test, so that we can deploy multi BE on same host and create table with multi replicas. --- docs/en/administrator-guide/config/fe_config.md | 22 ++++++ .../operation/tablet-repair-and-balance.md | 88 ++++++++++++++++++++++ docs/zh-CN/administrator-guide/config/fe_config.md | 25 +++++- .../operation/tablet-repair-and-balance.md | 86 +++++++++++++++++++++ .../main/java/org/apache/doris/catalog/Tablet.java | 47 ++++++------ .../clone/ColocateTableCheckerAndBalancer.java | 28 +++---- .../java/org/apache/doris/clone/TabletChecker.java | 9 +-- .../org/apache/doris/clone/TabletScheduler.java | 7 +- .../main/java/org/apache/doris/common/Config.java | 21 ++++++ .../main/java/org/apache/doris/system/Backend.java | 26 ++++--- .../org/apache/doris/system/SystemInfoService.java | 32 ++++---- .../java/org/apache/doris/catalog/BackendTest.java | 14 ++-- .../clone/ColocateTableCheckerAndBalancerTest.java | 18 ++--- .../doris/clone/TabletRepairAndBalanceTest.java | 1 + 14 files changed, 338 insertions(+), 86 deletions(-) diff --git a/docs/en/administrator-guide/config/fe_config.md b/docs/en/administrator-guide/config/fe_config.md index bf6a8d2..69d611a 100644 --- a/docs/en/administrator-guide/config/fe_config.md +++ b/docs/en/administrator-guide/config/fe_config.md @@ -2099,3 +2099,25 @@ Default: true IsMutable:true MasterOnly: true If set to true, the replica with slower compaction will be automatically detected and migrated to other machines. The detection condition is that the version difference between the fastest and slowest replica exceeds 100, and the difference exceeds 30% of the fastest replica + +### colocate_group_relocate_delay_second + +Default: 1800 + +Dynamically configured: true + +Only for Master FE: true + +The relocation of a colocation group may involve a large number of tablets moving within the cluster. Therefore, we should use a more conservative strategy to avoid relocation of colocation groups as much as possible. +Reloaction usually occurs after a BE node goes offline or goes down. This parameter is used to delay the determination of BE node unavailability. The default is 30 minutes, i.e., if a BE node recovers within 30 minutes, relocation of the colocation group will not be triggered. + +### allow_replica_on_same_host + +Default: false + +Dynamically configured: false + +Only for Master FE: false + +Whether to allow multiple replicas of the same tablet to be distributed on the same host. This parameter is mainly used for local testing, to facilitate building multiple BEs to test certain multi-replica situations. Do not use it for non-test environments. + diff --git a/docs/en/administrator-guide/operation/tablet-repair-and-balance.md b/docs/en/administrator-guide/operation/tablet-repair-and-balance.md index 1593cec..e924e62 100644 --- a/docs/en/administrator-guide/operation/tablet-repair-and-balance.md +++ b/docs/en/administrator-guide/operation/tablet-repair-and-balance.md @@ -684,3 +684,91 @@ The following parameters do not support modification for the time being, just fo * In some cases, the default replica repair and balancing strategy may cause the network to be full (mostly in the case of gigabit network cards and a large number of disks per BE). At this point, some parameters need to be adjusted to reduce the number of simultaneous balancing and repair tasks. * Current balancing strategies for copies of Colocate Table do not guarantee that copies of the same Tablet will not be distributed on the BE of the same host. However, the repair strategy of the copy of Colocate Table detects this distribution error and corrects it. However, it may occur that after correction, the balancing strategy regards the replicas as unbalanced and rebalances them. As a result, the Colocate Group cannot achieve stability because of the continuous alternation betwe [...] + +## Best Practices + +### Control and manage the progress of replica repair and balancing of clusters + +In most cases, Doris can automatically perform replica repair and cluster balancing by default parameter configuration. However, in some cases, we need to manually intervene to adjust the parameters to achieve some special purposes. Such as prioritizing the repair of a table or partition, disabling cluster balancing to reduce cluster load, prioritizing the repair of non-colocation table data, and so on. + +This section describes how to control and manage the progress of replica repair and balancing of the cluster by modifying the parameters. + +1. Deleting Corrupt Replicas + + In some cases, Doris may not be able to automatically detect some corrupt replicas, resulting in frequent query or import errors on the corrupt replicas. In this case, we need to delete the corrupted copies manually. This method can be used to: delete a copy with a high version number resulting in a -235 error, delete a corrupted copy of a file, etc. + + First, find the tablet id of the corresponding copy, let's say 10001, and use `show tablet 10001;` and execute the `show proc` statement to see the details of each copy of the corresponding tablet. + + Assuming that the backend id of the copy to be deleted is 20001, the following statement is executed to mark the copy as `bad`. + + ``` + ADMIN SET REPLICA STATUS PROPERTIES("tablet_id" = "10001", "backend_id" = "20001", "status" = "bad"); + ``` + + At this point, the `show proc` statement again shows that the `IsBad` column of the corresponding copy has a value of `true`. + + The replica marked as `bad` will no longer participate in imports and queries. The replica repair logic will automatically replenish a new replica at the same time. 2. + +2. prioritize repairing a table or partition + + `help admin repair table;` View help. This command attempts to repair the tablet of the specified table or partition as a priority. + +3. Stop the balancing task + + The balancing task will take up some network bandwidth and IO resources. If you wish to stop the generation of new balancing tasks, you can do so with the following command. + + ``` + ADMIN SET FRONTEND CONFIG ("disable_balance" = "true"); + ``` + +4. Stop all replica scheduling tasks + + Copy scheduling tasks include balancing and repair tasks. These tasks take up some network bandwidth and IO resources. All replica scheduling tasks (excluding those already running, including colocation tables and common tables) can be stopped with the following command. + + ``` + ADMIN SET FRONTEND CONFIG ("disable_tablet_scheduler" = "true"); + ``` + +5. Stop the copy scheduling task for all colocation tables. + + The colocation table copy scheduling is run separately and independently from the regular table. In some cases, users may wish to stop the balancing and repair of colocation tables first and use the cluster resources for normal table repair with the following command. + + ``` + ADMIN SET FRONTEND CONFIG ("disable_colocate_balance" = "true"); + ``` + +6. Repair replicas using a more conservative strategy + + Doris automatically repairs replicas when it detects missing replicas, BE downtime, etc. However, in order to reduce some errors caused by jitter (e.g., BE being down briefly), Doris delays triggering these tasks. + + * The `tablet_repair_delay_factor_second` parameter. Default 60 seconds. Depending on the priority of the repair task, it will delay triggering the repair task for 60 seconds, 120 seconds, or 180 seconds. This time can be extended so that longer exceptions can be tolerated to avoid triggering unnecessary repair tasks by using the following command. + + ``` + ADMIN SET FRONTEND CONFIG ("tablet_repair_delay_factor_second" = "120"); + ``` + +7. use a more conservative strategy to trigger redistribution of colocation groups + + Redistribution of colocation groups may be accompanied by a large number of tablet migrations. `colocate_group_relocate_delay_second` is used to control the redistribution trigger delay. The default is 1800 seconds. If a BE node is likely to be offline for a long time, you can try to increase this parameter to avoid unnecessary redistribution by. + + ``` + ADMIN SET FRONTEND CONFIG ("colocate_group_relocate_delay_second" = "3600"); + ``` + +8. Faster Replica Balancing + + Doris' replica balancing logic adds a normal replica first and then deletes the old one for the purpose of replica migration. When deleting the old replica, Doris waits for the completion of the import task that has already started on this replica to avoid the balancing task from affecting the import task. However, this will slow down the execution speed of the balancing logic. In this case, you can make Doris ignore this wait and delete the old replica directly by modifying the foll [...] + + ``` + ADMIN SET FRONTEND CONFIG ("enable_force_drop_redundant_replica" = "true"); + ``` + + This operation may cause some import tasks to fail during balancing (requiring a retry), but it will speed up balancing significantly. + +Overall, when we need to bring the cluster back to a normal state quickly, consider handling it along the following lines. + +1. find the tablet that is causing the highly optimal task to report an error and set the problematic copy to bad. +2. repair some tables with the `admin repair` statement. +3. Stop the replica balancing logic to avoid taking up cluster resources, and then turn it on again after the cluster is restored. +4. Use a more conservative strategy to trigger repair tasks to deal with the avalanche effect caused by frequent BE downtime. +5. Turn off scheduling tasks for colocation tables on-demand and focus cluster resources on repairing other high-optimality data. \ No newline at end of file diff --git a/docs/zh-CN/administrator-guide/config/fe_config.md b/docs/zh-CN/administrator-guide/config/fe_config.md index 9fe6cad..3135a98 100644 --- a/docs/zh-CN/administrator-guide/config/fe_config.md +++ b/docs/zh-CN/administrator-guide/config/fe_config.md @@ -2099,7 +2099,7 @@ load 标签清理器将每隔 `label_clean_interval_second` 运行一次以清 是否可以动态配置:true -是否为 Master FE 节点独有的配置项:false +是否为 Master FE 节点独有的配置项:true 如果设置为true,将关闭副本修复和均衡逻辑。 @@ -2109,7 +2109,7 @@ load 标签清理器将每隔 `label_clean_interval_second` 运行一次以清 是否可以动态配置:true -是否为 Master FE 节点独有的配置项:false +是否为 Master FE 节点独有的配置项:true 如果设置为true,系统会在副本调度逻辑中,立即删除冗余副本。这可能导致部分正在对对应副本写入的导入作业失败,但是会加速副本的均衡和修复速度。 当集群中有大量等待被均衡或修复的副本时,可以尝试设置此参数,以牺牲部分导入成功率为代价,加速副本的均衡和修复。 @@ -2123,3 +2123,24 @@ load 标签清理器将每隔 `label_clean_interval_second` 运行一次以清 是否为 Master FE 节点独有的配置项:true 如果设置为true,会自动检测compaction比较慢的副本,并将迁移到其他机器,检测条件是 最快和最慢副本版本差异超过100, 且差异超过最快副本的30% + +### colocate_group_relocate_delay_second + +默认值:1800 + +是否可以动态配置:true + +是否为 Master FE 节点独有的配置项:true + +重分布一个 Colocation Group 可能涉及大量的tablet迁移。因此,我们需要一个更保守的策略来避免不必要的Colocation 重分布。 +重分布通常发生在 Doris 检测到有 BE 节点宕机后。这个参数用于推迟对BE宕机的判断。如默认参数下,如果 BE 节点能够在 1800 秒内恢复,则不会触发 Colocation 重分布。 + +### allow_replica_on_same_host + +默认值:false + +是否可以动态配置:false + +是否为 Master FE 节点独有的配置项:false + +是否允许同一个 tablet 的多个副本分布在同一个 host 上。这个参数主要用于本地测试是,方便搭建多个 BE 已测试某些多副本情况。不要用于非测试环境。 diff --git a/docs/zh-CN/administrator-guide/operation/tablet-repair-and-balance.md b/docs/zh-CN/administrator-guide/operation/tablet-repair-and-balance.md index ecfcb37..8164349 100644 --- a/docs/zh-CN/administrator-guide/operation/tablet-repair-and-balance.md +++ b/docs/zh-CN/administrator-guide/operation/tablet-repair-and-balance.md @@ -683,7 +683,93 @@ TabletScheduler 在每轮调度时,都会通过 LoadBalancer 来选择一定 * 目前针对 Colocate Table 的副本的均衡策略无法保证同一个 Tablet 的副本不会分布在同一个 host 的 BE 上。但 Colocate Table 的副本的修复策略会检测到这种分布错误并校正。但可能会出现,校正后,均衡策略再次认为副本不均衡而重新均衡。从而导致在两种状态间不停交替,无法使 Colocate Group 达成稳定。针对这种情况,我们建议在使用 Colocate 属性时,尽量保证集群是同构的,以减小副本分布在同一个 host 上的概率。 +## 最佳实践 +### 控制并管理集群的副本修复和均衡进度 + +在大多数情况下,通过默认的参数配置,Doris 都可以自动的进行副本修复和集群均衡。但是某些情况下,我们需要通过人工介入调整参数,来达到一些特殊的目的。如优先修复某个表或分区、禁止集群均衡以降低集群负载、优先修复非 colocation 的表数据等等。 + +本小节主要介绍如何通过修改参数,来控制并管理集群的副本修复和均衡进度。 + +1. 删除损坏副本 + + 某些情况下,Doris 可能无法自动检测某些损坏的副本,从而导致查询或导入在损坏的副本上频繁报错。此时我们需要手动删除已损坏的副本。该方法可以适用于:删除版本数过高导致 -235 错误的副本、删除文件已损坏的副本等等。 + + 首先,找到副本对应的 tablet id,假设为 10001。通过 `show tablet 10001;` 并执行其中的 `show proc` 语句可以查看对应的 tablet 的各个副本详情。 + + 假设需要删除的副本的 backend id 是 20001。则执行以下语句将副本标记为 `bad`: + + ``` + ADMIN SET REPLICA STATUS PROPERTIES("tablet_id" = "10001", "backend_id" = "20001", "status" = "bad"); + ``` + + 此时,再次通过 `show proc` 语句可以看到对应的副本的 `IsBad` 列值为 `true`。 + + 被标记为 `bad` 的副本不会再参与导入和查询。同时副本修复逻辑会自动补充一个新的副本。 + +2. 优先修复某个表或分区 + + `help admin repair table;` 查看帮助。该命令会尝试优先修复指定表或分区的tablet。 + +3. 停止均衡任务 + + 均衡任务会占用一定的网络带宽和IO资源。如果希望停止新的均衡任务的产生,可以通过以下命令: + + ``` + ADMIN SET FRONTEND CONFIG ("disable_balance" = "true"); + ``` + +4. 停止所有副本调度任务 + + 副本调度任务包括均衡和修复任务。这些任务都会占用一定的网络带宽和IO资源。可以通过以下命令停止所有副本调度任务(不包括已经在运行的,包括 colocation 表和普通表): + + ``` + ADMIN SET FRONTEND CONFIG ("disable_tablet_scheduler" = "true"); + ``` + +5. 停止所有 colocation 表的副本调度任务。 + + colocation 表的副本调度和普通表是分开独立运行的。某些情况下,用户可能希望先停止对 colocation 表的均衡和修复工作,而将集群资源用于普通表的修复,则可以通过以下命令: + + ``` + ADMIN SET FRONTEND CONFIG ("disable_colocate_balance" = "true"); + ``` + +6. 使用更保守的策略修复副本 + + Doris 在检测到副本缺失、BE宕机等情况下,会自动修复副本。但为了减少一些抖动导致的错误(如BE短暂宕机),Doris 会延迟触发这些任务。 + + * `tablet_repair_delay_factor_second` 参数。默认 60 秒。根据修复任务优先级的不同,会推迟 60秒、120秒、180秒后开始触发修复任务。可以通过以下命令延长这个时间,这样可以容忍更长的异常时间,以避免触发不必要的修复任务: + + ``` + ADMIN SET FRONTEND CONFIG ("tablet_repair_delay_factor_second" = "120"); + ``` + +7. 使用更保守的策略触发 colocation group 的重分布 + + colocation group 的重分布可能伴随着大量的 tablet 迁移。`colocate_group_relocate_delay_second` 用于控制重分布的触发延迟。默认 1800秒。如果某台 BE 节点可能长时间下线,可以尝试调大这个参数,以避免不必要的重分布: + + ``` + ADMIN SET FRONTEND CONFIG ("colocate_group_relocate_delay_second" = "3600"); + ``` + +8. 更快速的副本均衡 + + Doris 的副本均衡逻辑会先增加一个正常副本,然后在删除老的副本,已达到副本迁移的目的。而在删除老副本时,Doris会等待这个副本上已经开始执行的导入任务完成,以避免均衡任务影响导入任务。但这样会降低均衡逻辑的执行速度。此时可以通过修改以下参数,让 Doris 忽略这个等待,直接删除老副本: + + ``` + ADMIN SET FRONTEND CONFIG ("enable_force_drop_redundant_replica" = "true"); + ``` + + 这种操作可能会导致均衡期间部分导入任务失败(需要重试),但会显著加速均衡速度。 + +总体来讲,当我们需要将集群快速恢复到正常状态时,可以考虑按照以下思路处理: + +1. 找到导致高优任务报错的tablet,将有问题的副本置为 bad。 +2. 通过 `admin repair` 语句高优修复某些表。 +3. 停止副本均衡逻辑以避免占用集群资源,等集群恢复后,再开启即可。 +4. 使用更保守的策略触发修复任务,以应对 BE 频繁宕机导致的雪崩效应。 +5. 按需关闭 colocation 表的调度任务,集中集群资源修复其他高优数据。 diff --git a/fe/fe-core/src/main/java/org/apache/doris/catalog/Tablet.java b/fe/fe-core/src/main/java/org/apache/doris/catalog/Tablet.java index ffc5ba6..a3d8967 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/catalog/Tablet.java +++ b/fe/fe-core/src/main/java/org/apache/doris/catalog/Tablet.java @@ -64,7 +64,7 @@ public class Tablet extends MetaObject implements Writable { REPLICA_MISSING_IN_CLUSTER, // not enough healthy replicas in correct cluster. REPLICA_MISSING_FOR_TAG, // not enough healthy replicas in backend with specified tag. FORCE_REDUNDANT, // some replica is missing or bad, but there is no other backends for repair, - // at least one replica has to be deleted first to make room for new replica. + // at least one replica has to be deleted first to make room for new replica. COLOCATE_MISMATCH, // replicas do not all locate in right colocate backends set. COLOCATE_REDUNDANT, // replicas match the colocate backends set, but redundant. NEED_FURTHER_REPAIR, // one of replicas need a definite repair. @@ -86,28 +86,28 @@ public class Tablet extends MetaObject implements Writable { // last time that the tablet checker checks this tablet. // no need to persist private long lastStatusCheckTime = -1; - + public Tablet() { this(0L, new ArrayList<>()); } - + public Tablet(long tabletId) { this(tabletId, new ArrayList<>()); } - + public Tablet(long tabletId, List<Replica> replicas) { this.id = tabletId; this.replicas = replicas; if (this.replicas == null) { this.replicas = new ArrayList<>(); } - + checkedVersion = -1L; checkedVersionHash = -1L; isConsistent = true; } - + public void setIdForRestore(long tabletId) { this.id = tabletId; } @@ -115,7 +115,7 @@ public class Tablet extends MetaObject implements Writable { public long getId() { return this.id; } - + public long getCheckedVersion() { return this.checkedVersion; } @@ -163,7 +163,7 @@ public class Tablet extends MetaObject implements Writable { } } } - + public void addReplica(Replica replica) { addReplica(replica, false); } @@ -171,7 +171,7 @@ public class Tablet extends MetaObject implements Writable { public List<Replica> getReplicas() { return this.replicas; } - + public Set<Long> getBackendIds() { Set<Long> beIds = Sets.newHashSet(); for (Replica replica : replicas) { @@ -180,7 +180,6 @@ public class Tablet extends MetaObject implements Writable { return beIds; } - // for loading data public List<Long> getNormalReplicaBackendIds() { List<Long> beIds = Lists.newArrayList(); SystemInfoService infoService = Catalog.getCurrentSystemInfo(); @@ -188,7 +187,7 @@ public class Tablet extends MetaObject implements Writable { if (replica.isBad()) { continue; } - + ReplicaState state = replica.getState(); if (infoService.checkBackendAlive(replica.getBackendId()) && state.canLoad()) { beIds.add(replica.getBackendId()); @@ -198,6 +197,7 @@ public class Tablet extends MetaObject implements Writable { } // return map of (BE id -> path hash) of normal replicas + // for load plan. public Multimap<Long, Long> getNormalReplicaBackendPathMap() { Multimap<Long, Long> map = HashMultimap.create(); SystemInfoService infoService = Catalog.getCurrentSystemInfo(); @@ -246,7 +246,7 @@ public class Tablet extends MetaObject implements Writable { } return null; } - + public Replica getReplicaByBackendId(long backendId) { for (Replica replica : replicas) { if (replica.getBackendId() == backendId) { @@ -255,7 +255,7 @@ public class Tablet extends MetaObject implements Writable { } return null; } - + public boolean deleteReplica(Replica replica) { if (replicas.contains(replica)) { replicas.remove(replica); @@ -264,7 +264,7 @@ public class Tablet extends MetaObject implements Writable { } return false; } - + public boolean deleteReplicaByBackendId(long backendId) { Iterator<Replica> iterator = replicas.iterator(); while (iterator.hasNext()) { @@ -277,7 +277,7 @@ public class Tablet extends MetaObject implements Writable { } return false; } - + @Deprecated public Replica deleteReplicaById(long replicaId) { Iterator<Replica> iterator = replicas.iterator(); @@ -297,7 +297,7 @@ public class Tablet extends MetaObject implements Writable { public void clearReplica() { this.replicas.clear(); } - + public void setTabletId(long tabletId) { this.id = tabletId; } @@ -306,7 +306,7 @@ public class Tablet extends MetaObject implements Writable { // sort replicas by version. higher version in the tops replicas.sort(Replica.VERSION_DESC_COMPARATOR); } - + @Override public String toString() { return "tabletId=" + this.id; @@ -327,6 +327,7 @@ public class Tablet extends MetaObject implements Writable { out.writeLong(checkedVersionHash); out.writeBoolean(isConsistent); } + @Override public void readFields(DataInput in) throws IOException { super.readFields(in); @@ -346,13 +347,13 @@ public class Tablet extends MetaObject implements Writable { isConsistent = in.readBoolean(); } } - + public static Tablet read(DataInput in) throws IOException { Tablet tablet = new Tablet(); tablet.readFields(in); return tablet; } - + @Override public boolean equals(Object obj) { if (this == obj) { @@ -361,9 +362,9 @@ public class Tablet extends MetaObject implements Writable { if (!(obj instanceof Tablet)) { return false; } - + Tablet tablet = (Tablet) obj; - + if (replicas != tablet.replicas) { if (replicas.size() != tablet.replicas.size()) { return false; @@ -554,10 +555,10 @@ public class Tablet extends MetaObject implements Writable { * 1. Mismatch: * backends set: 1,2,3 * tablet replicas: 1,2,5 - * + * * backends set: 1,2,3 * tablet replicas: 1,2 - * + * * backends set: 1,2,3 * tablet replicas: 1,2,4,5 * diff --git a/fe/fe-core/src/main/java/org/apache/doris/clone/ColocateTableCheckerAndBalancer.java b/fe/fe-core/src/main/java/org/apache/doris/clone/ColocateTableCheckerAndBalancer.java index 36f473e..8e9d81f 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/clone/ColocateTableCheckerAndBalancer.java +++ b/fe/fe-core/src/main/java/org/apache/doris/clone/ColocateTableCheckerAndBalancer.java @@ -68,6 +68,7 @@ public class ColocateTableCheckerAndBalancer extends MasterDaemon { } private static volatile ColocateTableCheckerAndBalancer INSTANCE = null; + public static ColocateTableCheckerAndBalancer getInstance() { if (INSTANCE == null) { synchronized (ColocateTableCheckerAndBalancer.class) { @@ -87,7 +88,7 @@ public class ColocateTableCheckerAndBalancer extends MasterDaemon { * and after all unavailable has been replaced, balance the group * * 2. Match group: - * If replica mismatch backends in a group, that group will be marked as unstable, and pass that + * If replica mismatch backends in a group, that group will be marked as unstable, and pass that * tablet to TabletScheduler. * Otherwise, mark the group as stable */ @@ -223,7 +224,8 @@ public class ColocateTableCheckerAndBalancer extends MasterDaemon { } String unstableReason = null; - OUT: for (Long tableId : tableIds) { + OUT: + for (Long tableId : tableIds) { OlapTable olapTable = (OlapTable) db.getTableNullable(tableId); if (olapTable == null || !colocateIndex.isColocateTable(olapTable.getId())) { continue; @@ -249,7 +251,7 @@ public class ColocateTableCheckerAndBalancer extends MasterDaemon { unstableReason = String.format("get unhealthy tablet %d in colocate table. status: %s", tablet.getId(), st); LOG.debug(unstableReason); - if (!tablet.readyToBeRepaired(Priority.HIGH)) { + if (!tablet.readyToBeRepaired(Priority.NORMAL)) { continue; } @@ -260,8 +262,7 @@ public class ColocateTableCheckerAndBalancer extends MasterDaemon { System.currentTimeMillis()); // the tablet status will be set again when being scheduled tabletCtx.setTabletStatus(st); - // using HIGH priority, cause we want to stabilize the colocate group as soon as possible - tabletCtx.setOrigPriority(Priority.HIGH); + tabletCtx.setOrigPriority(Priority.NORMAL); tabletCtx.setTabletOrderIdx(idx); AddResult res = tabletScheduler.addTablet(tabletCtx, false /* not force */); @@ -299,7 +300,7 @@ public class ColocateTableCheckerAndBalancer extends MasterDaemon { * TagA B C A B * TagB D D D D * - * First, we will hanlde resource group of TagA, then TagB. + * First, we will handle resource group of TagA, then TagB. * * For a single resource group, the balance logic is as follow * (Suppose there is only one resource group with 3 replicas): @@ -505,7 +506,7 @@ public class ColocateTableCheckerAndBalancer extends MasterDaemon { .stream() .sorted((entry1, entry2) -> { if (!entry1.getValue().equals(entry2.getValue())) { - return (int)(entry2.getValue() - entry1.getValue()); + return (int) (entry2.getValue() - entry1.getValue()); } BackendLoadStatistic beStat1 = statistic.getBackendLoadStatistic(entry1.getKey()); BackendLoadStatistic beStat2 = statistic.getBackendLoadStatistic(entry2.getKey()); @@ -544,7 +545,7 @@ public class ColocateTableCheckerAndBalancer extends MasterDaemon { Set<Long> backends = colocateIndex.getBackendsByGroup(groupId, tag); Set<Long> unavailableBeIds = Sets.newHashSet(); for (Long backendId : backends) { - if (!checkBackendAvailable(backendId, tag, Sets.newHashSet(), infoService)) { + if (!checkBackendAvailable(backendId, tag, Sets.newHashSet(), infoService, Config.colocate_group_relocate_delay_second)) { unavailableBeIds.add(backendId); } } @@ -557,7 +558,7 @@ public class ColocateTableCheckerAndBalancer extends MasterDaemon { List<Long> allBackendIds = infoService.getClusterBackendIds(cluster, false); List<Long> availableBeIds = Lists.newArrayList(); for (Long backendId : allBackendIds) { - if (checkBackendAvailable(backendId, tag, excludedBeIds, infoService)) { + if (checkBackendAvailable(backendId, tag, excludedBeIds, infoService, Config.colocate_group_relocate_delay_second)) { availableBeIds.add(backendId); } } @@ -566,9 +567,10 @@ public class ColocateTableCheckerAndBalancer extends MasterDaemon { /** * check backend available - * backend stopped for a short period of time is still considered available + * backend stopped within "delaySecond" is still considered available */ - private boolean checkBackendAvailable(Long backendId, Tag tag, Set<Long> excludedBeIds, SystemInfoService infoService) { + private boolean checkBackendAvailable(Long backendId, Tag tag, Set<Long> excludedBeIds, + SystemInfoService infoService, long delaySecond) { long currTime = System.currentTimeMillis(); Backend be = infoService.getBackend(backendId); if (be == null) { @@ -576,9 +578,9 @@ public class ColocateTableCheckerAndBalancer extends MasterDaemon { } else if (!be.getTag().equals(tag) || excludedBeIds.contains(be.getId())) { return false; } else if (!be.isScheduleAvailable()) { - // 1. BE is dead for a long time + // 1. BE is dead longer than "delaySecond" // 2. BE is under decommission - if ((!be.isAlive() && (currTime - be.getLastUpdateMs()) > Config.tablet_repair_delay_factor_second * 1000 * 2) + if ((!be.isAlive() && (currTime - be.getLastUpdateMs()) > delaySecond * 1000L) || be.isDecommissioned()) { return false; } diff --git a/fe/fe-core/src/main/java/org/apache/doris/clone/TabletChecker.java b/fe/fe-core/src/main/java/org/apache/doris/clone/TabletChecker.java index 3a01e83..f7e4f3c 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/clone/TabletChecker.java +++ b/fe/fe-core/src/main/java/org/apache/doris/clone/TabletChecker.java @@ -70,7 +70,7 @@ public class TabletChecker extends MasterDaemon { private TabletScheduler tabletScheduler; private TabletSchedulerStat stat; - HashMap<String, AtomicLong> tabletCountByStatus = new HashMap<String, AtomicLong>(){{ + HashMap<String, AtomicLong> tabletCountByStatus = new HashMap<String, AtomicLong>() {{ put("total", new AtomicLong(0L)); put("unhealthy", new AtomicLong(0L)); put("added", new AtomicLong(0L)); @@ -81,7 +81,7 @@ public class TabletChecker extends MasterDaemon { // db id -> (tbl id -> PrioPart) // priority of replicas of partitions in this table will be set to VERY_HIGH if not healthy private com.google.common.collect.Table<Long, Long, Set<PrioPart>> prios = HashBasedTable.create(); - + // represent a partition which need to be repaired preferentially public static class PrioPart { public long partId; @@ -125,7 +125,7 @@ public class TabletChecker extends MasterDaemon { } public TabletChecker(Catalog catalog, SystemInfoService infoService, TabletScheduler tabletScheduler, - TabletSchedulerStat stat) { + TabletSchedulerStat stat) { super("tablet checker", FeConstants.tablet_checker_interval_ms); this.catalog = catalog; this.infoService = infoService; @@ -187,7 +187,6 @@ public class TabletChecker extends MasterDaemon { tblMap.remove(repairTabletInfo.tblId); } } - } /* @@ -405,7 +404,7 @@ public class TabletChecker extends MasterDaemon { if (prioPartIsHealthy && isInPrios) { // if all replicas in this partition are healthy, remove this partition from // priorities. - LOG.debug("partition is healthy, remove from prios: {}-{}-{}", + LOG.info("partition is healthy, remove from prios: {}-{}-{}", db.getId(), tbl.getId(), partition.getId()); removePrios(new RepairTabletInfo(db.getId(), tbl.getId(), Lists.newArrayList(partition.getId()))); diff --git a/fe/fe-core/src/main/java/org/apache/doris/clone/TabletScheduler.java b/fe/fe-core/src/main/java/org/apache/doris/clone/TabletScheduler.java index c941817..1215da2 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/clone/TabletScheduler.java +++ b/fe/fe-core/src/main/java/org/apache/doris/clone/TabletScheduler.java @@ -371,6 +371,10 @@ public class TabletScheduler extends MasterDaemon { AgentBatchTask batchTask = new AgentBatchTask(); for (TabletSchedCtx tabletCtx : currentBatch) { try { + if (Config.disable_tablet_scheduler) { + // do not schedule more tablet is tablet scheduler is disabled. + throw new SchedException(Status.FINISHED, "tablet scheduler is disabled"); + } scheduleTablet(tabletCtx, batchTask); } catch (SchedException e) { tabletCtx.increaseFailedSchedCounter(); @@ -403,7 +407,7 @@ public class TabletScheduler extends MasterDaemon { dynamicAdjustPrioAndAddBackToPendingTablets(tabletCtx, e.getMessage()); } } else if (e.getStatus() == Status.FINISHED) { - // schedule redundant tablet will throw this exception + // schedule redundant tablet or scheduler disabled will throw this exception stat.counterTabletScheduledSucceeded.incrementAndGet(); finalizeTabletCtx(tabletCtx, TabletSchedCtx.State.FINISHED, e.getMessage()); } else { @@ -1310,7 +1314,6 @@ public class TabletScheduler extends MasterDaemon { LOG.info("remove the tablet {}. because: {}", tabletCtx.getTabletId(), reason); } - // get next batch of tablets from queue. private synchronized List<TabletSchedCtx> getNextTabletCtxBatch() { List<TabletSchedCtx> list = Lists.newArrayList(); diff --git a/fe/fe-core/src/main/java/org/apache/doris/common/Config.java b/fe/fe-core/src/main/java/org/apache/doris/common/Config.java index 4e0ab78..3194d1a 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/common/Config.java +++ b/fe/fe-core/src/main/java/org/apache/doris/common/Config.java @@ -1561,4 +1561,25 @@ public class Config extends ConfigBase { */ @ConfField(mutable = true, masterOnly = true) public static boolean repair_slow_replica = true; + + /* + * The relocation of a colocation group may involve a large number of tablets moving within the cluster. + * Therefore, we should use a more conservative strategy to avoid relocation of colocation groups as much as possible. + * Reloaction usually occurs after a BE node goes offline or goes down. + * This parameter is used to delay the determination of BE node unavailability. + * The default is 30 minutes, i.e., if a BE node recovers within 30 minutes, relocation of the colocation group + * will not be triggered. + */ + @ConfField(mutable = true, masterOnly = true) + public static long colocate_group_relocate_delay_second = 1800; // 30 min + + /* + * If set to true, when creating table, Doris will allow to locate replicas of a tablet + * on same host. And also the tablet repair and balance will be disabled. + * This is only for local test, so that we can deploy multi BE on same host and create table + * with multi replicas. + * DO NOT use it for production env. + */ + @ConfField + public static boolean allow_replica_on_same_host = false; } diff --git a/fe/fe-core/src/main/java/org/apache/doris/system/Backend.java b/fe/fe-core/src/main/java/org/apache/doris/system/Backend.java index 4ae145f..8fa2401 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/system/Backend.java +++ b/fe/fe-core/src/main/java/org/apache/doris/system/Backend.java @@ -104,7 +104,7 @@ public class Backend implements Writable { private volatile ImmutableMap<String, DiskInfo> disksRef; private String heartbeatErrMsg = ""; - + // This is used for the first time we init pathHashToDishInfo in SystemInfoService. // after init it, this variable is set to true. private boolean initPathInfo = false; @@ -194,19 +194,25 @@ public class Backend implements Writable { return heartbeatErrMsg; } - public long getLastStreamLoadTime() { return this.backendStatus.lastStreamLoadTime; } + public long getLastStreamLoadTime() { + return this.backendStatus.lastStreamLoadTime; + } public void setLastStreamLoadTime(long lastStreamLoadTime) { this.backendStatus.lastStreamLoadTime = lastStreamLoadTime; } - public boolean isQueryDisabled() { return backendStatus.isQueryDisabled; } + public boolean isQueryDisabled() { + return backendStatus.isQueryDisabled; + } public void setQueryDisabled(boolean isQueryDisabled) { this.backendStatus.isQueryDisabled = isQueryDisabled; } - public boolean isLoadDisabled() {return backendStatus.isLoadDisabled; } + public boolean isLoadDisabled() { + return backendStatus.isLoadDisabled; + } public void setLoadDisabled(boolean isLoadDisabled) { this.backendStatus.isLoadDisabled = isLoadDisabled; @@ -319,7 +325,7 @@ public class Backend implements Writable { /** * backend belong to some cluster - * + * * @return */ public boolean isUsedByCluster() { @@ -328,7 +334,7 @@ public class Backend implements Writable { /** * backend is free, and it isn't belong to any cluster - * + * * @return */ public boolean isFreeFromCluster() { @@ -338,7 +344,7 @@ public class Backend implements Writable { /** * backend execute discommission in cluster , and backendState will be free * finally - * + * * @return */ public boolean isOffLineFromCluster() { @@ -609,7 +615,7 @@ public class Backend implements Writable { @Override public String toString() { return "Backend [id=" + id + ", host=" + host + ", heartbeatPort=" + heartbeatPort + ", alive=" + isAlive.get() - + "]"; + + ", tag: " + tag + "]"; } public String getOwnerClusterName() { @@ -619,7 +625,7 @@ public class Backend implements Writable { public void setOwnerClusterName(String name) { ownerClusterName = name; } - + public void clearClusterName() { ownerClusterName = ""; } @@ -638,7 +644,7 @@ public class Backend implements Writable { public void setDecommissionType(DecommissionType type) { decommissionType = type.ordinal(); } - + public DecommissionType getDecommissionType() { if (decommissionType == DecommissionType.ClusterDecommission.ordinal()) { return DecommissionType.ClusterDecommission; diff --git a/fe/fe-core/src/main/java/org/apache/doris/system/SystemInfoService.java b/fe/fe-core/src/main/java/org/apache/doris/system/SystemInfoService.java index 726cb18..3da10a4 100644 --- a/fe/fe-core/src/main/java/org/apache/doris/system/SystemInfoService.java +++ b/fe/fe-core/src/main/java/org/apache/doris/system/SystemInfoService.java @@ -24,6 +24,7 @@ import org.apache.doris.catalog.DiskInfo; import org.apache.doris.catalog.ReplicaAllocation; import org.apache.doris.cluster.Cluster; import org.apache.doris.common.AnalysisException; +import org.apache.doris.common.Config; import org.apache.doris.common.DdlException; import org.apache.doris.common.FeConstants; import org.apache.doris.common.FeMetaVersion; @@ -113,7 +114,7 @@ public class SystemInfoService { private volatile ImmutableMap<Long, DiskInfo> pathHashToDishInfoRef; // sort host backends list by num of backends, descending - private static final Comparator<List<Backend>> hostBackendsListComparator = new Comparator<List<Backend>> (){ + private static final Comparator<List<Backend>> hostBackendsListComparator = new Comparator<List<Backend>>() { @Override public int compare(List<Backend> list1, List<Backend> list2) { if (list1.size() > list2.size()) { @@ -125,8 +126,8 @@ public class SystemInfoService { }; public SystemInfoService() { - idToBackendRef = ImmutableMap.<Long, Backend> of(); - idToReportVersionRef = ImmutableMap.<Long, AtomicLong> of(); + idToBackendRef = ImmutableMap.<Long, Backend>of(); + idToReportVersionRef = ImmutableMap.<Long, AtomicLong>of(); lastBackendIdForCreationMap = new ConcurrentHashMap<String, Long>(); lastBackendIdForOtherMap = new ConcurrentHashMap<String, Long>(); @@ -272,9 +273,9 @@ public class SystemInfoService { // only for test public void dropAllBackend() { // update idToBackend - idToBackendRef = ImmutableMap.<Long, Backend> of(); + idToBackendRef = ImmutableMap.<Long, Backend>of(); // update idToReportVersion - idToReportVersionRef = ImmutableMap.<Long, AtomicLong> of(); + idToReportVersionRef = ImmutableMap.<Long, AtomicLong>of(); } public Backend getBackend(long backendId) { @@ -384,8 +385,8 @@ public class SystemInfoService { public List<Long> createCluster(String clusterName, int instanceNum) { final List<Long> chosenBackendIds = Lists.newArrayList(); final Map<String, List<Backend>> hostBackendsMap = getHostBackendsMap(true /* need alive*/, - true /* need free */, - false /* can not be in decommission*/); + true /* need free */, + false /* can not be in decommission*/); LOG.info("begin to create cluster {} with instance num: {}", clusterName, instanceNum); int availableBackendsCount = 0; @@ -541,8 +542,8 @@ public class SystemInfoService { ImmutableMap<Long, Backend> idToBackends = idToBackendRef; // host -> backends final Map<String, List<Backend>> hostBackendsMap = getHostBackendsMap(true /* need alive*/, - true /* need free */, - false /* can not be in decommission */); + true /* need free */, + false /* can not be in decommission */); final List<Long> clusterBackends = getClusterBackendIds(clusterName); // hosts not in cluster @@ -585,7 +586,7 @@ public class SystemInfoService { hostIsEmpty[i] = false; } int numOfHost = hostsNotInCluster.size(); - for (int i = 0;; i = ++i % hostsNotInCluster.size()) { + for (int i = 0; ; i = ++i % hostsNotInCluster.size()) { if (hostsNotInCluster.get(i).size() > 0) { chosenBackendIds.add(hostsNotInCluster.get(i).remove(0).getId()); } else { @@ -608,7 +609,7 @@ public class SystemInfoService { hostIsEmpty[i] = false; } int numOfHost = hostsInCluster.size(); - for (int i = 0;; i = ++i % hostsInCluster.size()) { + for (int i = 0; ; i = ++i % hostsInCluster.size()) { if (hostsInCluster.get(i).size() > 0) { chosenBackendIds.add(hostsInCluster.get(i).remove(0).getId()); } else { @@ -680,7 +681,7 @@ public class SystemInfoService { if (needAlive) { for (Backend backend : copiedBackends.values()) { if (backend != null && name.equals(backend.getOwnerClusterName()) - && backend.isAlive()) { + && backend.isAlive()) { ret.add(backend); } } @@ -734,7 +735,7 @@ public class SystemInfoService { if (needAlive) { for (Backend backend : copiedBackends.values()) { if (backend != null && clusterName.equals(backend.getOwnerClusterName()) - && backend.isAlive()) { + && backend.isAlive()) { ret.add(backend.getId()); } } @@ -869,7 +870,7 @@ public class SystemInfoService { // if more than one backend exists in same host, select a backend at random List<Backend> backends = Lists.newArrayList(); for (List<Backend> list : backendMaps.values()) { - if (FeConstants.runningUnitTest) { + if (FeConstants.runningUnitTest || Config.allow_replica_on_same_host) { backends.addAll(list); } else { Collections.shuffle(list); @@ -1201,7 +1202,7 @@ public class SystemInfoService { * Check if the specified disks' capacity has reached the limit. * bePathsMap is (BE id -> list of path hash) * If floodStage is true, it will check with the floodStage threshold. - * + * * return Status.OK if not reach the limit */ public Status checkExceedDiskCapacityLimit(Multimap<Long, Long> bePathsMap, boolean floodStage) { @@ -1311,3 +1312,4 @@ public class SystemInfoService { } } + diff --git a/fe/fe-core/src/test/java/org/apache/doris/catalog/BackendTest.java b/fe/fe-core/src/test/java/org/apache/doris/catalog/BackendTest.java index c4f6913..460d8d6 100644 --- a/fe/fe-core/src/test/java/org/apache/doris/catalog/BackendTest.java +++ b/fe/fe-core/src/test/java/org/apache/doris/catalog/BackendTest.java @@ -100,7 +100,7 @@ public class BackendTest { // first update backend.updateDisks(diskInfos); Assert.assertEquals(disk1.getDiskTotalCapacity() + disk2.getDiskTotalCapacity(), - backend.getTotalCapacityB()); + backend.getTotalCapacityB()); Assert.assertEquals(1, backend.getAvailableCapacityB()); // second update @@ -118,7 +118,7 @@ public class BackendTest { File file = new File("./backendTest"); file.createNewFile(); DataOutputStream dos = new DataOutputStream(new FileOutputStream(file)); - + List<Backend> list1 = new LinkedList<Backend>(); List<Backend> list2 = new LinkedList<Backend>(); @@ -144,7 +144,7 @@ public class BackendTest { } dos.flush(); dos.close(); - + // 2. Read objects from file DataInputStream dis = new DataInputStream(new FileInputStream(file)); for (int count = 0; count < 200; ++count) { @@ -176,26 +176,26 @@ public class BackendTest { Assert.assertFalse(list1.get(1).equals(list1.get(2))); Assert.assertFalse(list1.get(1).equals(this)); Assert.assertTrue(list1.get(1).equals(list1.get(1))); - + Backend back1 = new Backend(1, "a", 1); back1.updateOnce(1, 1, 1); Backend back2 = new Backend(2, "a", 1); back2.updateOnce(1, 1, 1); Assert.assertFalse(back1.equals(back2)); - + back1 = new Backend(1, "a", 1); back1.updateOnce(1, 1, 1); back2 = new Backend(1, "b", 1); back2.updateOnce(1, 1, 1); Assert.assertFalse(back1.equals(back2)); - + back1 = new Backend(1, "a", 1); back1.updateOnce(1, 1, 1); back2 = new Backend(1, "a", 2); back2.updateOnce(1, 1, 1); Assert.assertFalse(back1.equals(back2)); - Assert.assertEquals("Backend [id=1, host=a, heartbeatPort=1, alive=true]", back1.toString()); + Assert.assertEquals("Backend [id=1, host=a, heartbeatPort=1, alive=true, tag: {\"location\" : \"default\"}]", back1.toString()); // 3. delete files dis.close(); diff --git a/fe/fe-core/src/test/java/org/apache/doris/clone/ColocateTableCheckerAndBalancerTest.java b/fe/fe-core/src/test/java/org/apache/doris/clone/ColocateTableCheckerAndBalancerTest.java index f1a17cc..b5f00c8 100644 --- a/fe/fe-core/src/test/java/org/apache/doris/clone/ColocateTableCheckerAndBalancerTest.java +++ b/fe/fe-core/src/test/java/org/apache/doris/clone/ColocateTableCheckerAndBalancerTest.java @@ -31,14 +31,14 @@ import org.apache.doris.resource.Tag; import org.apache.doris.system.Backend; import org.apache.doris.system.SystemInfoService; -import org.junit.Assert; -import org.junit.Before; -import org.junit.Test; - import com.google.common.collect.Lists; import com.google.common.collect.Maps; import com.google.common.collect.Sets; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + import java.util.HashSet; import java.util.List; import java.util.Map; @@ -306,7 +306,7 @@ public class ColocateTableCheckerAndBalancerTest { public final class FakeBackendLoadStatistic extends BackendLoadStatistic { public FakeBackendLoadStatistic(long beId, String clusterName, SystemInfoService infoService, - TabletInvertedIndex invertedIndex) { + TabletInvertedIndex invertedIndex) { super(beId, clusterName, Tag.DEFAULT_BACKEND_TAG, infoService, invertedIndex); } @@ -320,7 +320,7 @@ public class ColocateTableCheckerAndBalancerTest { public void testGetBeSeqIndexes() { List<Long> flatBackendsPerBucketSeq = Lists.newArrayList(1L, 2L, 2L, 3L, 4L, 2L); List<Integer> indexes = Deencapsulation.invoke(balancer, "getBeSeqIndexes", flatBackendsPerBucketSeq, 2L); - Assert.assertArrayEquals(new int[]{1, 2, 5}, indexes.stream().mapToInt(i->i).toArray()); + Assert.assertArrayEquals(new int[]{1, 2, 5}, indexes.stream().mapToInt(i -> i).toArray()); System.out.println("backend1 id is " + backend1.getId()); } @@ -331,7 +331,7 @@ public class ColocateTableCheckerAndBalancerTest { @Mocked Backend myBackend3, @Mocked Backend myBackend4, @Mocked Backend myBackend5 - ) { + ) { GroupId groupId = new GroupId(10000, 10001); Tag tag = Tag.DEFAULT_BACKEND_TAG; Set<Long> allBackendsInGroup = Sets.newHashSet(1L, 2L, 3L, 4L, 5L); @@ -363,7 +363,7 @@ public class ColocateTableCheckerAndBalancerTest { result = false; minTimes = 0; myBackend3.getLastUpdateMs(); - result = System.currentTimeMillis() - Config.tablet_repair_delay_factor_second * 1000 * 20; + result = System.currentTimeMillis() - (Config.colocate_group_relocate_delay_second + 20) * 1000; minTimes = 0; myBackend3.getTag(); result = Tag.DEFAULT_BACKEND_TAG; @@ -456,7 +456,7 @@ public class ColocateTableCheckerAndBalancerTest { result = false; minTimes = 0; myBackend3.getLastUpdateMs(); - result = System.currentTimeMillis() - Config.tablet_repair_delay_factor_second * 1000 * 20; + result = System.currentTimeMillis() - (Config.colocate_group_relocate_delay_second + 20) * 1000; minTimes = 0; myBackend3.getTag(); result = Tag.DEFAULT_BACKEND_TAG; diff --git a/fe/fe-core/src/test/java/org/apache/doris/clone/TabletRepairAndBalanceTest.java b/fe/fe-core/src/test/java/org/apache/doris/clone/TabletRepairAndBalanceTest.java index 65a9a24..71f6e0f 100644 --- a/fe/fe-core/src/test/java/org/apache/doris/clone/TabletRepairAndBalanceTest.java +++ b/fe/fe-core/src/test/java/org/apache/doris/clone/TabletRepairAndBalanceTest.java @@ -108,6 +108,7 @@ public class TabletRepairAndBalanceTest { FeConstants.runningUnitTest = true; FeConstants.tablet_checker_interval_ms = 1000; Config.tablet_repair_delay_factor_second = 1; + Config.colocate_group_relocate_delay_second = 1; // 5 backends: // 127.0.0.1 // 127.0.0.2 --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org