huyangg opened a new issue #8446:
URL: https://github.com/apache/incubator-doris/issues/8446
### 问题描述:
```
复制现有doris环境整个目录,在新环境启动导致原环境的be服务宕机。
```
### 问题复现的case:
```
复制现有doris环境整个目录,在新环境启动,观察原环境be状态。前提条件:新环境和原环境网络互通。
```
### Doris版本:
```
Palo version 0.14.13.1-Unknown
```
### Doris集群基本信息:
```
单节点和多节点环境。新环境在 fe.conf 中添加配置:metadata_failure_recovery=true。
```
### 异常信息:
```
dmesg -T无OOM信息。
原环境be alive状态为false ,ErrMsg为 epoch is not greater than local. ignore
heartbeat.
MySQL [(none)]> SHOW PROC '/backends';
+-----------+-----------------+---------------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+----------------------------------------------------+-------------------+-------------------------------------------------------------------------------------------+
| BackendId | Cluster | IP | HostName |
HeartbeatPort | BePort | HttpPort | BrpcPort | LastStartTime |
LastHeartbeat | Alive | SystemDecommissioned | ClusterDecommissioned |
TabletNum | DataUsedCapacity | AvailCapacity | TotalCapacity | UsedPct |
MaxDiskUsedPct | ErrMsg | Version
| Status
|
+-----------+-----------------+---------------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+----------------------------------------------------+-------------------+-------------------------------------------------------------------------------------------+
| 10003 | default_cluster | 172.16.1.201 | 172.16.1.201 | 9050
| 9060 | 8040 | 8060 | 2021-10-20 21:34:31 | 2022-03-10 15:27:53 |
false | false | false | 837 | 1.442 GB
| 138.925 GB | 191.024 GB | 27.27 % | 27.27 % | epoch is not
greater than local. ignore heartbeat. | 0.14.13.1-Unknown |
{"lastSuccessReportTabletsTime":"2022-03-10
15:27:22","lastStreamLoadTime":1645584570768} |
+-----------+-----------------+---------------+---------------+---------------+--------+----------+----------+---------------------+---------------------+-------+----------------------+-----------------------+-----------+------------------+---------------+---------------+---------+----------------+----------------------------------------------------+-------------------+-------------------------------------------------------------------------------------------+
be.info.log信息:
I0310 15:27:53.379778 3011 plan_fragment_executor.cpp:583] Close()
fragment_instance_id=65ac48000aed4ecc-9b947eb57686de25
I0310 15:27:54.298195 10114 heartbeat_server.cpp:58] get heartbeat from
FE.host:172.16.2.113, port:9020, cluster id:138756675, counter:2424613
I0310 15:27:54.298223 10114 heartbeat_server.cpp:120] master change. new
master host: 172.16.2.113. port: 9020. epoch: 8
I0310 15:27:54.298228 10114 heartbeat_server.cpp:166] Master FE is changed
or restarted. report tablet and disk info immediately
I0310 15:27:54.298241 10114 task_worker_pool.cpp:258] notify task worker
pool: TaskWorkerPool.REPORT_DISK_STATE
I0310 15:27:54.298250 10114 task_worker_pool.cpp:258] notify task worker
pool: TaskWorkerPool.REPORT_OLAP_TABLE
I0310 15:27:54.298363 3128 data_dir.cpp:837] path:
/root/DORIS-0.14.7-release/be/storage total capacity: 1064086802432, available
capacity: 953151504384
I0310 15:27:54.299175 3129 tablet_manager.cpp:880] begin to build all
report tablets info
I0310 15:27:54.299291 3129 tablet_manager.cpp:885] find expired
transactions for 0 tablets
I0310 15:27:54.299764 3128 storage_engine.cpp:373] get root path info cost:
1 ms. tablet counter: 2087
I0310 15:27:54.300318 10115 backend_service.cpp:325] get_batch
stream_load_record rocksdb successfully. records size: 0,
last_stream_load_timestamp: 1645584507086
I0310 15:27:54.305917 3129 tablet_manager.cpp:922] success to build all
report tablets info. tablet_count=2087
I0310 15:27:54.353857 3128 task_worker_pool.cpp:1587] finish report DISK.
master host: 172.16.2.113, port: 9020
I0310 15:27:54.361510 3129 task_worker_pool.cpp:1587] finish report TABLET.
master host: 172.16.2.113, port: 9020
I0310 15:27:57.650785 3127 task_worker_pool.cpp:1587] finish report TASK.
master host: 172.16.2.113, port: 9020
I0310 15:27:57.753644 3063 storage_engine.cpp:625] start trash and snapshot
sweep.
I0310 15:27:57.755581 3063 storage_engine.cpp:373] get root path info cost:
1 ms. tablet counter: 2087
I0310 15:27:57.755627 3063 storage_engine.cpp:647] Start to sweep path
/root/DORIS-0.14.7-release/be/storage
W0310 15:27:58.920964 3247 heartbeat_server.cpp:125] epoch is not greater
than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8
received epoch: 7
W0310 15:28:03.929098 3247 heartbeat_server.cpp:125] epoch is not greater
than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8
received epoch: 7
I0310 15:28:07.652289 3127 task_worker_pool.cpp:1587] finish report TASK.
master host: 172.16.2.113, port: 9020
W0310 15:28:08.936048 3247 heartbeat_server.cpp:125] epoch is not greater
than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8
received epoch: 7
W0310 15:28:13.945868 3247 heartbeat_server.cpp:125] epoch is not greater
than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8
received epoch: 7
I0310 15:28:17.653152 3127 task_worker_pool.cpp:1587] finish report TASK.
master host: 172.16.2.113, port: 9020
I0310 15:28:18.727761 3061 load_channel_mgr.cpp:241] cleaning timed out
load channels
I0310 15:28:18.727794 3061 load_channel_mgr.cpp:274] load mem
consumption(bytes). limit: 86418309775, current: 0, peak: 1241120388
W0310 15:28:18.952822 3247 heartbeat_server.cpp:125] epoch is not greater
than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8
received epoch: 7
W0310 15:28:23.958277 3247 heartbeat_server.cpp:125] epoch is not greater
than local. ignore heartbeat. host: 172.16.2.113 port: 9020 local epoch: 8
received epoch: 7
```
### 解决方案(社区技术人员或者其他用户给出的回复解决方案)
```
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]