[jira] [Updated] (KUDU-3325) When wal is deleted, fault recovery and load balancing are abnormal

yejiabao_h (Jira) Wed, 06 Oct 2021 00:40:09 -0700


     [ 
https://issues.apache.org/jira/browse/KUDU-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


yejiabao_h updated KUDU-3325:
-----------------------------
     Attachment: image-2021-10-06-15-36-40-996.png
                 image-2021-10-06-15-36-53-813.png
                 image-2021-10-06-15-37-09-520.png
                 image-2021-10-06-15-37-24-776.png
                 image-2021-10-06-15-37-42-533.png
                 image-2021-10-06-15-37-54-782.png
                 image-2021-10-06-15-38-06-575.png
                 image-2021-10-06-15-38-17-388.png
                 image-2021-10-06-15-38-29-176.png
                 image-2021-10-06-15-38-39-852.png
                 image-2021-10-06-15-38-53-343.png
                 image-2021-10-06-15-39-03-296.png
    Component/s: consensus
    Description: 
h3. 1、using kudu leader step down to create multiple wal message
./kudu tablet leader_step_down  $MASTER_IP   1299f5a939d2453c83104a6db0cae3e7 
h4. wal

!image-2021-10-06-15-36-40-996.png!
h4. cmeta

!image-2021-10-06-15-36-53-813.png!
h3. 2、stop one of tserver to start tablet recovery，so that we can make 
opid_index flush to cmeta

!image-2021-10-06-15-37-09-520.png!
h4. wal

!image-2021-10-06-15-37-24-776.png!
h4. cmeta

!image-2021-10-06-15-37-42-533.png!
h3. 3、stop all tservers，and delete tablet wal

!image-2021-10-06-15-37-54-782.png!
h3. 4、start all tservers
we can see the index in wal starts counting from 1, but the opid_index recorded 
in cmeta is the value 20 which is before deleting wal
 
h4. wal

!image-2021-10-06-15-38-06-575.png!

 
h4. cmeta

!image-2021-10-06-15-38-17-388.png!

 
h3. 5、stop a tserver，trigger fault recovery

!image-2021-10-06-15-38-29-176.png!
when the leader recovery a replica, and master request change raft config to 
add the new replica to new raft config, leader replica while ignored because 
the opindex is smaller than that in cmeta.
 
h3. 6、delete all wals

!image-2021-10-06-15-38-39-852.png!
h3. 7、kudu cluster rebalance
./kudu cluster rebalance $MASTER_IP
!image-2021-10-06-15-38-53-343.png!

!image-2021-10-06-15-39-03-296.png!
rebalance is also failed when change raft config
        Summary: When wal is deleted, fault recovery and load balancing are 
abnormal  (was: when wal)

> When wal is deleted, fault recovery and load balancing are abnormal
> -------------------------------------------------------------------
>
>                 Key: KUDU-3325
>                 URL: https://issues.apache.org/jira/browse/KUDU-3325
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>            Reporter: yejiabao_h
>            Priority: Major
>         Attachments: image-2021-10-06-15-36-40-996.png, 
> image-2021-10-06-15-36-53-813.png, image-2021-10-06-15-37-09-520.png, 
> image-2021-10-06-15-37-24-776.png, image-2021-10-06-15-37-42-533.png, 
> image-2021-10-06-15-37-54-782.png, image-2021-10-06-15-38-06-575.png, 
> image-2021-10-06-15-38-17-388.png, image-2021-10-06-15-38-29-176.png, 
> image-2021-10-06-15-38-39-852.png, image-2021-10-06-15-38-53-343.png, 
> image-2021-10-06-15-39-03-296.png
>
>
> h3. 1、using kudu leader step down to create multiple wal message
> ./kudu tablet leader_step_down  $MASTER_IP   1299f5a939d2453c83104a6db0cae3e7 
> h4. wal
> !image-2021-10-06-15-36-40-996.png!
> h4. cmeta
> !image-2021-10-06-15-36-53-813.png!
> h3. 2、stop one of tserver to start tablet recovery，so that we can make 
> opid_index flush to cmeta
> !image-2021-10-06-15-37-09-520.png!
> h4. wal
> !image-2021-10-06-15-37-24-776.png!
> h4. cmeta
> !image-2021-10-06-15-37-42-533.png!
> h3. 3、stop all tservers，and delete tablet wal
> !image-2021-10-06-15-37-54-782.png!
> h3. 4、start all tservers
> we can see the index in wal starts counting from 1, but the opid_index 
> recorded in cmeta is the value 20 which is before deleting wal
>  
> h4. wal
> !image-2021-10-06-15-38-06-575.png!
>  
> h4. cmeta
> !image-2021-10-06-15-38-17-388.png!
>  
> h3. 5、stop a tserver，trigger fault recovery
> !image-2021-10-06-15-38-29-176.png!
> when the leader recovery a replica, and master request change raft config to 
> add the new replica to new raft config, leader replica while ignored because 
> the opindex is smaller than that in cmeta.
>  
> h3. 6、delete all wals
> !image-2021-10-06-15-38-39-852.png!
> h3. 7、kudu cluster rebalance
> ./kudu cluster rebalance $MASTER_IP
> !image-2021-10-06-15-38-53-343.png!
> !image-2021-10-06-15-39-03-296.png!
> rebalance is also failed when change raft config



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KUDU-3325) When wal is deleted, fault recovery and load balancing are abnormal

Reply via email to