[ 
https://issues.apache.org/jira/browse/HDDS-13023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gargi Jaiswal updated HDDS-13023:
---------------------------------
    Description: 
After the changes introduced in 
[HDDS-12233|https://github.com/apache/ozone/pull/7934], DiskBalancer is failing 
to move containers due to a checksum mismatch error during container import.

The root cause is a regression introduced in the 
*{{KeyValueContainer#importContainerData}}* method. The method was modified to 
call *{{KeyValueContainerUtil.parseKVContainerData(...)}} before* setting the 
container state and metadata (which includes the expected checksum).

Since the checksum is not yet populated in {{containerData}} at the time of 
parsing, the call to {{ContainerUtils.verifyChecksum(...)}} inside 
{{parseKVContainerData}} fails with the following error:


{code:java}
2025-04-10 15:58:41 2025-04-10 10:28:41,115 [DiskBalancerService#0] WARN 
diskbalancer.DiskBalancerService: Failed to move container 
KeyValueContainerData #8 (CLOSED, non-empty, ri=0, 
origin=[dn_b916c228-6ce7-4f89-8de3-b29b1d967af1, 
pipeline_efeea169-e8f7-43fb-9003-270fa058adb2]): {} 
2025-04-10 15:58:41 
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Container checksum error for ContainerID: 8. 2025-04-10 15:58:41 Stored 
Checksum: 2025-04-10 15:58:41 Expected Checksum: 
7a0ec671d9f43c8d6dd303961776f383ce767bfa2e365de7119d1c6c7d2c1359
 2025-04-10 15:58:41 at 
org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.verifyChecksum(ContainerUtils.java:214)
 2025-04-10 15:58:41 at 
org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.parseKVContainerData(KeyValueContainerUtil.java:210)
 2025-04-10 15:58:41 at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.importContainerData(KeyValueContainer.java:683)
 2025-04-10 15:58:41 at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.importContainerData(KeyValueContainer.java:708)
{code}


  was:
After the changes introduced in 
[HDDS-12233][https://github.com/apache/ozone/pull/7934], DiskBalancer is 
failing to move containers due to a checksum mismatch error during container 
import.

The root cause is a regression introduced in the 
*{{KeyValueContainer#importContainerData}}* method. The method was modified to 
call *{{KeyValueContainerUtil.parseKVContainerData(...)}} before* setting the 
container state and metadata (which includes the expected checksum).

Since the checksum is not yet populated in {{containerData}} at the time of 
parsing, the call to {{ContainerUtils.verifyChecksum(...)}} inside 
{{parseKVContainerData}} fails with the following error:

```2025-04-10 15:58:41 2025-04-10 10:28:41,115 [DiskBalancerService#0] WARN 
diskbalancer.DiskBalancerService: Failed to move container 
KeyValueContainerData #8 (CLOSED, non-empty, ri=0, 
origin=[dn_b916c228-6ce7-4f89-8de3-b29b1d967af1, 
pipeline_efeea169-e8f7-43fb-9003-270fa058adb2]): {} 2025-04-10 15:58:41 
org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: 
Container checksum error for ContainerID: 8. 2025-04-10 15:58:41 Stored 
Checksum: 2025-04-10 15:58:41 Expected Checksum: 
7a0ec671d9f43c8d6dd303961776f383ce767bfa2e365de7119d1c6c7d2c1359 2025-04-10 
15:58:41 at 
org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.verifyChecksum(ContainerUtils.java:214)
 2025-04-10 15:58:41 at 
org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.parseKVContainerData(KeyValueContainerUtil.java:210)
 2025-04-10 15:58:41 at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.importContainerData(KeyValueContainer.java:683)
 2025-04-10 15:58:41 at 
org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.importContainerData(KeyValueContainer.java:708)```


> [DiskBalancer] Fix DiskBalancer failure due to checksum mismatch during 
> container import
> ----------------------------------------------------------------------------------------
>
>                 Key: HDDS-13023
>                 URL: https://issues.apache.org/jira/browse/HDDS-13023
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Gargi Jaiswal
>            Assignee: Gargi Jaiswal
>            Priority: Major
>
> After the changes introduced in 
> [HDDS-12233|https://github.com/apache/ozone/pull/7934], DiskBalancer is 
> failing to move containers due to a checksum mismatch error during container 
> import.
> The root cause is a regression introduced in the 
> *{{KeyValueContainer#importContainerData}}* method. The method was modified 
> to call *{{KeyValueContainerUtil.parseKVContainerData(...)}} before* setting 
> the container state and metadata (which includes the expected checksum).
> Since the checksum is not yet populated in {{containerData}} at the time of 
> parsing, the call to {{ContainerUtils.verifyChecksum(...)}} inside 
> {{parseKVContainerData}} fails with the following error:
> {code:java}
> 2025-04-10 15:58:41 2025-04-10 10:28:41,115 [DiskBalancerService#0] WARN 
> diskbalancer.DiskBalancerService: Failed to move container 
> KeyValueContainerData #8 (CLOSED, non-empty, ri=0, 
> origin=[dn_b916c228-6ce7-4f89-8de3-b29b1d967af1, 
> pipeline_efeea169-e8f7-43fb-9003-270fa058adb2]): {} 
> 2025-04-10 15:58:41 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  Container checksum error for ContainerID: 8. 2025-04-10 15:58:41 Stored 
> Checksum: 2025-04-10 15:58:41 Expected Checksum: 
> 7a0ec671d9f43c8d6dd303961776f383ce767bfa2e365de7119d1c6c7d2c1359
>  2025-04-10 15:58:41 at 
> org.apache.hadoop.ozone.container.common.helpers.ContainerUtils.verifyChecksum(ContainerUtils.java:214)
>  2025-04-10 15:58:41 at 
> org.apache.hadoop.ozone.container.keyvalue.helpers.KeyValueContainerUtil.parseKVContainerData(KeyValueContainerUtil.java:210)
>  2025-04-10 15:58:41 at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.importContainerData(KeyValueContainer.java:683)
>  2025-04-10 15:58:41 at 
> org.apache.hadoop.ozone.container.keyvalue.KeyValueContainer.importContainerData(KeyValueContainer.java:708)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to