[ 
https://issues.apache.org/jira/browse/HDDS-11261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Juncevich updated HDDS-11261:
----------------------------------
    Description: 
How to reproduce:
 # Create cluster with 9 datanodes
 # On datanode create volume (ozone sh volume create /data)
 # Create bucket with EC replication rs-6-3-1024k (ozone sh bucket create 
data/test-bucket --type EC --replication rs-6-3-1024k)
 # Create file e.x. 50 MB (fallocate -l 10M small_file_1)
 # Put file to bucket (ozone sh key put data/test-bucket/small_file_1 
small_file_1 --type EC --replication rs-6-3-1024k)
 # Disable 4 nodes
 # Try to get file from bucket (ozone sh key get /data/test-bucket/small_file_1 
/tmp/sm_1_1)
 # You will get "There are insufficient datanodes to read the EC block". It's 
ok, nodes amount should be at least 6.
 # Enable 1 node and as fast as possible try to get file.
 # You will get "There are insufficient datanodes to read the EC block". It is 
not ok, nodes now 6.
 # You can try get file from minute later and get this error again.

I reproduced it via docker-compose. With fixed nodes ip addresses (it's 
important, because docker compose can change ip addresses if not fix).

Why it happened? Command getKey in Ozone Manager has cache. And this cache in 
this case is not actual. When we try to get file again and again OM return for 
us list of 5 nodes, instead of 6.

I solved it by refresh cache and recreate Input and Output streams.

  was:
How to reproduce:
 # Create cluster with 9 datanodes
 # On datanode create volume
 # Create bucket with EC replication rs-6-3-1024k
 # Create file e.x. 50 MB
 # Put file to bucket
 # Disable 4 nodes
 # Try to get file from bucket
 # You will get "There are insufficient datanodes to read the EC block". It's 
ok, nodes amount should be at least 6.
 # Enable 1 node and as fast as possible try to get file.
 # You will get "There are insufficient datanodes to read the EC block". It is 
not ok, nodes now 6.
 # You can try get file from minute later and get this error again.

Why it happened? Command getKey in Ozone Manager has cache. And this cache in 
this case is not actual. When we try to get file again and again OM return for 
us list of 5 nodes, instead of 6.

I solved it by refresh cache and recreate Input and Output streams.


> Get key answer with "There are insufficient datanodes to read the EC block" 
> even nodes amount is sufficient
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDDS-11261
>                 URL: https://issues.apache.org/jira/browse/HDDS-11261
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Alex Juncevich
>            Assignee: Alex Juncevich
>            Priority: Major
>
> How to reproduce:
>  # Create cluster with 9 datanodes
>  # On datanode create volume (ozone sh volume create /data)
>  # Create bucket with EC replication rs-6-3-1024k (ozone sh bucket create 
> data/test-bucket --type EC --replication rs-6-3-1024k)
>  # Create file e.x. 50 MB (fallocate -l 10M small_file_1)
>  # Put file to bucket (ozone sh key put data/test-bucket/small_file_1 
> small_file_1 --type EC --replication rs-6-3-1024k)
>  # Disable 4 nodes
>  # Try to get file from bucket (ozone sh key get 
> /data/test-bucket/small_file_1 /tmp/sm_1_1)
>  # You will get "There are insufficient datanodes to read the EC block". It's 
> ok, nodes amount should be at least 6.
>  # Enable 1 node and as fast as possible try to get file.
>  # You will get "There are insufficient datanodes to read the EC block". It 
> is not ok, nodes now 6.
>  # You can try get file from minute later and get this error again.
> I reproduced it via docker-compose. With fixed nodes ip addresses (it's 
> important, because docker compose can change ip addresses if not fix).
> Why it happened? Command getKey in Ozone Manager has cache. And this cache in 
> this case is not actual. When we try to get file again and again OM return 
> for us list of 5 nodes, instead of 6.
> I solved it by refresh cache and recreate Input and Output streams.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to