[ceph-users] librados behavior when some OSDs are unreachables

David DELON Tue, 28 Jan 2020 10:04:26 -0800

Hi, 

i had a problem with one application (seafile) which uses CEPH backend with 
librados. 
The corresponding pools are defined with size=3 and each object copy is on a 
different host. 
The cluster health is OK: all the monitors see all the hosts.


Now, a network problem just happens between my RADOS client and a single host. 
Then, when my application/client tries to access an object which is situed on 
the unreachable host (primary for the corresponding PG), 
it does not failover to another copy/host (and my application crashes later 
because after a while, with many requests, too many files are opened on Linux). 
Is it the normal behavior? My storage is resilient (great!) but not its 
access... 
If on the host, i stop the OSDs or change the affinity to zero, it solves, 
so it seems like the librados just check and trust the osdmap 
And doing a tcpdump show the client tries to access the same OSD without 
timeout. 

It can be easily reproduced with defining a netfilter rule on a host to drop 
packets coming from the client. 
Note: i am still on Luminous (both on lient and cluster sides). 

Thanks for reading. 

D. 



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] librados behavior when some OSDs are unreachables

Reply via email to