Hello,
We use python librados bindings for object operations on our cluster.
For a long time we've been using 2 ec pools with k=4 m=1 and a fixed 4MB
read/write size with the python bindings. During preparations for
migrating all of our data to a k=6 m=2 pool we've discovered that ec
pool alignment size is dynamic and the librados bindings for python and
go fail to write objects because they are not aware of the the pool
alignment size and therefore cannot adjust the write block size to be a
multiple of that. The ec pool alignment size seems to be (k value * 4K)
on new pools, but is only 4K on old pools from the hammer days. We
haven't been able to find much useful documentation for this pool
alignment setting other than the librados docs
(http://docs.ceph.com/docs/master/rados/api/librados)
rados_ioctx_pool_requires_alingment,
rados_ioctx_pool_requires_alignment2,
rados_ioctx_pool_required_alignment,
rados_ioctx_pool_required_alignment2. After going through the rados
binary source we found that the binary is rounding the write op size for
an ec pool to a multiple of the pool alignment size (line ~1945
https://github.com/ceph/ceph/blob/master/src/tools/rados/rados.cc#L1945).
The min write op size can be figured out by writing to an ec pool like
this to get the binary to round up and print it out `rados -b 1k -p
$pool put .....`. All of the support for being alignment aware is
obviously available but simply isn't available in the bindings, we've
only tested python and go.
We've gone ahead and submitted a patch and pull request to the pycradox
project which seems to be what was merged into the ceph project for
python bindings https://github.com/sileht/pycradox/pull/4. It replicates
getting the alignment size of the pool in the python bindings so that we
can then calculate the proper op sizes for writing to a pool
We find it hard to believe that we're the only ones to have run into
this problem when using the bindings. Have we missed something obvious
for cluster configuration? Or maybe we're just doing things different
compared to most users... Any insight would be appreciated as we'd
prefer to use an official solution rather than our bindings fix for long
term use.
Tested on Luminous 12.2.2 and 12.2.4.
Thanks,
Kevin
--
Kevin Hrpcek
Linux Systems Administrator
NASA SNPP Atmospheric SIPS
Space Science & Engineering Center
University of Wisconsin-Madison
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com