Re: [ceph-users] RadosGW performance s3 many objects

Krzysztof Księżyk Wed, 27 Jan 2016 14:28:14 -0800

On Sun, 2016-01-24 at 13:44 +0100, Stefan Rogge wrote: 

> Hi,
> we are using the Ceph with RadosGW and S3 setting.
> With more and more objects in the storage the writing speed slows down
> significantly. With 5 million object in the storage we had a writing
> speed of 10MS/s. With 10 million objects in the storage its only
> 5MB/s.  
> Is this a common issue?
> Is the RadosGW suitable for a large amount of objects or would you
> recommend to not use the RadosGW with these amount of objects?
> 
> 
> Thank you.
> 
> 
> Stefan
> 
> 
> I found also a ticket at the ceph tracker with the same issue:
> 
> 
> http://tracker.ceph.com/projects/ceph/wiki/Rgw_-_bucket_index_scalability
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Hi,

I'm struggling with the same issue on Ceph 9.2.0. Unfortunately I wasn't
aware of it and now the only way to improve things is create new bucket
with bucket index shrading or change way our apps store data into
buckets. And of course copy tons of data :( In my case also sth happened
to leveldb files and now I cannot even run some radosgw-admin commands
like:

radosgw-admin bucket check -b ....

what causes osd daemon flapping and process timeout messages in logs.
PGS containing  .rgw.bucket.index  can't be even backfilled to other osd
as osd process dies with messages:


> [...]
> 2016-01-25 15:47:22.700737 7f79fc66d700  1 heartbeat_map is_healthy 
> 'OSD::osd_op_tp thread 0x7f7992c86700' had suicide timed out after 150
> 2016-01-25 15:47:22.702619 7f79fc66d700 -1 common/HeartbeatMap.cc: In 
> function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, 
> const char*, time_t)' thread 7f79fc66d700 time 2016-01-25 15:47:22.700751
> common/HeartbeatMap.cc: 81: FAILED assert(0 == "hit suicide timeout")
> 
>  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x85) [0x7f7a019f4be5]
>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, 
> long)+0x2d9) [0x7f7a019343b9]
>  3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0x7f7a01934bf6]
>  4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x7f7a019353bc]
>  5: (CephContextServiceThread::entry()+0x15b) [0x7f7a01a10dcb]
>  6: (()+0x7df5) [0x7f79ffa8fdf5]
>  7: (clone()+0x6d) [0x7f79fe3381ad]
> 


I don't know - maybe it's because number of leveldb files in omap folder
(total 5.1GB). Read somewhere that things can be improved by setting
'leveldb_compression' to false and leveldb_compact_on_mount to true but
I don't know if these options have any effect in 9.2.0 as they are not
documented for this release. Tried with 'leveldb_compression' but
without visible effect and wasn't brave enough with trying
leveldb_compact_on_mount on live. But setting it to true on my test
0.94.5 makes osd failing on restart.

Kind regards -
Krzysztof Księżyk

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RadosGW performance s3 many objects

Reply via email to