Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

David Turner Tue, 18 Sep 2018 14:07:13 -0700

Are those settings fine to have be global even if not all OSDs on a node
have rocksdb as the backend?  Or will I need to convert all OSDs on a node
at the same time?


On Tue, Sep 18, 2018 at 5:02 PM Pavan Rallabhandi <
prallabha...@walmartlabs.com> wrote:

> The steps that were outlined for conversion are correct, have you tried
> setting some the relevant ceph conf values too:
>
> filestore_rocksdb_options =
> "max_background_compactions=8;compaction_readahead_size=2097152;compression=kNoCompression"
>
> filestore_omap_backend = rocksdb
>
> Thanks,
> -Pavan.
>
> From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of David
> Turner <drakonst...@gmail.com>
> Date: Tuesday, September 18, 2018 at 4:09 PM
> To: ceph-users <ceph-users@lists.ceph.com>
> Subject: EXT: [ceph-users] Any backfill in our cluster makes the cluster
> unusable and takes forever
>
> I've finally learned enough about the OSD backend track down this issue to
> what I believe is the root cause.  LevelDB compaction is the common thread
> every time we move data around our cluster.  I've ruled out PG subfolder
> splitting, EC doesn't seem to be the root cause of this, and it is cluster
> wide as opposed to specific hardware.
>
> One of the first things I found after digging into leveldb omap compaction
> was [1] this article with a heading "RocksDB instead of LevelDB"
> which mentions that leveldb was replaced with rocksdb as the default db
> backend for filestore OSDs and was even backported to Jewel because of the
> performance improvements.
>
> I figured there must be a way to be able to upgrade an OSD to use rocksdb
> from leveldb without needing to fully backfill the entire OSD.  There is
> [2] this article, but you need to have an active service account with
> RedHat to access it.  I eventually came across [3] this article about
> optimizing Ceph Object Storage which mentions a resolution to OSDs flapping
> due to omap compaction to migrate to using rocksdb.  It links to the RedHat
> article, but also has [4] these steps outlined in it.  I tried to follow
> the steps, but the OSD I tested this on was unable to start with [5] this
> segfault.  And then trying to move the OSD back to the original LevelDB
> omap folder resulted in [6] this in the log.  I apologize that all of my
> logging is with log level 1.  If needed I can get some higher log levels.
>
> My Ceph version is 12.2.4.  Does anyone have any suggestions for how I can
> update my filestore backend from leveldb to rocksdb?  Or if that's the
> wrong direction and I should be looking elsewhere?  Thank you.
>
>
> [1] https://ceph.com/community/new-luminous-rados-improvements/
> [2] https://access.redhat.com/solutions/3210951
> [3]
> https://hubb.blob.core.windows.net/c2511cea-81c5-4386-8731-cc444ff806df-public/resources/Optimize
> Ceph object storage for production in multisite clouds.pdf
>
> [4] ■ Stop the OSD
> ■ mv /var/lib/ceph/osd/ceph-/current/omap /var/lib/ceph/osd/ceph-/omap.orig
> ■ ulimit -n 65535
> ■ ceph-kvstore-tool leveldb /var/lib/ceph/osd/ceph-/omap.orig store-copy
> /var/lib/ceph/osd/ceph-/current/omap 10000 rocksdb
> ■ ceph-osdomap-tool --omap-path /var/lib/ceph/osd/ceph-/current/omap
> --command check
> ■ sed -i s/leveldb/rocksdb/g /var/lib/ceph/osd/ceph-/superblock
> ■ chown ceph.ceph /var/lib/ceph/osd/ceph-/current/omap -R
> ■ cd /var/lib/ceph/osd/ceph-; rm -rf omap.orig
> ■ Start the OSD
>
> [5] 2018-09-17 19:23:10.826227 7f1f3f2ab700 -1 abort: Corruption: Snappy
> not supported or corrupted Snappy compressed block contents
> 2018-09-17 19:23:10.830525 7f1f3f2ab700 -1 *** Caught signal (Aborted) **
>
> [6] 2018-09-17 19:27:34.010125 7fcdee97cd80 -1 osd.0 0 OSD:init: unable to
> mount object store
> 2018-09-17 19:27:34.010131 7fcdee97cd80 -1 ESC[0;31m ** ERROR: osd init
> failed: (1) Operation not permittedESC[0m
> 2018-09-17 19:27:54.225941 7f7f03308d80  0 set uid:gid to 167:167
> (ceph:ceph)
> 2018-09-17 19:27:54.225975 7f7f03308d80  0 ceph version 12.2.4
> (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable), process
> (unknown), pid 361535
> 2018-09-17 19:27:54.231275 7f7f03308d80  0 pidfile_write: ignore empty
> --pid-file
> 2018-09-17 19:27:54.260207 7f7f03308d80  0 load: jerasure load: lrc load:
> isa
> 2018-09-17 19:27:54.260520 7f7f03308d80  0
> filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
> 2018-09-17 19:27:54.261135 7f7f03308d80  0
> filestore(/var/lib/ceph/osd/ceph-0) backend xfs (magic 0x58465342)
> 2018-09-17 19:27:54.261750 7f7f03308d80  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config option
> 2018-09-17 19:27:54.261757 7f7f03308d80  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
> SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
> 2018-09-17 19:27:54.261758 7f7f03308d80  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features: splice()
> is disabled via 'filestore splice' config option
> 2018-09-17 19:27:54.286454 7f7f03308d80  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2018-09-17 19:27:54.286572 7f7f03308d80  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-0) detect_feature: extsize is
> disabled by conf
> 2018-09-17 19:27:54.287119 7f7f03308d80  0
> filestore(/var/lib/ceph/osd/ceph-0) start omap initiation
> 2018-09-17 19:27:54.287527 7f7f03308d80 -1
> filestore(/var/lib/ceph/osd/ceph-0) mount(1723): Error initializing leveldb
> : Corruption: VersionEdit: unknown tag
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Any backfill in our cluster makes the cluster unusable and takes forever

Reply via email to