Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-23 Thread Anton Dmitriev
This problem occurs not only on my cluster, but also on some others. It there any workaround for it? How I can disable leveldb compact on every OSD start, except "leveldb_compact_on_mount": "false"? In my opinion this option does not work correctly in 10.2.9. On 19.07.2017 09:12, Anton Dmitri

Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-18 Thread Anton Dmitriev
root@storage07:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 14.04.5 LTS Release:14.04 Codename: trusty root@storage07:~$ uname -a Linux storage07 4.4.0-83-generic #106~14.04.1-Ubuntu SMP Mon Jun 26 18:10:19 UTC 2017 x86_64 x86_64 x86

Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-18 Thread Josh Durgin
On 07/17/2017 10:04 PM, Anton Dmitriev wrote: My cluster stores more than 1.5 billion objects in RGW, cephfs I dont use. Bucket index pool stored on separate SSD placement. But compaction occurs on all OSD, also on those, which doesn`t contain bucket indexes. After restarting 5 times every OSD

Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-17 Thread Anton Dmitriev
My cluster stores more than 1.5 billion objects in RGW, cephfs I dont use. Bucket index pool stored on separate SSD placement. But compaction occurs on all OSD, also on those, which doesn`t contain bucket indexes. After restarting 5 times every OSD nothing changed, each of them doing comapct ag

Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-17 Thread Josh Durgin
Both of you are seeing leveldb perform compaction when the osd starts up. This can take a while for large amounts of omap data (created by things like cephfs directory metadata or rgw bucket indexes). The 'leveldb_compact_on_mount' option wasn't changed in 10.2.9, but leveldb will compact automa

Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-17 Thread Lincoln Bryant
Hi Anton, We observe something similar on our OSDs going from 10.2.7 to 10.2.9 (see thread "some OSDs stuck down after 10.2.7 -> 10.2.9 update"). Some of our OSDs are not working at all on 10.2.9 or die with suicide timeouts. Those that come up/in take a very long time to boot up. Seems to no

Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-16 Thread Anton Dmitriev
During start it consumes ~90% CPU, strace shows, that OSD process doing something with LevelDB. Compact is disabled: r...@storage07.main01.ceph.apps.prod.int.grcc:~$ cat /etc/ceph/ceph.conf | grep compact #leveldb_compact_on_mount = true But with debug_leveldb=20 I see, that compaction is runn

Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-16 Thread Anton Dmitriev
Thanks for reply. I restarted OSD with debug_ms = 1/1 and debug_osd = 20/20. Look at this: 2017-07-17 08:57:52.077481 7f4db319c840 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-167) detect_feature: extsize is disabled by conf 2017-07-17 09:04:04.345065 7f4db319c840 0 filestore(/var/lib/ceph/o

Re: [ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-16 Thread Irek Fasikhov
Hi, Anton. You need to run the OSD with debug_ms = 1/1 and debug_osd = 20/20 for detailed information. 2017-07-17 8:26 GMT+03:00 Anton Dmitriev : > Hi, all! > > After upgrading from 10.2.7 to 10.2.9 I see that restarting osds by > 'restart ceph-osd id=N' or 'restart ceph-osd-all' takes about 10 m

[ceph-users] Long OSD restart after upgrade to 10.2.9

2017-07-16 Thread Anton Dmitriev
Hi, all! After upgrading from 10.2.7 to 10.2.9 I see that restarting osds by 'restart ceph-osd id=N' or 'restart ceph-osd-all' takes about 10 minutes for getting OSD from DOWN to UP. The same situation on all 208 OSDs on 7 servers. Also very long OSD start after rebooting servers. Before up