[ceph-users] Ceph metadata

2019-01-29 Thread F B
Hi,

I'm looking for some details about the limits of the metadata used by Ceph.
I founded some restrictions from XFS :

-  Max total keys/values size : 64 kB

-  Max key size : 255 bytes

Does Ceph has limits for this metadata ?

Thanks in advance !

Fabien BELLEGO
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fs crashed after upgrade to 13.2.4

2019-01-29 Thread Yan, Zheng
upgraded from which version?  have you try downgrade ceph-mds to old version?


On Mon, Jan 28, 2019 at 9:20 PM Ansgar Jazdzewski
 wrote:
>
> hi folks we need some help with our cephfs, all mds keep crashing
>
> starting mds.mds02 at -
> terminate called after throwing an instance of
> 'ceph::buffer::bad_alloc'
>  what():  buffer::bad_alloc
> *** Caught signal (Aborted) **
> in thread 7f542d825700 thread_name:md_log_replay
> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
> 1: /usr/bin/ceph-mds() [0x7cc8a0]
> 2: (()+0x11390) [0x7f543cf29390]
> 3: (gsignal()+0x38) [0x7f543c676428]
> 4: (abort()+0x16a) [0x7f543c67802a]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65]
> 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46]
> 7: (()+0x734e91) [0x7f543dadae91]
> 8: (()+0x7410a4) [0x7f543dae70a4]
> 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned
> int, int)+0x258) [0x7f543d63b348]
> 10: (ceph::buffer::list::iterator_impl::copy_shallow(unsigned
> int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2]
> 11: (compact_map_base std::char_traits,
> mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> ceph::buffer::ptr, std::map std::char_traits,
> mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> ceph::buffer::ptr, std::less std::char_traits, mempool::po
> ol_allocator<(mempool::pool_index_t)18, char> > >,
> mempool::pool_allocator<(mempool::pool_index_t)18,
> std::pair,
> mempool::pool_allocator<(mempool::pool_index_t)18, char> > const,
> ceph::buffer::ptr> > > >::decode(ceph::buffer::list::iterator&)+0x122)
> [0x66b202]
> 12: (EMetaBlob::fullbit::decode(ceph::buffer::list::iterator&)+0xe3) 
> [0x7aa633]
> 13: /usr/bin/ceph-mds() [0x7aeae6]
> 14: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x3d36) 
> [0x7b4fa6]
> 15: (EImportStart::replay(MDSRank*)+0x5b) [0x7bbb1b]
> 16: (MDLog::_replay_thread()+0x864) [0x760024]
> 17: (MDLog::ReplayThread::entry()+0xd) [0x4f487d]
> 18: (()+0x76ba) [0x7f543cf1f6ba]
> 19: (clone()+0x6d) [0x7f543c74841d]
> 2019-01-28 13:10:02.202 7f542d825700 -1 *** Caught signal (Aborted) **
> in thread 7f542d825700 thread_name:md_log_replay
>
> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
> 1: /usr/bin/ceph-mds() [0x7cc8a0]
> 2: (()+0x11390) [0x7f543cf29390]
> 3: (gsignal()+0x38) [0x7f543c676428]
> 4: (abort()+0x16a) [0x7f543c67802a]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65]
> 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46]
> 7: (()+0x734e91) [0x7f543dadae91]
> 8: (()+0x7410a4) [0x7f543dae70a4]
> 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned
> int, int)+0x258) [0x7f543d63b348]
> 10: (ceph::buffer::list::iterator_impl::copy_shallow(unsigned
> int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2]
> 11: (compact_map_base std::char_traits,
> mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> ceph::buffer::ptr, std::map std::char_traits,
> mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> ceph::buffer::ptr, std::less std::char_traits, mempool::po
> ol_allocator<(mempool::pool_index_t)18, char> > >,
> mempool::pool_allocator<(mempool::pool_index_t)18,
> std::pair,
> mempool::pool_allocator<(mempool::pool_index_t)18, char> > const,
> ceph::buffer::ptr> > > >::decode(ceph::buffer::list::iterator&)+0x122)
> [0x66b202]
> 12: (EMetaBlob::fullbit::decode(ceph::buffer::list::iterator&)+0xe3) 
> [0x7aa633]
> 13: /usr/bin/ceph-mds() [0x7aeae6]
> 14: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x3d36) 
> [0x7b4fa6]
> 15: (EImportStart::replay(MDSRank*)+0x5b) [0x7bbb1b]
> 16: (MDLog::_replay_thread()+0x864) [0x760024]
> 17: (MDLog::ReplayThread::entry()+0xd) [0x4f487d]
> 18: (()+0x76ba) [0x7f543cf1f6ba]
> 19: (clone()+0x6d) [0x7f543c74841d]
> NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
>
> 0> 2019-01-28 13:10:02.202 7f542d825700 -1 *** Caught signal
> (Aborted) **
> in thread 7f542d825700 thread_name:md_log_replay
>
> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
> 1: /usr/bin/ceph-mds() [0x7cc8a0]
> 2: (()+0x11390) [0x7f543cf29390]
> 3: (gsignal()+0x38) [0x7f543c676428]
> 4: (abort()+0x16a) [0x7f543c67802a]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65]
> 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46]
> 7: (()+0x734e91) [0x7f543dadae91]
> 8: (()+0x7410a4) [0x7f543dae70a4]
> 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned
> int, int)+0x258) [0x7f543d63b348]
> 10: (ceph::buffer::list::iterator_impl::copy_shallow(unsigned
> int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2]
> 11: (compact_map_base std::char_traits,
> mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> ceph::buffer::ptr, std::map std::char_traits,
> mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> ceph::buffer::ptr, std::less std::char_traits, mempool::po
> ol_allocator<(me

Re: [ceph-users] tuning ceph mds cache settings

2019-01-29 Thread Yan, Zheng
On Fri, Jan 25, 2019 at 9:49 PM Jonathan Woytek  wrote:
>
> Hi friendly ceph folks. A little while after I got the message asking for 
> some stats, we had a network issue that caused us to take all of our 
> processing offline for a few hours. Since we brought everything back up, I 
> have been unable to duplicate the issues I was seeing. Instead, performance 
> of the file writer has been steady around 1.5k files/minute. Dropping the 
> cache causes performance to suffer. We can only get back to the 1.5k/minute 
> average range by restarting all of the mds daemons (well, specifically, it 
> looks like we can restart the first two or three, but resetting the other 
> three or four doesn't seem to make a difference).
>
> Now, I'm seeing pretty consistent sets of slow requests logged on the first 
> two mds daemons, stating that the slow request is a 
> "rejoin:client.[clientid]". When I parse the clientid's and look at the 
> client lists on the daemons, the clients correspond to the six swarm hosts 
> running the file writers. I'm attaching a small archive here of the 
> performance metrics Zheng asked me to produce a couple of weeks ago. I'm not 
> sure if they are valid for this particular problem.
>
> jonathan
>

Looks like you have 5 active mds. I suspect your issue is related to
load balancer.  Please try disabling mds load balancer (add
"mds_bal_max = 0" to mds section of ceph.conf). and use 'export_pin'
to manually pin directories to mds
(https://ceph.com/community/new-luminous-cephfs-subtree-pinning/)


>
> On Wed, Jan 9, 2019 at 9:10 PM Yan, Zheng  wrote:
>>
>> [...]
>> Could you please run following command (for each active mds) when
>> operations are fast and when operations are slow
>>
>> - for i in `seq 10`; do ceph daemon mds.xxx dump_historic_ops >
>> mds.xxx.$i; sleep 1; done
>>
>> Then send the results to us
>>
>> Regards
>> Yan, Zheng
>
> --
> Jonathan Woytek
> http://www.dryrose.com
> KB3HOZ
> PGP:  462C 5F50 144D 6B09 3B65  FCE8 C1DC DEC4 E8B6 AABC
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fs crashed after upgrade to 13.2.4

2019-01-29 Thread Ansgar Jazdzewski
Hi,

we upgraded from 12.2.8 to 13.2.4 (ubuntu 16.04)

- after the upgrade (~2 hours after the upgrade) the replay-mds keep
crashing so we tryed to restart all MDS than the filesystem was in
'failed' state and no MDS is in "activ"-state
- we than tryed to downgrade the MDS to 13.2.1 but had no luck.
ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77)
mimic (stable), process ceph-mds, pid 6843
...
log_channel(cluster) log [ERR] : corrupt sessionmap values:
buffer::malformed_input: void
session_info_t::decode(ceph::buffer::list::iterator&) no longer
understand old encoding version 6 < 7

so we "fixed" the filsystem by following the manual
(http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/)

systemctl stop ceph-mds@`hostname -s`.service
cephfs-journal-tool journal export backup.bin
cephfs-journal-tool --help
cephfs-journal-tool --rank=cephfs:all journal export backup.bin
cephfs-journal-tool --rank=cephfs:all event recover_dentries summary
cephfs-journal-tool --rank=cephfs:all journal reset
cephfs-table-tool all reset session
systemctl restart ceph-mds@mds05.service
systemctl stop ceph-mds@`hostname -s`.service
ceph fs set cephfs down true
ceph fs set cephfs down false
cephfs-table-tool all reset session
cephfs-table-tool all reset inode
cephfs-table-tool all reset snap
systemctl restart ceph-mds@mds05.service
less /var/log/ceph/ceph-mds.mds05.log
systemctl stop ceph-mds@`hostname -s`.service
ceph fs reset cephfs --yes-i-really-mean-it
systemctl restart ceph-mds@mds05.service
less /var/log/ceph/ceph-mds.mds05.log

after the reset we was able to use the cephfs but we still have some errors

...
log [ERR] : unmatched rstat rbytes on single dirfrag 0x10002253f4e,
inode has n(v9 rc2019-01-28 14:42:47.371612 b1158 71=8+63), dirfrag
has n(v9 rc2019-01-28 14:42:47.371612 b1004 65=7+58)
log [ERR] : unmatched fragstat size on single dirfrag 0x10002253db6,
inode has f(v0 m2019-01-28 14:46:47.983292 59=0+59), dirfrag has f(v0
m2019-01-28 14:46:47.983292 58=0+58)
log [ERR] : unmatched rstat rbytes on single dirfrag 0x10002253db6,
inode has n(v11 rc2019-01-28 14:46:47.983292 b1478 71=11+60), dirfrag
has n(v11 rc2019-01-28 14:46:47.983292 b1347 68=10+58)
...

any help is welcome,
Ansgar

Am Di., 29. Jan. 2019 um 12:32 Uhr schrieb Yan, Zheng :
>
> upgraded from which version?  have you try downgrade ceph-mds to old version?
>
>
> On Mon, Jan 28, 2019 at 9:20 PM Ansgar Jazdzewski
>  wrote:
> >
> > hi folks we need some help with our cephfs, all mds keep crashing
> >
> > starting mds.mds02 at -
> > terminate called after throwing an instance of
> > 'ceph::buffer::bad_alloc'
> >  what():  buffer::bad_alloc
> > *** Caught signal (Aborted) **
> > in thread 7f542d825700 thread_name:md_log_replay
> > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic 
> > (stable)
> > 1: /usr/bin/ceph-mds() [0x7cc8a0]
> > 2: (()+0x11390) [0x7f543cf29390]
> > 3: (gsignal()+0x38) [0x7f543c676428]
> > 4: (abort()+0x16a) [0x7f543c67802a]
> > 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65]
> > 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46]
> > 7: (()+0x734e91) [0x7f543dadae91]
> > 8: (()+0x7410a4) [0x7f543dae70a4]
> > 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned
> > int, int)+0x258) [0x7f543d63b348]
> > 10: (ceph::buffer::list::iterator_impl::copy_shallow(unsigned
> > int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2]
> > 11: (compact_map_base > std::char_traits,
> > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > ceph::buffer::ptr, std::map > std::char_traits,
> > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > ceph::buffer::ptr, std::less > std::char_traits, mempool::po
> > ol_allocator<(mempool::pool_index_t)18, char> > >,
> > mempool::pool_allocator<(mempool::pool_index_t)18,
> > std::pair,
> > mempool::pool_allocator<(mempool::pool_index_t)18, char> > const,
> > ceph::buffer::ptr> > > >::decode(ceph::buffer::list::iterator&)+0x122)
> > [0x66b202]
> > 12: (EMetaBlob::fullbit::decode(ceph::buffer::list::iterator&)+0xe3) 
> > [0x7aa633]
> > 13: /usr/bin/ceph-mds() [0x7aeae6]
> > 14: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x3d36) 
> > [0x7b4fa6]
> > 15: (EImportStart::replay(MDSRank*)+0x5b) [0x7bbb1b]
> > 16: (MDLog::_replay_thread()+0x864) [0x760024]
> > 17: (MDLog::ReplayThread::entry()+0xd) [0x4f487d]
> > 18: (()+0x76ba) [0x7f543cf1f6ba]
> > 19: (clone()+0x6d) [0x7f543c74841d]
> > 2019-01-28 13:10:02.202 7f542d825700 -1 *** Caught signal (Aborted) **
> > in thread 7f542d825700 thread_name:md_log_replay
> >
> > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic 
> > (stable)
> > 1: /usr/bin/ceph-mds() [0x7cc8a0]
> > 2: (()+0x11390) [0x7f543cf29390]
> > 3: (gsignal()+0x38) [0x7f543c676428]
> > 4: (abort()+0x16a) [0x7f543c67802a]
> > 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65]
> > 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46]
> > 7: 

Re: [ceph-users] tuning ceph mds cache settings

2019-01-29 Thread Jonathan Woytek
On Tue, Jan 29, 2019 at 7:12 AM Yan, Zheng  wrote:

> Looks like you have 5 active mds. I suspect your issue is related to
> load balancer.  Please try disabling mds load balancer (add
> "mds_bal_max = 0" to mds section of ceph.conf). and use 'export_pin'
> to manually pin directories to mds
> (https://ceph.com/community/new-luminous-cephfs-subtree-pinning/)
>

This seems unclear from the documentation I've reviewed, but what happens
to directories that are not manually pinned if the load balancer is
disabled? The busiest directories are static (with changing contents, of
course) and will be pretty easy to distribute amongst the mds with pinning,
but then there are a handful of other directories that exist, some of which
are created or destroyed at different times.

jonathan
-- 
Jonathan Woytek
http://www.dryrose.com
KB3HOZ
PGP:  462C 5F50 144D 6B09 3B65  FCE8 C1DC DEC4 E8B6 AABC
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs constantly strays ( num_strays)

2019-01-29 Thread Yan, Zheng
Nothing to worried about.

On Sun, Jan 27, 2019 at 10:13 PM Marc Roos  wrote:
>
>
> I have constantly strays. What are strays? Why do I have them? Is this
> bad?
>
>
>
> [@~]# ceph daemon mds.c perf dump| grep num_stray
> "num_strays": 25823,
> "num_strays_delayed": 0,
> "num_strays_enqueuing": 0,
> [@~]#
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fs crashed after upgrade to 13.2.4

2019-01-29 Thread Yan, Zheng
On Tue, Jan 29, 2019 at 8:30 PM Ansgar Jazdzewski
 wrote:
>
> Hi,
>
> we upgraded from 12.2.8 to 13.2.4 (ubuntu 16.04)
>
> - after the upgrade (~2 hours after the upgrade) the replay-mds keep
> crashing so we tryed to restart all MDS than the filesystem was in
> 'failed' state and no MDS is in "activ"-state
> - we than tryed to downgrade the MDS to 13.2.1 but had no luck.
> ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77)
> mimic (stable), process ceph-mds, pid 6843
> ...
> log_channel(cluster) log [ERR] : corrupt sessionmap values:
> buffer::malformed_input: void
> session_info_t::decode(ceph::buffer::list::iterator&) no longer
> understand old encoding version 6 < 7
>
> so we "fixed" the filsystem by following the manual
> ()
>
> systemctl stop ceph-mds@`hostname -s`.service
> cephfs-journal-tool journal export backup.bin
> cephfs-journal-tool --help
> cephfs-journal-tool --rank=cephfs:all journal export backup.bin
> cephfs-journal-tool --rank=cephfs:all event recover_dentries summary
> cephfs-journal-tool --rank=cephfs:all journal reset
> cephfs-table-tool all reset session
> systemctl restart ceph-mds@mds05.service
> systemctl stop ceph-mds@`hostname -s`.service
> ceph fs set cephfs down true
> ceph fs set cephfs down false
> cephfs-table-tool all reset session
> cephfs-table-tool all reset inode
> cephfs-table-tool all reset snap

you have reset inode table. it will cause further damage. you should
stop using your fs (umount your client),  immediately run ' ceph tell
mds.xx scrub start / recursive repair'


> systemctl restart ceph-mds@mds05.service
> less /var/log/ceph/ceph-mds.mds05.log
> systemctl stop ceph-mds@`hostname -s`.service
> ceph fs reset cephfs --yes-i-really-mean-it
> systemctl restart ceph-mds@mds05.service
> less /var/log/ceph/ceph-mds.mds05.log
>
> after the reset we was able to use the cephfs but we still have some errors
>
> ...
> log [ERR] : unmatched rstat rbytes on single dirfrag 0x10002253f4e,
> inode has n(v9 rc2019-01-28 14:42:47.371612 b1158 71=8+63), dirfrag
> has n(v9 rc2019-01-28 14:42:47.371612 b1004 65=7+58)
> log [ERR] : unmatched fragstat size on single dirfrag 0x10002253db6,
> inode has f(v0 m2019-01-28 14:46:47.983292 59=0+59), dirfrag has f(v0
> m2019-01-28 14:46:47.983292 58=0+58)
> log [ERR] : unmatched rstat rbytes on single dirfrag 0x10002253db6,
> inode has n(v11 rc2019-01-28 14:46:47.983292 b1478 71=11+60), dirfrag
> has n(v11 rc2019-01-28 14:46:47.983292 b1347 68=10+58)
> ...
>
> any help is welcome,
> Ansgar
>
> Am Di., 29. Jan. 2019 um 12:32 Uhr schrieb Yan, Zheng :
> >
> > upgraded from which version?  have you try downgrade ceph-mds to old 
> > version?
> >
> >
> > On Mon, Jan 28, 2019 at 9:20 PM Ansgar Jazdzewski
> >  wrote:
> > >
> > > hi folks we need some help with our cephfs, all mds keep crashing
> > >
> > > starting mds.mds02 at -
> > > terminate called after throwing an instance of
> > > 'ceph::buffer::bad_alloc'
> > >  what():  buffer::bad_alloc
> > > *** Caught signal (Aborted) **
> > > in thread 7f542d825700 thread_name:md_log_replay
> > > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic 
> > > (stable)
> > > 1: /usr/bin/ceph-mds() [0x7cc8a0]
> > > 2: (()+0x11390) [0x7f543cf29390]
> > > 3: (gsignal()+0x38) [0x7f543c676428]
> > > 4: (abort()+0x16a) [0x7f543c67802a]
> > > 5: (__gnu_cxx::__verbose_terminate_handler()+0x135) [0x7f543dae6e65]
> > > 6: (__cxxabiv1::__terminate(void (*)())+0x6) [0x7f543dadae46]
> > > 7: (()+0x734e91) [0x7f543dadae91]
> > > 8: (()+0x7410a4) [0x7f543dae70a4]
> > > 9: (ceph::buffer::create_aligned_in_mempool(unsigned int, unsigned
> > > int, int)+0x258) [0x7f543d63b348]
> > > 10: (ceph::buffer::list::iterator_impl::copy_shallow(unsigned
> > > int, ceph::buffer::ptr&)+0xa2) [0x7f543d640ee2]
> > > 11: (compact_map_base > > std::char_traits,
> > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > > ceph::buffer::ptr, std::map > > std::char_traits,
> > > mempool::pool_allocator<(mempool::pool_index_t)18, char> >,
> > > ceph::buffer::ptr, std::less > > std::char_traits, mempool::po
> > > ol_allocator<(mempool::pool_index_t)18, char> > >,
> > > mempool::pool_allocator<(mempool::pool_index_t)18,
> > > std::pair,
> > > mempool::pool_allocator<(mempool::pool_index_t)18, char> > const,
> > > ceph::buffer::ptr> > > >::decode(ceph::buffer::list::iterator&)+0x122)
> > > [0x66b202]
> > > 12: (EMetaBlob::fullbit::decode(ceph::buffer::list::iterator&)+0xe3) 
> > > [0x7aa633]
> > > 13: /usr/bin/ceph-mds() [0x7aeae6]
> > > 14: (EMetaBlob::replay(MDSRank*, LogSegment*, MDSlaveUpdate*)+0x3d36) 
> > > [0x7b4fa6]
> > > 15: (EImportStart::replay(MDSRank*)+0x5b) [0x7bbb1b]
> > > 16: (MDLog::_replay_thread()+0x864) [0x760024]
> > > 17: (MDLog::ReplayThread::entry()+0xd) [0x4f487d]
> > > 18: (()+0x76ba) [0x7f543cf1f6ba]
> > > 19: (clone()+0x6d) [0x7f543c74841d]
> > > 2019-01-28 13:10:02.202 7f542d825700 -1 *** Caught signal (Aborted) **
> > > in thread 7f542d825700 thread_name:md_log_re

Re: [ceph-users] tuning ceph mds cache settings

2019-01-29 Thread Yan, Zheng
On Tue, Jan 29, 2019 at 9:05 PM Jonathan Woytek  wrote:
>
> On Tue, Jan 29, 2019 at 7:12 AM Yan, Zheng  wrote:
>>
>> Looks like you have 5 active mds. I suspect your issue is related to
>> load balancer.  Please try disabling mds load balancer (add
>> "mds_bal_max = 0" to mds section of ceph.conf). and use 'export_pin'
>> to manually pin directories to mds
>> (https://ceph.com/community/new-luminous-cephfs-subtree-pinning/)
>
>
> This seems unclear from the documentation I've reviewed, but what happens to 
> directories that are not manually pinned if the load balancer is disabled? 
> The busiest directories are static (with changing contents, of course) and 
> will be pretty easy to distribute amongst the mds with pinning, but then 
> there are a handful of other directories that exist, some of which are 
> created or destroyed at different times.
>

if balancer is disabled, directory stays in its current mds. newly
create directory is in the same mds as its parent directory


> jonathan
> --
> Jonathan Woytek
> http://www.dryrose.com
> KB3HOZ
> PGP:  462C 5F50 144D 6B09 3B65  FCE8 C1DC DEC4 E8B6 AABC
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous defaults and OpenStack

2019-01-29 Thread Smith, Eric
Hey folks - I'm using Luminous (12.2.10) and I was wondering if there's 
anything out of the box I need to change performance wise to get the most out 
of OpenStack on Ceph. I'm running Rocky (Deployed with Kolla) and running Ceph 
deployed via ceph-deploy.

Any tips / tricks / gotchas are greatly appreciated!
Eric
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tuning ceph mds cache settings

2019-01-29 Thread Jonathan Woytek
Thanks.. I'll give this a shot and we'll see what happens!

jonathan

On Tue, Jan 29, 2019 at 8:47 AM Yan, Zheng  wrote:

> On Tue, Jan 29, 2019 at 9:05 PM Jonathan Woytek 
> wrote:
> >
> > On Tue, Jan 29, 2019 at 7:12 AM Yan, Zheng  wrote:
> >>
> >> Looks like you have 5 active mds. I suspect your issue is related to
> >> load balancer.  Please try disabling mds load balancer (add
> >> "mds_bal_max = 0" to mds section of ceph.conf). and use 'export_pin'
> >> to manually pin directories to mds
> >> (https://ceph.com/community/new-luminous-cephfs-subtree-pinning/)
> >
> >
> > This seems unclear from the documentation I've reviewed, but what
> happens to directories that are not manually pinned if the load balancer is
> disabled? The busiest directories are static (with changing contents, of
> course) and will be pretty easy to distribute amongst the mds with pinning,
> but then there are a handful of other directories that exist, some of which
> are created or destroyed at different times.
> >
>
> if balancer is disabled, directory stays in its current mds. newly
> create directory is in the same mds as its parent directory
>
>
> > jonathan
> > --
> > Jonathan Woytek
> > http://www.dryrose.com
> > KB3HOZ
> > PGP:  462C 5F50 144D 6B09 3B65  FCE8 C1DC DEC4 E8B6 AABC
>


-- 
Jonathan Woytek
http://www.dryrose.com
KB3HOZ
PGP:  462C 5F50 144D 6B09 3B65  FCE8 C1DC DEC4 E8B6 AABC
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multisite Ceph setup sync issue

2019-01-29 Thread Krishna Verma
Hi Ceph Users,

I need your to fix sync issue in multisite setup.

I have 2 cluster in different datacenter that we want to use for bidirectional 
data replication. By followed the documentation 
http://docs.ceph.com/docs/master/radosgw/multisite/ I have setup the gateway on 
each site but when I am checking the sync status its getting failed as below:

Admin node at master :
[cephuser@vlno-ceph01 cluster]$ radosgw-admin data sync status
ERROR: source zone not specified
[cephuser@vlno-ceph01 cluster]$ radosgw-admin realm list
{
"default_info": "1102c891-d81c-480e-9487-c9f874287d13",
"realms": [
"georep",
"geodata"
]
}

[cephuser@vlno-ceph01 cluster]$ radosgw-admin zonegroup list
read_default_id : 0
{
"default_info": "74ad391b-fbca-4c05-b9e7-c90fd4851223",
"zonegroups": [
"noida"
]
}

[cephuser@vlno-ceph01 cluster]$ radosgw-admin zone list
{
"default_info": "71931e0e-1be6-449f-af34-edb4166c4e4a",
"zones": [
"noida1"
]
}

[cephuser@vlno-ceph01 cluster]$

[cephuser@vlno-ceph01 cluster]$ cat ceph.conf
[global]
fsid = d52e50a4-ed2e-44cc-aa08-9309bc539a55
mon_initial_members = vlno-ceph01
mon_host = 172.23.16.67
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
# Your network address
public network = 172.23.16.0/24
osd pool default size = 2
rgw_override_bucket_index_max_shards = 100
debug ms = 1
debug rgw = 20
[cephuser@vlno-ceph01 cluster]$

On Master Gateway :

[cephuser@zabbix-server ~]$ cat /etc/ceph/ceph.conf
[global]
fsid = d52e50a4-ed2e-44cc-aa08-9309bc539a55
mon_initial_members = vlno-ceph01
mon_host = 172.23.16.67
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
# Your network address
public network = 172.23.16.0/24
osd pool default size = 2
rgw_override_bucket_index_max_shards = 100
debug ms = 1
debug rgw = 20
[client.rgw.zabbix-server]
host = zabbix-server
rgw frontends = "civetweb port=7480"
rgw_zone=noida1
[cephuser@zabbix-server ~]$


On Secondary site admin node.

[cephuser@vlsj-kverma1 cluster]$ radosgw-admin realm list
{
"default_info": "1102c891-d81c-480e-9487-c9f874287d13",
"realms": [
"georep"
]
}

[cephuser@vlsj-kverma1 cluster]$ radosgw-admin zonegroup list
read_default_id : 0
{
"default_info": "74ad391b-fbca-4c05-b9e7-c90fd4851223",
"zonegroups": [
"noida",
"default"
]
}

[cephuser@vlsj-kverma1 cluster]$ radosgw-admin zone list
{
"default_info": "45c690a8-f39c-4b1d-9faf-e0e991ceaaac",
"zones": [
"san-jose"
]
}

[cephuser@vlsj-kverma1 cluster]$


[cephuser@vlsj-kverma1 cluster]$ cat ceph.conf
[global]
fsid = c626be3a-4536-48b9-8db8-470437052313
mon_initial_members = vlsj-kverma1
mon_host = 172.18.84.131
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
# Your network address
public network = 172.18.84.0/24
osd pool default size = 2
rgw_override_bucket_index_max_shards = 100
debug ms = 1
debug rgw = 20


[cephuser@vlsj-kverma1 cluster]$

[cephuser@vlsj-kverma1 cluster]$ radosgw-admin data sync status
2019-01-28 10:33:12.163298 7f11c24c79c0  1 Cannot find zone 
id=45c690a8-f39c-4b1d-9faf-e0e991ceaaac (name=san-jose), switching to local 
zonegroup configuration
ERROR: source zone not specified
[cephuser@vlsj-kverma1 cluster]$

On Secondary site Gateway host:

[cephuser@zabbix-client ceph]$ cat /etc/ceph/ceph.conf
[global]
fsid = c626be3a-4536-48b9-8db8-470437052313
mon_initial_members = vlsj-kverma1
mon_host = 172.18.84.131
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
# Your network address
public network = 172.18.84.0/24
osd pool default size = 2
rgw_override_bucket_index_max_shards = 100
debug ms = 1
debug rgw = 20
[client.rgw.zabbix-client]
host = zabbix-client
rgw frontends = "civetweb port=7480"
rgw_zone=san-jose

[cephuser@zabbix-client ceph]$



Appreciate any help in the setup.

/Krishna

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bright new cluster get all pgs stuck in inactive

2019-01-29 Thread PHARABOT Vincent
Hello,

I have a bright new cluster with 2 pools, but cluster keeps pgs in inactive 
state.
I have 3 OSDs and 1 Mon… all seems ok except I could not have pgs in 
clean+active state !

I might miss something obvious but I really don’t know what…. Someone could 
help me ?
I tried to seek answers among the list mail threads, but no luck, other 
situation seems different

Thank you a lot for your help

Vincent

# ceph -v
ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)

# ceph -s
cluster:
id: ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
health: HEALTH_WARN
Reduced data availability: 200 pgs inactive

services:
mon: 1 daemons, quorum ip-10-8-66-123.eu-west-2.compute.internal
mgr: ip-10-8-66-123.eu-west-2.compute.internal(active)
osd: 3 osds: 3 up, 3 in

data:
pools: 2 pools, 200 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 2.9 TiB / 2.9 TiB avail
pgs: 100.000% pgs unknown
200 unknown

# ceph osd tree -f json-pretty

{
"nodes": [
{
"id": -1,
"name": "default",
"type": "root",
"type_id": 10,
"children": [
-3,
-5,
-7
]
},
{
"id": -7,
"name": "ip-10-8-10-108",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
2
]
},
{
"id": 2,
"device_class": "hdd",
"name": "osd.2",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
"depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity": 1.00
},
{
"id": -5,
"name": "ip-10-8-22-148",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
1
]
},
{
"id": 1,
"device_class": "hdd",
"name": "osd.1",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
"depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity": 1.00
},
{
"id": -3,
"name": "ip-10-8-5-246",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
0
]
},
{
"id": 0,
"device_class": "hdd",
"name": "osd.0",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
   "depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity": 1.00
}
],
"stray": []
}

# cat /etc/ceph/ceph.conf
[global]
fsid = ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
mon initial members = ip-10-8-66-123
mon host = 10.8.66.123
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
pid file = /var/run/$cluster/$type.pid


#Choose reasonable numbers for number of replicas and placement groups.
osd pool default size = 3 # Write an object 3 times
osd pool default min size = 2 # Allow writing 2 copy in a degraded state
osd pool default pg num = 100
osd pool default pgp num = 100

#Choose a reasonable crush leaf type
#0 for a 1-node cluster.
#1 for a multi node cluster in a single rack
#2 for a multi node, multi chassis cluster with multiple hosts in a chassis
#3 for a multi node cluster with hosts across racks, etc.
osd crush chooseleaf type = 2

[mon]
debug mon = 20

# ceph health detail
HEALTH_WARN Reduced data availability: 200 pgs inactive
PG_AVAILABILITY Reduced data availability: 200 pgs inactive
pg 1.46 is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.47 is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.48 is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.49 is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.4a is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.4b is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.4c is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.4d is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.4e is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.4f is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.50 is stuck inactive for 10848.068201, current state unknown, last 
acting []
pg 1.51 is stuck inactive for 10848

Re: [ceph-users] Bright new cluster get all pgs stuck in inactive

2019-01-29 Thread Jean-Charles Lopez
Hi,

I suspect your generated CRUSH rule is incorret because of 
osd_crush_cooseleaf_type=2 and by default chassis bucket are not created.

Changing the type of bucket to host (osd_crush_cooseleaf_type=1 which is the 
default when using old ceph-deploy or ceph-ansible) for your deployment should 
fix the problem.

Could you show the output of ceph osd crush rule dump to verify how the rule 
was built

JC

> On Jan 29, 2019, at 10:08, PHARABOT Vincent  wrote:
> 
> Hello,
>  
> I have a bright new cluster with 2 pools, but cluster keeps pgs in inactive 
> state.
> I have 3 OSDs and 1 Mon… all seems ok except I could not have pgs in 
> clean+active state !
>  
> I might miss something obvious but I really don’t know what…. Someone could 
> help me ?
> I tried to seek answers among the list mail threads, but no luck, other 
> situation seems different
>  
> Thank you a lot for your help
>  
> Vincent
>  
> # ceph -v
> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
>  
> # ceph -s
> cluster:
> id: ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
> health: HEALTH_WARN
> Reduced data availability: 200 pgs inactive
>  
> services:
> mon: 1 daemons, quorum ip-10-8-66-123.eu 
> -west-2.compute.internal
> mgr: ip-10-8-66-123.eu 
> -west-2.compute.internal(active)
> osd: 3 osds: 3 up, 3 in
>  
> data:
> pools: 2 pools, 200 pgs
> objects: 0 objects, 0 B
> usage: 3.0 GiB used, 2.9 TiB / 2.9 TiB avail
> pgs: 100.000% pgs unknown
> 200 unknown
>  
> # ceph osd tree -f json-pretty
>  
> {
> "nodes": [
> {
> "id": -1,
> "name": "default",
> "type": "root",
> "type_id": 10,
> "children": [
> -3,
> -5,
> -7
> ]
> },
> {
> "id": -7,
> "name": "ip-10-8-10-108",
> "type": "host",
> "type_id": 1,
> "pool_weights": {},
> "children": [
> 2
> ]
> },
> {
> "id": 2,
> "device_class": "hdd",
> "name": "osd.2",
> "type": "osd",
> "type_id": 0,
> "crush_weight": 0.976593,
> "depth": 2,
> "pool_weights": {},
> "exists": 1,
> "status": "up",
> "reweight": 1.00,
> "primary_affinity": 1.00
> },
> {
> "id": -5,
> "name": "ip-10-8-22-148",
> "type": "host",
> "type_id": 1,
> "pool_weights": {},
> "children": [
> 1
> ]
> },
> {
> "id": 1,
> "device_class": "hdd",
> "name": "osd.1",
> "type": "osd",
> "type_id": 0,
> "crush_weight": 0.976593,
> "depth": 2,
> "pool_weights": {},
> "exists": 1,
> "status": "up",
> "reweight": 1.00,
> "primary_affinity": 1.00
> },
> {
> "id": -3,
> "name": "ip-10-8-5-246",
> "type": "host",
> "type_id": 1,
> "pool_weights": {},
> "children": [
> 0
> ]
> },
> {
> "id": 0,
> "device_class": "hdd",
> "name": "osd.0",
> "type": "osd",
> "type_id": 0,
> "crush_weight": 0.976593,
>"depth": 2,
> "pool_weights": {},
> "exists": 1,
> "status": "up",
> "reweight": 1.00,
> "primary_affinity": 1.00
> }
> ],
> "stray": []
> }
>  
> # cat /etc/ceph/ceph.conf
> [global]
> fsid = ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
> mon initial members = ip-10-8-66-123
> mon host = 10.8.66.123
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> pid file = /var/run/$cluster/$type.pid
>  
>  
> #Choose reasonable numbers for number of replicas and placement groups.
> osd pool default size = 3 # Write an object 3 times
> osd pool default min size = 2 # Allow writing 2 copy in a degraded state
> osd pool default pg num = 100
> osd pool default pgp num = 100
>  
> #Choose a reasonable crush leaf type
> #0 for a 1-node cluster.
> #1 for a multi node cluster in a single rack
> #2 for a multi node, multi chassis cluster with multiple hosts in a chassis
> #3 for a multi node cluster with hosts across racks, etc.
> osd crush chooseleaf type = 2
>  
> [mon]
> debug mon = 20
>  
> # ceph health detail
> HEALTH_WARN Reduced data availability: 200 pgs inactive
> PG_AVAILABILITY Reduced data availability: 200 pgs inactive
> pg 1.46 is stuck inactive for 10848.068201, current state unknown, last 
> acting []
> pg 1.47 is stuck inactive for 10848.068

Re: [ceph-users] Bright new cluster get all pgs stuck in inactive

2019-01-29 Thread PHARABOT Vincent
Thanks for the quick reply

Here is the result

# ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
   "op": "emit"
}
]
}
]

De : Jean-Charles Lopez [mailto:jelo...@redhat.com]
Envoyé : mardi 29 janvier 2019 19:30
À : PHARABOT Vincent 
Cc : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Bright new cluster get all pgs stuck in inactive

Hi,

I suspect your generated CRUSH rule is incorret because of 
osd_crush_cooseleaf_type=2 and by default chassis bucket are not created.

Changing the type of bucket to host (osd_crush_cooseleaf_type=1 which is the 
default when using old ceph-deploy or ceph-ansible) for your deployment should 
fix the problem.

Could you show the output of ceph osd crush rule dump to verify how the rule 
was built

JC

On Jan 29, 2019, at 10:08, PHARABOT Vincent 
mailto:vincent.phara...@3ds.com>> wrote:

Hello,

I have a bright new cluster with 2 pools, but cluster keeps pgs in inactive 
state.
I have 3 OSDs and 1 Mon… all seems ok except I could not have pgs in 
clean+active state !

I might miss something obvious but I really don’t know what…. Someone could 
help me ?
I tried to seek answers among the list mail threads, but no luck, other 
situation seems different

Thank you a lot for your help

Vincent

# ceph -v
ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)

# ceph -s
cluster:
id: ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
health: HEALTH_WARN
Reduced data availability: 200 pgs inactive

services:
mon: 1 daemons, quorum 
ip-10-8-66-123.eu-west-2.compute.internal
mgr: 
ip-10-8-66-123.eu-west-2.compute.internal(active)
osd: 3 osds: 3 up, 3 in

data:
pools: 2 pools, 200 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 2.9 TiB / 2.9 TiB avail
pgs: 100.000% pgs unknown
200 unknown

# ceph osd tree -f json-pretty

{
"nodes": [
{
"id": -1,
"name": "default",
"type": "root",
"type_id": 10,
"children": [
-3,
-5,
-7
]
},
{
"id": -7,
"name": "ip-10-8-10-108",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
2
]
},
{
"id": 2,
"device_class": "hdd",
"name": "osd.2",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
"depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity": 1.00
},
{
"id": -5,
"name": "ip-10-8-22-148",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
1
]
},
{
"id": 1,
"device_class": "hdd",
"name": "osd.1",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
"depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity": 1.00
},
{
"id": -3,
"name": "ip-10-8-5-246",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
0
]
},
{
"id": 0,
"device_class": "hdd",
"name": "osd.0",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
   "depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity": 1.00
}
],
"stray": []
}

# cat /etc/ceph/ceph.conf
[global]
fsid = ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
mon initial members = ip-10-8-66-123
mon host = 10.8.66.123
auth_cluster_required = cephx
auth_service_requir

Re: [ceph-users] Bright new cluster get all pgs stuck in inactive

2019-01-29 Thread PHARABOT Vincent
Sorry JC, here is the correct osd crush rule dump (type=chassis instead of host)

# ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "chassis"
},
{
"op": "emit"
}
]
}
]

De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de 
PHARABOT Vincent
Envoyé : mardi 29 janvier 2019 19:33
À : Jean-Charles Lopez 
Cc : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Bright new cluster get all pgs stuck in inactive

Thanks for the quick reply

Here is the result

# ceph osd crush rule dump
[
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
   "op": "emit"
}
]
}
]

De : Jean-Charles Lopez [mailto:jelo...@redhat.com]
Envoyé : mardi 29 janvier 2019 19:30
À : PHARABOT Vincent mailto:vincent.phara...@3ds.com>>
Cc : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Bright new cluster get all pgs stuck in inactive

Hi,

I suspect your generated CRUSH rule is incorret because of 
osd_crush_cooseleaf_type=2 and by default chassis bucket are not created.

Changing the type of bucket to host (osd_crush_cooseleaf_type=1 which is the 
default when using old ceph-deploy or ceph-ansible) for your deployment should 
fix the problem.

Could you show the output of ceph osd crush rule dump to verify how the rule 
was built

JC

On Jan 29, 2019, at 10:08, PHARABOT Vincent 
mailto:vincent.phara...@3ds.com>> wrote:

Hello,

I have a bright new cluster with 2 pools, but cluster keeps pgs in inactive 
state.
I have 3 OSDs and 1 Mon… all seems ok except I could not have pgs in 
clean+active state !

I might miss something obvious but I really don’t know what…. Someone could 
help me ?
I tried to seek answers among the list mail threads, but no luck, other 
situation seems different

Thank you a lot for your help

Vincent

# ceph -v
ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)

# ceph -s
cluster:
id: ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
health: HEALTH_WARN
Reduced data availability: 200 pgs inactive

services:
mon: 1 daemons, quorum 
ip-10-8-66-123.eu-west-2.compute.internal
mgr: 
ip-10-8-66-123.eu-west-2.compute.internal(active)
osd: 3 osds: 3 up, 3 in

data:
pools: 2 pools, 200 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 2.9 TiB / 2.9 TiB avail
pgs: 100.000% pgs unknown
200 unknown

# ceph osd tree -f json-pretty

{
"nodes": [
{
"id": -1,
"name": "default",
"type": "root",
"type_id": 10,
"children": [
-3,
-5,
-7
]
},
{
"id": -7,
"name": "ip-10-8-10-108",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
2
]
},
{
"id": 2,
"device_class": "hdd",
"name": "osd.2",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
"depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity": 1.00
},
{
"id": -5,
"name": "ip-10-8-22-148",
"type": "host",
"type_id": 1,
"pool_weights": {},
"children": [
1
]
},
{
"id": 1,
"device_class": "hdd",
"name": "osd.1",
"type": "osd",
"type_id": 0,
"crush_weight": 0.976593,
"depth": 2,
"pool_weights": {},
"exists": 1,
"status": "up",
"reweight": 1.00,
"primary_affinity": 1.00
},
{
"id": -3,
"name": "ip-10-8-5-246",
"type": "host",
"type_id": 1,
 

Re: [ceph-users] Bright new cluster get all pgs stuck in inactive

2019-01-29 Thread Paul Emmerich
Your CRUSH rule specifies to select 3 different chassis but your CRUSH
ma defines no chassis.
Add buckets of type chassis or change the rule to select hosts.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Tue, Jan 29, 2019 at 7:40 PM PHARABOT Vincent
 wrote:
>
> Sorry JC, here is the correct osd crush rule dump (type=chassis instead of 
> host)
>
>
>
> # ceph osd crush rule dump
>
> [
>
> {
>
> "rule_id": 0,
>
> "rule_name": "replicated_rule",
>
> "ruleset": 0,
>
> "type": 1,
>
> "min_size": 1,
>
> "max_size": 10,
>
> "steps": [
>
> {
>
> "op": "take",
>
> "item": -1,
>
> "item_name": "default"
>
> },
>
> {
>
> "op": "chooseleaf_firstn",
>
> "num": 0,
>
> "type": "chassis"
>
> },
>
> {
>
> "op": "emit"
>
> }
>
> ]
>
> }
>
> ]
>
>
>
> De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de 
> PHARABOT Vincent
> Envoyé : mardi 29 janvier 2019 19:33
> À : Jean-Charles Lopez 
> Cc : ceph-users@lists.ceph.com
> Objet : Re: [ceph-users] Bright new cluster get all pgs stuck in inactive
>
>
>
> Thanks for the quick reply
>
>
>
> Here is the result
>
>
>
> # ceph osd crush rule dump
>
> [
>
> {
>
> "rule_id": 0,
>
> "rule_name": "replicated_rule",
>
> "ruleset": 0,
>
> "type": 1,
>
> "min_size": 1,
>
> "max_size": 10,
>
> "steps": [
>
> {
>
> "op": "take",
>
> "item": -1,
>
> "item_name": "default"
>
> },
>
> {
>
> "op": "chooseleaf_firstn",
>
> "num": 0,
>
> "type": "host"
>
> },
>
> {
>
>"op": "emit"
>
> }
>
> ]
>
> }
>
> ]
>
>
>
> De : Jean-Charles Lopez [mailto:jelo...@redhat.com]
> Envoyé : mardi 29 janvier 2019 19:30
> À : PHARABOT Vincent 
> Cc : ceph-users@lists.ceph.com
> Objet : Re: [ceph-users] Bright new cluster get all pgs stuck in inactive
>
>
>
> Hi,
>
>
>
> I suspect your generated CRUSH rule is incorret because of 
> osd_crush_cooseleaf_type=2 and by default chassis bucket are not created.
>
>
>
> Changing the type of bucket to host (osd_crush_cooseleaf_type=1 which is the 
> default when using old ceph-deploy or ceph-ansible) for your deployment 
> should fix the problem.
>
>
>
> Could you show the output of ceph osd crush rule dump to verify how the rule 
> was built
>
>
>
> JC
>
>
>
> On Jan 29, 2019, at 10:08, PHARABOT Vincent  wrote:
>
>
>
> Hello,
>
>
>
> I have a bright new cluster with 2 pools, but cluster keeps pgs in inactive 
> state.
>
> I have 3 OSDs and 1 Mon… all seems ok except I could not have pgs in 
> clean+active state !
>
>
>
> I might miss something obvious but I really don’t know what…. Someone could 
> help me ?
>
> I tried to seek answers among the list mail threads, but no luck, other 
> situation seems different
>
>
>
> Thank you a lot for your help
>
>
>
> Vincent
>
>
>
> # ceph -v
>
> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
>
>
>
> # ceph -s
>
> cluster:
>
> id: ff4c91fb-3c29-4d9f-a26f-467d6b6a712e
>
> health: HEALTH_WARN
>
> Reduced data availability: 200 pgs inactive
>
>
>
> services:
>
> mon: 1 daemons, quorum ip-10-8-66-123.eu-west-2.compute.internal
>
> mgr: ip-10-8-66-123.eu-west-2.compute.internal(active)
>
> osd: 3 osds: 3 up, 3 in
>
>
>
> data:
>
> pools: 2 pools, 200 pgs
>
> objects: 0 objects, 0 B
>
> usage: 3.0 GiB used, 2.9 TiB / 2.9 TiB avail
>
> pgs: 100.000% pgs unknown
>
> 200 unknown
>
>
>
> # ceph osd tree -f json-pretty
>
>
>
> {
>
> "nodes": [
>
> {
>
> "id": -1,
>
> "name": "default",
>
> "type": "root",
>
> "type_id": 10,
>
> "children": [
>
> -3,
>
> -5,
>
> -7
>
> ]
>
> },
>
> {
>
> "id": -7,
>
> "name": "ip-10-8-10-108",
>
> "type": "host",
>
> "type_id": 1,
>
> "pool_weights": {},
>
> "children": [
>
> 2
>
> ]
>
> },
>
> {
>
> "id": 2,
>
> "device_class": "hdd",
>
> "name": "osd.2",
>
> "type": "osd",
>
> "type_id": 0,
>
> "crush_weight": 0.976593,
>
> "depth": 2,
>
> "pool_weights": {},
>
> "exists": 1,
>
> "status": "up",
>
> "reweight": 1.00,
>
> "primary_affinity": 1.00
>
> },
>
> {
>
> "id": -5,
>
> "name": "ip-10-8-22-148",
>
> "type": "host",
>
> "type_id": 1,
>
> "pool_weights": {},
>
> "children": [
>
> 1
>
> ]
>
> },
>
> {
>
> "id": 1,
>
>

[ceph-users] OSDs stuck in preboot with log msgs about "osdmap fullness state needs update"

2019-01-29 Thread Subhachandra Chandra
Hello,

I have a bunch of OSDs stuck in the preboot stage with the following
log messages while recovering from an outage. The following flags are set
on the cluster

flags nodown,noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub

   How do we get these OSDs back to active state? Or will turning off
nodown or norecover get them back up?


2019-01-29 19:26:38.866134 7fc7e6682700 -1 osd.116 244652 osdmap fullness
state needs update

2019-01-29 19:26:40.370466 7fc7e6682700 -1 osd.116 244653 osdmap fullness
state needs update

2019-01-29 19:26:41.746553 7fc7e6682700 -1 osd.116 244654 osdmap fullness
state needs update


2019-01-29 19:26:38.934123 7fd91c6bb700 -1 osd.357 244652 osdmap fullness
state needs update

2019-01-29 19:26:40.473567 7fd91c6bb700 -1 osd.357 244653 osdmap fullness
state needs update

2019-01-29 19:26:41.776754 7fd91c6bb700 -1 osd.357 244654 osdmap fullness
state needs update



Thanks

Chandra

-- 


This email message, including attachments, may contain private, 
proprietary, or privileged information and is the confidential information 
and/or property of GRAIL, Inc., and is for the sole use of the intended 
recipient(s). Any unauthorized review, use, disclosure or distribution is 
strictly prohibited. If you are not the intended recipient, please contact 
the sender by reply email and destroy all copies of the original message.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multisite Ceph setup sync issue

2019-01-29 Thread Casey Bodley
On Tue, Jan 29, 2019 at 12:24 PM Krishna Verma  wrote:
>
> Hi Ceph Users,
>
>
>
> I need your to fix sync issue in multisite setup.
>
>
>
> I have 2 cluster in different datacenter that we want to use for 
> bidirectional data replication. By followed the documentation 
> http://docs.ceph.com/docs/master/radosgw/multisite/ I have setup the gateway 
> on each site but when I am checking the sync status its getting failed as 
> below:
>
>
>
> Admin node at master :
>
> [cephuser@vlno-ceph01 cluster]$ radosgw-admin data sync status
>
> ERROR: source zone not specified
>
> [cephuser@vlno-ceph01 cluster]$ radosgw-admin realm list
>
> {
>
> "default_info": "1102c891-d81c-480e-9487-c9f874287d13",
>
> "realms": [
>
> "georep",
>
> "geodata"
>
> ]
>
> }
>
>
>
> [cephuser@vlno-ceph01 cluster]$ radosgw-admin zonegroup list
>
> read_default_id : 0
>
> {
>
> "default_info": "74ad391b-fbca-4c05-b9e7-c90fd4851223",
>
> "zonegroups": [
>
> "noida"
>
> ]
>
> }
>
>
>
> [cephuser@vlno-ceph01 cluster]$ radosgw-admin zone list
>
> {
>
> "default_info": "71931e0e-1be6-449f-af34-edb4166c4e4a",
>
> "zones": [
>
> "noida1"
>
> ]
>
> }
>
>
>
> [cephuser@vlno-ceph01 cluster]$
>
>
>
> [cephuser@vlno-ceph01 cluster]$ cat ceph.conf
>
> [global]
>
> fsid = d52e50a4-ed2e-44cc-aa08-9309bc539a55
>
> mon_initial_members = vlno-ceph01
>
> mon_host = 172.23.16.67
>
> auth_cluster_required = cephx
>
> auth_service_required = cephx
>
> auth_client_required = cephx
>
> # Your network address
>
> public network = 172.23.16.0/24
>
> osd pool default size = 2
>
> rgw_override_bucket_index_max_shards = 100
>
> debug ms = 1
>
> debug rgw = 20
>
> [cephuser@vlno-ceph01 cluster]$
>
>
>
> On Master Gateway :
>
>
>
> [cephuser@zabbix-server ~]$ cat /etc/ceph/ceph.conf
>
> [global]
>
> fsid = d52e50a4-ed2e-44cc-aa08-9309bc539a55
>
> mon_initial_members = vlno-ceph01
>
> mon_host = 172.23.16.67
>
> auth_cluster_required = cephx
>
> auth_service_required = cephx
>
> auth_client_required = cephx
>
> # Your network address
>
> public network = 172.23.16.0/24
>
> osd pool default size = 2
>
> rgw_override_bucket_index_max_shards = 100
>
> debug ms = 1
>
> debug rgw = 20
>
> [client.rgw.zabbix-server]
>
> host = zabbix-server
>
> rgw frontends = "civetweb port=7480"
>
> rgw_zone=noida1
>
> [cephuser@zabbix-server ~]$
>
>
>
>
>
> On Secondary site admin node.
>
>
>
> [cephuser@vlsj-kverma1 cluster]$ radosgw-admin realm list
>
> {
>
> "default_info": "1102c891-d81c-480e-9487-c9f874287d13",
>
> "realms": [
>
> "georep"
>
> ]
>
> }
>
>
>
> [cephuser@vlsj-kverma1 cluster]$ radosgw-admin zonegroup list
>
> read_default_id : 0
>
> {
>
> "default_info": "74ad391b-fbca-4c05-b9e7-c90fd4851223",
>
> "zonegroups": [
>
> "noida",
>
> "default"
>
> ]
>
> }
>
>
>
> [cephuser@vlsj-kverma1 cluster]$ radosgw-admin zone list
>
> {
>
> "default_info": "45c690a8-f39c-4b1d-9faf-e0e991ceaaac",
>
> "zones": [
>
> "san-jose"
>
> ]
>
> }
>
>
>
> [cephuser@vlsj-kverma1 cluster]$
>
>
>
>
>
> [cephuser@vlsj-kverma1 cluster]$ cat ceph.conf
>
> [global]
>
> fsid = c626be3a-4536-48b9-8db8-470437052313
>
> mon_initial_members = vlsj-kverma1
>
> mon_host = 172.18.84.131
>
> auth_cluster_required = cephx
>
> auth_service_required = cephx
>
> auth_client_required = cephx
>
> # Your network address
>
> public network = 172.18.84.0/24
>
> osd pool default size = 2
>
> rgw_override_bucket_index_max_shards = 100
>
> debug ms = 1
>
> debug rgw = 20
>
>
>
>
>
> [cephuser@vlsj-kverma1 cluster]$
>
>
>
> [cephuser@vlsj-kverma1 cluster]$ radosgw-admin data sync status
>
> 2019-01-28 10:33:12.163298 7f11c24c79c0  1 Cannot find zone 
> id=45c690a8-f39c-4b1d-9faf-e0e991ceaaac (name=san-jose), switching to local 
> zonegroup configuration
>
> ERROR: source zone not specified
>
> [cephuser@vlsj-kverma1 cluster]$
>
>
>
> On Secondary site Gateway host:
>
>
>
> [cephuser@zabbix-client ceph]$ cat /etc/ceph/ceph.conf
>
> [global]
>
> fsid = c626be3a-4536-48b9-8db8-470437052313
>
> mon_initial_members = vlsj-kverma1
>
> mon_host = 172.18.84.131
>
> auth_cluster_required = cephx
>
> auth_service_required = cephx
>
> auth_client_required = cephx
>
> # Your network address
>
> public network = 172.18.84.0/24
>
> osd pool default size = 2
>
> rgw_override_bucket_index_max_shards = 100
>
> debug ms = 1
>
> debug rgw = 20
>
> [client.rgw.zabbix-client]
>
> host = zabbix-client
>
> rgw frontends = "civetweb port=7480"
>
> rgw_zone=san-jose
>
>
>
> [cephuser@zabbix-client ceph]$
>
>
>
>
>
>
>
> Appreciate any help in the setup.
>
>
>
> /Krishna
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

The 'radosgw-admin data sync status' command requires a --source-zone
argument, which is generally the zone name on the opposite cluster.
But you're probably just looking for the 'rado

[ceph-users] Best practice for increasing number of pg and pgp

2019-01-29 Thread Albert Yue
Dear Ceph Users,

As the number of OSDs increase in our cluster, we reach a point where
pg/osd is lower than recommend value and we want to increase it from 4096
to 8192.

Somebody recommends that this adjustment should be done in multiple stages,
e.g. increase 1024 pg each time. Is this a good practice? or should we
increase it to 8192 in one time. Thanks!

Best regards,
Albert
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best practice for increasing number of pg and pgp

2019-01-29 Thread Linh Vu
We use https://github.com/cernceph/ceph-scripts  ceph-gentle-split script to 
slowly increase by 16 pgs at a time until we hit the target.


From: ceph-users  on behalf of Albert Yue 

Sent: Wednesday, 30 January 2019 1:39:40 PM
To: ceph-users
Subject: [ceph-users] Best practice for increasing number of pg and pgp

Dear Ceph Users,

As the number of OSDs increase in our cluster, we reach a point where pg/osd is 
lower than recommend value and we want to increase it from 4096 to 8192.

Somebody recommends that this adjustment should be done in multiple stages, 
e.g. increase 1024 pg each time. Is this a good practice? or should we increase 
it to 8192 in one time. Thanks!

Best regards,
Albert
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-29 Thread Alexandre DERUMIER
Hi,

here some new results,
different osd/ different cluster

before osd restart latency was between 2-5ms
after osd restart is around 1-1.5ms

http://odisoweb1.odiso.net/cephperf2/bad.txt  (2-5ms)
http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms)
http://odisoweb1.odiso.net/cephperf2/diff.txt


From what I see in diff, the biggest difference is in tcmalloc, but maybe I'm 
wrong.

(I'm using tcmalloc 2.5-2.2)


- Mail original -
De: "Sage Weil" 
À: "aderumier" 
Cc: "ceph-users" , "ceph-devel" 

Envoyé: Vendredi 25 Janvier 2019 10:49:02
Objet: Re: ceph osd commit latency increase over time, until restart

Can you capture a perf top or perf record to see where teh CPU time is 
going on one of the OSDs wth a high latency? 

Thanks! 
sage 


On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 

> 
> Hi, 
> 
> I have a strange behaviour of my osd, on multiple clusters, 
> 
> All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, 
> workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd 
> export-diff/snapshotdelete each day for backup 
> 
> When the osd are refreshly started, the commit latency is between 0,5-1ms. 
> 
> But overtime, this latency increase slowly (maybe around 1ms by day), until 
> reaching crazy 
> values like 20-200ms. 
> 
> Some example graphs: 
> 
> http://odisoweb1.odiso.net/osdlatency1.png 
> http://odisoweb1.odiso.net/osdlatency2.png 
> 
> All osds have this behaviour, in all clusters. 
> 
> The latency of physical disks is ok. (Clusters are far to be full loaded) 
> 
> And if I restart the osd, the latency come back to 0,5-1ms. 
> 
> That's remember me old tcmalloc bug, but maybe could it be a bluestore memory 
> bug ? 
> 
> Any Hints for counters/logs to check ? 
> 
> 
> Regards, 
> 
> Alexandre 
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-29 Thread Stefan Priebe - Profihost AG
Hi,

Am 30.01.19 um 08:33 schrieb Alexandre DERUMIER:
> Hi,
> 
> here some new results,
> different osd/ different cluster
> 
> before osd restart latency was between 2-5ms
> after osd restart is around 1-1.5ms
> 
> http://odisoweb1.odiso.net/cephperf2/bad.txt  (2-5ms)
> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms)
> http://odisoweb1.odiso.net/cephperf2/diff.txt
> 
> From what I see in diff, the biggest difference is in tcmalloc, but maybe I'm 
> wrong.
> (I'm using tcmalloc 2.5-2.2)

currently i'm in the process of switching back from jemalloc to tcmalloc
like suggested. This report makes me a little nervous about my change.

Also i'm currently only monitoring latency for filestore osds. Which
exact values out of the daemon do you use for bluestore?

I would like to check if i see the same behaviour.

Greets,
Stefan

> 
> - Mail original -
> De: "Sage Weil" 
> À: "aderumier" 
> Cc: "ceph-users" , "ceph-devel" 
> 
> Envoyé: Vendredi 25 Janvier 2019 10:49:02
> Objet: Re: ceph osd commit latency increase over time, until restart
> 
> Can you capture a perf top or perf record to see where teh CPU time is 
> going on one of the OSDs wth a high latency? 
> 
> Thanks! 
> sage 
> 
> 
> On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 
> 
>>
>> Hi, 
>>
>> I have a strange behaviour of my osd, on multiple clusters, 
>>
>> All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, 
>> workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd 
>> export-diff/snapshotdelete each day for backup 
>>
>> When the osd are refreshly started, the commit latency is between 0,5-1ms. 
>>
>> But overtime, this latency increase slowly (maybe around 1ms by day), until 
>> reaching crazy 
>> values like 20-200ms. 
>>
>> Some example graphs: 
>>
>> http://odisoweb1.odiso.net/osdlatency1.png 
>> http://odisoweb1.odiso.net/osdlatency2.png 
>>
>> All osds have this behaviour, in all clusters. 
>>
>> The latency of physical disks is ok. (Clusters are far to be full loaded) 
>>
>> And if I restart the osd, the latency come back to 0,5-1ms. 
>>
>> That's remember me old tcmalloc bug, but maybe could it be a bluestore 
>> memory bug ? 
>>
>> Any Hints for counters/logs to check ? 
>>
>>
>> Regards, 
>>
>> Alexandre 
>>
>>
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Question regarding client-network

2019-01-29 Thread Buchberger, Carsten
Hi,

it might be dumb question - our ceph-cluster runs with dedicated client and 
cluster network.

I understand it like this : client -network is the network interface from where 
the client connections come from (from the mon & osd perspective), regardless 
of the source-ip-address.
So as long as there is ip-connectivity between the client, and the 
client-network ip -adressses of our ceph-cluster everything is fine ?
Or is the client-network on ceph-side some kind of acl, that denies access if 
the client does not originate from the defined network ? The latter one would 
be bad ;-)

Best regards
Carsten Buchberger


[20y_witcom]


WiTCOM bringt alle Wiesbadener Gewerbegebiete ans Glasfasernetz! Haben Sie 
Fragen zum Ausbau? Dann rufen Sie uns an: 0611-26244-135
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com