I think this should give you a bit of isight on using large scale clusters. https://www.youtube.com/watch?v=NdGHE-yq1gU and https://www.youtube.com/watch?v=WpMzAFH6Mc4 . Watch the second video I think it more relates to your problem.
On Mon, Feb 25, 2019, 11:33 M Ranga Swami Reddy <swamire...@gmail.com> wrote: > We have taken care all HW recommendations, but missing that ceph mons > are VMs with good configuration (4 core, 64G RAM + 500G disk)... > Is this ceph-mon configuration might cause issues? > > On Sat, Feb 23, 2019 at 6:31 AM Anthony D'Atri <a...@dreamsnake.net> wrote: > > > > > > ? Did we start recommending that production mons run on a VM? I'd be > very hesitant to do that, though probably some folks do. > > > > I can say for sure that in the past (Firefly) I experienced outages > related to mons running on HDDs. That was a cluster of 450 HDD OSDs with > colo journals and hundreds of RBD clients. Something obscure about running > out of "global IDs" and not being able to create new ones fast enough. We > had to work around with a combo of lease settings on the mons and clients, > though with Hammer and later I would not expect that exact situation to > arise. Still it left me paranoid about mon DBs and HDDs. > > > > -- aad > > > > > > > > > > But ceph recommendation is to use VM (not even the HW node > > > recommended). will try to change the mon disk as SSD and HW node. > > > > > > On Fri, Feb 22, 2019 at 5:25 PM Darius Kasparavičius <daz...@gmail.com> > wrote: > > >> > > >> If your using hdd for monitor servers. Check their load. It might be > > >> the issue there. > > >> > > >> On Fri, Feb 22, 2019 at 1:50 PM M Ranga Swami Reddy > > >> <swamire...@gmail.com> wrote: > > >>> > > >>> ceph-mon disk with 500G with HDD (not journals/SSDs). Yes, mon use > > >>> folder on FS on a disk > > >>> > > >>> On Fri, Feb 22, 2019 at 5:13 PM David Turner <drakonst...@gmail.com> > wrote: > > >>>> > > >>>> Mon disks don't have journals, they're just a folder on a > filesystem on a disk. > > >>>> > > >>>> On Fri, Feb 22, 2019, 6:40 AM M Ranga Swami Reddy < > swamire...@gmail.com> wrote: > > >>>>> > > >>>>> ceph mons looks fine during the recovery. Using HDD with SSD > > >>>>> journals. with recommeded CPU and RAM numbers. > > >>>>> > > >>>>> On Fri, Feb 22, 2019 at 4:40 PM David Turner < > drakonst...@gmail.com> wrote: > > >>>>>> > > >>>>>> What about the system stats on your mons during recovery? If they > are having a hard time keeping up with requests during a recovery, I could > see that impacting client io. What disks are they running on? CPU? Etc. > > >>>>>> > > >>>>>> On Fri, Feb 22, 2019, 6:01 AM M Ranga Swami Reddy < > swamire...@gmail.com> wrote: > > >>>>>>> > > >>>>>>> Debug setting defaults are using..like 1/5 and 0/5 for almost.. > > >>>>>>> Shall I try with 0 for all debug settings? > > >>>>>>> > > >>>>>>> On Wed, Feb 20, 2019 at 9:17 PM Darius Kasparavičius < > daz...@gmail.com> wrote: > > >>>>>>>> > > >>>>>>>> Hello, > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Check your CPU usage when you are doing those kind of > operations. We > > >>>>>>>> had a similar issue where our CPU monitoring was reporting fine > < 40% > > >>>>>>>> usage, but our load on the nodes was high mid 60-80. If it's > possible > > >>>>>>>> try disabling ht and see the actual cpu usage. > > >>>>>>>> If you are hitting CPU limits you can try disabling crc on > messages. > > >>>>>>>> ms_nocrc > > >>>>>>>> ms_crc_data > > >>>>>>>> ms_crc_header > > >>>>>>>> > > >>>>>>>> And setting all your debug messages to 0. > > >>>>>>>> If you haven't done you can also lower your recovery settings a > little. > > >>>>>>>> osd recovery max active > > >>>>>>>> osd max backfills > > >>>>>>>> > > >>>>>>>> You can also lower your file store threads. > > >>>>>>>> filestore op threads > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> If you can also switch to bluestore from filestore. This will > also > > >>>>>>>> lower your CPU usage. I'm not sure that this is bluestore that > does > > >>>>>>>> it, but I'm seeing lower cpu usage when moving to bluestore + > rocksdb > > >>>>>>>> compared to filestore + leveldb . > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Wed, Feb 20, 2019 at 4:27 PM M Ranga Swami Reddy > > >>>>>>>> <swamire...@gmail.com> wrote: > > >>>>>>>>> > > >>>>>>>>> Thats expected from Ceph by design. But in our case, we are > using all > > >>>>>>>>> recommendation like rack failure domain, replication n/w,etc, > still > > >>>>>>>>> face client IO performance issues during one OSD down.. > > >>>>>>>>> > > >>>>>>>>> On Tue, Feb 19, 2019 at 10:56 PM David Turner < > drakonst...@gmail.com> wrote: > > >>>>>>>>>> > > >>>>>>>>>> With a RACK failure domain, you should be able to have an > entire rack powered down without noticing any major impact on the clients. > I regularly take down OSDs and nodes for maintenance and upgrades without > seeing any problems with client IO. > > >>>>>>>>>> > > >>>>>>>>>> On Tue, Feb 12, 2019 at 5:01 AM M Ranga Swami Reddy < > swamire...@gmail.com> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> Hello - I have a couple of questions on ceph cluster > stability, even > > >>>>>>>>>>> we follow all recommendations as below: > > >>>>>>>>>>> - Having separate replication n/w and data n/w > > >>>>>>>>>>> - RACK is the failure domain > > >>>>>>>>>>> - Using SSDs for journals (1:4ratio) > > >>>>>>>>>>> > > >>>>>>>>>>> Q1 - If one OSD down, cluster IO down drastically and > customer Apps impacted. > > >>>>>>>>>>> Q2 - what is stability ratio, like with above, is ceph > cluster > > >>>>>>>>>>> workable condition, if one osd down or one node down,etc. > > >>>>>>>>>>> > > >>>>>>>>>>> Thanks > > >>>>>>>>>>> Swami > > >>>>>>>>>>> _______________________________________________ > > >>>>>>>>>>> ceph-users mailing list > > >>>>>>>>>>> ceph-users@lists.ceph.com > > >>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >>>>>>>>> _______________________________________________ > > >>>>>>>>> ceph-users mailing list > > >>>>>>>>> ceph-users@lists.ceph.com > > >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com