Re: [ceph-users] Ceph cluster stability

Darius Kasparavičius Mon, 25 Feb 2019 03:24:43 -0800

I think this should give you a bit of isight on using large scale clusters.
https://www.youtube.com/watch?v=NdGHE-yq1gU and
https://www.youtube.com/watch?v=WpMzAFH6Mc4 . Watch the second video I
think it more relates to your problem.



On Mon, Feb 25, 2019, 11:33 M Ranga Swami Reddy <swamire...@gmail.com>
wrote:

> We have taken care all HW recommendations, but missing that ceph mons
> are VMs with good configuration (4 core, 64G RAM + 500G disk)...
> Is this ceph-mon configuration might cause issues?
>
> On Sat, Feb 23, 2019 at 6:31 AM Anthony D'Atri <a...@dreamsnake.net> wrote:
> >
> >
> > ? Did we start recommending that production mons run on a VM?  I'd be
> very hesitant to do that, though probably some folks do.
> >
> > I can say for sure that in the past (Firefly) I experienced outages
> related to mons running on HDDs.  That was a cluster of 450 HDD OSDs with
> colo journals and hundreds of RBD clients.  Something obscure about running
> out of "global IDs" and not being able to create new ones fast enough.  We
> had to work around with a combo of lease settings on the mons and clients,
> though with Hammer and later I would not expect that exact situation to
> arise.  Still it left me paranoid about mon DBs and HDDs.
> >
> > -- aad
> >
> >
> > >
> > > But ceph recommendation is to use VM (not even the  HW node
> > > recommended). will try to change the mon disk as SSD and HW node.
> > >
> > > On Fri, Feb 22, 2019 at 5:25 PM Darius Kasparavičius <daz...@gmail.com>
> wrote:
> > >>
> > >> If your using hdd for monitor servers. Check their load. It might be
> > >> the issue there.
> > >>
> > >> On Fri, Feb 22, 2019 at 1:50 PM M Ranga Swami Reddy
> > >> <swamire...@gmail.com> wrote:
> > >>>
> > >>> ceph-mon disk with 500G with HDD (not journals/SSDs).  Yes, mon use
> > >>> folder on FS on a disk
> > >>>
> > >>> On Fri, Feb 22, 2019 at 5:13 PM David Turner <drakonst...@gmail.com>
> wrote:
> > >>>>
> > >>>> Mon disks don't have journals, they're just a folder on a
> filesystem on a disk.
> > >>>>
> > >>>> On Fri, Feb 22, 2019, 6:40 AM M Ranga Swami Reddy <
> swamire...@gmail.com> wrote:
> > >>>>>
> > >>>>> ceph mons looks fine during the recovery.  Using  HDD with SSD
> > >>>>> journals. with recommeded CPU and RAM numbers.
> > >>>>>
> > >>>>> On Fri, Feb 22, 2019 at 4:40 PM David Turner <
> drakonst...@gmail.com> wrote:
> > >>>>>>
> > >>>>>> What about the system stats on your mons during recovery? If they
> are having a hard time keeping up with requests during a recovery, I could
> see that impacting client io. What disks are they running on? CPU? Etc.
> > >>>>>>
> > >>>>>> On Fri, Feb 22, 2019, 6:01 AM M Ranga Swami Reddy <
> swamire...@gmail.com> wrote:
> > >>>>>>>
> > >>>>>>> Debug setting defaults are using..like 1/5 and 0/5 for almost..
> > >>>>>>> Shall I try with 0 for all debug settings?
> > >>>>>>>
> > >>>>>>> On Wed, Feb 20, 2019 at 9:17 PM Darius Kasparavičius <
> daz...@gmail.com> wrote:
> > >>>>>>>>
> > >>>>>>>> Hello,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Check your CPU usage when you are doing those kind of
> operations. We
> > >>>>>>>> had a similar issue where our CPU monitoring was reporting fine
> < 40%
> > >>>>>>>> usage, but our load on the nodes was high mid 60-80. If it's
> possible
> > >>>>>>>> try disabling ht and see the actual cpu usage.
> > >>>>>>>> If you are hitting CPU limits you can try disabling crc on
> messages.
> > >>>>>>>> ms_nocrc
> > >>>>>>>> ms_crc_data
> > >>>>>>>> ms_crc_header
> > >>>>>>>>
> > >>>>>>>> And setting all your debug messages to 0.
> > >>>>>>>> If you haven't done you can also lower your recovery settings a
> little.
> > >>>>>>>> osd recovery max active
> > >>>>>>>> osd max backfills
> > >>>>>>>>
> > >>>>>>>> You can also lower your file store threads.
> > >>>>>>>> filestore op threads
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> If you can also switch to bluestore from filestore. This will
> also
> > >>>>>>>> lower your CPU usage. I'm not sure that this is bluestore that
> does
> > >>>>>>>> it, but I'm seeing lower cpu usage when moving to bluestore +
> rocksdb
> > >>>>>>>> compared to filestore + leveldb .
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Wed, Feb 20, 2019 at 4:27 PM M Ranga Swami Reddy
> > >>>>>>>> <swamire...@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Thats expected from Ceph by design. But in our case, we are
> using all
> > >>>>>>>>> recommendation like rack failure domain, replication n/w,etc,
> still
> > >>>>>>>>> face client IO performance issues during one OSD down..
> > >>>>>>>>>
> > >>>>>>>>> On Tue, Feb 19, 2019 at 10:56 PM David Turner <
> drakonst...@gmail.com> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> With a RACK failure domain, you should be able to have an
> entire rack powered down without noticing any major impact on the clients.
> I regularly take down OSDs and nodes for maintenance and upgrades without
> seeing any problems with client IO.
> > >>>>>>>>>>
> > >>>>>>>>>> On Tue, Feb 12, 2019 at 5:01 AM M Ranga Swami Reddy <
> swamire...@gmail.com> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>> Hello - I have a couple of questions on ceph cluster
> stability, even
> > >>>>>>>>>>> we follow all recommendations as below:
> > >>>>>>>>>>> - Having separate replication n/w and data n/w
> > >>>>>>>>>>> - RACK is the failure domain
> > >>>>>>>>>>> - Using SSDs for journals (1:4ratio)
> > >>>>>>>>>>>
> > >>>>>>>>>>> Q1 - If one OSD down, cluster IO down drastically and
> customer Apps impacted.
> > >>>>>>>>>>> Q2 - what is stability ratio, like with above, is ceph
> cluster
> > >>>>>>>>>>> workable condition, if one osd down or one node down,etc.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks
> > >>>>>>>>>>> Swami
> > >>>>>>>>>>> _______________________________________________
> > >>>>>>>>>>> ceph-users mailing list
> > >>>>>>>>>>> ceph-users@lists.ceph.com
> > >>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>>>>>>>> _______________________________________________
> > >>>>>>>>> ceph-users mailing list
> > >>>>>>>>> ceph-users@lists.ceph.com
> > >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph cluster stability

Reply via email to