Re: [ceph-users] The way to minimize osd memory usage?

Subhachandra Chandra Mon, 11 Dec 2017 10:08:35 -0800

I ran an experiment with 1GB memory per OSD using Bluestore. 12.2.2 made a
big difference.


In addition, you should have a look at your max object size. It looks like
you will see a jump in memory usage if a particular OSD happens to be the
primary for a number of objects being written in parallel. In our case
reducing the number of clients reduced memory requirements. Reducing max
object size should also reduce memory requirements on the OSD daemon.

Subhachandra



On Sun, Dec 10, 2017 at 1:01 PM, <ceph-users-requ...@lists.ceph.com> wrote:

> Send ceph-users mailing list submissions to
>         ceph-users@lists.ceph.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> or, via email, send a message with subject or body 'help' to
>         ceph-users-requ...@lists.ceph.com
>
> You can reach the person managing the list at
>         ceph-users-ow...@lists.ceph.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of ceph-users digest..."
>
>
> Today's Topics:
>
>    1. Re: RBD+LVM -> iSCSI -> VMWare (Donny Davis)
>    2. Re: RBD+LVM -> iSCSI -> VMWare (Brady Deetz)
>    3. Re: RBD+LVM -> iSCSI -> VMWare (Donny Davis)
>    4. Re: RBD+LVM -> iSCSI -> VMWare (Brady Deetz)
>    5. The way to minimize osd memory usage? (shadow_lin)
>    6. Re: The way to minimize osd memory usage? (Konstantin Shalygin)
>    7. Re: The way to minimize osd memory usage? (shadow_lin)
>    8. Random checksum errors (bluestore on Luminous) (Martin Preuss)
>    9. Re: The way to minimize osd memory usage? (David Turner)
>   10. what's the maximum number of OSDs per OSD server? (Igor Mendelev)
>   11. Re: what's the maximum number of OSDs per OSD server? (Nick Fisk)
>   12. Re: what's the maximum number of OSDs per OSD server?
>       (Igor Mendelev)
>   13. Re: RBD+LVM -> iSCSI -> VMWare (He?in Ejdesgaard M?ller)
>   14. Re: Random checksum errors (bluestore on Luminous) (Martin Preuss)
>   15. Re: what's the maximum number of OSDs per OSD server? (Nick Fisk)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 10 Dec 2017 00:26:39 +0000
> From: Donny Davis <do...@fortnebula.com>
> To: Brady Deetz <bde...@gmail.com>
> Cc: Aaron Glenn <agl...@laureateinstitute.org>, ceph-users
>         <ceph-us...@ceph.com>
> Subject: Re: [ceph-users] RBD+LVM -> iSCSI -> VMWare
> Message-ID:
>         <CAMHmko_35Y0pRqFp89MLJCi+6Uv9BMtF=Z71pkq8YDhDR0E3Mw@
> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Just curious but why not just use a hypervisor with rbd support? Are there
> VMware specific features you are reliant on?
>
> On Fri, Dec 8, 2017 at 4:08 PM Brady Deetz <bde...@gmail.com> wrote:
>
> > I'm testing using RBD as VMWare datastores. I'm currently testing with
> > krbd+LVM on a tgt target hosted on a hypervisor.
> >
> > My Ceph cluster is HDD backed.
> >
> > In order to help with write latency, I added an SSD drive to my
> hypervisor
> > and made it a writeback cache for the rbd via LVM. So far I've managed to
> > smooth out my 4k write latency and have some pleasing results.
> >
> > Architecturally, my current plan is to deploy an iSCSI gateway on each
> > hypervisor hosting that hypervisor's own datastore.
> >
> > Does anybody have any experience with this kind of configuration,
> > especially with regard to LVM writeback caching combined with RBD?
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171210/4f055103/attachment-0001.html>
>
> ------------------------------
>
> Message: 2
> Date: Sat, 9 Dec 2017 18:56:53 -0600
> From: Brady Deetz <bde...@gmail.com>
> To: Donny Davis <do...@fortnebula.com>
> Cc: Aaron Glenn <agl...@laureateinstitute.org>, ceph-users
>         <ceph-us...@ceph.com>
> Subject: Re: [ceph-users] RBD+LVM -> iSCSI -> VMWare
> Message-ID:
>         <CADU_9qV6VVVbzxdbEBCofvON-Or9sajS-E0j_22Wf=RdRycBwQ@
> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> We have over 150 VMs running in vmware. We also have 2PB of Ceph for
> filesystem. With our vmware storage aging and not providing the IOPs we
> need, we are considering and hoping to use ceph. Ultimately, yes we will
> move to KVM, but in the short term, we probably need to stay on VMware.
>
> On Dec 9, 2017 6:26 PM, "Donny Davis" <do...@fortnebula.com> wrote:
>
> > Just curious but why not just use a hypervisor with rbd support? Are
> there
> > VMware specific features you are reliant on?
> >
> > On Fri, Dec 8, 2017 at 4:08 PM Brady Deetz <bde...@gmail.com> wrote:
> >
> >> I'm testing using RBD as VMWare datastores. I'm currently testing with
> >> krbd+LVM on a tgt target hosted on a hypervisor.
> >>
> >> My Ceph cluster is HDD backed.
> >>
> >> In order to help with write latency, I added an SSD drive to my
> >> hypervisor and made it a writeback cache for the rbd via LVM. So far
> I've
> >> managed to smooth out my 4k write latency and have some pleasing
> results.
> >>
> >> Architecturally, my current plan is to deploy an iSCSI gateway on each
> >> hypervisor hosting that hypervisor's own datastore.
> >>
> >> Does anybody have any experience with this kind of configuration,
> >> especially with regard to LVM writeback caching combined with RBD?
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171209/8d02eb27/attachment-0001.html>
>
> ------------------------------
>
> Message: 3
> Date: Sun, 10 Dec 2017 01:09:39 +0000
> From: Donny Davis <do...@fortnebula.com>
> To: Brady Deetz <bde...@gmail.com>
> Cc: Aaron Glenn <agl...@laureateinstitute.org>, ceph-users
>         <ceph-us...@ceph.com>
> Subject: Re: [ceph-users] RBD+LVM -> iSCSI -> VMWare
> Message-ID:
>         <CAMHmko9bvQEcsPU3_crLeGkiiwtz5sY-WgGHTe3T2UjBqg4xPA@mail.gmail.
> com>
> Content-Type: text/plain; charset="utf-8"
>
> What I am getting at is that instead of sinking a bunch of time into this
> bandaid, why not sink that time into a hypervisor migration. Seems well
> timed if you ask me.
>
> There are even tools to make that migration easier
>
> http://libguestfs.org/virt-v2v.1.html
>
> You should ultimately move your hypervisor instead of building a one off
> case for ceph. Ceph works really well if you stay inside the box. So does
> KVM. They work like Gang Buster's together.
>
> I know that doesn't really answer your OP, but this is what I would do.
>
> ~D
>
> On Sat, Dec 9, 2017 at 7:56 PM Brady Deetz <bde...@gmail.com> wrote:
>
> > We have over 150 VMs running in vmware. We also have 2PB of Ceph for
> > filesystem. With our vmware storage aging and not providing the IOPs we
> > need, we are considering and hoping to use ceph. Ultimately, yes we will
> > move to KVM, but in the short term, we probably need to stay on VMware.
> > On Dec 9, 2017 6:26 PM, "Donny Davis" <do...@fortnebula.com> wrote:
> >
> >> Just curious but why not just use a hypervisor with rbd support? Are
> >> there VMware specific features you are reliant on?
> >>
> >> On Fri, Dec 8, 2017 at 4:08 PM Brady Deetz <bde...@gmail.com> wrote:
> >>
> >>> I'm testing using RBD as VMWare datastores. I'm currently testing with
> >>> krbd+LVM on a tgt target hosted on a hypervisor.
> >>>
> >>> My Ceph cluster is HDD backed.
> >>>
> >>> In order to help with write latency, I added an SSD drive to my
> >>> hypervisor and made it a writeback cache for the rbd via LVM. So far
> I've
> >>> managed to smooth out my 4k write latency and have some pleasing
> results.
> >>>
> >>> Architecturally, my current plan is to deploy an iSCSI gateway on each
> >>> hypervisor hosting that hypervisor's own datastore.
> >>>
> >>> Does anybody have any experience with this kind of configuration,
> >>> especially with regard to LVM writeback caching combined with RBD?
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171210/afb26767/attachment-0001.html>
>
> ------------------------------
>
> Message: 4
> Date: Sat, 9 Dec 2017 19:17:01 -0600
> From: Brady Deetz <bde...@gmail.com>
> To: Donny Davis <do...@fortnebula.com>
> Cc: Aaron Glenn <agl...@laureateinstitute.org>, ceph-users
>         <ceph-us...@ceph.com>
> Subject: Re: [ceph-users] RBD+LVM -> iSCSI -> VMWare
> Message-ID:
>         <CADU_9qXgqBODJc4pFGUoZuCeQfLk6d3nbhoKa4xxPKKuB6O2VA@mail.gmail.
> com>
> Content-Type: text/plain; charset="utf-8"
>
> That's not a bad position. I have concerns with what I'm proposing, so a
> hypervisor migration may actually bring less risk than a storage
> abomination.
>
> On Dec 9, 2017 7:09 PM, "Donny Davis" <do...@fortnebula.com> wrote:
>
> > What I am getting at is that instead of sinking a bunch of time into this
> > bandaid, why not sink that time into a hypervisor migration. Seems well
> > timed if you ask me.
> >
> > There are even tools to make that migration easier
> >
> > http://libguestfs.org/virt-v2v.1.html
> >
> > You should ultimately move your hypervisor instead of building a one off
> > case for ceph. Ceph works really well if you stay inside the box. So does
> > KVM. They work like Gang Buster's together.
> >
> > I know that doesn't really answer your OP, but this is what I would do.
> >
> > ~D
> >
> > On Sat, Dec 9, 2017 at 7:56 PM Brady Deetz <bde...@gmail.com> wrote:
> >
> >> We have over 150 VMs running in vmware. We also have 2PB of Ceph for
> >> filesystem. With our vmware storage aging and not providing the IOPs we
> >> need, we are considering and hoping to use ceph. Ultimately, yes we will
> >> move to KVM, but in the short term, we probably need to stay on VMware.
> >> On Dec 9, 2017 6:26 PM, "Donny Davis" <do...@fortnebula.com> wrote:
> >>
> >>> Just curious but why not just use a hypervisor with rbd support? Are
> >>> there VMware specific features you are reliant on?
> >>>
> >>> On Fri, Dec 8, 2017 at 4:08 PM Brady Deetz <bde...@gmail.com> wrote:
> >>>
> >>>> I'm testing using RBD as VMWare datastores. I'm currently testing with
> >>>> krbd+LVM on a tgt target hosted on a hypervisor.
> >>>>
> >>>> My Ceph cluster is HDD backed.
> >>>>
> >>>> In order to help with write latency, I added an SSD drive to my
> >>>> hypervisor and made it a writeback cache for the rbd via LVM. So far
> I've
> >>>> managed to smooth out my 4k write latency and have some pleasing
> results.
> >>>>
> >>>> Architecturally, my current plan is to deploy an iSCSI gateway on each
> >>>> hypervisor hosting that hypervisor's own datastore.
> >>>>
> >>>> Does anybody have any experience with this kind of configuration,
> >>>> especially with regard to LVM writeback caching combined with RBD?
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> ceph-users@lists.ceph.com
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171209/e19aa6ab/attachment-0001.html>
>
> ------------------------------
>
> Message: 5
> Date: Sun, 10 Dec 2017 11:35:33 +0800
> From: "shadow_lin"<shadow_...@163.com>
> To: "ceph-users"<ceph-users@lists.ceph.com>
> Subject: [ceph-users] The way to minimize osd memory usage?
> Message-ID: <229639cd.27d.1603e7dff17.coremail.shadow_...@163.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi All,
> I am testing running ceph luminous(12.2.1-249-g42172a4 (
> 42172a443183ffe6b36e85770e53fe678db293bf) on ARM server.
> The ARM server has a two cores@1.4GHz cpu and 2GB ram and I am running 2
> osd per ARM server with 2x8TB(or 2x10TB) hdd.
> Now I am facing constantly oom problem.I have tried upgrade ceph(to fix
> osd memroy leak problem) and lower the bluestore  cache setting.The oom
> problems did get better but still occurs constantly.
>
> I am hoping someone can gives me some advice of the follow questions.
>
> Is it impossible to run ceph in this config of hardware or Is it possible
> I can do some tunning the solve this problem(even to lose some performance
> to avoid the oom problem)?
>
> Is it a good idea to use raid0 to combine the 2 HDD into one so I can only
> run one osd to save some memory?
>
> How is memory usage of osd related to the size of HDD?
>
>
>
>
> PS:my ceph.conf bluestore cache setting
> [osd]
>         bluestore_cache_size = 104857600
>         bluestore_cache_kv_max = 67108864
>         osd client message size cap = 67108864
>
>
>
> 2017-12-10
>
>
>
> lin.yunfan
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171210/f096c25b/attachment-0001.html>
>
> ------------------------------
>
> Message: 6
> Date: Sun, 10 Dec 2017 11:29:23 +0700
> From: Konstantin Shalygin <k0...@k0ste.ru>
> To: ceph-users@lists.ceph.com
> Cc: shadow_lin <shadow_...@163.com>
> Subject: Re: [ceph-users] The way to minimize osd memory usage?
> Message-ID: <1836996d-95cb-4834-d202-c61502089...@k0ste.ru>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> > I am testing running ceph luminous(12.2.1-249-g42172a4 (
> 42172a443183ffe6b36e85770e53fe678db293bf) on ARM server.
> Try new 12.2.2 - this release should fix memory issues with Bluestore.
>
>
>
> ------------------------------
>
> Message: 7
> Date: Sun, 10 Dec 2017 12:33:36 +0800
> From: "shadow_lin"<shadow_...@163.com>
> To: "Konstantin Shalygin"<k0...@k0ste.ru>,
>         "ceph-users"<ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] The way to minimize osd memory usage?
> Message-ID: <51e6e209.4ac350.1603eb32924.coremail.shadow_...@163.com>
> Content-Type: text/plain; charset="utf-8"
>
> The 12.2.1(12.2.1-249-g42172a4 (42172a443183ffe6b36e85770e53fe678db293bf)
> we are running is with the memory issues fix.And we are working on to
> upgrade to 12.2.2 release to see if there is any furthermore improvement.
>
> 2017-12-10
>
>
> lin.yunfan
>
>
>
> ????Konstantin Shalygin <k0...@k0ste.ru>
> ?????2017-12-10 12:29
> ???Re: [ceph-users] The way to minimize osd memory usage?
> ????"ceph-users"<ceph-users@lists.ceph.com>
> ???"shadow_lin"<shadow_...@163.com>
>
> > I am testing running ceph luminous(12.2.1-249-g42172a4 (
> 42172a443183ffe6b36e85770e53fe678db293bf) on ARM server.
> Try new 12.2.2 - this release should fix memory issues with Bluestore.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171210/e5870ab8/attachment-0001.html>
>
> ------------------------------
>
> Message: 8
> Date: Sun, 10 Dec 2017 14:34:03 +0100
> From: Martin Preuss <mar...@aquamaniac.de>
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] Random checksum errors (bluestore on Luminous)
> Message-ID: <4e50b57f-5881-e806-bb10-0d1e16e05...@aquamaniac.de>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> I'm new to Ceph. I started a ceph cluster from scratch on DEbian 9,
> consisting of 3 hosts, each host has 3-4 OSDs (using 4TB hdds, currently
> totalling 10 hdds).
>
> Right from the start I always received random scrub errors telling me
> that some checksums didn't match the expected value, fixable with "ceph
> pg repair".
>
> I looked at the ceph-osd logfiles on each of the hosts and compared with
> the corresponding syslogs. I never found any hardware error, so there
> was no problem reading or writing a sector hardware-wise. Also there was
> never any other suspicious syslog entry around the time of checksum
> error reporting.
>
> When I looked at the checksum error entries I found that the reported
> bad checksum always was "0x6706be76".
>
> Could someone please tell me where to look further for the source of the
> problem?
>
> I appended an excerpt of the osd logs.
>
>
> Kind regards
> Martin
>
>
> --
> "Things are only impossible until they're not"
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: ceph-osd.log
> Type: text/x-log
> Size: 4645 bytes
> Desc: not available
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171210/460992fe/attachment-0001.bin>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 181 bytes
> Desc: OpenPGP digital signature
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171210/460992fe/attachment-0001.sig>
>
> ------------------------------
>
> Message: 9
> Date: Sun, 10 Dec 2017 15:05:16 +0000
> From: David Turner <drakonst...@gmail.com>
> To: shadow_lin <shadow_...@163.com>
> Cc: Konstantin Shalygin <k0...@k0ste.ru>, ceph-users
>         <ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] The way to minimize osd memory usage?
> Message-ID:
>         <CAN-GepK8nyqRzKTTo4AVmnTqLYuXLCcWdL_XC1LaGBPgQozQ_g@mail.gmail.
> com>
> Content-Type: text/plain; charset="utf-8"
>
> The docs recommend 1GB/TB of OSDs. I saw people asking if this was still
> accurate for bluestore and the answer was that it is more true for
> bluestore than filestore. There might be a way to get this working at the
> cost of performance. I would look at Linux kernel memory settings as much
> as ceph and bluestore settings. Cache pressure is one that comes to mind
> that an aggressive setting might help.
>
> On Sat, Dec 9, 2017, 11:33 PM shadow_lin <shadow_...@163.com> wrote:
>
> > The 12.2.1(12.2.1-249-g42172a4 (42172a443183ffe6b36e85770e53fe
> 678db293bf)
> > we are running is with the memory issues fix.And we are working on to
> > upgrade to 12.2.2 release to see if there is any furthermore improvement.
> >
> > 2017-12-10
> > ------------------------------
> > lin.yunfan
> > ------------------------------
> >
> > *????*Konstantin Shalygin <k0...@k0ste.ru>
> > *?????*2017-12-10 12:29
> > *???*Re: [ceph-users] The way to minimize osd memory usage?
> > *????*"ceph-users"<ceph-users@lists.ceph.com>
> > *???*"shadow_lin"<shadow_...@163.com>
> >
> >
> >
> > > I am testing running ceph luminous(12.2.1-249-g42172a4 (
> 42172a443183ffe6b36e85770e53fe678db293bf) on ARM server.
> > Try new 12.2.2 - this release should fix memory issues with Bluestore.
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171210/534133c9/attachment-0001.html>
>
> ------------------------------
>
> Message: 10
> Date: Sun, 10 Dec 2017 10:38:53 -0500
> From: Igor Mendelev <igm...@gmail.com>
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] what's the maximum number of OSDs per OSD
>         server?
> Message-ID:
>         <CAKtyfj_0NKQmPNO2C6CuU47xZhM_Xagm2WF4yLUdUhfSw2G7Qg@mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Given that servers with 64 CPU cores (128 threads @ 2.7GHz) and up to 2TB
> RAM - as well as 12TB HDDs - are easily available and somewhat reasonably
> priced I wonder what's the maximum number of OSDs per OSD server (if using
> 10TB or 12TB HDDs) and how much RAM does it really require if total storage
> capacity for such OSD server is on the order of 1,000+ TB - is it still 1GB
> RAM per TB of HDD or it could be less (during normal operations - and
> extended with NVMe SSDs swap space for extra space during recovery)?
>
> Are there any known scalability limits in Ceph Luminous (12.2.2 with
> BlueStore) and/or Linux that'll make such high capacity OSD server not
> scale well (using sequential IO speed per HDD as a metric)?
>
> Thanks.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171210/01aa76db/attachment-0001.html>
>
> ------------------------------
>
> Message: 11
> Date: Sun, 10 Dec 2017 16:17:40 -0000
> From: Nick Fisk <n...@fisk.me.uk>
> To: 'Igor Mendelev' <igm...@gmail.com>, ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] what's the maximum number of OSDs per OSD
>         server?
> Message-ID: <001d01d371d2$66f06de0$34d149a0$@fisk.me.uk>
> Content-Type: text/plain; charset="utf-8"
>
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Igor Mendelev
> Sent: 10 December 2017 15:39
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] what's the maximum number of OSDs per OSD server?
>
>
>
> Given that servers with 64 CPU cores (128 threads @ 2.7GHz) and up to 2TB
> RAM - as well as 12TB HDDs - are easily available and somewhat reasonably
> priced I wonder what's the maximum number of OSDs per OSD server (if using
> 10TB or 12TB HDDs) and how much RAM does it really require if total storage
> capacity for such OSD server is on the order of 1,000+ TB - is it still 1GB
> RAM per TB of HDD or it could be less (during normal operations - and
> extended with NVMe SSDs swap space for extra space during recovery)?
>
>
>
> Are there any known scalability limits in Ceph Luminous (12.2.2 with
> BlueStore) and/or Linux that'll make such high capacity OSD server not
> scale well (using sequential IO speed per HDD as a metric)?
>
>
>
> Thanks.
>
>
>
> How many total OSD?s will you have? If you are planning on having
> thousands then dense nodes might make sense. Otherwise you are leaving
> yourself open to having a few number of very large nodes, which will likely
> shoot you in the foot further down the line. Also don?t forget, unless this
> is purely for archiving, you will likely need to scale the networking up
> per node, 2x10G won?t cut it when you have 10-20+ disks per node.
>
>
>
> With Bluestore, you are probably looking at around 2-3GB of RAM per OSD,
> so say 4GB to be on the safe side.
>
> 7.2k HDD?s will likely only use a small proportion of a CPU core due to
> their limited IO potential. A would imagine that even with 90 bay JBOD?s,
> you will run into physical limitations before you hit CPU ones.
>
>
>
> Without knowing your exact requirements, I would suggest that larger
> number of smaller nodes, might be a better idea. If you choose your
> hardware right, you can often get the cost down to comparable levels by not
> going with top of the range kit. Ie Xeon E3?s or D?s vs dual socket E5?s.
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171210/3f1a50cf/attachment-0001.html>
>
> ------------------------------
>
> Message: 12
> Date: Sun, 10 Dec 2017 12:37:05 -0500
> From: Igor Mendelev <igm...@gmail.com>
> To: n...@fisk.me.uk, ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] what's the maximum number of OSDs per OSD
>         server?
> Message-ID:
>         <CAKtyfj-zCAPpPANb-5S6gXet+XYX33HhOC_65FP6HrTWBKFfDw@
> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Expected number of nodes for initial setup is 10-15 and of OSDs -
> 1,500-2,000.
>
> Networking is planned to be 2 100GbE or 2 dual 50GbE in x16 slots (per OSD
> node).
>
> JBODs are to be connected with 3-4 x8 SAS3 HBAs (4 4x SAS3 ports each)
>
> Choice of hardware is done considering (non-trivial) per-server sw
> licensing costs -
> so small (12-24 HDD) nodes are certainly not optimal regardless of CPUs
> cost (which
> is estimated to be below 10% of the total cost in the setup I'm currently
> considering).
>
> EC (4+2 or 8+3 etc - TBD) - not 3x replication - is planned to be used for
> most of the storage space.
>
> Main applications are expected to be archiving and sequential access to
> large (multiGB) files/objects.
>
> Nick, which physical limitations you're referring to ?
>
> Thanks.
>
> On Sun, Dec 10, 2017 at 11:17 AM, Nick Fisk <n...@fisk.me.uk> wrote:
>
> > *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> > Of *Igor Mendelev
> > *Sent:* 10 December 2017 15:39
> > *To:* ceph-users@lists.ceph.com
> > *Subject:* [ceph-users] what's the maximum number of OSDs per OSD server?
> >
> >
> >
> > Given that servers with 64 CPU cores (128 threads @ 2.7GHz) and up to 2TB
> > RAM - as well as 12TB HDDs - are easily available and somewhat reasonably
> > priced I wonder what's the maximum number of OSDs per OSD server (if
> using
> > 10TB or 12TB HDDs) and how much RAM does it really require if total
> storage
> > capacity for such OSD server is on the order of 1,000+ TB - is it still
> 1GB
> > RAM per TB of HDD or it could be less (during normal operations - and
> > extended with NVMe SSDs swap space for extra space during recovery)?
> >
> >
> >
> > Are there any known scalability limits in Ceph Luminous (12.2.2 with
> > BlueStore) and/or Linux that'll make such high capacity OSD server not
> > scale well (using sequential IO speed per HDD as a metric)?
> >
> >
> >
> > Thanks.
> >
> >
> >
> > How many total OSD?s will you have? If you are planning on having
> > thousands then dense nodes might make sense. Otherwise you are leaving
> > yourself open to having a few number of very large nodes, which will
> likely
> > shoot you in the foot further down the line. Also don?t forget, unless
> this
> > is purely for archiving, you will likely need to scale the networking up
> > per node, 2x10G won?t cut it when you have 10-20+ disks per node.
> >
> >
> >
> > With Bluestore, you are probably looking at around 2-3GB of RAM per OSD,
> > so say 4GB to be on the safe side.
> >
> > 7.2k HDD?s will likely only use a small proportion of a CPU core due to
> > their limited IO potential. A would imagine that even with 90 bay JBOD?s,
> > you will run into physical limitations before you hit CPU ones.
> >
> >
> >
> > Without knowing your exact requirements, I would suggest that larger
> > number of smaller nodes, might be a better idea. If you choose your
> > hardware right, you can often get the cost down to comparable levels by
> not
> > going with top of the range kit. Ie Xeon E3?s or D?s vs dual socket E5?s.
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171210/9c3b98f0/attachment-0001.html>
>
> ------------------------------
>
> Message: 13
> Date: Sun, 10 Dec 2017 17:38:30 +0000
> From: He?in Ejdesgaard M?ller  <h...@synack.fo>
> To: Brady Deetz <bde...@gmail.com>, Donny Davis <do...@fortnebula.com>
> Cc: Aaron Glenn <agl...@laureateinstitute.org>, ceph-users
>         <ceph-us...@ceph.com>
> Subject: Re: [ceph-users] RBD+LVM -> iSCSI -> VMWare
> Message-ID: <1512927510.642.70.ca...@synack.fo>
> Content-Type: text/plain; charset="UTF-8"
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Another option is to utilize the iscsi gateway, provided in 12.2
> http://docs.ceph.com/docs/master/rbd/iscsi-overview/
>
> Benefits:
> You can EOL your old SAN wtihout having to simultaneously migrate to
> another hypervisor.
> Any infrastructure that ties in to vSphere, is unaffected. (CEPH is just
> another set of datastores.)
> If you have the appropriate vmware licenses etc. then your move to CEPH
> can be done without any downtime.
>
> Drawback from my tests, using ceph-12.2-latest and ESXi-6.5, is that you
> get around 30% performance penalty, and the
> latency is higher, compared to a direct rbd mount.
>
>
> On ley, 2017-12-09 at 19:17 -0600, Brady Deetz wrote:
> > That's not a bad position. I have concerns with what I'm proposing, so a
> hypervisor migration may actually bring less
> > risk than a storage abomination.?
> >
> > On Dec 9, 2017 7:09 PM, "Donny Davis" <do...@fortnebula.com> wrote:
> > > What I am getting at is that instead of sinking a bunch of time into
> this bandaid, why not sink that time into a
> > > hypervisor migration. Seems well timed if you ask me.
> > >
> > > There are even tools to make that migration easier
> > >
> > > http://libguestfs.org/virt-v2v.1.html
> > >
> > > You should ultimately move your hypervisor instead of building a one
> off case for ceph. Ceph works really well if
> > > you stay inside the box. So does KVM. They work like Gang Buster's
> together.
> > >
> > > I know that doesn't really answer your OP, but this is what I would do.
> > >
> > > ~D
> > >
> > > On Sat, Dec 9, 2017 at 7:56 PM Brady Deetz <bde...@gmail.com> wrote:
> > > > We have over 150 VMs running in vmware. We also have 2PB of Ceph for
> filesystem. With our vmware storage aging and
> > > > not providing the IOPs we need, we are considering and hoping to use
> ceph. Ultimately, yes we will move to KVM,
> > > > but in the short term, we probably need to stay on VMware.?
> > > > On Dec 9, 2017 6:26 PM, "Donny Davis" <do...@fortnebula.com> wrote:
> > > > > Just curious but why not just use a hypervisor with rbd support?
> Are there VMware specific features you are
> > > > > reliant on??
> > > > >
> > > > > On Fri, Dec 8, 2017 at 4:08 PM Brady Deetz <bde...@gmail.com>
> wrote:
> > > > > > I'm testing using RBD?as VMWare datastores. I'm currently
> testing with krbd+LVM on a tgt target hosted on a
> > > > > > hypervisor.
> > > > > >
> > > > > > My Ceph cluster is HDD backed.
> > > > > >
> > > > > > In order to help with write latency, I added an SSD drive to my
> hypervisor and made it a writeback cache for
> > > > > > the rbd via LVM. So far I've managed to smooth out my 4k write
> latency and have some pleasing results.
> > > > > >
> > > > > > Architecturally, my current plan is to deploy an iSCSI gateway
> on each hypervisor hosting that hypervisor's
> > > > > > own datastore.
> > > > > >
> > > > > > Does anybody have any experience with this kind of
> configuration, especially with regard to LVM writeback
> > > > > > caching combined with RBD?
> > > > > > _______________________________________________
> > > > > > ceph-users mailing list
> > > > > > ceph-users@lists.ceph.com
> > > > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > > > >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> -----BEGIN PGP SIGNATURE-----
>
> iQIzBAEBCAAdFiEElZWfRQVsNukQFi9Ko80MCbT/An0FAlotcRYACgkQo80MCbT/
> An36fQ//ULP6gwd4qUbXG3yKBHqMtcsTV76+CfP8e3jcuEqyEzlCugoR10DXPELj
> TLCnrBp4fDP5gTd1zIHcU+PMPcVJ91dBYUWoMZrSLAraM0+7kvNQ9Nsacsl6CsiZ
> yq+506uOhwcLub55oLSpKgnaW1rEG6TAG/6TNIBGakb2a79iC1xev16S3lJ8V7zI
> cb3psUCePv/T753q/0E9B5SH9L5BiygsMT4DjiE09xGcFzH3lqkMWm2HMCFXNogI
> WbwqQVTTgk5Ch3oilz6cpOIqLK2VMkK0PPFXSGi1SAEjkw2c/XIBykB9MclVQn+8
> q5kO5g+uFcflEVnFhKTZknXVoOjrybhs4lMYmK4LJJ340Ay1uLyAlFdZdh+xAN3B
> 43QBKfcd1dL+EgKkMVuzGOaYOAqrFbh2/DN5rAz3l1YUy5h3OtjrXlNU/F7AkZfc
> +UECf9wa6M7uS6DqaPMVxtLhROyMnHw+Z6jrKz7V8EamUduxQyNwOxBNIJYDmKVC
> SHSkQi+oykPHWcOIXr1BNR2raaH1YVqXG+6mK8b6YV6sGtVeXA+KCa8RgrtabU3F
> tgDW8cPkeTcPYi5BOVZeQ2OSD90A6eiC4fJbMcWVbUQim+0gSY2paoC8Rk/HQkMF
> ug8xc9Os7SXe/wEOGQAzRHjDi16eKC9JghrS7dH4JLPg4gvBn4E=
> =auLW
> -----END PGP SIGNATURE-----
>
>
>
> ------------------------------
>
> Message: 14
> Date: Sun, 10 Dec 2017 19:45:31 +0100
> From: Martin Preuss <mar...@aquamaniac.de>
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Random checksum errors (bluestore on
>         Luminous)
> Message-ID: <f93ce725-a404-152e-700d-b847823b4...@aquamaniac.de>
> Content-Type: text/plain; charset="utf-8"
>
> Hi (again),
>
> meanwhile I tried
>
> "ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-0"
>
> but that resulted in a segfault (please see attached console log).
>
>
> Regards
> Martin
>
>
> Am 10.12.2017 um 14:34 schrieb Martin Preuss:
> > Hi,
> >
> > I'm new to Ceph. I started a ceph cluster from scratch on DEbian 9,
> > consisting of 3 hosts, each host has 3-4 OSDs (using 4TB hdds, currently
> > totalling 10 hdds).
> >
> > Right from the start I always received random scrub errors telling me
> > that some checksums didn't match the expected value, fixable with "ceph
> > pg repair".
> >
> > I looked at the ceph-osd logfiles on each of the hosts and compared with
> > the corresponding syslogs. I never found any hardware error, so there
> > was no problem reading or writing a sector hardware-wise. Also there was
> > never any other suspicious syslog entry around the time of checksum
> > error reporting.
> >
> > When I looked at the checksum error entries I found that the reported
> > bad checksum always was "0x6706be76".
> >
> > Could someone please tell me where to look further for the source of the
> > problem?
> >
> > I appended an excerpt of the osd logs.
> >
> >
> > Kind regards
> > Martin
> >
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> "Things are only impossible until they're not"
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: fsck.log
> Type: text/x-log
> Size: 4314 bytes
> Desc: not available
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171210/1a19349d/attachment-0001.bin>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: signature.asc
> Type: application/pgp-signature
> Size: 181 bytes
> Desc: OpenPGP digital signature
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171210/1a19349d/attachment-0001.sig>
>
> ------------------------------
>
> Message: 15
> Date: Sun, 10 Dec 2017 20:32:45 -0000
> From: Nick Fisk <n...@fisk.me.uk>
> To: 'Igor Mendelev' <igm...@gmail.com>, ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] what's the maximum number of OSDs per OSD
>         server?
> Message-ID: <002201d371f6$09a38040$1cea80c0$@fisk.me.uk>
> Content-Type: text/plain; charset="utf-8"
>
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Igor Mendelev
> Sent: 10 December 2017 17:37
> To: n...@fisk.me.uk; ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] what's the maximum number of OSDs per OSD server?
>
>
>
> Expected number of nodes for initial setup is 10-15 and of OSDs -
> 1,500-2,000.
>
>
>
> Networking is planned to be 2 100GbE or 2 dual 50GbE in x16 slots (per OSD
> node).
>
>
>
> JBODs are to be connected with 3-4 x8 SAS3 HBAs (4 4x SAS3 ports each)
>
>
>
> Choice of hardware is done considering (non-trivial) per-server sw
> licensing costs -
>
> so small (12-24 HDD) nodes are certainly not optimal regardless of CPUs
> cost (which
>
> is estimated to be below 10% of the total cost in the setup I'm currently
> considering).
>
>
>
> EC (4+2 or 8+3 etc - TBD) - not 3x replication - is planned to be used for
> most of the storage space.
>
>
>
> Main applications are expected to be archiving and sequential access to
> large (multiGB) files/objects.
>
>
>
> Nick, which physical limitations you're referring to ?
>
>
>
> Thanks.
>
>
>
>
>
> Hi Igor,
>
>
>
> I guess I meant physical annoyances rather than limitations. Being able to
> pull out a 1 or 2U node is always much less of a chore vs dealing with
> several U of SAS interconnected JBOD?s.
>
>
>
> If you have some license reason for larger nodes, then there is a very
> valid argument for larger nodes. Is this license cost  related in some way
> to Ceph (I thought Redhat was capacity based) or is this some sort of
> collocated software? Just make sure you size the nodes to a point that if
> one has to be taken offline for any reason, that you are happy with the
> resulting state of the cluster, including the peering when suddenly taking
> ~200 OSD?s offline/online.
>
>
>
> Nick
>
>
>
>
>
> On Sun, Dec 10, 2017 at 11:17 AM, Nick Fisk <n...@fisk.me.uk <mailto:
> n...@fisk.me.uk> > wrote:
>
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com <mailto:
> ceph-users-boun...@lists.ceph.com> ] On Behalf Of Igor Mendelev
> Sent: 10 December 2017 15:39
> To: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> Subject: [ceph-users] what's the maximum number of OSDs per OSD server?
>
>
>
> Given that servers with 64 CPU cores (128 threads @ 2.7GHz) and up to 2TB
> RAM - as well as 12TB HDDs - are easily available and somewhat reasonably
> priced I wonder what's the maximum number of OSDs per OSD server (if using
> 10TB or 12TB HDDs) and how much RAM does it really require if total storage
> capacity for such OSD server is on the order of 1,000+ TB - is it still 1GB
> RAM per TB of HDD or it could be less (during normal operations - and
> extended with NVMe SSDs swap space for extra space during recovery)?
>
>
>
> Are there any known scalability limits in Ceph Luminous (12.2.2 with
> BlueStore) and/or Linux that'll make such high capacity OSD server not
> scale well (using sequential IO speed per HDD as a metric)?
>
>
>
> Thanks.
>
>
>
> How many total OSD?s will you have? If you are planning on having
> thousands then dense nodes might make sense. Otherwise you are leaving
> yourself open to having a few number of very large nodes, which will likely
> shoot you in the foot further down the line. Also don?t forget, unless this
> is purely for archiving, you will likely need to scale the networking up
> per node, 2x10G won?t cut it when you have 10-20+ disks per node.
>
>
>
> With Bluestore, you are probably looking at around 2-3GB of RAM per OSD,
> so say 4GB to be on the safe side.
>
> 7.2k HDD?s will likely only use a small proportion of a CPU core due to
> their limited IO potential. A would imagine that even with 90 bay JBOD?s,
> you will run into physical limitations before you hit CPU ones.
>
>
>
> Without knowing your exact requirements, I would suggest that larger
> number of smaller nodes, might be a better idea. If you choose your
> hardware right, you can often get the cost down to comparable levels by not
> going with top of the range kit. Ie Xeon E3?s or D?s vs dual socket E5?s.
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> attachments/20171210/1e954b89/attachment-0001.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ------------------------------
>
> End of ceph-users Digest, Vol 59, Issue 9
> *****************************************
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] The way to minimize osd memory usage?

Reply via email to