Re: [ceph-users] [ceph-user ] HA and data recovery of CEPH

Romit Misra Fri, 29 Nov 2019 21:59:27 -0800

Hi, Peng,
  There are certain more observation that might help you further.


   1. If you are using multiple pools, separate the pools in terms of the
   crush mapping as well as if possible on the Hardware Hosted.
   2. It is not mandated to have all the pools separated, but say pools
   whose loads increase proportionally to the Client Load should be exercised
   3. A basic example is Separation of Index and Data
   4. If there is a multisite in play, you would want to possibly have a
   separate logpool mapping.
   5. When a host goes down, the number of PGS that go into peering is a
   direct proportion to the number of OSD, in turn pool hosted.
   6. Separating the Pools, also give a better control on per pool recovery.
   7. A single client Operation can be viewed as a DAG in term of requests
   on multiple pools. Any blocked operation on a specific pool, could slow
   down or block the entire request.
   8. When you say the service down time, you need to figure what is the
   metric you are looking at. In Object Storage it is the HTTP Response code,
   in other clients like CephFS, or RBD, there would be some SLA that you
   would be trying to maintain BAU.
   9. A faster way of speeding up peering is to set the "*norebalance and
   nobackfill*" flags, and let all the PGS move to "active + * " state.
   Post this unset the flags and let recovery proceed.
   10. AFAIK, as long as the PGS are in "*active + *" state, *the IO is
   bound to serve,
   11. In worse cases if your PG are taking longer time to move to active
   state, causing a service outage, you can try to set the min_size to 1, or a
   reduced number, so that the number of Peer exchanges at that particular
   instant reduce. Again this be a function of what tunebales you are for
   configuration of the ruleset.
   12. What is said in Point 9, is only applicable for Replicated Pool.
   13. There are certain recovery tunebables as well viz:
   -osd_recovery_max_active, osd_recovery_max_chunk, osd_max_push_objects,
   osd_max_backfill osd_recovery_max_single_start
   14. The tunebales mentioned above themselves control on the recovery
   throttles
   15. It has been observed that During Peering the OSD memory and CPU uses
   go high. You might want to double check if you are saturating on any of the
   compute or NW resources, causing longer recovery times.
   16. DO check your kernel tuneables as well, and make sure they are tuned
   to optimal settings
   17. The points I mentioned above are general practices that I have
   learnt, some of them may be applicable, some may not depending on your
   overall infrastructure and deployment

Hope this helps

Thanks
Romit Misra



On Sat, Nov 30, 2019 at 2:31 AM <ceph-users-requ...@lists.ceph.com> wrote:

> Send ceph-users mailing list submissions to
>         ceph-users@lists.ceph.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> or, via email, send a message with subject or body 'help' to
>         ceph-users-requ...@lists.ceph.com
>
> You can reach the person managing the list at
>         ceph-users-ow...@lists.ceph.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of ceph-users digest..."
>
>
> Today's Topics:
>
>    1. HA and data recovery of CEPH (Peng Bo)
>    2. Re: HA and data recovery of CEPH (Nathan Fish)
>    3. Re: HA and data recovery of CEPH (Peng Bo)
>    4. Re: HA and data recovery of CEPH (jes...@krogh.cc)
>    5. Re: HA and data recovery of CEPH (h...@portsip.cn)
>    6. Re: HA and data recovery of CEPH (Wido den Hollander)
>    7.  Can I add existing rgw users to a tenant (Wei Zhao)
>    8. Re: scrub errors on rgw data pool (M Ranga Swami Reddy)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 29 Nov 2019 11:50:20 +0800
> From: Peng Bo <pen...@portsip.com>
> To: ceph-users@lists.ceph.com
> Subject: [ceph-users] HA and data recovery of CEPH
> Message-ID:
>         <
> cabjnkz9gaqkepvntdb-ttsx_ebllpzooksfd1gcw0dh7f3p...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi all,
>
> We are working on use CEPH to build our HA system, the purpose is the
> system should always provide service even a node of CEPH is down or OSD is
> lost.
>
> Currently, as we practiced once a node/OSD is down, the CEPH cluster needs
> to take about 40 seconds to sync data, our system can't provide service
> during that.
>
> My questions:
>
>    - Does there have any way that we can reduce the data sync time?
>    - How can we let the CEPH keeps available once a node/OSD is down?
>
>
> BR
>
> --
> The modern Unified Communications provider
>
> https://www.portsip.com
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20191129/66c88d83/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Thu, 28 Nov 2019 23:57:24 -0500
> From: Nathan Fish <lordci...@gmail.com>
> To: Peng Bo <pen...@portsip.com>
> Cc: Ceph Users <ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] HA and data recovery of CEPH
> Message-ID:
>         <CAKJgeVa8OtV-5x6Mquk7XLPJ+hdW6=
> jjdrsxnftnckxpnf-...@mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> If correctly configured, your cluster should have zero downtime from a
> single OSD or node failure. What is your crush map? Are you using
> replica or EC? If your 'min_size' is not smaller than 'size', then you
> will lose availability.
>
> On Thu, Nov 28, 2019 at 10:50 PM Peng Bo <pen...@portsip.com> wrote:
> >
> > Hi all,
> >
> > We are working on use CEPH to build our HA system, the purpose is the
> system should always provide service even a node of CEPH is down or OSD is
> lost.
> >
> > Currently, as we practiced once a node/OSD is down, the CEPH cluster
> needs to take about 40 seconds to sync data, our system can't provide
> service during that.
> >
> > My questions:
> >
> > Does there have any way that we can reduce the data sync time?
> > How can we let the CEPH keeps available once a node/OSD is down?
> >
> >
> > BR
> >
> > --
> > The modern Unified Communications provider
> >
> > https://www.portsip.com
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 29 Nov 2019 13:21:44 +0800
> From: Peng Bo <pen...@portsip.com>
> To: Nathan Fish <lordci...@gmail.com>
> Cc: Ceph Users <ceph-users@lists.ceph.com>, h...@portsip.cn
> Subject: Re: [ceph-users] HA and data recovery of CEPH
> Message-ID:
>         <
> cabjnkz_q-ttfkxflnp2i_qkxksdycov+uyvqg31mwgewvem...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Nathan,
>
> Thanks for the help.
> My colleague will provide more details.
>
> BR
>
> On Fri, Nov 29, 2019 at 12:57 PM Nathan Fish <lordci...@gmail.com> wrote:
>
> > If correctly configured, your cluster should have zero downtime from a
> > single OSD or node failure. What is your crush map? Are you using
> > replica or EC? If your 'min_size' is not smaller than 'size', then you
> > will lose availability.
> >
> > On Thu, Nov 28, 2019 at 10:50 PM Peng Bo <pen...@portsip.com> wrote:
> > >
> > > Hi all,
> > >
> > > We are working on use CEPH to build our HA system, the purpose is the
> > system should always provide service even a node of CEPH is down or OSD
> is
> > lost.
> > >
> > > Currently, as we practiced once a node/OSD is down, the CEPH cluster
> > needs to take about 40 seconds to sync data, our system can't provide
> > service during that.
> > >
> > > My questions:
> > >
> > > Does there have any way that we can reduce the data sync time?
> > > How can we let the CEPH keeps available once a node/OSD is down?
> > >
> > >
> > > BR
> > >
> > > --
> > > The modern Unified Communications provider
> > >
> > > https://www.portsip.com
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> The modern Unified Communications provider
>
> https://www.portsip.com
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20191129/d714eeb8/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Fri, 29 Nov 2019 08:28:31 +0300
> From: jes...@krogh.cc
> To: Peng Bo <pen...@portsip.com>
> Cc: Ceph Users <ceph-users@lists.ceph.com>, h...@portsip.cn, Nathan
>         Fish <lordci...@gmail.com>
> Subject: Re: [ceph-users] HA and data recovery of CEPH
> Message-ID: <1575005311.764819...@f42.my.com>
> Content-Type: text/plain; charset="utf-8"
>
>
> Hi Nathan
>
> Is that true?
>
> The time it takes to reallocate the primary pg delivers ?downtime? by
> design. ?right? Seen from a writing clients perspective?
>
> Jesper
>
>
>
> Sent from myMail for iOS
>
>
> Friday, 29 November 2019, 06.24 +0100 from pen...@portsip.com  <
> pen...@portsip.com>:
> >Hi Nathan,?
> >
> >Thanks for the help.
> >My colleague will provide more details.
> >
> >BR
> >On Fri, Nov 29, 2019 at 12:57 PM Nathan Fish < lordci...@gmail.com >
> wrote:
> >>If correctly configured, your cluster should have zero downtime from a
> >>single OSD or node failure. What is your crush map? Are you using
> >>replica or EC? If your 'min_size' is not smaller than 'size', then you
> >>will lose availability.
> >>
> >>On Thu, Nov 28, 2019 at 10:50 PM Peng Bo < pen...@portsip.com > wrote:
> >>>
> >>> Hi all,
> >>>
> >>> We are working on use CEPH to build our HA system, the purpose is the
> system should always provide service even a node of CEPH is down or OSD is
> lost.
> >>>
> >>> Currently, as we practiced once a node/OSD is down, the CEPH cluster
> needs to take about 40 seconds to sync data, our system can't provide
> service during that.
> >>>
> >>> My questions:
> >>>
> >>> Does there have any way that we can reduce the data sync time?
> >>> How can we let the CEPH keeps available once a node/OSD is down?
> >>>
> >>>
> >>> BR
> >>>
> >>> --
> >>> The modern Unified Communications provider
> >>>
> >>>  https://www.portsip.com
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>>  ceph-users@lists.ceph.com
> >>>  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >--
> >The modern Unified Communications provider
> >
> >https://www.portsip.com
> >_______________________________________________
> >ceph-users mailing list
> >ceph-users@lists.ceph.com
> >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20191129/aa5f2242/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 5
> Date: Fri, 29 Nov 2019 14:23:01 +0800
> From: "h...@portsip.cn" <h...@portsip.cn>
> To: jesper <jes...@krogh.cc>,   "Peng Bo" <pen...@portsip.com>
> Cc: "Ceph Users" <ceph-users@lists.ceph.com>,  "Nathan Fish"
>         <lordci...@gmail.com>
> Subject: Re: [ceph-users] HA and data recovery of CEPH
> Message-ID: <2019112914230082973...@portsip.cn>+B11E2F5EFD0724AA
> Content-Type: text/plain; charset="utf-8"
>
> Hi Nathan
>
> We build a ceph cluster with 3 nodes.
> node-3: osd-2, mon-b,
> node-4: osd-0, mon-a, mds-myfs-a, mgr
> node-5: osd-1, mon-c, mds-myfs-b
>
> ceph cluster created by rook.
> Test phenomenon
> After one node unusual down(like direct poweroff), try to mount cephfs
> volume will spend more than 40 seconds.
> Normally Ceph Cluster Status:
> $ ceph status
>   cluster:
>     id:     776b5432-be9c-455f-bb2e-05cbf20d6f6a
>     health: HEALTH_OK
>
>   services:
>     mon: 3 daemons, quorum a,b,c (age 20h)
>     mgr: a(active, since 21h)
>     mds: myfs:1 {0=myfs-a=up:active} 1 up:standby
>     osd: 3 osds: 3 up (since 20h), 3 in (since 21h)
>
>   data:
>     pools:   2 pools, 136 pgs
>     objects: 2.59k objects, 330 MiB
>     usage:   25 GiB used, 125 GiB / 150 GiB avail
>     pgs:     136 active+clean
>
>   io:
>     client:   1.5 KiB/s wr, 0 op/s rd, 0 op/s wr
>
> Normally CephFS Status:
> $ ceph fs status
> myfs - 3 clients
> ====
> +------+--------+--------+---------------+-------+-------+
> | Rank | State  |  MDS   |    Activity   |  dns  |  inos |
> +------+--------+--------+---------------+-------+-------+
> |  0   | active | myfs-a | Reqs:    0 /s | 2250  | 2059  |
> +------+--------+--------+---------------+-------+-------+
> +---------------+----------+-------+-------+
> |      Pool     |   type   |  used | avail |
> +---------------+----------+-------+-------+
> | myfs-metadata | metadata |  208M | 39.1G |
> |   myfs-data0  |   data   |  121M | 39.1G |
> +---------------+----------+-------+-------+
> +-------------+
> | Standby MDS |
> +-------------+
> |    myfs-b   |
> +-------------+
> MDS version: ceph version 14.2.4
> (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus (stable)
>
> Are you using replica or EC?
>             => Not used EC
>
> 'min_size' is not smaller than 'size'?
> $ ceph osd dump | grep pool
> pool 1 'myfs-metadata' replicated size 3 min_size 2 crush_rule 1
> object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode warn last_change 16
> flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16
> recovery_priority 5 application cephfs
> pool 2 'myfs-data0' replicated size 3 min_size 2 crush_rule 2 object_hash
> rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 141 lfor
> 0/0/53 flags hashpspool stripe_width 0 application cephfs
>
> What is your crush map?
> $ ceph osd crush dump
> {
>     "devices": [
>         {
>             "id": 0,
>             "name": "osd.0",
>             "class": "hdd"
>         },
>         {
>             "id": 1,
>             "name": "osd.1",
>             "class": "hdd"
>         },
>         {
>             "id": 2,
>             "name": "osd.2",
>             "class": "hdd"
>         }
>     ],
>     "types": [
>         {
>             "type_id": 0,
>             "name": "osd"
>         },
>         {
>             "type_id": 1,
>             "name": "host"
>         },
>         {
>             "type_id": 2,
>             "name": "chassis"
>         },
>         {
>             "type_id": 3,
>             "name": "rack"
>         },
>         {
>             "type_id": 4,
>             "name": "row"
>         },
>         {
>             "type_id": 5,
>             "name": "pdu"
>         },
>         {
>             "type_id": 6,
>             "name": "pod"
>         },
>         {
>             "type_id": 7,
>             "name": "room"
>         },
>         {
>             "type_id": 8,
>             "name": "datacenter"
>         },
>         {
>             "type_id": 9,
>             "name": "zone"
>         },
>         {
>             "type_id": 10,
>             "name": "region"
>         },
>         {
>             "type_id": 11,
>             "name": "root"
>         }
>     ],
>     "buckets": [
>         {
>             "id": -1,
>             "name": "default",
>             "type_id": 11,
>             "type_name": "root",
>             "weight": 9594,
>             "alg": "straw2",
>             "hash": "rjenkins1",
>             "items": [
>                 {
>                     "id": -3,
>                     "weight": 3198,
>                     "pos": 0
>                 },
>                 {
>                     "id": -5,
>                     "weight": 3198,
>                     "pos": 1
>                 },
>                 {
>                     "id": -7,
>                     "weight": 3198,
>                     "pos": 2
>                 }
>             ]
>         },
>         {
>             "id": -2,
>             "name": "default~hdd",
>             "type_id": 11,
>             "type_name": "root",
>             "weight": 9594,
>             "alg": "straw2",
>             "hash": "rjenkins1",
>             "items": [
>                 {
>                     "id": -4,
>                     "weight": 3198,
>                     "pos": 0
>                 },
>                 {
>                     "id": -6,
>                     "weight": 3198,
>                     "pos": 1
>                 },
>                 {
>                     "id": -8,
>                     "weight": 3198,
>                     "pos": 2
>                 }
>             ]
>         },
>         {
>             "id": -3,
>             "name": "node-4",
>             "type_id": 1,
>             "type_name": "host",
>             "weight": 3198,
>             "alg": "straw2",
>             "hash": "rjenkins1",
>             "items": [
>                 {
>                     "id": 0,
>                     "weight": 3198,
>                     "pos": 0
>                 }
>             ]
>         },
>         {
>             "id": -4,
>             "name": "node-4~hdd",
>             "type_id": 1,
>             "type_name": "host",
>             "weight": 3198,
>             "alg": "straw2",
>             "hash": "rjenkins1",
>             "items": [
>                 {
>                     "id": 0,
>                     "weight": 3198,
>                     "pos": 0
>                 }
>             ]
>         },
>         {
>             "id": -5,
>             "name": "node-5",
>             "type_id": 1,
>             "type_name": "host",
>             "weight": 3198,
>             "alg": "straw2",
>             "hash": "rjenkins1",
>             "items": [
>                 {
>                     "id": 1,
>                     "weight": 3198,
>                     "pos": 0
>                 }
>             ]
>         },
>         {
>             "id": -6,
>             "name": "node-5~hdd",
>             "type_id": 1,
>             "type_name": "host",
>             "weight": 3198,
>             "alg": "straw2",
>             "hash": "rjenkins1",
>             "items": [
>                 {
>                     "id": 1,
>                     "weight": 3198,
>                     "pos": 0
>                 }
>             ]
>         },
>         {
>             "id": -7,
>             "name": "node-3",
>             "type_id": 1,
>             "type_name": "host",
>             "weight": 3198,
>             "alg": "straw2",
>             "hash": "rjenkins1",
>             "items": [
>                 {
>                     "id": 2,
>                     "weight": 3198,
>                     "pos": 0
>                 }
>             ]
>         },
>         {
>             "id": -8,
>             "name": "node-3~hdd",
>             "type_id": 1,
>             "type_name": "host",
>             "weight": 3198,
>             "alg": "straw2",
>             "hash": "rjenkins1",
>             "items": [
>                 {
>                     "id": 2,
>                     "weight": 3198,
>                     "pos": 0
>                 }
>             ]
>         }
>     ],
>     "rules": [
>         {
>             "rule_id": 0,
>             "rule_name": "replicated_rule",
>             "ruleset": 0,
>             "type": 1,
>             "min_size": 1,
>             "max_size": 10,
>             "steps": [
>                 {
>                     "op": "take",
>                     "item": -1,
>                     "item_name": "default"
>                 },
>                 {
>                     "op": "chooseleaf_firstn",
>                     "num": 0,
>                     "type": "host"
>                 },
>                 {
>                     "op": "emit"
>                 }
>             ]
>         },
>         {
>             "rule_id": 1,
>             "rule_name": "myfs-metadata",
>             "ruleset": 1,
>             "type": 1,
>             "min_size": 1,
>             "max_size": 10,
>             "steps": [
>                 {
>                     "op": "take",
>                     "item": -1,
>                     "item_name": "default"
>                 },
>                 {
>                     "op": "chooseleaf_firstn",
>                     "num": 0,
>                     "type": "host"
>                 },
>                 {
>                     "op": "emit"
>                 }
>             ]
>         },
>         {
>             "rule_id": 2,
>             "rule_name": "myfs-data0",
>             "ruleset": 2,
>             "type": 1,
>             "min_size": 1,
>             "max_size": 10,
>             "steps": [
>                 {
>                     "op": "take",
>                     "item": -1,
>                     "item_name": "default"
>                 },
>                 {
>                     "op": "chooseleaf_firstn",
>                     "num": 0,
>                     "type": "host"
>                 },
>                 {
>                     "op": "emit"
>                 }
>             ]
>         }
>     ],
>     "tunables": {
>         "choose_local_tries": 0,
>         "choose_local_fallback_tries": 0,
>         "choose_total_tries": 50,
>         "chooseleaf_descend_once": 1,
>         "chooseleaf_vary_r": 1,
>         "chooseleaf_stable": 1,
>         "straw_calc_version": 1,
>         "allowed_bucket_algs": 54,
>         "profile": "jewel",
>         "optimal_tunables": 1,
>         "legacy_tunables": 0,
>         "minimum_required_version": "jewel",
>         "require_feature_tunables": 1,
>         "require_feature_tunables2": 1,
>         "has_v2_rules": 0,
>         "require_feature_tunables3": 1,
>         "has_v3_rules": 0,
>         "has_v4_buckets": 1,
>         "require_feature_tunables5": 1,
>         "has_v5_rules": 0
>     },
>     "choose_args": {}
> }
>
> Question
> How can i mount CephFS volumn as soon as possible, after one node unusual
> down.Any ceph cluster(filesystem) configuration suggestion? Using EC?
>
> Best Regards
>
>
>
>
>
> h...@portsip.cn
>
> From: jesper
> Date: 2019-11-29 13:28
> To: Peng Bo
> CC: Ceph Users; hfx; Nathan Fish
> Subject: Re[2]: [ceph-users] HA and data recovery of CEPH
> Hi Nathan
>
> Is that true?
>
> The time it takes to reallocate the primary pg delivers ?downtime? by
> design.  right? Seen from a writing clients perspective
>
> Jesper
>
>
>
> Sent from myMail for iOS
>
>
> Friday, 29 November 2019, 06.24 +0100 from pen...@portsip.com <
> pen...@portsip.com>:
> Hi Nathan,
>
> Thanks for the help.
> My colleague will provide more details.
>
> BR
>
> On Fri, Nov 29, 2019 at 12:57 PM Nathan Fish <lordci...@gmail.com> wrote:
> If correctly configured, your cluster should have zero downtime from a
> single OSD or node failure. What is your crush map? Are you using
> replica or EC? If your 'min_size' is not smaller than 'size', then you
> will lose availability.
>
> On Thu, Nov 28, 2019 at 10:50 PM Peng Bo <pen...@portsip.com> wrote:
> >
> > Hi all,
> >
> > We are working on use CEPH to build our HA system, the purpose is the
> system should always provide service even a node of CEPH is down or OSD is
> lost.
> >
> > Currently, as we practiced once a node/OSD is down, the CEPH cluster
> needs to take about 40 seconds to sync data, our system can't provide
> service during that.
> >
> > My questions:
> >
> > Does there have any way that we can reduce the data sync time?
> > How can we let the CEPH keeps available once a node/OSD is down?
> >
> >
> > BR
> >
> > --
> > The modern Unified Communications provider
> >
> > https://www.portsip.com
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> The modern Unified Communications provider
>
> https://www.portsip.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20191129/55eda1e6/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 6
> Date: Fri, 29 Nov 2019 08:29:48 +0100
> From: Wido den Hollander <w...@42on.com>
> To: jes...@krogh.cc, Peng Bo <pen...@portsip.com>
> Cc: Ceph Users <ceph-users@lists.ceph.com>, h...@portsip.cn
> Subject: Re: [ceph-users] HA and data recovery of CEPH
> Message-ID: <7724520c-8659-bb86-050c-a1f90f7be...@42on.com>
> Content-Type: text/plain; charset=utf-8
>
>
>
> On 11/29/19 6:28 AM, jes...@krogh.cc wrote:
> > Hi Nathan
> >
> > Is that true?
> >
> > The time it takes to reallocate the primary pg delivers ?downtime? by
> > design. ?right? Seen from a writing clients perspective?
> >
>
> That is true. When an OSD goes down it will take a few seconds for it's
> Placement Groups to re-peer with the other OSDs. During that period
> writes to those PGs will stall for a couple of seconds.
>
> I wouldn't say it's 40s, but it can take ~10s.
>
> This is however by design. Consistency of data has a higher priority
> than availability inside Ceph.
>
> 'Nothing in this world is for free'. Keep that in mind.
>
> Wido
>
> > Jesper
> >
> >
> >
> > Sent from myMail for iOS
> >
> >
> > Friday, 29 November 2019, 06.24 +0100 from pen...@portsip.com
> > <pen...@portsip.com>:
> >
> >     Hi Nathan,?
> >
> >     Thanks for the help.
> >     My colleague will provide more details.
> >
> >     BR
> >
> >     On Fri, Nov 29, 2019 at 12:57 PM Nathan Fish <lordci...@gmail.com
> >     <mailto:lordci...@gmail.com>> wrote:
> >
> >         If correctly configured, your cluster should have zero downtime
> >         from a
> >         single OSD or node failure. What is your crush map? Are you using
> >         replica or EC? If your 'min_size' is not smaller than 'size',
> >         then you
> >         will lose availability.
> >
> >         On Thu, Nov 28, 2019 at 10:50 PM Peng Bo <pen...@portsip.com
> >         <mailto:pen...@portsip.com>> wrote:
> >         >
> >         > Hi all,
> >         >
> >         > We are working on use CEPH to build our HA system, the purpose
> >         is the system should always provide service even a node of CEPH
> >         is down or OSD is lost.
> >         >
> >         > Currently, as we practiced once a node/OSD is down, the CEPH
> >         cluster needs to take about 40 seconds to sync data, our system
> >         can't provide service during that.
> >         >
> >         > My questions:
> >         >
> >         > Does there have any way that we can reduce the data sync time?
> >         > How can we let the CEPH keeps available once a node/OSD is
> down?
> >         >
> >         >
> >         > BR
> >         >
> >         > --
> >         > The modern Unified Communications provider
> >         >
> >         > https://www.portsip.com
> >         > _______________________________________________
> >         > ceph-users mailing list
> >         > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> >         > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> >     --
> >     The modern Unified Communications provider
> >
> >     https://www.portsip.com
> >     _______________________________________________
> >     ceph-users mailing list
> >     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> ------------------------------
>
> Message: 7
> Date: Fri, 29 Nov 2019 16:26:27 +0800
> From: Wei Zhao <zhao6...@gmail.com>
> To: Ceph Users <ceph-users@lists.ceph.com>
> Subject: [ceph-users]  Can I add existing rgw users to a tenant
> Message-ID:
>         <
> cagoemcnfhzcqc15c9fqe-sm3khn9wxv+lq7ptts2hl7xgqs...@mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hello:
>     We want to use rgw tenant as a  group. But Can I add  existing rgw
> users to a new tenant ?
>
>
> ------------------------------
>
> Message: 8
> Date: Fri, 29 Nov 2019 15:45:40 +0530
> From: M Ranga Swami Reddy <swamire...@gmail.com>
> To: ceph-users <ceph-users@lists.ceph.com>, ceph-devel
>         <ceph-de...@vger.kernel.org>
> Subject: Re: [ceph-users] scrub errors on rgw data pool
> Message-ID:
>         <
> cana9uk75oulzlajbwqh+1sd5r5d4lmprej-gzyuuagdbezu...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Primary OSD crashes with below assert:
> 12.2.11/src/osd/ReplicatedBackend.cc:1445 assert(peer_missing.count(
> fromshard))
> ==
> here I have 2 OSDs with bluestore backend and 1 osd with filestore backend.
>
> On Mon, Nov 25, 2019 at 3:34 PM M Ranga Swami Reddy <swamire...@gmail.com>
> wrote:
>
> > Hello - We are using the ceph 12.2.11 version (upgraded from Jewel
> 10.2.12
> > to 12.2.11). In this cluster, we are having mix of filestore and
> bluestore
> > OSD backends.
> > Recently we are seeing the scrub errors on rgw buckets.data pool every
> > day, after scrub operation performed by Ceph. If we run the PG repair,
> the
> > errors will go way.
> >
> > Anyone seen the above issue?
> > Is the mix of filestore backend has bug/issue with 12.2.11 version (ie
> > Luminous).
> > Is the mix of filestore and bluestore OSDs cause this type of issue?
> >
> > Thanks
> > Swami
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20191129/ffa82f99/attachment-0001.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> ------------------------------
>
> End of ceph-users Digest, Vol 82, Issue 26
> ******************************************
>

-- 



*-----------------------------------------------------------------------------------------*


*This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error, please notify the 
system manager. This message contains confidential information and is 
intended only for the individual named. If you are not the named addressee, 
you should not disseminate, distribute or copy this email. Please notify 
the sender immediately by email if you have received this email by mistake 
and delete this email from your system. If you are not the intended 
recipient, you are notified that disclosing, copying, distributing or 
taking any action in reliance on the contents of this information is 
strictly prohibited.*****

 ****

*Any views or opinions presented in this 
email are solely those of the author and do not necessarily represent those 
of the organization. Any information on shares, debentures or similar 
instruments, recommended product pricing, valuations and the like are for 
information purposes only. It is not meant to be an instruction or 
recommendation, as the case may be, to buy or to sell securities, products, 
services nor an offer to buy or sell securities, products or services 
unless specifically stated to be so on behalf of the Flipkart group. 
Employees of the Flipkart group of companies are expressly required not to 
make defamatory statements and not to infringe or authorise any 
infringement of copyright or any other legal right by email communications. 
Any such communication is contrary to organizational policy and outside the 
scope of the employment of the individual concerned. The organization will 
not accept any liability in respect of such communication, and the employee 
responsible will be personally liable for any damages or other liability 
arising.*****

 ****

*Our organization accepts no liability for the 
content of this email, or for the consequences of any actions taken on the 
basis of the information *provided,* unless that information is 
subsequently confirmed in writing. If you are not the intended recipient, 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.*


_-----------------------------------------------------------------------------------------_

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [ceph-user ] HA and data recovery of CEPH

Reply via email to