Re: [ceph-users] CRUSH straw2 can not handle big weight differences

Gregory Farnum Mon, 29 Jan 2018 08:53:56 -0800

CRUSH is a pseudorandom, probabilistic algorithm. That can lead to problems
with extreme input.


In this case, you've given it a bucket in which one child contains ~3.3% of
the total weight, and there are only three weights. So on only 3% of
"draws", as it tries to choose a child bucket to descend into, will it
choose that small one first.
And then you've forced it to select...each of the hosts in that data
center, for all inputs? How can that even work in terms of actual data
storage, if some of them are an order of magnitude larger than the others?

Anyway, leaving that bit aside since it looks like you're mapping each host
to multiple DCs, you're giving CRUSH a very difficult problem to solve. You
can probably "fix" it by turning up the choose_retries value (or whatever
it is) to a high enough level that trying to map a PG eventually actually
grabs the small host. But I wouldn't be very confident in a solution like
this; it seems very fragile and subject to input error.
-Greg

On Mon, Jan 29, 2018 at 6:45 AM Peter Linder <peter.lin...@fiberdirekt.se>
wrote:

> We kind of turned the crushmap inside out a little bit.
>
> Instead of the traditional "for 1 PG, select OSDs from 3 separate data
> centers" we did "force selection from only one datacenter (out of 3) and
> leave enough options only to make sure precisely 1 SSD and 2 HDD are
> selected".
>
> We then organized these "virtual datacenters" in the hierachy so that
> one of them in fact contain 3 options that lead to 3 physically separate
> servers in different locations.
>
> Every physical datacenter has both SSD's and HDD's. The idea is that if
> one datacenter is lost, 2/3 of the SSD's still remain (and can be mapped
> to by marking the missing ones "out") so performance is maintained.
>
>
>
>
>
> Den 2018-01-29 kl. 13:35, skrev Niklas:
> > Yes.
> > It is a hybrid solution where a placement group is always located on
> > one NVMe drive and two HDD drives. Advantage is great read performance
> > and cost savings. Disadvantages is low write performance. Still the
> > write performance is good thanks to rockdb on Intel Optane disks in
> > HDD servers.
> >
> > Real world looks more like I described in a previous question
> > (2018-01-23) here on ceph-users list, "Ruleset for optimized Ceph
> > hybrid storage". Nobody answered so am guessing it is not possible to
> > create my wanted rule. Now am trying to solve it with virtual
> > datacenters in the crush map. Which works but maybe the the most
> > optimal solution.
> >
> >
> > On 2018-01-29 13:21, Wido den Hollander wrote:
> >>
> >>
> >> On 01/29/2018 01:14 PM, Niklas wrote:
> >>> ...
> >>>
> >>
> >> Is it your intention to put all copies of a object in only one DC?
> >>
> >> What is your exact idea behind this rule? What's the purpose?
> >>
> >> Wido
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CRUSH straw2 can not handle big weight differences

Reply via email to