That's correct. It doesn't matter how many copies of the data you have in
each datacenter. The mons control the maps and you should be good as long
as you have 1 mon per DC. You should test this to see how the recovery
goes, but there shouldn't be a problem.

On Sat, Oct 7, 2017, 6:10 PM Дробышевский, Владимир <v...@itgorod.ru> wrote:

> 2017-10-08 2:02 GMT+05:00 Peter Linder <peter.lin...@fiberdirekt.se>:
>
>>
>> Then, I believe, the next best configuration would be to set size for
>> this pool to 4.  It would choose an NVMe as the primary OSD, and then
>> choose an HDD from each DC for the secondary copies.  This will guarantee
>> that a copy of the data goes into each DC and you will have 2 copies in
>> other DCs away from the primary NVMe copy.  It wastes a copy of all of the
>> data in the pool, but that's on the much cheaper HDD storage and can
>> probably be considered acceptable losses for the sake of having the primary
>> OSD on NVMe drives.
>>
>> I have considered this, and it should of course work when it works so to
>> say, but what if 1 datacenter is isolated while running? We would be left
>> with 2 running copies on each side for all PGs, with no way of knowing what
>> gets written where. In the end, data would be destoyed due to the split
>> brain. Even being able to enforce quorum where the SSD is would mean a
>> single point of failure.
>>
> In case you have one mon per DC all operations in the isolated DC will be
> frozen, so I believe you would not lose data.
>
>
>>
>>
>>
>> On Sat, Oct 7, 2017 at 3:36 PM Peter Linder <peter.lin...@fiberdirekt.se>
>> wrote:
>>
>>> On 10/7/2017 8:08 PM, David Turner wrote:
>>>
>>> Just to make sure you understand that the reads will happen on the
>>> primary osd for the PG and not the nearest osd, meaning that reads will go
>>> between the datacenters. Also that each write will not ack until all 3
>>> writes happen adding the latency to the writes and reads both.
>>>
>>>
>>> Yes, I understand this. It is actually fine, the datacenters have been
>>> selected so that they are about 10-20km apart. This yields around a 0.1 -
>>> 0.2ms round trip time due to speed of light being too low. Nevertheless,
>>> latency due to network shouldn't be a problem and it's all 40G (dedicated)
>>> TRILL network for the moment.
>>>
>>> I just want to be able to select 1 SSD and 2 HDDs, all spread out. I can
>>> do that, but one of the HDDs end up in the same datacenter, probably
>>> because I'm using the "take" command 2 times (resets selecting buckets?).
>>>
>>>
>>>
>>> On Sat, Oct 7, 2017, 1:48 PM Peter Linder <peter.lin...@fiberdirekt.se>
>>> wrote:
>>>
>>>> On 10/7/2017 7:36 PM, Дробышевский, Владимир wrote:
>>>>
>>>> Hello!
>>>>
>>>> 2017-10-07 19:12 GMT+05:00 Peter Linder <peter.lin...@fiberdirekt.se>:
>>>>
>>>> The idea is to select an nvme osd, and
>>>>> then select the rest from hdd osds in different datacenters (see crush
>>>>> map below for hierarchy).
>>>>>
>>>>> It's a little bit aside of the question, but why do you want to mix
>>>> SSDs and HDDs in the same pool? Do you have read-intensive workload and
>>>> going to use primary-affinity to get all reads from nvme?
>>>>
>>>>
>>>> Yes, this is pretty much the idea, getting the performance from NVMe
>>>> reads, while still maintaining triple redundancy and a reasonable cost.
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Vladimir
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> Regards,
> Vladimir
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to