That's correct. It doesn't matter how many copies of the data you have in each datacenter. The mons control the maps and you should be good as long as you have 1 mon per DC. You should test this to see how the recovery goes, but there shouldn't be a problem.
On Sat, Oct 7, 2017, 6:10 PM Дробышевский, Владимир <v...@itgorod.ru> wrote: > 2017-10-08 2:02 GMT+05:00 Peter Linder <peter.lin...@fiberdirekt.se>: > >> >> Then, I believe, the next best configuration would be to set size for >> this pool to 4. It would choose an NVMe as the primary OSD, and then >> choose an HDD from each DC for the secondary copies. This will guarantee >> that a copy of the data goes into each DC and you will have 2 copies in >> other DCs away from the primary NVMe copy. It wastes a copy of all of the >> data in the pool, but that's on the much cheaper HDD storage and can >> probably be considered acceptable losses for the sake of having the primary >> OSD on NVMe drives. >> >> I have considered this, and it should of course work when it works so to >> say, but what if 1 datacenter is isolated while running? We would be left >> with 2 running copies on each side for all PGs, with no way of knowing what >> gets written where. In the end, data would be destoyed due to the split >> brain. Even being able to enforce quorum where the SSD is would mean a >> single point of failure. >> > In case you have one mon per DC all operations in the isolated DC will be > frozen, so I believe you would not lose data. > > >> >> >> >> On Sat, Oct 7, 2017 at 3:36 PM Peter Linder <peter.lin...@fiberdirekt.se> >> wrote: >> >>> On 10/7/2017 8:08 PM, David Turner wrote: >>> >>> Just to make sure you understand that the reads will happen on the >>> primary osd for the PG and not the nearest osd, meaning that reads will go >>> between the datacenters. Also that each write will not ack until all 3 >>> writes happen adding the latency to the writes and reads both. >>> >>> >>> Yes, I understand this. It is actually fine, the datacenters have been >>> selected so that they are about 10-20km apart. This yields around a 0.1 - >>> 0.2ms round trip time due to speed of light being too low. Nevertheless, >>> latency due to network shouldn't be a problem and it's all 40G (dedicated) >>> TRILL network for the moment. >>> >>> I just want to be able to select 1 SSD and 2 HDDs, all spread out. I can >>> do that, but one of the HDDs end up in the same datacenter, probably >>> because I'm using the "take" command 2 times (resets selecting buckets?). >>> >>> >>> >>> On Sat, Oct 7, 2017, 1:48 PM Peter Linder <peter.lin...@fiberdirekt.se> >>> wrote: >>> >>>> On 10/7/2017 7:36 PM, Дробышевский, Владимир wrote: >>>> >>>> Hello! >>>> >>>> 2017-10-07 19:12 GMT+05:00 Peter Linder <peter.lin...@fiberdirekt.se>: >>>> >>>> The idea is to select an nvme osd, and >>>>> then select the rest from hdd osds in different datacenters (see crush >>>>> map below for hierarchy). >>>>> >>>>> It's a little bit aside of the question, but why do you want to mix >>>> SSDs and HDDs in the same pool? Do you have read-intensive workload and >>>> going to use primary-affinity to get all reads from nvme? >>>> >>>> >>>> Yes, this is pretty much the idea, getting the performance from NVMe >>>> reads, while still maintaining triple redundancy and a reasonable cost. >>>> >>>> >>>> -- >>>> Regards, >>>> Vladimir >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>> >>> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > > -- > Regards, > Vladimir > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com