Hi Vlad, No need for a specific CRUSH map configuration. I’d suggest you use the primary-affinity setting on the OSD so that only the OSDs that are close to your read point are are selected as primary.
See https://ceph.com/geen-categorie/ceph-primary-affinity/ for information Just set the primary affinity of all the OSDs in building 2 to 0. Only the OSDs in building 1 should then be used as primary OSDs. BR JC > On Nov 13, 2018, at 12:19, Vlad Kopylov <vladk...@gmail.com> wrote: > > Or is it possible to mount one OSD directly for read file access? > > v > > On Sun, Nov 11, 2018 at 1:47 PM Vlad Kopylov <vladk...@gmail.com > <mailto:vladk...@gmail.com>> wrote: > Maybe it is possible if done via gateway-nfs export? > Settings for gateway allow read osd selection? > > v > > On Sun, Nov 11, 2018 at 1:01 AM Martin Verges <martin.ver...@croit.io > <mailto:martin.ver...@croit.io>> wrote: > Hello Vlad, > > If you want to read from the same data, then it ist not possible (as far I > know). > > -- > Martin Verges > Managing director > > Mobile: +49 174 9335695 > E-Mail: martin.ver...@croit.io <mailto:martin.ver...@croit.io> > Chat: https://t.me/MartinVerges <https://t.me/MartinVerges> > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > > Web: https://croit.io <https://croit.io/> > YouTube: https://goo.gl/PGE1Bx <https://goo.gl/PGE1Bx> > Am Sa., 10. Nov. 2018, 03:47 hat Vlad Kopylov <vladk...@gmail.com > <mailto:vladk...@gmail.com>> geschrieben: > Maybe i missed something but FS is explicitly selecting pools to put files > and metadata, like I did below. > So if I create new pools - data in them will be different. If I apply the > rule dc1_primary to cfs_data pool, and client from dc3 connects to fs t01 - > it will start using dc1 hosts > > > ceph osd pool create cfs_data 100 > ceph osd pool create cfs_meta 100 > ceph fs new t01 cfs_data cfs_meta > sudo mount -t ceph ceph1:6789:/ /mnt/t01 -o > name=admin,secretfile=/home/mciadmin/admin.secret > > rule dc1_primary { > id 1 > type replicated > min_size 1 > max_size 10 > step take dc1 > step chooseleaf firstn 1 type host > step emit > step take dc2 > step chooseleaf firstn -2 type host > step emit > step take dc3 > step chooseleaf firstn -2 type host > step emit > } > > On Fri, Nov 9, 2018 at 9:32 PM Vlad Kopylov <vladk...@gmail.com > <mailto:vladk...@gmail.com>> wrote: > Just to confirm - it will still populate 3 copies in each datacenter? > Thought this map was to select where to write to, guess it does write > replication on the back end. > > I thought pools are completely separate and clients would not see each others > data? > > Thank you Martin! > > > > > On Fri, Nov 9, 2018 at 2:10 PM Martin Verges <martin.ver...@croit.io > <mailto:martin.ver...@croit.io>> wrote: > Hello Vlad, > > you can generate something like this: > > rule dc1_primary_dc2_secondary { > id 1 > type replicated > min_size 1 > max_size 10 > step take dc1 > step chooseleaf firstn 1 type host > step emit > step take dc2 > step chooseleaf firstn 1 type host > step emit > step take dc3 > step chooseleaf firstn -2 type host > step emit > } > > rule dc2_primary_dc1_secondary { > id 2 > type replicated > min_size 1 > max_size 10 > step take dc1 > step chooseleaf firstn 1 type host > step emit > step take dc2 > step chooseleaf firstn 1 type host > step emit > step take dc3 > step chooseleaf firstn -2 type host > step emit > } > > After you added such crush rules, you can configure the pools: > > ~ $ ceph osd pool set <pool_for_dc1> crush_ruleset 1 > ~ $ ceph osd pool set <pool_for_dc2> crush_ruleset 2 > > Now you place your workload from dc1 to the dc1 pool, and workload > from dc2 to the dc2 pool. You could also use HDD with SSD journal (if > your workload issn't that write intensive) and save some money in dc3 > as your client would always read from a SSD and write to Hybrid. > > Btw. all this could be done with a few simple clicks through our web > frontend. Even if you want to export it via CephFS / NFS / .. it is > possible to set it on a per folder level. Feel free to take a look at > https://www.youtube.com/watch?v=V33f7ipw9d4 > <https://www.youtube.com/watch?v=V33f7ipw9d4> to see how easy it could > be. > > -- > Martin Verges > Managing director > > Mobile: +49 174 9335695 > E-Mail: martin.ver...@croit.io <mailto:martin.ver...@croit.io> > Chat: https://t.me/MartinVerges <https://t.me/MartinVerges> > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > > Web: https://croit.io <https://croit.io/> > YouTube: https://goo.gl/PGE1Bx <https://goo.gl/PGE1Bx> > > > 2018-11-09 17:35 GMT+01:00 Vlad Kopylov <vladk...@gmail.com > <mailto:vladk...@gmail.com>>: > > Please disregard pg status, one of test vms was down for some time it is > > healing. > > Question only how to make it read from proper datacenter > > > > If you have an example. > > > > Thanks > > > > > > On Fri, Nov 9, 2018 at 11:28 AM Vlad Kopylov <vladk...@gmail.com > > <mailto:vladk...@gmail.com>> wrote: > >> > >> Martin, thank you for the tip. > >> googling ceph crush rule examples doesn't give much on rules, just static > >> placement of buckets. > >> this all seems to be for placing data, not to giving client in specific > >> datacenter proper read osd > >> > >> maybe something wrong with placement groups? > >> > >> I added datacenter dc1 dc2 dc3 > >> Current replicated_rule is > >> > >> rule replicated_rule { > >> id 0 > >> type replicated > >> min_size 1 > >> max_size 10 > >> step take default > >> step chooseleaf firstn 0 type host > >> step emit > >> } > >> > >> # buckets > >> host ceph1 { > >> id -3 # do not change unnecessarily > >> id -2 class ssd # do not change unnecessarily > >> # weight 1.000 > >> alg straw2 > >> hash 0 # rjenkins1 > >> item osd.0 weight 1.000 > >> } > >> datacenter dc1 { > >> id -9 # do not change unnecessarily > >> id -4 class ssd # do not change unnecessarily > >> # weight 1.000 > >> alg straw2 > >> hash 0 # rjenkins1 > >> item ceph1 weight 1.000 > >> } > >> host ceph2 { > >> id -5 # do not change unnecessarily > >> id -6 class ssd # do not change unnecessarily > >> # weight 1.000 > >> alg straw2 > >> hash 0 # rjenkins1 > >> item osd.1 weight 1.000 > >> } > >> datacenter dc2 { > >> id -10 # do not change unnecessarily > >> id -8 class ssd # do not change unnecessarily > >> # weight 1.000 > >> alg straw2 > >> hash 0 # rjenkins1 > >> item ceph2 weight 1.000 > >> } > >> host ceph3 { > >> id -7 # do not change unnecessarily > >> id -12 class ssd # do not change unnecessarily > >> # weight 1.000 > >> alg straw2 > >> hash 0 # rjenkins1 > >> item osd.2 weight 1.000 > >> } > >> datacenter dc3 { > >> id -11 # do not change unnecessarily > >> id -13 class ssd # do not change unnecessarily > >> # weight 1.000 > >> alg straw2 > >> hash 0 # rjenkins1 > >> item ceph3 weight 1.000 > >> } > >> root default { > >> id -1 # do not change unnecessarily > >> id -14 class ssd # do not change unnecessarily > >> # weight 3.000 > >> alg straw2 > >> hash 0 # rjenkins1 > >> item dc1 weight 1.000 > >> item dc2 weight 1.000 > >> item dc3 weight 1.000 > >> } > >> > >> > >> #ceph pg dump > >> dumped all > >> version 29433 > >> stamp 2018-11-09 11:23:44.510872 > >> last_osdmap_epoch 0 > >> last_pg_scan 0 > >> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG > >> DISK_LOG STATE STATE_STAMP VERSION > >> REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP > >> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN > >> 1.5f 0 0 0 0 0 0 > >> 0 0 active+clean 2018-11-09 04:35:32.320607 0'0 > >> 544:1317 [0,2,1] 0 [0,2,1] 0 0'0 2018-11-09 > >> 04:35:32.320561 0'0 2018-11-04 11:55:54.756115 0 > >> 2.5c 143 0 143 0 0 19490267 > >> 461 461 active+undersized+degraded 2018-11-08 19:02:03.873218 508'461 > >> 544:2100 [2,1] 2 [2,1] 2 290'380 2018-11-07 > >> 18:58:43.043719 64'120 2018-11-05 14:21:49.256324 0 > >> ..... > >> sum 15239 0 2053 2659 0 2157615019 58286 58286 > >> OSD_STAT USED AVAIL TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM > >> 2 3.7 GiB 28 GiB 32 GiB [0,1] 200 73 > >> 1 3.7 GiB 28 GiB 32 GiB [0,2] 200 58 > >> 0 3.7 GiB 28 GiB 32 GiB [1,2] 173 69 > >> sum 11 GiB 85 GiB 96 GiB > >> > >> #ceph pg map 2.5c > >> osdmap e545 pg 2.5c (2.5c) -> up [2,1] acting [2,1] > >> > >> #pg map 1.5f > >> osdmap e547 pg 1.5f (1.5f) -> up [0,2,1] acting [0,2,1] > >> > >> > >> On Fri, Nov 9, 2018 at 2:21 AM Martin Verges <martin.ver...@croit.io > >> <mailto:martin.ver...@croit.io>> > >> wrote: > >>> > >>> Hello Vlad, > >>> > >>> Ceph clients connect to the primary OSD of each PG. If you create a > >>> crush rule for building1 and one for building2 that takes a OSD from > >>> the same building as the first one, your reads to the pool will always > >>> be on the same building (if the cluster is healthy) and only write > >>> request get replicated to the other building. > >>> > >>> -- > >>> Martin Verges > >>> Managing director > >>> > >>> Mobile: +49 174 9335695 > >>> E-Mail: martin.ver...@croit.io <mailto:martin.ver...@croit.io> > >>> Chat: https://t.me/MartinVerges <https://t.me/MartinVerges> > >>> > >>> croit GmbH, Freseniusstr. 31h, 81247 Munich > >>> CEO: Martin Verges - VAT-ID: DE310638492 > >>> Com. register: Amtsgericht Munich HRB 231263 > >>> > >>> Web: https://croit.io <https://croit.io/> > >>> YouTube: https://goo.gl/PGE1Bx <https://goo.gl/PGE1Bx> > >>> > >>> > >>> 2018-11-09 4:54 GMT+01:00 Vlad Kopylov <vladk...@gmail.com > >>> <mailto:vladk...@gmail.com>>: > >>> > I am trying to test replicated ceph with servers in different > >>> > buildings, and > >>> > I have a read problem. > >>> > Reads from one building go to osd in another building and vice versa, > >>> > making > >>> > reads slower then writes! Making read as slow as slowest node. > >>> > > >>> > Is there a way to > >>> > - disable parallel read (so it reads only from the same osd node where > >>> > mon > >>> > is); > >>> > - or give each client read restriction per osd? > >>> > - or maybe strictly specify read osd on mount; > >>> > - or have node read delay cap (for example if node time out is larger > >>> > then 2 > >>> > ms then do not use such node for read as other replicas are available). > >>> > - or ability to place Clients on the Crush map - so it understands that > >>> > osd > >>> > in - for example osd in the same data-center as client has preference, > >>> > and > >>> > pull data from it/them. > >>> > > >>> > Mounting with kernel client latest mimic. > >>> > > >>> > Thank you! > >>> > > >>> > Vlad > >>> > > >>> > _______________________________________________ > >>> > ceph-users mailing list > >>> > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>> > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> > >>> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com