[ceph-users] question about crushmap
Dear list, we are new to ceph and we are planning to install a ceph cluster over two datacenters. The situation is: DC1: 2 racks DC2: 1 racks We want to have one replica per rack and more generally two replicas in the first DC and one in the other one. So now we are stuck on the crushmap: how to force the cluster to put two replicas in the first dc? Is that related to th bucket's weight? These are the buckets we are using: 1) DC 2) Rack 3) Server Do you think is enough? Any kind of help is really appreciated. Regards Simone Spinelli -- Simone Spinelli Università di Pisa Direzione ICT - Servizi di Rete ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] HEALTH_WARN and PGs out of buckets
Dear list, Our ceph cluster (ceph version 0.87) is stuck in a warning state with some OSDs out of their original bucket: health HEALTH_WARN 1097 pgs degraded; 15 pgs peering; 1 pgs recovering; 1097 pgs stuck degraded; 16 pgs stuck inactive; 26148 pgs stuck unclean; 1096 pgs stuck undersized; 1096 pgs undersized; 4 requests are blocked > 32 sec; recovery 101465/6016350 objects degraded (1.686%); 1691712/6016350 objects misplaced (28.119%) monmap e2: 3 mons at {mon1-r2-ser=172.19.14.130:6789/0,mon1-r3-ser=172.19.14.150:6789/0,mon1-rc3-fib=172.19.14.170:6789/0}, election epoch 82, quorum 0,1,2 mon1-r2-ser,mon1-r3-ser,mon1-rc3-fib osdmap e15358: 144 osds: 143 up, 143 in pgmap v12209990: 38816 pgs, 16 pools, 8472 GB data, 1958 kobjects 25821 GB used, 234 TB / 259 TB avail 101465/6016350 objects degraded (1.686%); 1691712/6016350 objects misplaced (28.119%) 620 active 12668 active+clean 15 peering 395 active+undersized+degraded+remapped 1 active+recovering+degraded 24416 active+remapped 1 undersized+degraded 700 active+undersized+degraded client io 0 B/s rd, 40557 B/s wr, 13 op/s Yesterday it was just in a warning state with some PG stuck unclean and some requests blocked. As I restarted one of the OSD involved, a recovery process started and some OSD went down and then up and some others where put out of their original bucket: # idweight type name up/down reweight -1 262.1 root default -15 80.08 datacenter fibonacci -16 80.08 rack rack-c03-fib -35 83.72 datacenter ingegneria -31 0 rack rack-01-ing -32 0 rack rack-02-ing -33 0 rack rack-03-ing -34 0 rack rack-04-ing -18 83.72 rack rack-03-ser -13 20.02 host-high-end cnode1-r3-ser 124 1.82osd.124 up 1 126 1.82osd.126 up 1 128 1.82osd.128 up 1 133 1.82osd.133 up 1 135 1.82osd.135 up 1 145 1.82osd.145 up 1 146 1.82osd.146 up 1 147 1.82osd.147 up 1 148 1.82osd.148 up 1 5 1.82osd.5 up 1 150 1.82osd.150 up 1 153 1.82osd.153 up 1 80 1.82osd.80 up 1 24 1.82osd.24 up 1 131 1.82osd.131 up 1 Now, if I put by hand the OSD in its own bucket it works, but I have some concerns: why the recovery process is stopped? The cluster is almost empty so there is space to recover data even without 6 OSD. Did anyone already experience this? Any advice for what to search? Any help is appreciated. Regards Simone -- Simone Spinelli Università di Pisa Settore Rete, Telecomunicazioni e Fonia - Serra Direzione Edilizia e Telecomunicazioni ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radosGW balancer best practices
Dear all, we are going to add rados-gw to our ceph cluster (144 OSD on 12 servers + 3 monitors connected via 10giga network) and we have a couple of questions. The first question is about the load balancer, do you have some advice based on real-world experience? Second question is about the number of gateway instances: is it better to have many little&giga-connected servers or less fat&10giga-connected servers considering that the total bandwidth available is 10 giga anyway? Do you use real or virtual servers? Any advice in terms of performances and reliability? Many thanks! Simone -- Simone Spinelli Università di Pisa Direzione ICT - Servizi di Rete ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph as a primary storage for owncloud
Dear all, we would like to use ceph as a a primary (object) storage for owncloud. Did anyone already do this? I mean: is that actually possible or am I wrong? As I understood I have to use radosGW in swift "flavor", but what about s3 flavor? I cannot find anything "official" so hence my question. Do you have any advice or can you indicate me some kind of documentation/how-to? I know that maybe this is not the right place for this questions but I also asked owncloud's community... in the meantime... Every answer is appreciated! Thanks Simone -- Simone Spinelli Università di Pisa Direzione ICT - Servizi di Rete PGP KEY http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xDBDA383DEA2F1F96 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] subscribe
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] osd reweight vs osd crush reweight
Hi all, we are running a 144 osds ceph cluster and a couple of osd are >80% full. This is the general situation: osdmap e29344: 144 osds: 144 up, 144 in pgmap v48302229: 42064 pgs, 18 pools, 60132 GB data, 15483 kobjects 173 TB used, 90238 GB / 261 TB avail We are currenty mitigating the problem using osd reweight but the more we read about this problem the more our doubts abouts using osd crush reweight increases. Actually, we do not have plans to buy new hardware. Our main question is: what if the re-weighted osd restart and get the original weight are the data going back? How to correcly face this kind of situation? Many thanks Simone -- Simone Spinelli Università di Pisa Settore Rete, Telecomunicazioni e Fonia - Serra Direzione Edilizia e Telecomunicazioni ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com