Re: [ceph-users] vm fs corrupt after pgs stuck
On 01/02/2014 01:40 PM, James Harper wrote: I just had to restore an ms exchange database after an ceph hiccup (no actual data lost - Exchange is very good like that with its no loss restore!). The order of events went something like: . Loss of connection on osd to the cluster network (public network was okay) . pgs reported stuck . stopped osd on the bad server . resolved network problem . restarted osd on the bad server . noticed that the vm running exchange had hung . rebooted and vm did a chkdsk automatically . exchange refused to mount the main mailbox store I'm not using rbd caching or anything, so for ntfs to lose files like that means something fairly nasty happened. My best guess is that the loss of connectivity and function while ceph was figuring out what was going on meant that windows IO was frozen and started timing out, but I still can't see how that could result in corruption. NTFS may have gotten confused if some I/Os completed fine but others timed out. It looks like ntfs journals metadata, but not data, so it could lose data not written out yet after this kind of failure, assuming it stops doing I/O after some timeouts are hit, so it's similar to a sudden power loss. If the application was not doing the windows equivalent of O_SYNC it could still lose writes. I'm not too familiar with windows, but perhaps there's a way to configure disk timeout behavior or NTFS writeback. Any suggestions on how I could avoid this situation in the future would be greatly appreciated! Forgot to mention. This has also happened once previously when the OOM killer targeted ceph-osd. If this caused I/O timeouts, it would make sense. If you can't adjust the guest timeouts, you might want to decrease the ceph timeouts for noticing and marking out osds with network or other issues. Josh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] snapshot atomicity
On 01/02/2014 10:51 PM, James Harper wrote: I've not used ceph snapshots before. The documentation says that the rbd device should not be in use before creating a snapshot. Does this mean that creating a snapshot is not an atomic operation? I'm happy with a crash consistent filesystem if that's all the warning is about. It's atomic, the warning is just that it's crash consistent, not application-level consistent. If it is atomic, can you create multiple snapshots as an atomic operation? The use case for this would be a database spread across multiple volumes, eg database on one rbd, logfiles on another. No, but now that you mention it this would be technically pretty simple to implement. If multiple rbds referred to the same place to get their snapshot context, they could all be snapshotted atomically. Josh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph osd perf question
Hi guys, Could someone explain what's the new perf stats show and if the numbers are reasonable on my cluster? I am concerned about the high fs_commit_latency, which seems to be above 150ms for all osds. I've tried to find the documentation on what this command actually shows, but couldn't find anything. I am using 3TB sas drives with 4 osd journals on each ssd. Are the numbers below reasonable for a fairly idle ceph cluster (osd utilisation below 10% on average)? # ceph osd perf osdid fs_commit_latency(ms) fs_apply_latency(ms) 0 192 4 1 265 4 2 116 1 3 125 2 4 166 1 5 209 3 6 184 6 7 142 2 8 209 1 9 166 1 10 216 1 11 308 3 12 150 2 13 125 1 14 175 2 15 142 2 16 150 4 when the cluster get's a bit busy (osd utilisation below 50% on average) I see: # ceph osd perf osdid fs_commit_latency(ms) fs_apply_latency(ms) 0 551 11 1 284 25 2 517 41 3 492 14 4 625 13 5 309 26 6 650 9 7 517 21 8 634 25 9 784 32 10 392 7 11 501 8 12 602 12 13 467 14 14 476 36 15 451 11 16 383 21 Thanks Andrei ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Rados] How long will it take to fix a broken replica
On 01/02/2014 04:00 PM, Kuo Hugo wrote: Hi all, I did a test to ensure Rados's recovering. 1. echo string into a object from a placement group's directory on a OSD. 2. After osd scrub, the ceph health shows " 1pgs inconsistent " . Will it be fixed later? You manually have to instruct the OSD to repair the PG. iirc it's: ceph pg repair Thanks ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is cepf feasible for storing large no. of small files with care of handling OSD failure(disk i/o error....) so it can complete pending replication independent from no. of files
On 01/02/2014 05:42 PM, upendrayadav.u wrote: Hi, 1. Is ceph is feasible for storing large no. of small files in ceph cluster with care of osd failure and recovery process. 2. if we have *4TB **OSD(almosst 85% full)* and storing only small size files(500 KB to 1024 KB), And it got failed(due to disk i/o error) then how much time it will take to complete all pending replication? What are the factors that will affect this replication process? Is this total time to complete pending replication is independent from the *no. of files* to replicate. Means failure recovery depends on only size of OSD not on no. of files to replicate. Please forget the concept of files, we talk about object inside Ceph / RADOS :) It's hard to predict how long it will take, but it depends on the number of PGs and the amount of objects inside the PGs. The more objects you have, the longer recovery will take. Btw, I wouldn't fill a OSD until 85%, that's a bit to high. I'd stay below 80%. 3. We have 64 no. of disks(with JBOD configuration) for one machine. Is this necessary to run one OSD per disk. In this, Is there possible to combined 8 no. of disk for one OSD? Run one OSD per disk, that gives you best fault tolerance. You can run one OSD with something like RAID on multiple drives, but that reduces your fault tolerance. Wido Thanks a lot for giving ur precious time for me... hope this time will get response. * * *:( Last 2 mail have no reply... :( * * * *Regards,* *Upendra Yadav* *DFS* ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Rados Gateway problem
Hello, I have a problem on Rados Gw When I do a wget http://p1.13h.com/swift/v1/test/test.mp3 on this object, there is no problem to get it. but I put it in a browser or VLC, it stopped playing after 32 seconds or less Any one could help me ? Regards, Julien___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Is cepf feasible for storing large no. of small files with care of handling OSD failure(disk i/o error....) so it can complete pending replication independent from no. of files
Thanks a lot... for your detailed and very clear answer :)Regards,Upendra YadavDFS On Fri, 03 Jan 2014 15:52:09 +0530 Wido den Hollander wrote On 01/02/2014 05:42 PM, upendrayadav.u wrote: > Hi, > > 1. Is ceph is feasible for storing large no. of small files in ceph > cluster with care of osd failure and recovery process. > > 2. if we have *4TB **OSD(almosst 85% full)* and storing only small size > files(500 KB to 1024 KB), And it got failed(due to disk i/o error) > then how much time it will take to complete all pending replication? > What are the factors that will affect this replication process? Is this > total time to complete pending replication is independent from the *no. > of files* to replicate. Means failure recovery depends on only size of > OSD not on no. of files to replicate. Please forget the concept of files, we talk about object inside Ceph / RADOS :) It's hard to predict how long it will take, but it depends on the number of PGs and the amount of objects inside the PGs. The more objects you have, the longer recovery will take. Btw, I wouldn't fill a OSD until 85%, that's a bit to high. I'd stay below 80%. > 3. We have 64 no. of disks(with JBOD configuration) for one machine. Is > this necessary to run one OSD per disk. In this, Is there possible to > combined 8 no. of disk for one OSD? > Run one OSD per disk, that gives you best fault tolerance. You can run one OSD with something like RAID on multiple drives, but that reduces your fault tolerance. Wido > Thanks a lot for giving ur precious time for me... hope this time will > get response. > * > * > *:( Last 2 mail have no reply... :( * > > * > * > *Regards,* > *Upendra Yadav* > *DFS* > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [ANN] ceph-deploy 1.3.4 released!
Hi All, There is a new release of ceph-deploy, the easy deployment tool for Ceph. This is mostly a bug-fix release, although one minor feature was added: the ability to install/remove packages from remote hosts with a new sub-command: `pkg` As we continue to add features (or improve old ones) we are also making sure proper documentation goes hand in hand with those changes too. For `pkg` this is now documented in the ceph-deploy docs page: http://ceph.com/ceph-deploy/docs/pkg.html The complete changelog, including 1.3.4 changes can be found here: http://ceph.com/ceph-deploy/docs/changelog.html#id1 Make sure you update! Thanks, Alfredo ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to use the function ceph_open_layout
You'll need to register the new pool with the MDS: ceph mds add_data_pool On Thu, Jan 2, 2014 at 9:48 PM, 鹏 wrote: > Hi all; > today, I want to use the fuction of ceph_open_layout() in libcephFs.h > > I creat a new pool success, > # rados mkpool data1 > and then I edit the code like this: > > int fd = ceph_open_layout( cmount, c_path, O_RDONLY|O_CREAT, 0666. (1<<22), > 1, (1<<22) , "data1") > > and then the fd is -22! > > when I use the data pool , it can success > int fd = ceph_open_layout( cmount, c_path, O_RDONLY|O_CREAT, 0666. (1<<22), > 1, (1<<22) , "data") > > the ceph_open_layout support read/write to a new pool??? > > thinks you for the help! > yous ! > > > > > > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] how to use the function ceph_open_layout
On Fri, 3 Jan 2014, ? wrote: > Hi all; > today, I want to use the fuction of ceph_open_layout() in libcephFs.h > > I creat a new pool success, > # rados mkpool data1 You also need to do ceph mds add_data_pool data1 sage > and then I edit the code like this: > > int fd = ceph_open_layout( cmount, c_path, O_RDONLY|O_CREAT, 0666. (1<<22), > 1, (1<<22) , "data1") > > and then the fd is -22! > > when I use the data pool , it can success > int fd = ceph_open_layout( cmount, c_path, O_RDONLY|O_CREAT, 0666. (1<<22), > 1, (1<<22) , "data") > > the ceph_open_layout support read/write to a new pool??? > > thinks you for the help! > yous ! > > > > > > > > > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] crush chooseleaf vs. choose
Run 'ceph osd crush tunables optimal' or adjust an offline map file via the crushtool command line (more annoying) and retest; I suspect that is the problem. http://ceph.com/docs/master/rados/operations/crush-map/#tunables sage On Fri, 3 Jan 2014, Dietmar Maurer wrote: > > In both cases, you only get 2 replicas on the remaining 2 hosts. > > OK, I was able to reproduce this with crushtool. > > > The difference is if you have 4 hosts with 2 osds. In the choose case, you > > have > > some fraction of the data that chose the down host in the first step (most > > of the > > attempts, actually!) and then couldn't find a usable osd, leaving you with > > only 2 > > This is also reproducible. > > > replicas. With chooseleaf that doesn't happen. > > > > The other difference is if you have one of the two OSDs on the host marked > > out. > > In the choose case, the remaining OSD will get allocated 2x the data; in the > > chooseleaf case, usage will remain proportional with the rest of the > > cluster and > > the data from the out OSD will be distributed across other OSDs (at least > > when > > there are > 3 hosts!). > > I see, but data distribution seems not optimal in that case. > > For example using this crush map: > > # types > type 0 osd > type 1 host > type 2 rack > type 3 row > type 4 room > type 5 datacenter > type 6 root > > # buckets > host prox-ceph-1 { > id -2 # do not change unnecessarily > # weight 7.260 > alg straw > hash 0 # rjenkins1 > item osd.0 weight 3.630 > item osd.1 weight 3.630 > } > host prox-ceph-2 { > id -3 # do not change unnecessarily > # weight 7.260 > alg straw > hash 0 # rjenkins1 > item osd.2 weight 3.630 > item osd.3 weight 3.630 > } > host prox-ceph-3 { > id -4 # do not change unnecessarily > # weight 3.630 > alg straw > hash 0 # rjenkins1 > item osd.4 weight 3.630 > } > > host prox-ceph-4 { > id -5 # do not change unnecessarily > # weight 3.630 > alg straw > hash 0 # rjenkins1 > item osd.5 weight 3.630 > } > > root default { > id -1 # do not change unnecessarily > # weight 21.780 > alg straw > hash 0 # rjenkins1 > item prox-ceph-1 weight 7.260 # 2 OSDs > item prox-ceph-2 weight 7.260 # 2 OSDs > item prox-ceph-3 weight 3.630 # 1 OSD > item prox-ceph-4 weight 3.630 # 1 OSD > } > > # rules > rule data { > ruleset 0 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } > # end crush map > > crushtool shows the following utilization: > > # crushtool --test -i my.map --rule 0 --num-rep 3 --show-utilization > device 0: 423 > device 1: 452 > device 2: 429 > device 3: 452 > device 4: 661 > device 5: 655 > > Any explanation for that? Maybe related to the small number of devices? > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] snapshot atomicity
On 1/3/14, 3:21 AM, "Josh Durgin" wrote: >On 01/02/2014 10:51 PM, James Harper wrote: >> I've not used ceph snapshots before. The documentation says that the >>rbd device should not be in use before creating a snapshot. Does this >>mean that creating a snapshot is not an atomic operation? I'm happy with >>a crash consistent filesystem if that's all the warning is about. > >It's atomic, the warning is just that it's crash consistent, not >application-level consistent. > >> If it is atomic, can you create multiple snapshots as an atomic >>operation? The use case for this would be a database spread across >>multiple volumes, eg database on one rbd, logfiles on another. > >No, but now that you mention it this would be technically pretty >simple to implement. If multiple rbds referred to the same place to get >their snapshot context, they could all be snapshotted atomically. > >Josh >___ I had been trying to imagine a use for pool-level snapshotting after I¹d read about that feature. Thanks for settling that! JL ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [Rados] How long will it take to fix a broken replica
That's useful information. Thanks. 2014/1/3 Wido den Hollander > On 01/02/2014 04:00 PM, Kuo Hugo wrote: > >> Hi all, >> >> I did a test to ensure Rados's recovering. >> >> 1. echo string into a object from a placement group's directory on a OSD. >> 2. After osd scrub, the ceph health shows " 1pgs inconsistent " . Will >> it be fixed later? >> >> > You manually have to instruct the OSD to repair the PG. > > iirc it's: ceph pg repair > > Thanks >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > > -- > Wido den Hollander > 42on B.V. > > Phone: +31 (0)20 700 9902 > Skype: contact42on > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Strange things on Swift RADOS Gateway
Hi all I have a problem with gateway and swift. When i try ti get by wget, curl, or swift command, I have no problem to get my file ! But when I tried to do it directly in my browser it stopped between 6 and 40 seconds. Ceph.conf: [client.radosgw.gateway] host = p1 keyring = /etc/ceph/keyring.radosgw.gateway rgw socket path = /tmp/radosgw.sock log file = /var/log/ceph/radosgw.log Rados GW log level 20: 014-01-03 20:17:58.575271 7fa20f35d780 20 enqueued request req=0x20e72e0 2014-01-03 20:17:58.575283 7fa20f35d780 20 RGWWQ: 2014-01-03 20:17:58.575285 7fa20f35d780 20 req: 0x20e72e0 2014-01-03 20:17:58.575291 7fa20f35d780 10 allocated request req=0x20dd580 2014-01-03 20:17:58.575331 7fa1c37fe700 20 dequeued request req=0x20e72e0 2014-01-03 20:17:58.575340 7fa1c37fe700 20 RGWWQ: empty 2014-01-03 20:17:58.575346 7fa1c37fe700 1 == starting new request req=0x20e72e0 = 2014-01-03 20:17:58.575439 7fa1c37fe700 2 req 4:0.93::GET /swift/v1/test/big_buck_bunny.mp4::initializing 2014-01-03 20:17:58.575485 7fa1c37fe700 10 ver=v1 first=test req=big_buck_bunny.mp4 2014-01-03 20:17:58.575494 7fa1c37fe700 10 s->object=big_buck_bunny.mp4 s->bucket=test 2014-01-03 20:17:58.575501 7fa1c37fe700 20 FCGI_ROLE=RESPONDER 2014-01-03 20:17:58.575503 7fa1c37fe700 20 SCRIPT_URL=/swift/v1/test/big_buck_bunny.mp4 2014-01-03 20:17:58.575505 7fa1c37fe700 20 SCRIPT_URI=http://p1.13h.com/swift/v1/test/big_buck_bunny.mp4 2014-01-03 20:17:58.575507 7fa1c37fe700 20 RGW_LOG_LEVEL=20 2014-01-03 20:17:58.575509 7fa1c37fe700 20 RGW_PRINT_CONTINUE=yes 2014-01-03 20:17:58.575511 7fa1c37fe700 20 RGW_SHOULD_LOG=yes 2014-01-03 20:17:58.575513 7fa1c37fe700 20 HTTP_HOST=p1.13h.com 2014-01-03 20:17:58.575514 7fa1c37fe700 20 HTTP_CONNECTION=keep-alive 2014-01-03 20:17:58.575516 7fa1c37fe700 20 HTTP_CACHE_CONTROL=max-age=0 2014-01-03 20:17:58.575518 7fa1c37fe700 20 HTTP_ACCEPT=text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 2014-01-03 20:17:58.575521 7fa1c37fe700 20 HTTP_USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 2014-01-03 20:17:58.575523 7fa1c37fe700 20 HTTP_ACCEPT_ENCODING=gzip,deflate,sdch 2014-01-03 20:17:58.575525 7fa1c37fe700 20 HTTP_ACCEPT_LANGUAGE=fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4 2014-01-03 20:17:58.575527 7fa1c37fe700 20 HTTP_COOKIE=_ga=GA1.2.942382960.1369422161 2014-01-03 20:17:58.575529 7fa1c37fe700 20 HTTP_RANGE=bytes=34256-34256 2014-01-03 20:17:58.575531 7fa1c37fe700 20 HTTP_IF_RANGE=f13004eed4251c602bbe15737e8a1ecb 2014-01-03 20:17:58.575532 7fa1c37fe700 20 PATH=/usr/local/bin:/usr/bin:/bin 2014-01-03 20:17:58.575534 7fa1c37fe700 20 SERVER_SIGNATURE= 2014-01-03 20:17:58.575536 7fa1c37fe700 20 SERVER_SOFTWARE=Apache/2.2.22 (Ubuntu) 2014-01-03 20:17:58.575538 7fa1c37fe700 20 SERVER_NAME=p1.13h.com 2014-01-03 20:17:58.575540 7fa1c37fe700 20 SERVER_ADDR=62.210.177.137 2014-01-03 20:17:58.575542 7fa1c37fe700 20 SERVER_PORT=80 2014-01-03 20:17:58.575544 7fa1c37fe700 20 REMOTE_ADDR=213.245.29.151 2014-01-03 20:17:58.575545 7fa1c37fe700 20 DOCUMENT_ROOT=/var/www 2014-01-03 20:17:58.575547 7fa1c37fe700 20 SERVER_ADMIN=ad...@13h.com 2014-01-03 20:17:58.575549 7fa1c37fe700 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi 2014-01-03 20:17:58.575551 7fa1c37fe700 20 REMOTE_PORT=51892 2014-01-03 20:17:58.575553 7fa1c37fe700 20 GATEWAY_INTERFACE=CGI/1.1 2014-01-03 20:17:58.57 7fa1c37fe700 20 SERVER_PROTOCOL=HTTP/1.1 2014-01-03 20:17:58.575556 7fa1c37fe700 20 REQUEST_METHOD=GET 2014-01-03 20:17:58.575558 7fa1c37fe700 20 QUERY_STRING=page=swift¶ms=/v1/test/big_buck_bunny.mp4 2014-01-03 20:17:58.575560 7fa1c37fe700 20 REQUEST_URI=/swift/v1/test/big_buck_bunny.mp4 2014-01-03 20:17:58.575562 7fa1c37fe700 20 SCRIPT_NAME=/swift/v1/test/big_buck_bunny.mp4 2014-01-03 20:17:58.575564 7fa1c37fe700 2 req 4:0.000219:swift:GET /swift/v1/test/big_buck_bunny.mp4::getting op 2014-01-03 20:17:58.575571 7fa1c37fe700 2 req 4:0.000226:swift:GET /swift/v1/test/big_buck_bunny.mp4:get_obj:authorizing 2014-01-03 20:17:58.575578 7fa1c37fe700 2 req 4:0.000233:swift:GET /swift/v1/test/big_buck_bunny.mp4:get_obj:reading permissions 2014-01-03 20:17:58.575602 7fa1c37fe700 20 get_obj_state: rctx=0x7fa1840027c0 obj=.rgw:test state=0x7fa184012918 s->prefetch_data=0 2014-01-03 20:17:58.575615 7fa1c37fe700 10 moving .rgw+test to cache LRU end 2014-01-03 20:17:58.575619 7fa1c37fe700 10 cache get: name=.rgw+test : hit 2014-01-03 20:17:58.575630 7fa1c37fe700 20 get_obj_state: s->obj_tag was set empty 2014-01-03 20:17:58.575634 7fa1c37fe700 20 Read xattr: user.rgw.idtag 2014-01-03 20:17:58.575637 7fa1c37fe700 20 Read xattr: user.rgw.manifest 2014-01-03 20:17:58.575644 7fa1c37fe700 10 moving .rgw+test to cache LRU end 2014-01-03 20:17:58.575648 7fa1c37fe700 10 cache get: name=.rgw+test : hit 2014-01-03 20:17:58.575673 7fa1c37fe700 20 rgw_get_bucket_info: bucket instance: test(@{i=.rgw.buckets.index}.rgw.buckets[default.6016.1]) 2014-01-
Re: [ceph-users] Monitor configuration issue
I figured out why this was happening. When I went through the quick start guide, I created a directory on the admin node that was /home/ceph/storage and this is where ceph.conf, ceph.log, keyrings, etc. ended up. What I realized though is that when i was running the ceph commands on the admin node, it reads the configuration file from /etc/ceph/ceph.conf. I failed to update the config on the admin node itself when I ran ceph-deploy config push. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Procedure for planned reboots?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 Hi, I was wondering if there are any procedures for rebooting a node? Presumably when a node is rebooted, Ceph will lose contact with the OSDs and begin moving data around. I’ve not actually had to reboot a node of my cluster yet, but may need to do so in the near future. Does Ceph handle reboots gracefully or will it immediately begin moving data around? I ask because due to the hardware we’re running, we have a fair few OSDs per node (between 32-40, I know this isn’t ideal). We recently had a node die on us briefly and it took about 2 hours to get back to HEALTH_OK once the node was back online after being down for around 15 minutes. This is with about 16TB of data (of 588TB total), so I’m worried about how a reboot (or another node failure) will affect us when we have more data on there. Thanks Dane -BEGIN PGP SIGNATURE- Comment: GPGTools - https://gpgtools.org iQIcBAEBCgAGBQJSx0stAAoJEFGV5AlrXTNfjf8P/AlBryGdjlOVmpGTO3hlOKSr pq9gOXrn3x6hf/sX8yXg14TzsdbPkLADhgZ3s8di5uaeLlZJGtvv3zbUx5p8nGVy LsXuLm+lL1FMCBSB+dhn/o5x9UknFNT7tgQbK/JpzKK4UuTZNIFkDPI676O1tcxu L01tzX8OPoDpHeN0aLWSnFuRuS5i89WkDES9kZtimgc5cl9Rm6ELUHpUSznzhKm4 PHlS6/BTF4R39hXCLGDhgjL2zIFqGzXVIkC438Ns+thkWQ3xbjLIpBEpFjTd7lCS bSaePLt3cBLW/kSYfkebJg8skdhoKYazRKNTW5vJ1aCNDnILae1sPsYXuiIS7aLT H+eNY5aP4dvefvdWHg2bKkVdj+ERzd9yihvhxL6/3BZSFuC3D/fR2zl98WwY5Fkb VMTj+HAwdx8bBtieWYsXy22Upnb/oTIuH6Q+PGivPJBftdpeTWpB57xEzWHcLpjd nPZEB96ha3zV6Q1mUylkJXIuvzemOD2gUZJzt6bw/DHaswJXvPgzfEx2Nfb0Zd1l 2sjZ0Tp9bB15++5drXjpJRkpA2s+fIxkZPDh93IURzWQK49PZEUV7BfKcALrSRw4 2p8V7lNxtz/yQpNZJym5alrSS6xCx3dEeVL3lMiVwxBsVWJ+jquCkDNFyWf7jYFH OraOjShq/roUD1gZvdqO =Vyeo -END PGP SIGNATURE- ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Procedure for planned reboots?
On Jan 3, 2014, at 4:43 PM, Dane Elwell wrote: > I was wondering if there are any procedures for rebooting a node? Presumably > when a node is rebooted, Ceph will lose contact with the OSDs and begin > moving data around. I’ve not actually had to reboot a node of my cluster yet, > but may need to do so in the near future. Does Ceph handle reboots gracefully > or will it immediately begin moving data around? There's a delay. By default I think it is 5 minutes. You can also run "ceph osd set noout" beforehand to prevent OSD's from being marked 'out' no matter how long they may have been 'down'. After your maintenance don't forget to run "ceph osd unset noout" to put things back to normal. > I ask because due to the hardware we’re running, we have a fair few OSDs per > node (between 32-40, I know this isn’t ideal). We recently had a node die on > us briefly and it took about 2 hours to get back to HEALTH_OK once the node > was back online after being down for around 15 minutes. This is with about > 16TB of data (of 588TB total), so I’m worried about how a reboot (or another > node failure) will affect us when we have more data on there. I normally set the "noout" flag as above, then reboot a single node and wait for all the OSDs to come back online and for peering, etc. to finish. I like to run "ceph osd tree" and "ceph pg stat" while waiting to see how things are going. Only once the cluster is happy and stable after the first reboot will I start a second. This all presumes that your crush map has multiple replicas and stores them on different hosts, of course. JN ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com