[ceph-users] Adding OSDs without ceph-deploy
I use a scripted installation of ceph without ceph-deploy, which works fine on 0.63. On 0.80 it fails to add the OSDs. In this scenario the local OSDs are all listed in ceph.conf. It runs: mkcephfs --init-local-daemons osd -d blah which creates the OSDs (as in they are there on the file system): # ls /var/lib/ceph/osd/ceph-* /var/lib/ceph/osd/ceph-0: ceph_fsid current fsid journal keyring magic ready store_version superblock whoami /var/lib/ceph/osd/ceph-1: ceph_fsid current fsid journal keyring magic ready store_version superblock whoami /var/lib/ceph/osd/ceph-2: ceph_fsid current fsid journal keyring magic ready store_version superblock whoami /var/lib/ceph/osd/ceph-3: ceph_fsid current fsid journal keyring magic ready store_version superblock whoami However I get: # service ceph start === mon.a === Starting Ceph mon.a on extility-qa2-test...already running === osd.0 === Error ENOENT: osd.0 does not exist. create it before updating the crush map failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.0 --keyring=/var/lib/ceph/osd/ceph-0/keyring osd crush create-or-move -- 0 0.09 host=extility-qa2-test root=default' and ceph status returns no osds: root@extility-qa2-test:~# ceph status cluster 68efa90e-20a3-4efe-9382-38c8839aa6b0 health HEALTH_ERR 768 pgs stuck inactive; 768 pgs stuck unclean; no osds monmap e1: 1 mons at {a=10.157.208.1:6789/0}, election epoch 2, quorum 0 a osdmap e1: 0 osds: 0 up, 0 in pgmap v2: 768 pgs, 3 pools, 0 bytes data, 0 objects 0 kB used, 0 kB / 0 kB avail 768 creating I'm fully aware there is a newer way to do this, but I'd like this route to work too if possible. Is there some new magic I need to do to get ceph to recognise the osds? (again without ceph-deploy) -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OCFS2 or GFS2 for cluster filesystem?
On 11 Jul 2013, at 19:25, Gilles Mocellin wrote: > Hello, > > Yes, you missed that qemu can use directly RADOS volume. > Look here : > http://ceph.com/docs/master/rbd/qemu-rbd/ > > Create : > qemu-img create -f rbd rbd:data/squeeze 10G > > Use : > > qemu -m 1024 -drive format=raw,file=rbd:data/squeeze I don't think he did. As I read it he wants his VMs to all access the same filing system, and doesn't want to use cephfs. OCFS2 on RBD I suppose is a reasonable choice for that. -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OCFS2 or GFS2 for cluster filesystem?
On 12 Jul 2013, at 13:21, Tom Verdaat wrote: > In the mean time I've done some more research and figured out that: > • There is a bunch of other cluster file systems but GFS2 and OCFS2 are > the only open source ones I could find, and I believe the only ones that are > integrated in the Linux kernel. > • OCFS2 seems to have a lot more public information than GFS2. It has > more documentation and a living - though not very active - mailing list. > • OCFS2 seems to be in active use by its sponsor Oracle, while I can't > find much on GFS2 from its sponsor RedHat. > • OCFS2 documentation indicates a node soft limit of 256 versus 16 for > GFS2, and there are actual deployments of stable 45 TB+ production clusters. > • Performance tests from 2010 indicate OCFS2 clearly beating GFS2, > though of course newer versions have been released since. > • GFS2 has more fencing options than OCFS2. FWIW: For VM images (i.e. large files accessed by only one client at once) OCFS2 seems to perform better than GFS2. I seem to remember some performance issues with small files, and large directories with a lot of contention (multiple readers and writers of files or file metadata). You may need to forward port some of the more modern tools to your distro. -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Location of MONs
On 23 Jul 2013, at 17:16, Gregory Farnum wrote: >> And without wanting to sound daft having missed a salient configuration >> detail, but there's no way to release when it's written the primary? > > Definitely not. Ceph's consistency guarantees and recovery mechanisms > are all built on top of all the replicas having a consistent copy and > that breaks if you do primary-only acks. Maybe in the future something > like this will happen, but it's all very blue-sky right now. Saying that, another possibility is a persistent writeback cache on the client (either in qemu or in librbd), the former being something I'm toying with. Being persistent it can complete flush/fua type operations before they are actually written to ceph. It wasn't intended for this use case but it might be interesting. -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.61.6 Cuttlefish update released
On 24 Jul 2013, at 05:47, Sage Weil wrote: > There was a problem with the monitor daemons in v0.61.5 that would prevent > them from restarting after some period of time. This release fixes the > bug and works around the issue to allow affected monitors to restart. > All v0.61.5 users are strongly recommended to upgrade. Was this bug also in 0.61.4? -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [list admin] - membership disabled due to bounces
This is the third of these I've got in a month. I get them on no other mailing lists and I'm on quite a lot of lists. I have an outsourced spam thing which is generally well behaved but will reject spam. Any ideas why this might be? Alex Begin forwarded message: > From: ceph-users-requ...@lists.ceph.com > Date: 11 August 2013 08:25:30 GMT+01:00 > To: a...@alex.org.uk > Subject: confirm [REDACTED] > > Your membership in the mailing list ceph-users has been disabled due > to excessive bounces The last bounce received from you was dated > 11-Aug-2013. You will not get any more messages from this list until > you re-enable your membership. You will receive 3 more reminders like > this before your membership in the list is deleted. > > To re-enable your membership, you can simply respond to this message > (leaving the Subject: line intact), or visit the confirmation page at > >http://lists.ceph.com/confirm.cgi/ceph-users-ceph.com/[REDACTED] > > > You can also visit your membership page at > >http://lists.ceph.com/options.cgi/ceph-users-ceph.com/[REDACTED] > > > On your membership page, you can change various delivery options such > as your email address and whether you get digests or not. As a > reminder, your membership password is > >[REDACTED] > > If you have any questions or problems, you can contact the list owner > at > >ceph-users-ow...@lists.ceph.com > > -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [list admin] - membership disabled due to bounces
James, I'd prefer the list admins either tuned the sensitivity of the bounce filter or rejected the spam on ingress. I see nearly no spam so it seems my spam filter is effective :-) Alex On 11 Aug 2013, at 11:26, James Harper wrote: > This list actually does get a bit of spam, unlike most lists I'm subscribed > to. I'm surprised more reputation filters haven't blocked it. Rejecting spam > is the only right way to do it (junk mail folders are dumb), but obviously > the ceph-users list is taking the bounces as indicating a problem with your > account. > > Probably the only thing to do is to white list the address and put up with > the spam. > > James > >> -Original Message- >> From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users- >> boun...@lists.ceph.com] On Behalf Of Alex Bligh >> Sent: Sunday, 11 August 2013 6:43 PM >> To: ceph-users@lists.ceph.com >> Subject: [ceph-users] [list admin] - membership disabled due to bounces >> >> This is the third of these I've got in a month. I get them on no >> other mailing lists and I'm on quite a lot of lists. I have an >> outsourced spam thing which is generally well behaved but will >> reject spam. Any ideas why this might be? >> >> Alex >> >> Begin forwarded message: >> >>> From: ceph-users-requ...@lists.ceph.com >>> Date: 11 August 2013 08:25:30 GMT+01:00 >>> To: a...@alex.org.uk >>> Subject: confirm [REDACTED] >>> >>> Your membership in the mailing list ceph-users has been disabled due >>> to excessive bounces The last bounce received from you was dated >>> 11-Aug-2013. You will not get any more messages from this list until >>> you re-enable your membership. You will receive 3 more reminders like >>> this before your membership in the list is deleted. >>> >>> To re-enable your membership, you can simply respond to this message >>> (leaving the Subject: line intact), or visit the confirmation page at >>> >>> http://lists.ceph.com/confirm.cgi/ceph-users-ceph.com/[REDACTED] >>> >>> >>> You can also visit your membership page at >>> >>> http://lists.ceph.com/options.cgi/ceph-users-ceph.com/[REDACTED] >>> >>> >>> On your membership page, you can change various delivery options such >>> as your email address and whether you get digests or not. As a >>> reminder, your membership password is >>> >>> [REDACTED] >>> >>> If you have any questions or problems, you can contact the list owner >>> at >>> >>> ceph-users-ow...@lists.ceph.com >>> >>> >> >> -- >> Alex Bligh >> >> >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] newbie question: rebooting the whole cluster, powerfailure
On 6 Sep 2013, at 13:46, Jens Kristian Søgaard wrote: > You created 7 mons in ceph. This is like having a parliament with 7 members. > > Whenever you want to do something, you need to convince a majority of > parliament to vote yes. A majority would then be 4 members voting yes. > > If two members of parliament decide to stay at home instead of turning up to > vote - you still need 4 members to get a majority. > > It is _not_ the case that everyone would suddenly agree and acknowledge that > only 5 parliament members have turned up to vote, so that only 3 yes votes > would be enough to form a majority. Perhaps not a great analogy. At least in the case of the UK parliament, if 2 members of a 7 member parliament stay at home and don't vote, you would only need 3 members to pass a resolution. In the UK (and I believe in most other parliaments) you need the number of 'yes' to exceed the number of 'no'. The number of members does not matter. In ceph, you need the number of monitors active and voting yes to exceed (i.e. be strictly greater than) half the number of monitors configured. There is no magic about anything being odd or even, save that the quorum for an n-MON cluster, where n is odd, is the same as the quorum for an n+1 MON cluster (as n+1 is even) - in both cases if at least k=(n+1)/2 devices fail it will take the cluster out (i.e. (n-1)/2 have to survive). This makes deploying even numbers of MON devices wasteful (does not increase quorum) and arguably increases the chance of failure (as now we need k devices of n+1 to fail, as opposed to k devices of n). -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph performance with 8K blocks.
On 17 Sep 2013, at 21:47, Jason Villalta wrote: > dd if=ddbenchfile of=/dev/null bs=8K > 819200 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s As a general point, this benchmark may not do what you think it does, depending on the version of dd, as writes to /dev/null can be heavily optimised. Try: dd if=ddbenchfile of=- bs=8K | dd if=- of=/dev/null bs=8K -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Abort on moving OSD
I have set up a configuration with 3 x MON + 2 x OSD, each on a different host, as a test bench setup. I've written nothing to the cluster (yet). I'm running ceph 0.61.2 (cuttlefish). I want to discover what happens if I move an OSD from one host to another, simulating the effect of moving a working harddrive from a dead host to a live host, which I believe should work. So I stopped osd.0 on one host, and copied (using scp) /var/lib/ceph/osd/ceph-0 from one host to another. My understanding is that starting osd.0 on the destination host with 'service ceph start osd.0' should rewrite the crush map and everything should be fine. In fact what happened was: root@ceph6:~# service ceph start osd.0 === osd.0 === create-or-move updating item id 0 name 'osd.0' weight 0.05 at location {host=ceph6,root=default} to crush map Starting Ceph osd.0 on ceph6... starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal ... root@ceph6:~# ceph health HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean; 1/2 in osds are down osd.0 was not running on the new host, due to the abort as set out below (from the log file). Should this work? -- Alex Bligh 2013-05-18 17:03:00.345129 7fa408dbb780 0 ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60), process ceph-osd, pid 3398 2013-05-18 17:03:00.676611 7fa408dbb780 -1 filestore(/var/lib/ceph/osd/ceph-0) limited size xattrs -- filestore_xattr_use_omap enabled 2013-05-18 17:03:00.891267 7fa408dbb780 0 filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is supported and appears to work 2013-05-18 17:03:00.891314 7fa408dbb780 0 filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-05-18 17:03:00.891533 7fa408dbb780 0 filestore(/var/lib/ceph/osd/ceph-0) mount did NOT detect btrfs 2013-05-18 17:03:01.373741 7fa408dbb780 0 filestore(/var/lib/ceph/osd/ceph-0) mount syncfs(2) syscall fully supported (by glibc and kernel) 2013-05-18 17:03:01.374175 7fa408dbb780 0 filestore(/var/lib/ceph/osd/ceph-0) mount found snaps <> 2013-05-18 17:03:02.023315 7fa408dbb780 0 filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2013-05-18 17:03:02.024992 7fa408dbb780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2013-05-18 17:03:02.025372 7fa408dbb780 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 21: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0 2013-05-18 17:03:02.025580 7fa408dbb780 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 21: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0 2013-05-18 17:03:02.027454 7fa408dbb780 1 journal close /var/lib/ceph/osd/ceph-0/journal 2013-05-18 17:03:02.302070 7fa408dbb780 -1 filestore(/var/lib/ceph/osd/ceph-0) limited size xattrs -- filestore_xattr_use_omap enabled 2013-05-18 17:03:02.361438 7fa408dbb780 0 filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is supported and appears to work 2013-05-18 17:03:02.361508 7fa408dbb780 0 filestore(/var/lib/ceph/osd/ceph-0) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option 2013-05-18 17:03:02.361755 7fa408dbb780 0 filestore(/var/lib/ceph/osd/ceph-0) mount did NOT detect btrfs 2013-05-18 17:03:02.424915 7fa408dbb780 0 filestore(/var/lib/ceph/osd/ceph-0) mount syncfs(2) syscall fully supported (by glibc and kernel) 2013-05-18 17:03:02.425107 7fa408dbb780 0 filestore(/var/lib/ceph/osd/ceph-0) mount found snaps <> 2013-05-18 17:03:02.519006 7fa408dbb780 0 filestore(/var/lib/ceph/osd/ceph-0) mount: enabling WRITEAHEAD journal mode: btrfs not detected 2013-05-18 17:03:02.520446 7fa408dbb780 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2013-05-18 17:03:02.520507 7fa408dbb780 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 29: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0 2013-05-18 17:03:02.520625 7fa408dbb780 1 journal _open /var/lib/ceph/osd/ceph-0/journal fd 29: 1048576000 bytes, block size 4096 bytes, directio = 1, aio = 0 2013-05-18 17:03:02.522371 7fa408dbb780 0 osd.0 24 crush map has features 33816576, adjusting msgr requires for clients 2013-05-18 17:03:02.522419 7fa408dbb780 0 osd.0 24 crush map has features 33816576, adjusting msgr requires for osds 2013-05-18 17:03:02.533617 7fa408dbb780 -1 *** Caught signal (Aborted) ** in thread 7fa408dbb780 ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60) 1: /usr/bin/ceph-osd() [0x79087a] 2: (()+0xfcb0) [0x7fa408254cb0] 3: (gsignal()+0x35) [0x7fa406a0d425] 4: (abort()+0x17b) [0x7fa406a10b8b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fa40735f69d] 6: (()+0xb5846) [0x7fa40735d846] 7: (()+0xb5873) [0x7fa40735d873] 8: (()+0xb596e) [0x7fa40735d96e] 9: (ceph::buff
Re: [ceph-users] Abort on moving OSD
On 18 May 2013, at 18:20, Alex Bligh wrote: > I want to discover what happens if I move an OSD from one host to another, > simulating the effect of moving a working harddrive from a dead host to a > live host, which I believe should work. So I stopped osd.0 on one host, and > copied (using scp) /var/lib/ceph/osd/ceph-0 from one host to another. My > understanding is that starting osd.0 on the destination host with 'service > ceph start osd.0' should rewrite the crush map and everything should be fine. Apologies, this was my idiocy. scp does not copy xattrs. rsync -aHAX does, and indeed works fine. I suppose it would have been nice if it died a little more gracefully, but I think I got what I deserved. -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Setting OSD weight
How do I set the weight for OSDs? I have 4 OSDs I want to create with very low weight (<1) so they are never used if any other OSDs are added subsequently (and would like to avoid placement groups). These OSDs have been created with default settings using the manual OSD add procedure as per ceph docs. But (unless I am being stupid which is quite possible), setting the weight (either to 0.0001 or to 2) appears to have no effect per a ceph osd dump. -- Alex Bligh root@kvm:~# ceph osd dump epoch 12 fsid ed0e2e56-bc17-4ef2-a1db-b030c77a8d45 created 2013-05-20 14:58:02.250461 modified 2013-05-20 14:59:54.580601 flags pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 320 pgp_num 320 last_change 1 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 320 pgp_num 320 last_change 1 owner 0 pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 320 pgp_num 320 last_change 1 owner 0 max_osd 4 osd.0 up in weight 1 up_from 2 up_thru 10 down_at 0 last_clean_interval [0,0) 10.161.208.1:6800/30687 10.161.208.1:6801/30687 10.161.208.1:6803/30687 exists,up 9cc2a2cf-e79e-404b-9b49-55c8954b0684 osd.1 up in weight 1 up_from 4 up_thru 11 down_at 0 last_clean_interval [0,0) 10.161.208.1:6804/30800 10.161.208.1:6806/30800 10.161.208.1:6807/30800 exists,up 11628f8d-8234-4329-bf6e-e130d76f18f5 osd.2 up in weight 1 up_from 3 up_thru 11 down_at 0 last_clean_interval [0,0) 10.161.208.1:6809/30913 10.161.208.1:6810/30913 10.161.208.1:6811/30913 exists,up 050c8955-84aa-4025-961a-f9d9fe60a5b0 osd.3 up in weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 10.161.208.1:6812/31024 10.161.208.1:6813/31024 10.161.208.1:6814/31024 exists,up bcd4ad0e-c0e4-4c46-95c2-e68906f8e69a root@kvm:~# ceph osd crush set 0 2 root=default set item id 0 name 'osd.0' weight 2 at location {root=default} to crush map root@kvm:~# ceph osd dump epoch 14 fsid ed0e2e56-bc17-4ef2-a1db-b030c77a8d45 created 2013-05-20 14:58:02.250461 modified 2013-05-20 15:13:21.009317 flags pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 320 pgp_num 320 last_change 1 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 320 pgp_num 320 last_change 1 owner 0 pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 320 pgp_num 320 last_change 1 owner 0 max_osd 4 osd.0 up in weight 1 up_from 2 up_thru 13 down_at 0 last_clean_interval [0,0) 10.161.208.1:6800/30687 10.161.208.1:6801/30687 10.161.208.1:6803/30687 exists,up 9cc2a2cf-e79e-404b-9b49-55c8954b0684 osd.1 up in weight 1 up_from 4 up_thru 13 down_at 0 last_clean_interval [0,0) 10.161.208.1:6804/30800 10.161.208.1:6806/30800 10.161.208.1:6807/30800 exists,up 11628f8d-8234-4329-bf6e-e130d76f18f5 osd.2 up in weight 1 up_from 3 up_thru 13 down_at 0 last_clean_interval [0,0) 10.161.208.1:6809/30913 10.161.208.1:6810/30913 10.161.208.1:6811/30913 exists,up 050c8955-84aa-4025-961a-f9d9fe60a5b0 osd.3 up in weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval [0,0) 10.161.208.1:6812/31024 10.161.208.1:6813/31024 10.161.208.1:6814/31024 exists,up bcd4ad0e-c0e4-4c46-95c2-e68906f8e69a ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Setting OSD weight
On 20 May 2013, at 17:19, Sage Weil wrote: > Look at 'ceph osd tree'. The weight value in 'ceph osd dump' output is > the in/out correction, not the crush weight. Doh. Thanks. Is there a difference between: ceph osd crush set 0 2 root=default and ceph osd crush reweight osd.0 2 ? -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Determining when an 'out' OSD is actually unused
If I want to remove an osd, I use 'ceph out' before taking it down, i.e. stopping the OSD process, and removing the disk. How do I (preferably programatically) tell when it is safe to stop the OSD process? The documentation says 'ceph -w', which is not especially helpful, (a) if I want to do it programatically, or (b) if there are other problems in the cluster so ceph was not reporting HEALTH_OK to start with. Is there a better way? -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Determining when an 'out' OSD is actually unused
Dan, On 21 May 2013, at 00:52, Dan Mick wrote: > On 05/20/2013 01:33 PM, Alex Bligh wrote: >> If I want to remove an osd, I use 'ceph out' before taking it down, i.e. >> stopping the OSD process, and removing the disk. >> >> How do I (preferably programatically) tell when it is safe to stop the OSD >> process? The documentation says 'ceph -w', which is not especially helpful, >> (a) if I want to do it programatically, or (b) if there are other problems >> in the cluster so ceph was not reporting HEALTH_OK to start with. >> >> Is there a better way? >> > > We've had some discussions about this recently, but there's no great way of > doing this right now. OK. So would the following conservative rule work for now? * Don't mark the OSD out until and unless you have ceph HEALTH_OK * Then mark it out * Then you are safe to remove only when it returns to ceph HEALTH_OK The instructions at present say watch ceph -w, but don't say exactly what to watch for. > We should probably have a query option that returns "number of PGs on this > OSD" or some such. That would be very useful. -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Determining when an 'out' OSD is actually unused
On 21 May 2013, at 07:17, Dan Mick wrote: > Yes, with the proviso that you really mean "kill the osd" when clean. > Marking out is step 1. Thanks -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] qemu-1.4.2 rbd-fixed ubuntu packages
Wolfgang, On 28 May 2013, at 06:50, Wolfgang Hennerbichler wrote: > for anybody who's interested, I've packaged the latest qemu-1.4.2 (not 1.5, > it didn't work nicely with libvirt) which includes important fixes to RBD for > ubuntu 12.04 AMD64. If you want to save some time, I can share the packages > with you. drop me a line if you're interested. Information as to what the important fixes are would be appreciated! -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] qemu-1.4.2 rbd-fixed ubuntu packages
On 28 May 2013, at 06:50, Wolfgang Hennerbichler wrote: > for anybody who's interested, I've packaged the latest qemu-1.4.2 (not 1.5, > it didn't work nicely with libvirt) which includes important fixes to RBD for > ubuntu 12.04 AMD64. If you want to save some time, I can share the packages > with you. drop me a line if you're interested. The issue Wolfgang is referring to is here: http://tracker.ceph.com/issues/3737 And the actual patch to QEMU is here: http://patchwork.ozlabs.org/patch/232489/ I'd be interested in whether the raring version (1.4.0+dfsg-1expubuntu4) contains this (unchecked as yet). -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Recommended versions of Qemu/KVM to run Ceph Cuttlefish
I'm planning on running Ceph Cuttlefish with Qemu/KVM using Qemu's inbuilt RBD support (not kernel RBD). I may go beyond Cuttlefish. What versions of Qemu are recommended for this? Qemu 1.0 is what ships with Ubuntu Precise LTS which the base OS in use, so this would be the best options in many ways. Qemu 1.0 is built with rbd support, and dynamically links correctly to librbd from the Cuttlefish distribution. However, I note things like: http://tracker.ceph.com/issues/3737 http://patchwork.ozlabs.org/patch/232489/ which is the implementation of using asynchronous flushing in Qemu. That's only in 1.4.3 and 1.5 if I use the upstream version. Are there many such things? If not, part of me is tempted to backport a few such issues to Qemu 1.0 rather than risk upgrading to Qemu 1.5. I know there are a few issues with qemu convert, but I can solve those another way. We're using format 2 images, if that's relevant. -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] why so many ceph-create-keys processes?
On 19 Jun 2013, at 10:42, James Harper wrote: > Why are there so many ceph-create-keys processes? Under Debian, every time I > start the mons another ceph-create-keys process starts up. I've seen these hang around for no particular good reason (no Ubuntu). It seems to happen when there is some difficulty starting mon services. Once everything is up and running, it doesn't happen (at least for me). I never worked out quite what it was, but I think it was something like the init script starts them, but doesn't kill them under every circumstance where starting a mon fails. -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Backport of modern qemu rbd driver to qemu 1.0 + Precise packaging
I've backported the modern (qemu 1.5) rbd driver to qemu 1.0 (for anyone interested). This is designed for people who are conservative in hypervisor version, but like more bleeding edge storage. The main thing this adds is asynchronous flush to rbd, plus automatic control of rbd caching behaviour. I have NOT backported the extended configuration bits. I've used runtime testing through weak binding to detect the version of librd in use, so it's possible to compile against a standard Precise librbd, then run with a more modern one and take advantage of async flush. It works the other way around too. Note the original implementation of this posted to the qemu list (but not taken) did not quite work. The backport in qemu repository format can be found at https://github.com/flexiant/qemu/commits/v1.0-rbd-add-async-flush (note the branch is v1.0-rbd-add-async-flush). I've also backported this to the Ubuntu Precise packaging of qemu-kvm, (again note the branch is v1.0-rbd-add-async-flush) at https://github.com/flexiant/qemu-kvm-1.0-noroms/tree/v1.0-rbd-add-async-flush THESE PATCHES ARE VERY LIGHTLY TESTED. USE AT YOUR OWN RISK. -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] monitor removal and re-add
On 25 Jun 2013, at 00:39, Mandell Degerness wrote: > The issue, Sage, is that we have to deal with the cluster being > re-expanded. If we start with 5 monitors and scale back to 3, running > the "ceph mon remove N" command after stopping each monitor and don't > restart the existing monitors, we cannot re-add those same monitors > that were previously removed. They will suicide at startup. Can you not restart the remaining monitors individually at the end of the process once the monmaps and the ceph.confs have been updated so they only think there are 3 monitors? Once you have got to a stable 3 mon config, you can go back up to 5. -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ceph-users@lists.ceph.com
On 28 Jun 2013, at 08:41, 华仔 wrote: > write speed: wget http://remote server ip/2GB.file ,we get a write speed > at an average speed of 6MB/s.(far behind expected) > (we must get something wrong there, we would appreciate a lot if any help > comes from you. we think the problems comes from the kvm emulator, but we are > not sure, can you give us some advice to improve our vm's disk performance in > the aspect of writing speed?) Are you using cache=writeback on your kvm command line? What about librbd caching? What versions of kvm & ceph? -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problem with data distribution
On 1 Jul 2013, at 17:02, Gregory Farnum wrote: > It looks like probably your PG counts are too low for the number of > OSDs you have. > http://ceph.com/docs/master/rados/operations/placement-groups/ The docs you referred Pierre to say: "Important Increasing the number of placement groups in a pool after you create the pool is still an experimental feature in Bobtail (v 0.56). We recommend defining a reasonable number of placement groups and maintaining that number until Ceph’s placement group splitting and merging functionality matures." but they do not tell you how to increase that number (whether it's experimental or not) after a pool has been created. Also, they say the default number of PGs is 8, but "When you create a pool, set the number of placement groups to a reasonable value (e.g., 100)." If so, perhaps a different default should be used! -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Problem with data distribution
On 1 Jul 2013, at 17:37, Gregory Farnum wrote: > Oh, that's out of date! PG splitting is supported in Cuttlefish: > "ceph osd pool set pg_num " > http://ceph.com/docs/master/rados/operations/control/#osd-subsystem Ah, so: pg_num: The placement group number. means pg_num: The number of placement groups. Perhaps worth demystifying for those hard of understanding such as myself. I'm still not quite sure how that relates to pgp_num. -- Alex Bligh ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com