[ceph-users] Adding OSDs without ceph-deploy

2014-07-30 Thread Alex Bligh
I use a scripted installation of ceph without ceph-deploy, which works fine on 
0.63. On 0.80 it fails to add the OSDs. In this scenario the local OSDs are all 
listed in ceph.conf.

It runs:
  mkcephfs --init-local-daemons osd -d blah

which creates the OSDs (as in they are there on the file system):

# ls /var/lib/ceph/osd/ceph-*
/var/lib/ceph/osd/ceph-0:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  
superblock  whoami

/var/lib/ceph/osd/ceph-1:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  
superblock  whoami

/var/lib/ceph/osd/ceph-2:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  
superblock  whoami

/var/lib/ceph/osd/ceph-3:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  
superblock  whoami


However I get:

# service ceph start
=== mon.a ===
Starting Ceph mon.a on extility-qa2-test...already running
=== osd.0 ===
Error ENOENT: osd.0 does not exist.  create it before updating the crush map
failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.0 
--keyring=/var/lib/ceph/osd/ceph-0/keyring osd crush create-or-move -- 0 0.09 
host=extility-qa2-test root=default'

and ceph status returns no osds:

root@extility-qa2-test:~# ceph status
cluster 68efa90e-20a3-4efe-9382-38c8839aa6b0
 health HEALTH_ERR 768 pgs stuck inactive; 768 pgs stuck unclean; no osds
 monmap e1: 1 mons at {a=10.157.208.1:6789/0}, election epoch 2, quorum 0 a
 osdmap e1: 0 osds: 0 up, 0 in
  pgmap v2: 768 pgs, 3 pools, 0 bytes data, 0 objects
0 kB used, 0 kB / 0 kB avail
 768 creating

I'm fully aware there is a newer way to do this, but I'd like this route to 
work too if possible.

Is there some new magic I need to do to get ceph to recognise the osds? (again 
without ceph-deploy)

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OCFS2 or GFS2 for cluster filesystem?

2013-07-11 Thread Alex Bligh

On 11 Jul 2013, at 19:25, Gilles Mocellin wrote:

> Hello,
> 
> Yes, you missed that qemu can use directly RADOS volume.
> Look here :
> http://ceph.com/docs/master/rbd/qemu-rbd/
> 
> Create :
> qemu-img create -f rbd rbd:data/squeeze 10G
> 
> Use :
> 
> qemu -m 1024 -drive format=raw,file=rbd:data/squeeze

I don't think he did. As I read it he wants his VMs to all access the same 
filing system, and doesn't want to use cephfs.

OCFS2 on RBD I suppose is a reasonable choice for that.

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OCFS2 or GFS2 for cluster filesystem?

2013-07-12 Thread Alex Bligh

On 12 Jul 2013, at 13:21, Tom Verdaat wrote:

> In the mean time I've done some more research and figured out that:
>   • There is a bunch of other cluster file systems but GFS2 and OCFS2 are 
> the only open source ones I could find, and I believe the only ones that are 
> integrated in the Linux kernel.
>   • OCFS2 seems to have a lot more public information than GFS2. It has 
> more documentation and a living - though not very active - mailing list.
>   • OCFS2 seems to be in active use by its sponsor Oracle, while I can't 
> find much on GFS2 from its sponsor RedHat.
>   • OCFS2 documentation indicates a node soft limit of 256 versus 16 for 
> GFS2, and there are actual deployments of stable 45 TB+ production clusters.
>   • Performance tests from 2010 indicate OCFS2 clearly beating GFS2, 
> though of course newer versions have been released since.
>   • GFS2 has more fencing options than OCFS2.

FWIW: For VM images (i.e. large files accessed by only one client at once) 
OCFS2 seems to perform better than GFS2. I seem to remember some performance 
issues with small files, and large directories with a lot of contention 
(multiple readers and writers of files or file metadata). You may need to 
forward port some of the more modern tools to your distro.

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Location of MONs

2013-07-23 Thread Alex Bligh

On 23 Jul 2013, at 17:16, Gregory Farnum wrote:

>> And without wanting to sound daft having missed a salient configuration
>> detail, but there's no way to release when it's written the primary?
> 
> Definitely not. Ceph's consistency guarantees and recovery mechanisms
> are all built on top of all the replicas having a consistent copy and
> that breaks if you do primary-only acks. Maybe in the future something
> like this will happen, but it's all very blue-sky right now.

Saying that, another possibility is a persistent writeback cache on
the client (either in qemu or in librbd), the former being something
I'm toying with. Being persistent it can complete flush/fua
type operations before they are actually written to ceph.
It wasn't intended for this use case but it might be interesting.

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.61.6 Cuttlefish update released

2013-07-24 Thread Alex Bligh

On 24 Jul 2013, at 05:47, Sage Weil wrote:

> There was a problem with the monitor daemons in v0.61.5 that would prevent 
> them from restarting after some period of time.  This release fixes the 
> bug and works around the issue to allow affected monitors to restart.  
> All v0.61.5 users are strongly recommended to upgrade.

Was this bug also in 0.61.4?

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [list admin] - membership disabled due to bounces

2013-08-11 Thread Alex Bligh
This is the third of these I've got in a month. I get them on no
other mailing lists and I'm on quite a lot of lists. I have an
outsourced spam thing which is generally well behaved but will
reject spam. Any ideas why this might be?

Alex

Begin forwarded message:

> From: ceph-users-requ...@lists.ceph.com
> Date: 11 August 2013 08:25:30 GMT+01:00
> To: a...@alex.org.uk
> Subject: confirm [REDACTED]
> 
> Your membership in the mailing list ceph-users has been disabled due
> to excessive bounces The last bounce received from you was dated
> 11-Aug-2013.  You will not get any more messages from this list until
> you re-enable your membership.  You will receive 3 more reminders like
> this before your membership in the list is deleted.
> 
> To re-enable your membership, you can simply respond to this message
> (leaving the Subject: line intact), or visit the confirmation page at
> 
>http://lists.ceph.com/confirm.cgi/ceph-users-ceph.com/[REDACTED]
> 
> 
> You can also visit your membership page at
> 
>http://lists.ceph.com/options.cgi/ceph-users-ceph.com/[REDACTED]
> 
> 
> On your membership page, you can change various delivery options such
> as your email address and whether you get digests or not.  As a
> reminder, your membership password is
> 
>[REDACTED]
> 
> If you have any questions or problems, you can contact the list owner
> at
> 
>ceph-users-ow...@lists.ceph.com
> 
> 

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [list admin] - membership disabled due to bounces

2013-08-11 Thread Alex Bligh
James,

I'd prefer the list admins either tuned the sensitivity of the bounce filter or 
rejected the spam on ingress. I see nearly no spam so it seems my spam filter 
is effective :-)

Alex

On 11 Aug 2013, at 11:26, James Harper wrote:

> This list actually does get a bit of spam, unlike most lists I'm subscribed 
> to. I'm surprised more reputation filters haven't blocked it. Rejecting spam 
> is the only right way to do it (junk mail folders are dumb), but obviously 
> the ceph-users list is taking the bounces as indicating a problem with your 
> account.
> 
> Probably the only thing to do is to white list the address and put up with 
> the spam.
> 
> James
> 
>> -Original Message-
>> From: ceph-users-boun...@lists.ceph.com [mailto:ceph-users-
>> boun...@lists.ceph.com] On Behalf Of Alex Bligh
>> Sent: Sunday, 11 August 2013 6:43 PM
>> To: ceph-users@lists.ceph.com
>> Subject: [ceph-users] [list admin] - membership disabled due to bounces
>> 
>> This is the third of these I've got in a month. I get them on no
>> other mailing lists and I'm on quite a lot of lists. I have an
>> outsourced spam thing which is generally well behaved but will
>> reject spam. Any ideas why this might be?
>> 
>> Alex
>> 
>> Begin forwarded message:
>> 
>>> From: ceph-users-requ...@lists.ceph.com
>>> Date: 11 August 2013 08:25:30 GMT+01:00
>>> To: a...@alex.org.uk
>>> Subject: confirm [REDACTED]
>>> 
>>> Your membership in the mailing list ceph-users has been disabled due
>>> to excessive bounces The last bounce received from you was dated
>>> 11-Aug-2013.  You will not get any more messages from this list until
>>> you re-enable your membership.  You will receive 3 more reminders like
>>> this before your membership in the list is deleted.
>>> 
>>> To re-enable your membership, you can simply respond to this message
>>> (leaving the Subject: line intact), or visit the confirmation page at
>>> 
>>>   http://lists.ceph.com/confirm.cgi/ceph-users-ceph.com/[REDACTED]
>>> 
>>> 
>>> You can also visit your membership page at
>>> 
>>>   http://lists.ceph.com/options.cgi/ceph-users-ceph.com/[REDACTED]
>>> 
>>> 
>>> On your membership page, you can change various delivery options such
>>> as your email address and whether you get digests or not.  As a
>>> reminder, your membership password is
>>> 
>>>   [REDACTED]
>>> 
>>> If you have any questions or problems, you can contact the list owner
>>> at
>>> 
>>>   ceph-users-ow...@lists.ceph.com
>>> 
>>> 
>> 
>> --
>> Alex Bligh
>> 
>> 
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] newbie question: rebooting the whole cluster, powerfailure

2013-09-06 Thread Alex Bligh

On 6 Sep 2013, at 13:46, Jens Kristian Søgaard wrote:

> You created 7 mons in ceph. This is like having a parliament with 7 members.
> 
> Whenever you want to do something, you need to convince a majority of 
> parliament to vote yes. A majority would then be 4 members voting yes.
> 
> If two members of parliament decide to stay at home instead of turning up to 
> vote - you still need 4 members to get a majority.
> 
> It is _not_ the case that everyone would suddenly agree and acknowledge that 
> only 5 parliament members have turned up to vote, so that only 3 yes votes 
> would be enough to form a majority.

Perhaps not a great analogy. At least in the case of the UK parliament, if 2 
members of a 7 member parliament stay at home and don't vote, you would only 
need 3 members to pass a resolution. In the UK (and I believe in most other 
parliaments) you need the number of 'yes' to exceed the number of 'no'. The 
number of members does not matter.

In ceph, you need the number of monitors active and voting yes to exceed (i.e. 
be strictly greater than) half the number of monitors configured.

There is no magic about anything being odd or even, save that the quorum for an 
n-MON cluster, where n is odd, is the same as the quorum for an n+1 MON cluster 
(as n+1 is even) - in both cases if at least k=(n+1)/2 devices fail it will 
take the cluster out (i.e. (n-1)/2 have to survive). This makes deploying even 
numbers of MON devices wasteful (does not increase quorum) and arguably 
increases the chance of failure (as now we need k devices of n+1 to fail, as 
opposed to k devices of n).

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph performance with 8K blocks.

2013-09-18 Thread Alex Bligh

On 17 Sep 2013, at 21:47, Jason Villalta wrote:

> dd if=ddbenchfile of=/dev/null bs=8K
> 819200 bytes (8.2 GB) copied, 19.7318 s, 415 MB/s

As a general point, this benchmark may not do what you think it does, depending 
on the version of dd, as writes to /dev/null can be heavily optimised.

Try:
  dd if=ddbenchfile of=- bs=8K | dd if=- of=/dev/null bs=8K

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Abort on moving OSD

2013-05-18 Thread Alex Bligh
I have set up a configuration with 3 x MON + 2 x OSD, each on a different host, 
as a test bench setup. I've written nothing to the cluster (yet).

I'm running ceph 0.61.2 (cuttlefish).

I want to discover what happens if I move an OSD from one host to another, 
simulating the effect of moving a working harddrive from a dead host to a live 
host, which I believe should work. So I stopped osd.0 on one host, and copied 
(using scp) /var/lib/ceph/osd/ceph-0 from one host to another. My understanding 
is that starting osd.0 on the destination host with 'service ceph start osd.0' 
should rewrite the crush map and everything should be fine.

In fact what happened was:

root@ceph6:~# service ceph start osd.0
=== osd.0 === 
create-or-move updating item id 0 name 'osd.0' weight 0.05 at location 
{host=ceph6,root=default} to crush map
Starting Ceph osd.0 on ceph6...
starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 
/var/lib/ceph/osd/ceph-0/journal
...
root@ceph6:~# ceph health
HEALTH_WARN 192 pgs degraded; 192 pgs stuck unclean; 1/2 in osds are down

osd.0 was not running on the new host, due to the abort as set out below (from 
the log file). Should this work?

-- 
Alex Bligh


2013-05-18 17:03:00.345129 7fa408dbb780  0 ceph version 0.61.2 
(fea782543a844bb277ae94d3391788b76c5bee60), process ceph-osd, pid 3398
2013-05-18 17:03:00.676611 7fa408dbb780 -1 filestore(/var/lib/ceph/osd/ceph-0) 
limited size xattrs -- filestore_xattr_use_omap enabled
2013-05-18 17:03:00.891267 7fa408dbb780  0 filestore(/var/lib/ceph/osd/ceph-0) 
mount FIEMAP ioctl is supported and appears to work
2013-05-18 17:03:00.891314 7fa408dbb780  0 filestore(/var/lib/ceph/osd/ceph-0) 
mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-05-18 17:03:00.891533 7fa408dbb780  0 filestore(/var/lib/ceph/osd/ceph-0) 
mount did NOT detect btrfs
2013-05-18 17:03:01.373741 7fa408dbb780  0 filestore(/var/lib/ceph/osd/ceph-0) 
mount syncfs(2) syscall fully supported (by glibc and kernel)
2013-05-18 17:03:01.374175 7fa408dbb780  0 filestore(/var/lib/ceph/osd/ceph-0) 
mount found snaps <>
2013-05-18 17:03:02.023315 7fa408dbb780  0 filestore(/var/lib/ceph/osd/ceph-0) 
mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-05-18 17:03:02.024992 7fa408dbb780 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
2013-05-18 17:03:02.025372 7fa408dbb780  1 journal _open 
/var/lib/ceph/osd/ceph-0/journal fd 21: 1048576000 bytes, block size 4096 
bytes, directio = 1, aio = 0
2013-05-18 17:03:02.025580 7fa408dbb780  1 journal _open 
/var/lib/ceph/osd/ceph-0/journal fd 21: 1048576000 bytes, block size 4096 
bytes, directio = 1, aio = 0
2013-05-18 17:03:02.027454 7fa408dbb780  1 journal close 
/var/lib/ceph/osd/ceph-0/journal
2013-05-18 17:03:02.302070 7fa408dbb780 -1 filestore(/var/lib/ceph/osd/ceph-0) 
limited size xattrs -- filestore_xattr_use_omap enabled
2013-05-18 17:03:02.361438 7fa408dbb780  0 filestore(/var/lib/ceph/osd/ceph-0) 
mount FIEMAP ioctl is supported and appears to work
2013-05-18 17:03:02.361508 7fa408dbb780  0 filestore(/var/lib/ceph/osd/ceph-0) 
mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-05-18 17:03:02.361755 7fa408dbb780  0 filestore(/var/lib/ceph/osd/ceph-0) 
mount did NOT detect btrfs
2013-05-18 17:03:02.424915 7fa408dbb780  0 filestore(/var/lib/ceph/osd/ceph-0) 
mount syncfs(2) syscall fully supported (by glibc and kernel)
2013-05-18 17:03:02.425107 7fa408dbb780  0 filestore(/var/lib/ceph/osd/ceph-0) 
mount found snaps <>
2013-05-18 17:03:02.519006 7fa408dbb780  0 filestore(/var/lib/ceph/osd/ceph-0) 
mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-05-18 17:03:02.520446 7fa408dbb780 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
2013-05-18 17:03:02.520507 7fa408dbb780  1 journal _open 
/var/lib/ceph/osd/ceph-0/journal fd 29: 1048576000 bytes, block size 4096 
bytes, directio = 1, aio = 0
2013-05-18 17:03:02.520625 7fa408dbb780  1 journal _open 
/var/lib/ceph/osd/ceph-0/journal fd 29: 1048576000 bytes, block size 4096 
bytes, directio = 1, aio = 0
2013-05-18 17:03:02.522371 7fa408dbb780  0 osd.0 24 crush map has features 
33816576, adjusting msgr requires for clients
2013-05-18 17:03:02.522419 7fa408dbb780  0 osd.0 24 crush map has features 
33816576, adjusting msgr requires for osds
2013-05-18 17:03:02.533617 7fa408dbb780 -1 *** Caught signal (Aborted) **
 in thread 7fa408dbb780

 ceph version 0.61.2 (fea782543a844bb277ae94d3391788b76c5bee60)
 1: /usr/bin/ceph-osd() [0x79087a]
 2: (()+0xfcb0) [0x7fa408254cb0]
 3: (gsignal()+0x35) [0x7fa406a0d425]
 4: (abort()+0x17b) [0x7fa406a10b8b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7fa40735f69d]
 6: (()+0xb5846) [0x7fa40735d846]
 7: (()+0xb5873) [0x7fa40735d873]
 8: (()+0xb596e) [0x7fa40735d96e]
 9: (ceph::buff

Re: [ceph-users] Abort on moving OSD

2013-05-18 Thread Alex Bligh

On 18 May 2013, at 18:20, Alex Bligh wrote:

> I want to discover what happens if I move an OSD from one host to another, 
> simulating the effect of moving a working harddrive from a dead host to a 
> live host, which I believe should work. So I stopped osd.0 on one host, and 
> copied (using scp) /var/lib/ceph/osd/ceph-0 from one host to another. My 
> understanding is that starting osd.0 on the destination host with 'service 
> ceph start osd.0' should rewrite the crush map and everything should be fine.

Apologies, this was my idiocy. scp does not copy xattrs. rsync -aHAX does, and 
indeed works fine.

I suppose it would have been nice if it died a little more gracefully, but I 
think I got what I deserved.

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Setting OSD weight

2013-05-20 Thread Alex Bligh
How do I set the weight for OSDs? I have 4 OSDs I want to create
with very low weight (<1) so they are never used if any other OSDs
are added subsequently (and would like to avoid placement groups).

These OSDs have been created with default settings using the manual
OSD add procedure as per ceph docs. But (unless I am being stupid
which is quite possible), setting the weight (either to 0.0001 or
to 2) appears to have no effect per a ceph osd dump.

-- 
Alex Bligh



root@kvm:~# ceph osd dump
 
epoch 12
fsid ed0e2e56-bc17-4ef2-a1db-b030c77a8d45
created 2013-05-20 14:58:02.250461
modified 2013-05-20 14:59:54.580601
flags 

pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 
320 pgp_num 320 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins 
pg_num 320 pgp_num 320 last_change 1 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 
320 pgp_num 320 last_change 1 owner 0

max_osd 4
osd.0 up   in  weight 1 up_from 2 up_thru 10 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6800/30687 10.161.208.1:6801/30687 10.161.208.1:6803/30687 
exists,up 9cc2a2cf-e79e-404b-9b49-55c8954b0684
osd.1 up   in  weight 1 up_from 4 up_thru 11 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6804/30800 10.161.208.1:6806/30800 10.161.208.1:6807/30800 
exists,up 11628f8d-8234-4329-bf6e-e130d76f18f5
osd.2 up   in  weight 1 up_from 3 up_thru 11 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6809/30913 10.161.208.1:6810/30913 10.161.208.1:6811/30913 
exists,up 050c8955-84aa-4025-961a-f9d9fe60a5b0
osd.3 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6812/31024 10.161.208.1:6813/31024 10.161.208.1:6814/31024 
exists,up bcd4ad0e-c0e4-4c46-95c2-e68906f8e69a


root@kvm:~# ceph osd crush set 0 2 root=default
set item id 0 name 'osd.0' weight 2 at location {root=default} to crush map
root@kvm:~# ceph osd dump
 
epoch 14
fsid ed0e2e56-bc17-4ef2-a1db-b030c77a8d45
created 2013-05-20 14:58:02.250461
modified 2013-05-20 15:13:21.009317
flags 

pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 
320 pgp_num 320 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins 
pg_num 320 pgp_num 320 last_change 1 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 
320 pgp_num 320 last_change 1 owner 0

max_osd 4
osd.0 up   in  weight 1 up_from 2 up_thru 13 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6800/30687 10.161.208.1:6801/30687 10.161.208.1:6803/30687 
exists,up 9cc2a2cf-e79e-404b-9b49-55c8954b0684
osd.1 up   in  weight 1 up_from 4 up_thru 13 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6804/30800 10.161.208.1:6806/30800 10.161.208.1:6807/30800 
exists,up 11628f8d-8234-4329-bf6e-e130d76f18f5
osd.2 up   in  weight 1 up_from 3 up_thru 13 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6809/30913 10.161.208.1:6810/30913 10.161.208.1:6811/30913 
exists,up 050c8955-84aa-4025-961a-f9d9fe60a5b0
osd.3 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6812/31024 10.161.208.1:6813/31024 10.161.208.1:6814/31024 
exists,up bcd4ad0e-c0e4-4c46-95c2-e68906f8e69a

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Setting OSD weight

2013-05-20 Thread Alex Bligh

On 20 May 2013, at 17:19, Sage Weil wrote:

> Look at 'ceph osd tree'.  The weight value in 'ceph osd dump' output is 
> the in/out correction, not the crush weight.

Doh. Thanks.

Is there a difference between:
  ceph osd crush set 0 2 root=default
and
  ceph osd crush reweight osd.0 2
?

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Determining when an 'out' OSD is actually unused

2013-05-20 Thread Alex Bligh
If I want to remove an osd, I use 'ceph out' before taking it down, i.e. 
stopping the OSD process, and removing the disk.

How do I (preferably programatically) tell when it is safe to stop the OSD 
process? The documentation says 'ceph -w', which is not especially helpful, (a) 
if I want to do it programatically, or (b) if there are other problems in the 
cluster so ceph was not reporting HEALTH_OK to start with.

Is there a better way?

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Determining when an 'out' OSD is actually unused

2013-05-20 Thread Alex Bligh
Dan,

On 21 May 2013, at 00:52, Dan Mick wrote:

> On 05/20/2013 01:33 PM, Alex Bligh wrote:
>> If I want to remove an osd, I use 'ceph out' before taking it down, i.e. 
>> stopping the OSD process, and removing the disk.
>> 
>> How do I (preferably programatically) tell when it is safe to stop the OSD 
>> process? The documentation says 'ceph -w', which is not especially helpful, 
>> (a) if I want to do it programatically, or (b) if there are other problems 
>> in the cluster so ceph was not reporting HEALTH_OK to start with.
>> 
>> Is there a better way?
>> 
> 
> We've had some discussions about this recently, but there's no great way of 
> doing this right now.

OK. So would the following conservative rule work for now?
* Don't mark the OSD out until and unless you have ceph HEALTH_OK
* Then mark it out
* Then you are safe to remove only when it returns to ceph HEALTH_OK

The instructions at present say watch ceph -w, but don't say exactly what to 
watch for.

> We should probably have a query option that returns "number of PGs on this 
> OSD" or some such.

That would be very useful.

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Determining when an 'out' OSD is actually unused

2013-05-21 Thread Alex Bligh

On 21 May 2013, at 07:17, Dan Mick wrote:

> Yes, with the proviso that you really mean "kill the osd" when clean.  
> Marking out is step 1.

Thanks

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.2 rbd-fixed ubuntu packages

2013-05-28 Thread Alex Bligh
Wolfgang,

On 28 May 2013, at 06:50, Wolfgang Hennerbichler wrote:

> for anybody who's interested, I've packaged the latest qemu-1.4.2 (not 1.5, 
> it didn't work nicely with libvirt) which includes important fixes to RBD for 
> ubuntu 12.04 AMD64. If you want to save some time, I can share the packages 
> with you. drop me a line if you're interested. 

Information as to what the important fixes are would be appreciated!

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.2 rbd-fixed ubuntu packages

2013-05-29 Thread Alex Bligh

On 28 May 2013, at 06:50, Wolfgang Hennerbichler wrote:

> for anybody who's interested, I've packaged the latest qemu-1.4.2 (not 1.5, 
> it didn't work nicely with libvirt) which includes important fixes to RBD for 
> ubuntu 12.04 AMD64. If you want to save some time, I can share the packages 
> with you. drop me a line if you're interested. 


The issue Wolfgang is referring to is here:
 http://tracker.ceph.com/issues/3737

And the actual patch to QEMU is here:
 http://patchwork.ozlabs.org/patch/232489/

I'd be interested in whether the raring version (1.4.0+dfsg-1expubuntu4) 
contains this (unchecked as yet).

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Recommended versions of Qemu/KVM to run Ceph Cuttlefish

2013-06-18 Thread Alex Bligh
I'm planning on running Ceph Cuttlefish with Qemu/KVM using Qemu's
inbuilt RBD support (not kernel RBD). I may go beyond Cuttlefish.

What versions of Qemu are recommended for this? Qemu 1.0 is what
ships with Ubuntu Precise LTS which the base OS in use, so this
would be the best options in many ways. Qemu 1.0 is built with
rbd support, and dynamically links correctly to librbd from
the Cuttlefish distribution.

However, I note things like:
  http://tracker.ceph.com/issues/3737
  http://patchwork.ozlabs.org/patch/232489/

which is the implementation of using asynchronous flushing
in Qemu. That's only in 1.4.3 and 1.5 if I use the upstream
version.

Are there many such things? If not, part of me is tempted to
backport a few such issues to Qemu 1.0 rather than risk upgrading
to Qemu 1.5.

I know there are a few issues with qemu convert, but I can solve
those another way.

We're using format 2 images, if that's relevant.

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] why so many ceph-create-keys processes?

2013-06-19 Thread Alex Bligh

On 19 Jun 2013, at 10:42, James Harper wrote:

> Why are there so many ceph-create-keys processes? Under Debian, every time I 
> start the mons another ceph-create-keys process starts up.

I've seen these hang around for no particular good reason (no Ubuntu). It seems 
to happen when there is some difficulty starting mon services. Once everything 
is up and running, it doesn't happen (at least for me). I never worked out 
quite what it was, but I think it was something like the init script starts 
them, but doesn't kill them under every circumstance where starting a mon fails.

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Backport of modern qemu rbd driver to qemu 1.0 + Precise packaging

2013-06-21 Thread Alex Bligh
I've backported the modern (qemu 1.5) rbd driver to qemu 1.0 (for anyone
interested). This is designed for people who are conservative in hypervisor
version, but like more bleeding edge storage.

The main thing this adds is asynchronous flush to rbd, plus automatic
control of rbd caching behaviour. I have NOT backported the extended
configuration bits.

I've used runtime testing through weak binding to detect the version
of librd in use, so it's possible to compile against a standard
Precise librbd, then run with a more modern one and take advantage
of async flush. It works the other way around too. Note the original
implementation of this posted to the qemu list (but not
taken) did not quite work.

The backport in qemu repository format can be found at
 https://github.com/flexiant/qemu/commits/v1.0-rbd-add-async-flush
(note the branch is v1.0-rbd-add-async-flush).

I've also backported this to the Ubuntu Precise packaging of qemu-kvm,
(again note the branch is v1.0-rbd-add-async-flush) at
 https://github.com/flexiant/qemu-kvm-1.0-noroms/tree/v1.0-rbd-add-async-flush

THESE PATCHES ARE VERY LIGHTLY TESTED. USE AT YOUR OWN RISK.

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor removal and re-add

2013-06-24 Thread Alex Bligh

On 25 Jun 2013, at 00:39, Mandell Degerness wrote:

> The issue, Sage, is that we have to deal with the cluster being
> re-expanded.  If we start with 5 monitors and scale back to 3, running
> the "ceph mon remove N" command after stopping each monitor and don't
> restart the existing monitors, we cannot re-add those same monitors
> that were previously removed.  They will suicide at startup.

Can you not restart the remaining monitors individually at the
end of the process once the monmaps and the ceph.confs have been
updated so they only think there are 3 monitors?

Once you have got to a stable 3 mon config, you can go back up
to 5.

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


ceph-users@lists.ceph.com

2013-06-28 Thread Alex Bligh

On 28 Jun 2013, at 08:41, 华仔 wrote:

> write speed:  wget http://remote server ip/2GB.file ,we get a write speed 
> at an average speed of 6MB/s.(far behind expected)
> (we must get something wrong there, we would appreciate a lot if any help 
> comes from you. we think the problems comes from the kvm emulator, but we are 
> not sure, can you give us some advice to improve our vm's disk performance in 
> the aspect of writing speed?)

Are you using cache=writeback on your kvm command line? What about librbd 
caching? What versions of kvm & ceph?

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem with data distribution

2013-07-01 Thread Alex Bligh

On 1 Jul 2013, at 17:02, Gregory Farnum wrote:

> It looks like probably your PG counts are too low for the number of
> OSDs you have.
> http://ceph.com/docs/master/rados/operations/placement-groups/

The docs you referred Pierre to say:

"Important Increasing the number of placement groups in a pool after you create 
the pool is still an experimental feature in Bobtail (v 0.56). We recommend 
defining a reasonable number of placement groups and maintaining that number 
until Ceph’s placement group splitting and merging functionality matures."

but they do not tell you how to increase that number (whether it's experimental 
or not) after a pool has been created.

Also, they say the default number of PGs is 8, but "When you create a pool, set 
the number of placement groups to a reasonable value (e.g., 100)." If so, 
perhaps a different default should be used!

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem with data distribution

2013-07-01 Thread Alex Bligh

On 1 Jul 2013, at 17:37, Gregory Farnum wrote:

> Oh, that's out of date! PG splitting is supported in Cuttlefish:
> "ceph osd pool set  pg_num "
> http://ceph.com/docs/master/rados/operations/control/#osd-subsystem

Ah, so:
  pg_num: The placement group number.
means
  pg_num: The number of placement groups.

Perhaps worth demystifying for those hard of understanding such as
myself.

I'm still not quite sure how that relates to pgp_num.

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com