Re: [ceph-users] Testing linux-3.10 and rbd format v2

2013-06-24 Thread Damien Churchill
On 21 June 2013 16:32, Sage Weil  wrote:
> On Fri, 21 Jun 2013, Damien Churchill wrote:
>> Hi,
>>
>> I've built a copy of linux 3.10-rc6 (and added the patch from
>> ceph-client/for-linus) however when I try and map a rbd image created
>> with:
>>
>> # rbd create test-format-2 --size 10240 --format 2
>>
>> and then run a map on the machine running the new kernel:
>>
>> # rbd map test-format-2
>> rbd: add failed: (22) Invalid argument
>>
>> I receive the following message in dmesg:
>>
>> # dmesg | grep rbd
>> rbd: image test-format-2: unsupported stripe unit (got 4194304 want 1)
>>
>> Reading the docs it seems a striping unit of 1 is definitely not
>> wanted at all. Have I misbuilt or misconfigured the kernel at all? Or
>> created the image incorrectly?
>
> Hmm, I think 764684ef34af685cd8d46830a73826443f9129df needs to go
> upstream.  Can you try building a kenrel with that commit and see if it
> resolves the problem?
>

I've built a new kernel with that commit applied to it and now I can
map format 2 images to the kernel. Thanks for the help!

> Thanks!
> sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Resizing filesystem on RBD without unmount/mount cycle

2013-06-24 Thread Edward Huyer
Maybe I'm missing something obvious here.  Or maybe this is the way it has to 
be.  I haven't found an answer via Google.

I'm experimenting with ceph 0.61.4 and RBD under Ubuntu 13.0x.  I create a 
RADOS block device (test), map it, format it as ext4 or xfs, and mount it.  No 
problem.  I grow the underlying RBD.  lsblk on both /dev/rbd/rbd/test and 
/dev/rbd1 shows the new size, but the filesystem resize commands don't see the 
new size until I unmount and then mount the block device again.  "-o remount" 
isn't good enough, nor is partprobe.

Is there a way to club the filesystem tools into recognizing that the RBD has 
changed sizes without unmounting the filesystem?

-
Edward Huyer
School of Interactive Games and Media
Golisano 70-2373
152 Lomb Memorial Drive
Rochester, NY 14623
585-475-6651
erh...@rit.edu

Obligatory Legalese:
The information transmitted, including attachments, is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and destroy any copies of this information.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] several radosgw sharing pools

2013-06-24 Thread John Nielsen
How do you manage cache coherency with Varnish?

On Jun 21, 2013, at 6:09 AM, Artem Silenkov  wrote:

> This picture shows the way we do it 
> http://habrastorage.org/storage2/1ed/532/627/1ed5326273399df81f3a73179848a404.png
> 
> Regards, Artem Silenkov, 2GIS TM.
> ---
> 2GIS LLC
> 
> http://2gis.ru
> a.silenkov at 2gis.ru
> 
> gtalk:
> artem.silenkov at gmail.com
> 
> cell:+79231534853
> 
> 
> 
> 
> 2013/6/21 Alvaro Izquierdo Jimeno 
> Thanks Artem
> 
>  
> 
> De: Artem Silenkov [mailto:artem.silen...@gmail.com] 
> Enviado el: viernes, 21 de junio de 2013 14:01
> Para: Alvaro Izquierdo Jimeno
> CC: ceph-users@lists.ceph.com
> Asunto: Re: [ceph-users] several radosgw sharing pools
> 
>  
> 
> Good day! 
> 
>  
> 
> We use balancing such way
> 
>  
> 
> varnish frontend-->radosgw1
> 
> |
> 
> ->radosgw2
> 
>  
> 
> Every radosgw host use his own config so not necessary to add both nodes in 
> every ceph.conf. It looks like 
> 
>  
> 
> Host1
> 
> [client.radosgw.gateway]
> 
>  host = myhost1
> 
> ...
> 
>  
> 
> Host2
> 
> [client.radosgw.gateway]
> 
>  host = myhost2
> 
> ...
> 
>  
> 
>  
> 
> Pools, users, etc are internal params so every radosgw installation share 
> this without any problem. And shares concurrently so you can do atomic writes 
> and other good things. You could also use monit to monitor service health and 
> even try to repair it automatically. 
> 
>  
> 
> Regards, Artem Silenkov, 2GIS TM.
> 
> ---
> 
> 2GIS LLC
> 
> http://2gis.ru
> 
> a.silen...@2gis.ru
> 
> gtalk:artem.silen...@gmail.com
> 
> cell:+79231534853
> 
>  
> 
> 2013/6/21 Alvaro Izquierdo Jimeno 
> 
> Hi,
> 
>  
> 
> I have a ceph cluster with a radosgw running. The radosgw part in ceph.conf 
> is:
> 
> [client.radosgw.gateway]
> 
>   host = myhost1
> 
>   ……..
> 
>  
> 
> But if the process radosgw dies for some reason, we lose this behavior…So:
> 
>  
> 
> -Can I setup another radosgw in other host sharing pools, users…. in ceph?
> 
> i.e.:
> 
> [client.radosgw.gateway2]
> 
>  host = myhost2
> 
>   ……..
> 
> -If previous question is ‘yes’, Is there any load balancer in the radosgw 
> configure options?
> 
>  
> 
>  
> 
> Thank you so much in advanced and best regards,
> 
> Álvaro.
> 
> 
> 
> Verificada la ausencia de virus por G Data AntiVirus Versión: AVA 22.10538 
> del 21.06.2013 Noticias de virus: www.antiviruslab.com
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
>  
> 
> 
> 
> Verificada la ausencia de virus por G Data AntiVirus Versión: AVA 22.10538 
> del 21.06.2013 Noticias de virus: www.antiviruslab.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor removal and re-add

2013-06-24 Thread John Nielsen
On Jun 21, 2013, at 5:00 PM, Mandell Degerness  wrote:

> There is a scenario where we would want to remove a monitor and, at a
> later date, re-add the monitor (using the same IP address).  Is there
> a supported way to do this?  I tried deleting the monitor directory
> and rebuilding from scratch following the add monitor procedures from
> the web, but the monitor still suicide's when started.


I assume you're already referencing this:
http://ceph.com/docs/master/rados/operations/add-or-rm-mons/

I have done what you describe before. There were a couple hiccups, let's see if 
I remember the specifics:

Remove:
Follow the first two steps under "removing a monitor (manual) at the link above:
service ceph stop mon.N
ceph mon remove N
Comment out the monitor entry in ceph.conf on ALL mon, osd and client hosts.
Restart services as required to make everyone happy with the smaller set of 
monitors

Re-add:
Wipe the old monitor's directory and re-create it
Follow the steps for "adding a monitor (manual) at the link above. Instead of 
adding a new entry you can just un-comment your old ones in ceph.conf. You can 
also start the monitor with "service ceph start mon N" on the appropriate host 
instead of running yourself (step 8). Note that you DO need to run ceph-mon as 
specified in step 5. I was initially confused about the '--mkfs' flag there--it 
doesn't refer to the OS's filesystem, you should use a directory or mountpoint 
that is already prepared/mounted.

HTH. If you run into trouble post exactly the steps you followed and additional 
details about your setup.

JN

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Drive replacement procedure

2013-06-24 Thread Brian Candler
I'm just finding my way around the Ceph documentation. What I'm hoping 
to build are servers with 24 data disks and one O/S disk. From what I've 
read, the recommended configuration is to run 24 separate OSDs (or 23 if 
I have a separate journal disk/SSD), and not have any sort of in-server 
RAID.


Obviously, disks are going to fail - and the documentation acknowledges 
this.


What I'm looking for is a documented procedure for replacing a failed 
disk, but so far I have not been able to find one. Can you point me at 
the right place please?


I'm looking for something step-by-step and as idiot-proof as possible :-)

Thanks,

Brian.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Resizing filesystem on RBD without unmount/mount cycle

2013-06-24 Thread John Nielsen
On Jun 24, 2013, at 9:13 AM, Edward Huyer  wrote:

> I’m experimenting with ceph 0.61.4 and RBD under Ubuntu 13.0x.  I create a 
> RADOS block device (test), map it, format it as ext4 or xfs, and mount it.  
> No problem.  I grow the underlying RBD.  lsblk on both /dev/rbd/rbd/test and 
> /dev/rbd1 shows the new size, but the filesystem resize commands don’t see 
> the new size until I unmount and then mount the block device again.  “-o 
> remount” isn’t good enough, nor is partprobe.
>  
> Is there a way to club the filesystem tools into recognizing that the RBD has 
> changed sizes without unmounting the filesystem?

I know this is possible with e.g. virtual machines (c.f. "virsh blockresize"), 
so I agree it _ought_ to work. I don't know if the RBD kernel module has or 
needs any special support for online resizing.

It may work the same as "partprobe", but have you tried "blockdev --rereadpt"?

JN

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issues going from 1 to 3 mons

2013-06-24 Thread Jeppesen, Nelson
What do you mean 'bring up the second monitor with enough information'?

Here are the basic steps I took. It fails on step 4. If I skip step 4, I get a 
number out of range error.


1.  ceph auth get mon. -o /tmp/auth

2.  ceph mon getmap -o /tmp/map

3.  sudo ceph-mon -i 1 --mkfs --monmap /tmp/map --keyring /tmp/auth

4.  ceph mon add 1 [:]

5.  ceph-mon -i 1 --public-addr {ip:port}

Thank you.


From: Gregory Farnum [mailto:g...@inktank.com]
Sent: Sunday, June 23, 2013 12:59 PM
To: Jeppesen, Nelson
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Issues going from 1 to 3 mons

On Sunday, June 23, 2013, Jeppesen, Nelson wrote:
Hello,


I have cluster that only has one monitor running but I loose quorum when I try 
add a second monitor. I'm trying to raise the cluster from 1 to 3 monitors. I 
think it breaks when I run 'ceph mon add  [:]' because it 
loses quorum before the new monitor is online.

Thanks.

Yep, that's exactly what's happening. If you bring up the second monitor with 
enough information to contact the first, they should sync up and form a quorum. 
Have you do e that?
-Greg


--
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Drive replacement procedure

2013-06-24 Thread John Nielsen
On Jun 24, 2013, at 11:22 AM, Brian Candler  wrote:

> I'm just finding my way around the Ceph documentation. What I'm hoping to 
> build are servers with 24 data disks and one O/S disk. From what I've read, 
> the recommended configuration is to run 24 separate OSDs (or 23 if I have a 
> separate journal disk/SSD), and not have any sort of in-server RAID.
> 
> Obviously, disks are going to fail - and the documentation acknowledges this.
> 
> What I'm looking for is a documented procedure for replacing a failed disk, 
> but so far I have not been able to find one. Can you point me at the right 
> place please?
> 
> I'm looking for something step-by-step and as idiot-proof as possible :-)


The official documentation is maybe not %100 idiot-proof, but it is 
step-by-step:

http://ceph.com/docs/master/rados/operations/add-or-rm-osds/

If you lose a disk you want to remove the OSD associated with it. This will 
trigger a data migration so you are back to full redundancy as soon as it 
finishes. Whenever you get a replacement disk, you will add an OSD for it (the 
same as if you were adding an entirely new disk). This will also trigger a data 
migration so the new disk will be utilized immediately.

If you have a spare or replacement disk immediately after a disk goes bad, you 
could maybe save some data migration by doing the removal and re-adding within 
a short period of time, but otherwise "drive replacement" looks exactly like 
retiring an OSD and adding a new one that happens to use the same drive slot.

JN

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Issues going from 1 to 3 mons

2013-06-24 Thread Gregory Farnum
On Mon, Jun 24, 2013 at 10:36 AM, Jeppesen, Nelson
 wrote:
> What do you mean ‘bring up the second monitor with enough information’?
>
>
>
> Here are the basic steps I took. It fails on step 4. If I skip step 4, I get
> a number out of range error.
>
>
>
> 1.  ceph auth get mon. -o /tmp/auth
>
> 2.  ceph mon getmap -o /tmp/map
>
> 3.  sudo ceph-mon -i 1 --mkfs --monmap /tmp/map --keyring /tmp/auth
>
> 4.  ceph mon add 1 [:]

What's the failure here? Does it not return, or does it stop working
after that? I'd expect that following it with

> 5.  ceph-mon -i 1 --public-addr {ip:port}

should work...

Oh, I think I see — mon 1 is starting up and not seeing itself in the
monmap so it then shuts down. You'll need to convince it to turn on
and contact mon.0; I don't remember exactly how to do that (Joao?) but
I think you should be able to find what you need at
http://ceph.com/docs/master/dev/mon-bootstrap
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Drive replacement procedure

2013-06-24 Thread Brian Candler

On 24/06/2013 18:41, John Nielsen wrote:

The official documentation is maybe not %100 idiot-proof, but it is 
step-by-step:

http://ceph.com/docs/master/rados/operations/add-or-rm-osds/

If you lose a disk you want to remove the OSD associated with it. This will 
trigger a data migration so you are back to full redundancy as soon as it 
finishes. Whenever you get a replacement disk, you will add an OSD for it (the 
same as if you were adding an entirely new disk). This will also trigger a data 
migration so the new disk will be utilized immediately.

If you have a spare or replacement disk immediately after a disk goes bad, you could 
maybe save some data migration by doing the removal and re-adding within a short period 
of time, but otherwise "drive replacement" looks exactly like retiring an OSD 
and adding a new one that happens to use the same drive slot.

That's good, thank you. So I think it's something like this:

* Remove OSD
* Unmount filesystem (forcibly if necessary)
* Replace drive
* mkfs filesystem
* mount it on /var/lib/ceph/osd/ceph-{osd-number}
* Start OSD

Would you typically reuse the same OSD number?

One other thing I'm not clear about. At
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#adding-an-osd-manual
it says to mkdir the mountpoint, mkfs and mount the filesystem.

But at
http://ceph.com/docs/master/start/quick-ceph-deploy/#add-osds-on-standalone-disks
it says to use "ceph-deploy osd prepare" and "ceph-deploy osd activate", 
or the one-step version

"ceph-deploy osd create"

Is ceph-deploy doing the same things? Could I make a shorter 
disk-replacement procedure which uses ceph-deploy?


Thanks,

Brian.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Resizing filesystem on RBD without unmount/mount cycle

2013-06-24 Thread Edward Huyer
> -Original Message-
> From: John Nielsen [mailto:li...@jnielsen.net]
> Sent: Monday, June 24, 2013 1:24 PM
> To: Edward Huyer
> Cc: ceph-us...@ceph.com
> Subject: Re: [ceph-users] Resizing filesystem on RBD without
> unmount/mount cycle
> 
> On Jun 24, 2013, at 9:13 AM, Edward Huyer  wrote:
> 
> > I'm experimenting with ceph 0.61.4 and RBD under Ubuntu 13.0x.  I create
> a RADOS block device (test), map it, format it as ext4 or xfs, and mount it.  
> No
> problem.  I grow the underlying RBD.  lsblk on both /dev/rbd/rbd/test and
> /dev/rbd1 shows the new size, but the filesystem resize commands don't
> see the new size until I unmount and then mount the block device again.  "-o
> remount" isn't good enough, nor is partprobe.
> >
> > Is there a way to club the filesystem tools into recognizing that the RBD 
> > has
> changed sizes without unmounting the filesystem?
> 
> I know this is possible with e.g. virtual machines (c.f. "virsh 
> blockresize"), so I
> agree it _ought_ to work. I don't know if the RBD kernel module has or needs
> any special support for online resizing.
> 
> It may work the same as "partprobe", but have you tried "blockdev --
> rereadpt"?

No dice with blockdev.

I just now ran across this older thread in ceph-devel that seems to imply what 
I want isn't possible:
http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/8013

It looks like the kernel can't update the size of a block device while the 
block device is in use.  Boo.

Thanks anyway.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Drive replacement procedure

2013-06-24 Thread Dave Spano
If you remove the OSD after it fails from the cluster and the crushmap, the 
cluster will automatically re-assign that number to the new OSD when you run 
ceph osd create with no arguments. 

Here's my procedure for manually adding OSDs. This part of the documentation I 
wrote for myself, or in the event of a bus error. If anyone wants to call 
Shenanigans on my procedure, I'm always open to constructive criticism. 

Create an xfs filesystem on the whole disk. We don't need a partition because 
we're not creating multiple partitions on the OSDs. 
mkfs.xfs options: -f -i size=2048 -d su-64k for RAID0 tests 
Ex. mkfs.xfs -f -i size=2048 -d su=64k,sw=1 /dev/sdc 

Mount points use this convention /srv/ceph/osd[id], I.E. /srv/ceph/osd1 
Ex. cd /srv/ceph; mkdir osd1; 

Create a disk label, so you don't mount the wrong disk, and possbily use it 
with the wrong OSD daemon. 
Ex. xfs_admin -L osd1 /dev/sdb 
This assigns the label osd1 to /dev/sdb. 

Next we add the mount to /etc/fstab 
LABEL=osd1 /srv/ceph/osd1 xfs inode64,noatime 0 0 

Create the osd: 
ceph osd create [optional id] 
Will return next available number if nothing is specified, otherwise it will 
use the id specified. In almost all cases, it's best to let the cluster assign 
the id. 

Add the osd to the /etc/ceph/ceph.conf files. Use the by label convention to 
avoid mounting the wrong hard drive! 
Ex. 
[osd.1] 
host = ha1 
devs = /dev/disk/by-label/osd1 

Intialize the OSD's directory: 
ceph-osd -i {osd-num} --mkfs --mkkey 

Register the OSD authentication key: 
ceph auth add osd.1 osd 'allow *' mon 'allow rwx' -i 
/var/lib/ceph/osd/ceph--{osd-num}/keyring 

Add the OSD to the CRUSH map so it can receive data: 
ceph osd crush set osd.1 1.0 root=default host=ha1 

Check to make sure it's added. 
ceph osd tree 

Start up the new osd, and let it sync with the cluster. 
service ceph start osd.1 


Dave Spano 
Optogenics 


- Original Message -

From: "Brian Candler"  
To: "John Nielsen"  
Cc: ceph-users@lists.ceph.com 
Sent: Monday, June 24, 2013 2:04:32 PM 
Subject: Re: [ceph-users] Drive replacement procedure 

On 24/06/2013 18:41, John Nielsen wrote: 
> The official documentation is maybe not %100 idiot-proof, but it is 
> step-by-step: 
> 
> http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ 
> 
> If you lose a disk you want to remove the OSD associated with it. This will 
> trigger a data migration so you are back to full redundancy as soon as it 
> finishes. Whenever you get a replacement disk, you will add an OSD for it 
> (the same as if you were adding an entirely new disk). This will also trigger 
> a data migration so the new disk will be utilized immediately. 
> 
> If you have a spare or replacement disk immediately after a disk goes bad, 
> you could maybe save some data migration by doing the removal and re-adding 
> within a short period of time, but otherwise "drive replacement" looks 
> exactly like retiring an OSD and adding a new one that happens to use the 
> same drive slot. 
That's good, thank you. So I think it's something like this: 

* Remove OSD 
* Unmount filesystem (forcibly if necessary) 
* Replace drive 
* mkfs filesystem 
* mount it on /var/lib/ceph/osd/ceph-{osd-number} 
* Start OSD 

Would you typically reuse the same OSD number? 

One other thing I'm not clear about. At 
http://ceph.com/docs/master/rados/operations/add-or-rm-osds/#adding-an-osd-manual
 
it says to mkdir the mountpoint, mkfs and mount the filesystem. 

But at 
http://ceph.com/docs/master/start/quick-ceph-deploy/#add-osds-on-standalone-disks
 
it says to use "ceph-deploy osd prepare" and "ceph-deploy osd activate", 
or the one-step version 
"ceph-deploy osd create" 

Is ceph-deploy doing the same things? Could I make a shorter 
disk-replacement procedure which uses ceph-deploy? 

Thanks, 

Brian. 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Drive replacement procedure

2013-06-24 Thread Brian Candler

On 24/06/2013 20:27, Dave Spano wrote:
If you remove the OSD after it fails from the cluster and the 
crushmap, the cluster will automatically re-assign that number to the 
new OSD when you run ceph osd create with no arguments.


OK - although obviously if you're going to make a disk with a label like 
"osd1" then it seems you need to know in advance what OSD number to use.

Here's my procedure for manually adding OSDs.
That's very useful, thank you. I'm not sure I've got a one true document 
to give to operations saying "here's how you replace a failed disk" yet 
though, but maybe I can assemble one from this info, and then test it.


Regards,

Brian.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG Splitting

2013-06-24 Thread Travis Rhoden
Hello folks,

Is PG splitting considered stable now?  I feel like I used to see it
discussed all the time (and how it wasn't quite there), but haven't
heard anything about it in a while.  I remember seeing related bits in
release notes and such, but never an announcement that "you can now
increase the number of PGs in a pool".

I was thinking about this because I just deployed (successfully) a
small test cluster using ceph-deploy (first time I've gotten it to
work -- pretty smooth this time).  Since ceph-deploy has no idea how
many OSDs in total you are about to active/create, I suppose it has no
idea how to take a good guess at the number of PGs to set for the
"data" pool and kin.  So instead I just got 64 PGs per pool, which is
too low.

Can I just increase it with "ceph osd set..." now?

If not, would the best approach be to override the default in
ceph.conf in between "ceph-deploy new" and "ceph-deploy mon create" ?

Thanks,

 - Travis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor removal and re-add

2013-06-24 Thread Mandell Degerness
I'm testing the change (actually re-starting the monitors after the
monitor removal), but this brings up the issue with why we didn't want
to do this in the first place:  When reducing the number of monitors
from 5 to 3, we are guaranteed to have a service outage for the time
it takes to restart at least one of the monitors (and, possibly, for
two of the restarts, now that I think on it).  In theory, the
stop/start cycle is very short and should complete in a reasonable
time.  What I'm concerned about, however, is the case that something
is wrong with our re-written config file.  In that case, the outage is
immediate and will last until the problem is corrected on the first
server to have the monitor restarted.

On Mon, Jun 24, 2013 at 10:07 AM, John Nielsen  wrote:
> On Jun 21, 2013, at 5:00 PM, Mandell Degerness  
> wrote:
>
>> There is a scenario where we would want to remove a monitor and, at a
>> later date, re-add the monitor (using the same IP address).  Is there
>> a supported way to do this?  I tried deleting the monitor directory
>> and rebuilding from scratch following the add monitor procedures from
>> the web, but the monitor still suicide's when started.
>
>
> I assume you're already referencing this:
> http://ceph.com/docs/master/rados/operations/add-or-rm-mons/
>
> I have done what you describe before. There were a couple hiccups, let's see 
> if I remember the specifics:
>
> Remove:
> Follow the first two steps under "removing a monitor (manual) at the link 
> above:
> service ceph stop mon.N
> ceph mon remove N
> Comment out the monitor entry in ceph.conf on ALL mon, osd and client hosts.
> Restart services as required to make everyone happy with the smaller set of 
> monitors
>
> Re-add:
> Wipe the old monitor's directory and re-create it
> Follow the steps for "adding a monitor (manual) at the link above. Instead of 
> adding a new entry you can just un-comment your old ones in ceph.conf. You 
> can also start the monitor with "service ceph start mon N" on the appropriate 
> host instead of running yourself (step 8). Note that you DO need to run 
> ceph-mon as specified in step 5. I was initially confused about the '--mkfs' 
> flag there--it doesn't refer to the OS's filesystem, you should use a 
> directory or mountpoint that is already prepared/mounted.
>
> HTH. If you run into trouble post exactly the steps you followed and 
> additional details about your setup.
>
> JN
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor removal and re-add

2013-06-24 Thread Sage Weil
On Mon, 24 Jun 2013, Mandell Degerness wrote:
> I'm testing the change (actually re-starting the monitors after the
> monitor removal), but this brings up the issue with why we didn't want
> to do this in the first place:  When reducing the number of monitors
> from 5 to 3, we are guaranteed to have a service outage for the time
> it takes to restart at least one of the monitors (and, possibly, for
> two of the restarts, now that I think on it).  In theory, the
> stop/start cycle is very short and should complete in a reasonable
> time.  What I'm concerned about, however, is the case that something
> is wrong with our re-written config file.  In that case, the outage is
> immediate and will last until the problem is corrected on the first
> server to have the monitor restarted.

I'm jumping into this thread late, but: why would you follow the second 
removal procedure for broken clusters?  To go from 5->3 mons, you should 
just stop 2 of the mons and do 'ceph mon rm ' 'ceph mon rm 
'.

sage

> 
> On Mon, Jun 24, 2013 at 10:07 AM, John Nielsen  wrote:
> > On Jun 21, 2013, at 5:00 PM, Mandell Degerness  
> > wrote:
> >
> >> There is a scenario where we would want to remove a monitor and, at a
> >> later date, re-add the monitor (using the same IP address).  Is there
> >> a supported way to do this?  I tried deleting the monitor directory
> >> and rebuilding from scratch following the add monitor procedures from
> >> the web, but the monitor still suicide's when started.
> >
> >
> > I assume you're already referencing this:
> > http://ceph.com/docs/master/rados/operations/add-or-rm-mons/
> >
> > I have done what you describe before. There were a couple hiccups, let's 
> > see if I remember the specifics:
> >
> > Remove:
> > Follow the first two steps under "removing a monitor (manual) at the link 
> > above:
> > service ceph stop mon.N
> > ceph mon remove N
> > Comment out the monitor entry in ceph.conf on ALL mon, osd and client hosts.
> > Restart services as required to make everyone happy with the smaller set of 
> > monitors
> >
> > Re-add:
> > Wipe the old monitor's directory and re-create it
> > Follow the steps for "adding a monitor (manual) at the link above. Instead 
> > of adding a new entry you can just un-comment your old ones in ceph.conf. 
> > You can also start the monitor with "service ceph start mon N" on the 
> > appropriate host instead of running yourself (step 8). Note that you DO 
> > need to run ceph-mon as specified in step 5. I was initially confused about 
> > the '--mkfs' flag there--it doesn't refer to the OS's filesystem, you 
> > should use a directory or mountpoint that is already prepared/mounted.
> >
> > HTH. If you run into trouble post exactly the steps you followed and 
> > additional details about your setup.
> >
> > JN
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor removal and re-add

2013-06-24 Thread Mandell Degerness
Hmm.  This is a bit ugly from our perspective, but not fatal to your
design (just our implementation).  At the time we run the rm, the
cluster is smaller and so the restart of each monitor is not fatal to
the cluster.  The problem is on our side in terms of guaranteeing
order of behaviors.

On Mon, Jun 24, 2013 at 1:54 PM, Sage Weil  wrote:
> On Mon, 24 Jun 2013, Mandell Degerness wrote:
>> I'm testing the change (actually re-starting the monitors after the
>> monitor removal), but this brings up the issue with why we didn't want
>> to do this in the first place:  When reducing the number of monitors
>> from 5 to 3, we are guaranteed to have a service outage for the time
>> it takes to restart at least one of the monitors (and, possibly, for
>> two of the restarts, now that I think on it).  In theory, the
>> stop/start cycle is very short and should complete in a reasonable
>> time.  What I'm concerned about, however, is the case that something
>> is wrong with our re-written config file.  In that case, the outage is
>> immediate and will last until the problem is corrected on the first
>> server to have the monitor restarted.
>
> I'm jumping into this thread late, but: why would you follow the second
> removal procedure for broken clusters?  To go from 5->3 mons, you should
> just stop 2 of the mons and do 'ceph mon rm ' 'ceph mon rm
> '.
>
> sage
>
>>
>> On Mon, Jun 24, 2013 at 10:07 AM, John Nielsen  wrote:
>> > On Jun 21, 2013, at 5:00 PM, Mandell Degerness  
>> > wrote:
>> >
>> >> There is a scenario where we would want to remove a monitor and, at a
>> >> later date, re-add the monitor (using the same IP address).  Is there
>> >> a supported way to do this?  I tried deleting the monitor directory
>> >> and rebuilding from scratch following the add monitor procedures from
>> >> the web, but the monitor still suicide's when started.
>> >
>> >
>> > I assume you're already referencing this:
>> > http://ceph.com/docs/master/rados/operations/add-or-rm-mons/
>> >
>> > I have done what you describe before. There were a couple hiccups, let's 
>> > see if I remember the specifics:
>> >
>> > Remove:
>> > Follow the first two steps under "removing a monitor (manual) at the link 
>> > above:
>> > service ceph stop mon.N
>> > ceph mon remove N
>> > Comment out the monitor entry in ceph.conf on ALL mon, osd and client 
>> > hosts.
>> > Restart services as required to make everyone happy with the smaller set 
>> > of monitors
>> >
>> > Re-add:
>> > Wipe the old monitor's directory and re-create it
>> > Follow the steps for "adding a monitor (manual) at the link above. Instead 
>> > of adding a new entry you can just un-comment your old ones in ceph.conf. 
>> > You can also start the monitor with "service ceph start mon N" on the 
>> > appropriate host instead of running yourself (step 8). Note that you DO 
>> > need to run ceph-mon as specified in step 5. I was initially confused 
>> > about the '--mkfs' flag there--it doesn't refer to the OS's filesystem, 
>> > you should use a directory or mountpoint that is already prepared/mounted.
>> >
>> > HTH. If you run into trouble post exactly the steps you followed and 
>> > additional details about your setup.
>> >
>> > JN
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to remove /var/lib/ceph/osd/ceph-2?

2013-06-24 Thread Craig Lewis
I also have problems keeping my time in sync on VMWare virtual 
machines.  My problems occurs most when the VM Host is oversubscribed, 
or when I'm doing stress tests.  I ended up disabling ntpd in the 
guests, and enabled Host Time Sync using the VMWare Guest Tools.  All of 
my VMWare Hosts runs ntpd, using the same ntpd servers.


That's my development cluster.  For production, I'm using ntpd on real 
servers.




*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



On 6/18/13 05:41 , Da Chun wrote:


Thanks! Craig.
umount works.

About the time skew, I saw the log said the time difference should be 
less than 50ms. I setup one of my nodes as the time server, and the 
others sync the time with it. I don't know why the system time still 
changes frequently especially after reboot. Maybe it's because all my 
nodes are VMware virtual machines. The softclock is not accurate enough.


-- Original --
*From: * "Craig Lewis";
*Date: * Tue, Jun 18, 2013 05:34 AM
*To: * "ceph-users";
*Subject: * Re: [ceph-users] How to remove /var/lib/ceph/osd/ceph-2?

If you followed the standard setup, each OSD is it's own disk + 
filesystem. /var/lib/ceph/osd/ceph-2 is in use, as the mount point for 
the OSD.2 filesystem.  Double check by examining the output of the 
`mount` command.


I get the same error when I try to rename a directory that's used as a 
mount point.


Try `umount /var/lib/ceph/osd/ceph-2` instead of the mv and rm. The 
fuser command is telling you that the kernel has a filesystem mounted 
in that directory.  Nothing else appears to be using it, so the umount 
should complete successfully.



Also, you should fix that time skew on mon.ceph-node5.  The mailing 
list archives have several posts with good answers.



On 6/15/2013 2:14 AM, Da Chun wrote:

Hi all,
On Ubuntu 13.04 with ceph 0.61.3.
I want to remove osd.2 from my cluster. The following steps were 
performed:

root@ceph-node6:~# ceph osd out osd.2
marked out osd.2.
root@ceph-node6:~# ceph -w
   health HEALTH_WARN clock skew detected on mon.ceph-node5
   monmap e1: 3 mons at 
{ceph-node4=172.18.46.34:6789/0,ceph-node5=172.18.46.35:6789/0,ceph-node6=172.18.46.36:6789/0}, 
election epoch 124, quorum 0,1,2 ceph-node4,ceph-node5,ceph-node6

   osdmap e414: 6 osds: 5 up, 5 in
pgmap v10540: 456 pgs: 456 active+clean; 12171 MB data, 24325 MB 
used, 50360 MB / 74685 MB avail

   mdsmap e102: 1/1/1 up {0=ceph-node4=up:active}

2013-06-15 16:55:22.096059 mon.0 [INF] pgmap v10540: 456 pgs: 456 
active+clean; 12171 MB data, 24325 MB used, 50360 MB / 74685 MB avail

^C
root@ceph-node6:~# stop ceph-osd id=2
ceph-osd stop/waiting
root@ceph-node6:~# ceph osd crush remove osd.2
removed item id 2 name 'osd.2' from crush map
root@ceph-node6:~# ceph auth del osd.2
updated
root@ceph-node6:~# ceph osd rm 2
removed osd.2
root@ceph-node6:~# mv /var/lib/ceph/osd/ceph-2 
/var/lib/ceph/osd/ceph-2.bak
mv: cannot move ??/var/lib/ceph/osd/ceph-2?? to 
??/var/lib/ceph/osd/ceph-2.bak??: Device or resource busy


Everything was working OK until the last step to remove the osd.2 
directory /var/lib/ceph/osd/ceph-2.

root@ceph-node6:~# fuser -v /var/lib/ceph/osd/ceph-2
 USER  PID ACCESS COMMAND
/var/lib/ceph/osd/ceph-2:
 root kernel mount /var/lib/ceph/osd/ceph-2 
// What does this mean?

root@ceph-node6:~# lsof +D /var/lib/ceph/osd/ceph-2
root@ceph-node6:~#

I restarted the system, and found that the osd.2 daemon was still 
running:

root@ceph-node6:~# ps aux | grep osd
root  1264  1.4 12.3 550940 125732 ?   Ssl  16:41   0:20 
/usr/bin/ceph-osd --cluster=ceph -i 2 -f
root  2876  0.0  0.0   4440 628 ?Ss   16:44   0:00 
/bin/sh -e -c /usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id" 
-f /bin/sh
root  2877  4.9 18.2 613780 185676 ?   Sl   16:44   1:04 
/usr/bin/ceph-osd --cluster=ceph -i 5 -f


I have to take this workaround:
root@ceph-node6:~# rm -rf /var/lib/ceph/osd/ceph-2
rm: cannot remove ??/var/lib/ceph/osd/ceph-2??: Device or resource busy
root@ceph-node6:~# ls /var/lib/ceph/osd/ceph-2
root@ceph-node6:~# shutdown -r now

root@ceph-node6:~# ps aux | grep osd
root  1416  0.0  0.0   4440 628 ?Ss   17:10   0:00 
/bin/sh -e -c /usr/bin/ceph-osd --cluster="${cluster:-ceph}" -i "$id" 
-f /bin/sh
root  1417  8.9  5.8 468052 59868 ?Sl   17:10   0:02 
/usr/bin/ceph-osd --cluster=ceph -i 5 -f

root@ceph-node6:~# rm -r /var/lib/ceph/osd/ceph-2
root@ceph-node6:~#

Any idea? HELP!



___
ceph-use

Re: [ceph-users] Ceph and open source cloud software: Path of least resistance

2013-06-24 Thread Constantinos Venetsanopoulos

Hi Jens,

On 6/17/13 05:02 AM, Jens Kristian Søgaard wrote:

Hi Stratos,


you might want to take a look at Synnefo. [1]


I did take a look at it earlier, but decided not to test it.

Mainly I was deterred because I found the documentation a bit lacking. 
I opened up the section on File Storage and found that there were only 
chapter titles, but no actual content. Perhaps I was too quick to 
dismiss it.




Thanks for your interest in our work with Synnefo.

It seems you are referring to the empty sections of the Administrator's
Guide. If yes, then what you're saying is true: The project is in very
active development, so we are mostly focusing on the Installation Guide
right now, which we always try to keep updated with the latest commits:

http://www.synnefo.org/docs/synnefo/latest/quick-install-admin-guide.html

Perhaps you were a bit too quick to dismiss it
If you start playing around with Ganeti for VM management, I think
you'll love its simplicity and reliability. Then, Synnefo is a nice way
of providing cloud interfaces on top of Ganeti VMs, and also adding the
cloud storage part.

A bit more practical problem for me was that my test equipment 
consists of a single server (besides the Ceph cluster). As far as I 
understood the docs, there was a bug that makes it impossible to run 
Synnefo on a single server (to be fixed in the next version)?




This has been completely rehauled in Synnefo 0.14, which will be out by
next week, allowing any combination of components to coexist on a single
node, with arbitrary setting of URL prefixes for each. If you're feeling
adventurous, please find 0.14~rc4 packages for Squeeze at apt.dev.grnet.gr,
we've also uploaded the latest version of the docs at 
http://docs.synnefo.org.


Regarding my goals, I read through the installation guide and it 
recommends setting up an NFS server on one of the servers to serve 
images to the rest. This is what I wanted to avoid. Is that optional 
and/or could be replaced with Ceph?




We have integrated the storage service ("Pithos") with the compute
service, as the Image repository. Pithos has pluggable storage drivers,
through which it stores files as collections of content-addressable blocks.
One driver uses NFS, storing objects as distinct files on a shared 
directory,

another uses RADOS, storing objects as RADOS objects. Our production used
to run on NFS, and we're now transitioning to using RADOS exclusively.
Currently, we use both drivers simultaneously: Incoming file chunks are 
stored

both in RADOS and in the NFS share. Eventually, we'll just unplug the NFS
driver when we're ready to go RADOS-only.

In your case, you can start with Pithos being RADOS-only, although the
Installation Guide continues to refer to NFS for simplicity.


At the moment Ganeti only supports the in-kernel RBD driver, although
support for the qemu-rbd driver should be implemented soon. Using the


Hmm, I wanted to avoid using the in-kernel RBD driver, as I figured it 
lead to various problems. Is it not a problem in practice?




Our demo installation at http://www.synnefo.org ["Try it out"] uses the
in-kernel RBD driver for the "rbd" storage option. We haven't encountered
any significant problems. Furthermore, AFAIK, Ganeti will also support
choosing between the in-kernel or qemu-rbd userspace driver when
spawning a VM in one of its next versions, so Synnefo will then also support
that, out-of-the-box.

I was thinking it would be wisest to stay with the distribution 
kernel, but I guess you swap it out for a later version?




For our custom storage layer (Archipelago, see below) we require a newer
kernel than the one that comes with Squeeze, so we run 3.2 from
squeeze-backports, everything has been going smoothly so far.

The rbds for all my existing VMs would probably have to be converted 
back from format 2 to format 1, right?




If you plan to use the in-kernel rbd driver, it seems yes:
http://ceph.com/docs/next/man/8/rbd/#parameters

I can't comment on this because we only run rbd as an option in the demo
environment, with the in-kernel driver. For our production, we're running
a custom storage layer (Archipelago) which does thin provisioning of volumes
from Pithos files and accesses the underlying Pithos objects directly, 
no matter

which driver (RADOS or NFS) you use.

Thanks again for your interest,
Constantinos


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW snapshots

2013-06-24 Thread Craig Lewis
I've looked into this a bit, and the best I've come up with is to 
snapshot all of the RGW pools.  I asked a similar question before: 
http://comments.gmane.org/gmane.comp.file-systems.ceph.user/855


I am planning to have a 2nd cluster for disaster recovery, with some 
in-house geo-replication.


I haven't actually tried this yet.  I just setup my development cluster, 
and this is on my list of things to test.  The basic idea:


 * Disable geo-replication
 * Snapshot the Disaster Recovery cluster manually
 * Rollback all of the RGW pools to the snapshot I want to restore from
 * Manually restore objects from the Disaster Recovery cluster to the
   Production Cluster, probably using s3cmd
 * Return all of the RGW pools to the most recent snapshot
 * Re-enable geo-replication


I have several layers of safety above this, so this process is meant to 
be a last resort after several layers of human+code errors.  In theory, 
it shouldn't ever happen, but we all know how that goes.



I would like to discuss how RadosGW snapshots might work, but there 
doesn't seem to be much interest at this time.  The ability to use 
RadosGW snapshots is somewhat niche.





*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com 

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website   | Twitter 
  | Facebook 
  | LinkedIn 
  | Blog 



On 6/20/13 07:59 , Mike Bryant wrote:

Hi,
is there any way to create snapshots of individual buckets, that can
be restored from piecemeal?
i.e. if someone deletes objects by mistake?

Cheers
Mike


--
Mike Bryant | Systems Administrator | Ocado Technology
mike.bry...@ocado.com | 01707 382148 | www.ocadotechnology.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor removal and re-add

2013-06-24 Thread Sage Weil
On Mon, 24 Jun 2013, Mandell Degerness wrote:
> Hmm.  This is a bit ugly from our perspective, but not fatal to your
> design (just our implementation).  At the time we run the rm, the
> cluster is smaller and so the restart of each monitor is not fatal to
> the cluster.  The problem is on our side in terms of guaranteeing
> order of behaviors.

Sorry, I'm still confused about where the monitor gets restarted.  It 
doesn't matter if the removed monitor is stopped or failed/gone; 'ceph mon 
rm ...' will remove it from the monmap and quorum.  It sounds like you're 
suggesting that the surviving monitors need to be restarted, but they do 
not, as long as enough of them are alive to form a quorum and pass the 
decree that the mon cluster is smaller.  So 5 -> 2 would be problematic, 
but 5 -> 3 (assuming there are 3 currently up) will work without 
restarts...

sage


> 
> On Mon, Jun 24, 2013 at 1:54 PM, Sage Weil  wrote:
> > On Mon, 24 Jun 2013, Mandell Degerness wrote:
> >> I'm testing the change (actually re-starting the monitors after the
> >> monitor removal), but this brings up the issue with why we didn't want
> >> to do this in the first place:  When reducing the number of monitors
> >> from 5 to 3, we are guaranteed to have a service outage for the time
> >> it takes to restart at least one of the monitors (and, possibly, for
> >> two of the restarts, now that I think on it).  In theory, the
> >> stop/start cycle is very short and should complete in a reasonable
> >> time.  What I'm concerned about, however, is the case that something
> >> is wrong with our re-written config file.  In that case, the outage is
> >> immediate and will last until the problem is corrected on the first
> >> server to have the monitor restarted.
> >
> > I'm jumping into this thread late, but: why would you follow the second
> > removal procedure for broken clusters?  To go from 5->3 mons, you should
> > just stop 2 of the mons and do 'ceph mon rm ' 'ceph mon rm
> > '.
> >
> > sage
> >
> >>
> >> On Mon, Jun 24, 2013 at 10:07 AM, John Nielsen  wrote:
> >> > On Jun 21, 2013, at 5:00 PM, Mandell Degerness  
> >> > wrote:
> >> >
> >> >> There is a scenario where we would want to remove a monitor and, at a
> >> >> later date, re-add the monitor (using the same IP address).  Is there
> >> >> a supported way to do this?  I tried deleting the monitor directory
> >> >> and rebuilding from scratch following the add monitor procedures from
> >> >> the web, but the monitor still suicide's when started.
> >> >
> >> >
> >> > I assume you're already referencing this:
> >> > http://ceph.com/docs/master/rados/operations/add-or-rm-mons/
> >> >
> >> > I have done what you describe before. There were a couple hiccups, let's 
> >> > see if I remember the specifics:
> >> >
> >> > Remove:
> >> > Follow the first two steps under "removing a monitor (manual) at the 
> >> > link above:
> >> > service ceph stop mon.N
> >> > ceph mon remove N
> >> > Comment out the monitor entry in ceph.conf on ALL mon, osd and client 
> >> > hosts.
> >> > Restart services as required to make everyone happy with the smaller set 
> >> > of monitors
> >> >
> >> > Re-add:
> >> > Wipe the old monitor's directory and re-create it
> >> > Follow the steps for "adding a monitor (manual) at the link above. 
> >> > Instead of adding a new entry you can just un-comment your old ones in 
> >> > ceph.conf. You can also start the monitor with "service ceph start mon 
> >> > N" on the appropriate host instead of running yourself (step 8). Note 
> >> > that you DO need to run ceph-mon as specified in step 5. I was initially 
> >> > confused about the '--mkfs' flag there--it doesn't refer to the OS's 
> >> > filesystem, you should use a directory or mountpoint that is already 
> >> > prepared/mounted.
> >> >
> >> > HTH. If you run into trouble post exactly the steps you followed and 
> >> > additional details about your setup.
> >> >
> >> > JN
> >> >
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Drive replacement procedure

2013-06-24 Thread Nigel Williams

On 25/06/2013 5:59 AM, Brian Candler wrote:

On 24/06/2013 20:27, Dave Spano wrote:

Here's my procedure for manually adding OSDs.


The other thing I discovered is not to wait between steps; some changes result in a new 
crushmap, that then triggers replication. You want to speed through the steps so the 
cluster does not waste time moving objects around to meet the replica requirements until 
you have finished crushmap changes.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor removal and re-add

2013-06-24 Thread Mandell Degerness
The issue, Sage, is that we have to deal with the cluster being
re-expanded.  If we start with 5 monitors and scale back to 3, running
the "ceph mon remove N" command after stopping each monitor and don't
restart the existing monitors, we cannot re-add those same monitors
that were previously removed.  They will suicide at startup.

On Mon, Jun 24, 2013 at 4:22 PM, Sage Weil  wrote:
> On Mon, 24 Jun 2013, Mandell Degerness wrote:
>> Hmm.  This is a bit ugly from our perspective, but not fatal to your
>> design (just our implementation).  At the time we run the rm, the
>> cluster is smaller and so the restart of each monitor is not fatal to
>> the cluster.  The problem is on our side in terms of guaranteeing
>> order of behaviors.
>
> Sorry, I'm still confused about where the monitor gets restarted.  It
> doesn't matter if the removed monitor is stopped or failed/gone; 'ceph mon
> rm ...' will remove it from the monmap and quorum.  It sounds like you're
> suggesting that the surviving monitors need to be restarted, but they do
> not, as long as enough of them are alive to form a quorum and pass the
> decree that the mon cluster is smaller.  So 5 -> 2 would be problematic,
> but 5 -> 3 (assuming there are 3 currently up) will work without
> restarts...
>
> sage
>
>
>>
>> On Mon, Jun 24, 2013 at 1:54 PM, Sage Weil  wrote:
>> > On Mon, 24 Jun 2013, Mandell Degerness wrote:
>> >> I'm testing the change (actually re-starting the monitors after the
>> >> monitor removal), but this brings up the issue with why we didn't want
>> >> to do this in the first place:  When reducing the number of monitors
>> >> from 5 to 3, we are guaranteed to have a service outage for the time
>> >> it takes to restart at least one of the monitors (and, possibly, for
>> >> two of the restarts, now that I think on it).  In theory, the
>> >> stop/start cycle is very short and should complete in a reasonable
>> >> time.  What I'm concerned about, however, is the case that something
>> >> is wrong with our re-written config file.  In that case, the outage is
>> >> immediate and will last until the problem is corrected on the first
>> >> server to have the monitor restarted.
>> >
>> > I'm jumping into this thread late, but: why would you follow the second
>> > removal procedure for broken clusters?  To go from 5->3 mons, you should
>> > just stop 2 of the mons and do 'ceph mon rm ' 'ceph mon rm
>> > '.
>> >
>> > sage
>> >
>> >>
>> >> On Mon, Jun 24, 2013 at 10:07 AM, John Nielsen  wrote:
>> >> > On Jun 21, 2013, at 5:00 PM, Mandell Degerness 
>> >> >  wrote:
>> >> >
>> >> >> There is a scenario where we would want to remove a monitor and, at a
>> >> >> later date, re-add the monitor (using the same IP address).  Is there
>> >> >> a supported way to do this?  I tried deleting the monitor directory
>> >> >> and rebuilding from scratch following the add monitor procedures from
>> >> >> the web, but the monitor still suicide's when started.
>> >> >
>> >> >
>> >> > I assume you're already referencing this:
>> >> > http://ceph.com/docs/master/rados/operations/add-or-rm-mons/
>> >> >
>> >> > I have done what you describe before. There were a couple hiccups, 
>> >> > let's see if I remember the specifics:
>> >> >
>> >> > Remove:
>> >> > Follow the first two steps under "removing a monitor (manual) at the 
>> >> > link above:
>> >> > service ceph stop mon.N
>> >> > ceph mon remove N
>> >> > Comment out the monitor entry in ceph.conf on ALL mon, osd and client 
>> >> > hosts.
>> >> > Restart services as required to make everyone happy with the smaller 
>> >> > set of monitors
>> >> >
>> >> > Re-add:
>> >> > Wipe the old monitor's directory and re-create it
>> >> > Follow the steps for "adding a monitor (manual) at the link above. 
>> >> > Instead of adding a new entry you can just un-comment your old ones in 
>> >> > ceph.conf. You can also start the monitor with "service ceph start mon 
>> >> > N" on the appropriate host instead of running yourself (step 8). Note 
>> >> > that you DO need to run ceph-mon as specified in step 5. I was 
>> >> > initially confused about the '--mkfs' flag there--it doesn't refer to 
>> >> > the OS's filesystem, you should use a directory or mountpoint that is 
>> >> > already prepared/mounted.
>> >> >
>> >> > HTH. If you run into trouble post exactly the steps you followed and 
>> >> > additional details about your setup.
>> >> >
>> >> > JN
>> >> >
>> >> ___
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >>
>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Drive replacement procedure

2013-06-24 Thread Michael Lowe
That's where 'ceph osd set noout' comes in handy.



On Jun 24, 2013, at 7:28 PM, Nigel Williams  wrote:

> On 25/06/2013 5:59 AM, Brian Candler wrote:
>> On 24/06/2013 20:27, Dave Spano wrote:
>>> Here's my procedure for manually adding OSDs.
> 
> The other thing I discovered is not to wait between steps; some changes 
> result in a new crushmap, that then triggers replication. You want to speed 
> through the steps so the cluster does not waste time moving objects around to 
> meet the replica requirements until you have finished crushmap changes.
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Packages for fedora 19

2013-06-24 Thread Darryl Bond

Any plans to build a set of packages for Fedora 19 yet?
F19 has qemu 1.4.2 packaged and we would like to try it with ceph
cuttlefish.

Attempting to install the F18 ceph .6.1.4 bumps into a dependency on
libboost_system-mt.so.1.50.0()(64bit).
The version of libboost on F19 is 1.53 :(

I will have a go at building the rpms from the src rpm in the mean time.

Regards
Darryl

The contents of this electronic message and any attachments are intended only 
for the addressee and may contain legally privileged, personal, sensitive or 
confidential information. If you are not the intended addressee, and have 
received this email, any transmission, distribution, downloading, printing or 
photocopying of the contents of this message or attachments is strictly 
prohibited. Any legal privilege or confidentiality attached to this message and 
attachments is not waived, lost or destroyed by reason of delivery to any 
person other than intended addressee. If you have received this message and are 
not the intended addressee you should notify the sender by return email and 
destroy all copies of the message and any attachments. Unless expressly 
attributed, the views expressed in this email do not necessarily represent the 
views of the company.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Packages for fedora 19

2013-06-24 Thread Sage Weil
On Tue, 25 Jun 2013, Darryl Bond wrote:
> Any plans to build a set of packages for Fedora 19 yet?
> F19 has qemu 1.4.2 packaged and we would like to try it with ceph
> cuttlefish.
> 
> Attempting to install the F18 ceph .6.1.4 bumps into a dependency on
> libboost_system-mt.so.1.50.0()(64bit).
> The version of libboost on F19 is 1.53 :(
> 
> I will have a go at building the rpms from the src rpm in the mean time.

I opened up a couple tickets for this in our build tracker.  It should be 
ready by teh time 0.66 is released.

sage

> 
> Regards
> Darryl
> 
> The contents of this electronic message and any attachments are intended only
> for the addressee and may contain legally privileged, personal, sensitive or
> confidential information. If you are not the intended addressee, and have
> received this email, any transmission, distribution, downloading, printing or
> photocopying of the contents of this message or attachments is strictly
> prohibited. Any legal privilege or confidentiality attached to this message
> and attachments is not waived, lost or destroyed by reason of delivery to any
> person other than intended addressee. If you have received this message and
> are not the intended addressee you should notify the sender by return email
> and destroy all copies of the message and any attachments. Unless expressly
> attributed, the views expressed in this email do not necessarily represent the
> views of the company.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] One mon failed to start

2013-06-24 Thread Da Chun
Hi All,
One of my three mons failed to start. Below is the error in the mon log. I 
tried to attach the complete log, but it's limited.
I can't tell what's happening to it.

--- begin dump of recent events ---
 0> 2013-06-25 11:18:47.177334 7f46a868b7c0 -1 *** Caught signal (Aborted) 
**
 in thread 7f46a868b7c0


 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
 1: /usr/bin/ceph-mon() [0x599180]
 2: (()+0xfbd0) [0x7f46a826cbd0]
 3: (gsignal()+0x37) [0x7f46a6874037]
 4: (abort()+0x148) [0x7f46a6877698]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f46a7180e8d]
 6: (()+0x5ef76) [0x7f46a717ef76]
 7: (()+0x5efa3) [0x7f46a717efa3]
 8: (()+0x5f1de) [0x7f46a717f1de]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x43d) [0x64cf0d]
 10: (PGMonitor::update_from_paxos()+0xe02) [0x54dad2]
 11: (Monitor::init_paxos()+0x97) [0x4a3a47]
 12: (Monitor::preinit()+0x5f2) [0x4c3382]
 13: (main()+0x172d) [0x496f3d]
 14: (__libc_start_main()+0xf5) [0x7f46a685eea5]
 15: /usr/bin/ceph-mon() [0x499459]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One mon failed to start

2013-06-24 Thread Sage Weil
On Tue, 25 Jun 2013, Da Chun wrote:
> Hi All,
> One of my three mons failed to start. Below is the error in the mon log. I
> tried to attach the complete log, but it's limited.
> I can't tell what's happening to it.
> 
> --- begin dump of recent events ---
>      0> 2013-06-25 11:18:47.177334 7f46a868b7c0 -1 *** Caught signal
> (Aborted) **
>  in thread 7f46a868b7c0

If you look a bit higher up in teh log there shoudl be a couple of lines 
that include 'assert' and tell us what line of code and what check was 
untrue.

Thanks!
sage


> 
>  ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
>  1: /usr/bin/ceph-mon() [0x599180]
>  2: (()+0xfbd0) [0x7f46a826cbd0]
>  3: (gsignal()+0x37) [0x7f46a6874037]
>  4: (abort()+0x148) [0x7f46a6877698]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f46a7180e8d]
>  6: (()+0x5ef76) [0x7f46a717ef76]
>  7: (()+0x5efa3) [0x7f46a717efa3]
>  8: (()+0x5f1de) [0x7f46a717f1de]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x43d) [0x64cf0d]
>  10: (PGMonitor::update_from_paxos()+0xe02) [0x54dad2]
>  11: (Monitor::init_paxos()+0x97) [0x4a3a47]
>  12: (Monitor::preinit()+0x5f2) [0x4c3382]
>  13: (main()+0x172d) [0x496f3d]
>  14: (__libc_start_main()+0xf5) [0x7f46a685eea5]
>  15: /usr/bin/ceph-mon() [0x499459]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to
> interpret this.
> 
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One mon failed to start

2013-06-24 Thread Da Chun
Here they are:


2013-06-25 11:18:47.040064 7f46a868b7c0  0 ceph version 0.61.4 
(1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-mon, pid 14099
2013-06-25 11:18:47.169526 7f46a868b7c0  1 mon.ceph-node0@-1(probing) e1 
preinit fsid 5436253a-8ecc-4509-a3ef-4bfd68387189
2013-06-25 11:18:47.173179 7f46a868b7c0 -1 mon/PGMonitor.cc: In function 
'virtual void PGMonitor::update_from_paxos()' thread 7f46a868b7c0 time 
2013-06-25 11:18:47.172341
mon/PGMonitor.cc: 173: FAILED assert(err == 0)


 ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
 1: (PGMonitor::update_from_paxos()+0xe02) [0x54dad2]
 2: (Monitor::init_paxos()+0x97) [0x4a3a47]
 3: (Monitor::preinit()+0x5f2) [0x4c3382]
 4: (main()+0x172d) [0x496f3d]
 5: (__libc_start_main()+0xf5) [0x7f46a685eea5]
 6: /usr/bin/ceph-mon() [0x499459]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.


--- begin dump of recent events ---
   -24> 2013-06-25 11:18:47.038879 7f46a868b7c0  5 asok(0x131a0e0) 
register_command perfcounters_dump hook 0x1314010
   -23> 2013-06-25 11:18:47.038907 7f46a868b7c0  5 asok(0x131a0e0) 
register_command 1 hook 0x1314010
   -22> 2013-06-25 11:18:47.038909 7f46a868b7c0  5 asok(0x131a0e0) 
register_command perf dump hook 0x1314010
   -21> 2013-06-25 11:18:47.038914 7f46a868b7c0  5 asok(0x131a0e0) 
register_command perfcounters_schema hook 0x1314010
   -20> 2013-06-25 11:18:47.038916 7f46a868b7c0  5 asok(0x131a0e0) 
register_command 2 hook 0x1314010
   -19> 2013-06-25 11:18:47.038917 7f46a868b7c0  5 asok(0x131a0e0) 
register_command perf schema hook 0x1314010
   -18> 2013-06-25 11:18:47.038920 7f46a868b7c0  5 asok(0x131a0e0) 
register_command config show hook 0x1314010
   -17> 2013-06-25 11:18:47.038922 7f46a868b7c0  5 asok(0x131a0e0) 
register_command config set hook 0x1314010
   -16> 2013-06-25 11:18:47.038924 7f46a868b7c0  5 asok(0x131a0e0) 
register_command log flush hook 0x1314010
   -15> 2013-06-25 11:18:47.038926 7f46a868b7c0  5 asok(0x131a0e0) 
register_command log dump hook 0x1314010
   -14> 2013-06-25 11:18:47.038927 7f46a868b7c0  5 asok(0x131a0e0) 
register_command log reopen hook 0x1314010
   -13> 2013-06-25 11:18:47.040064 7f46a868b7c0  0 ceph version 0.61.4 
(1669132fcfc27d0c0b5e5bb93ade59d147e23404), process ceph-mon, pid 14099
   -12> 2013-06-25 11:18:47.041934 7f46a868b7c0  5 asok(0x131a0e0) init 
/var/run/ceph/ceph-mon.ceph-node0.asok
   -11> 2013-06-25 11:18:47.041952 7f46a868b7c0  5 asok(0x131a0e0) 
bind_and_listen /var/run/ceph/ceph-mon.ceph-node0.asok
   -10> 2013-06-25 11:18:47.041982 7f46a868b7c0  5 asok(0x131a0e0) 
register_command 0 hook 0x13120b8
-9> 2013-06-25 11:18:47.041985 7f46a868b7c0  5 asok(0x131a0e0) 
register_command version hook 0x13120b8
-8> 2013-06-25 11:18:47.041988 7f46a868b7c0  5 asok(0x131a0e0) 
register_command git_version hook 0x13120b8
-7> 2013-06-25 11:18:47.041991 7f46a868b7c0  5 asok(0x131a0e0) 
register_command help hook 0x13140c0
-6> 2013-06-25 11:18:47.042866 7f46a447e700  5 asok(0x131a0e0) entry start
-5> 2013-06-25 11:18:47.169414 7f46a868b7c0  1 -- 172.18.11.30:6789/0 
learned my addr 172.18.11.30:6789/0
-4> 2013-06-25 11:18:47.169428 7f46a868b7c0  1 accepter.accepter.bind 
my_inst.addr is 172.18.11.30:6789/0 need_addr=0
-3> 2013-06-25 11:18:47.169455 7f46a868b7c0  5 adding auth protocol: none
-2> 2013-06-25 11:18:47.169471 7f46a868b7c0  5 adding auth protocol: none
-1> 2013-06-25 11:18:47.169526 7f46a868b7c0  1 mon.ceph-node0@-1(probing) 
e1 preinit fsid 5436253a-8ecc-4509-a3ef-4bfd68387189
 0> 2013-06-25 11:18:47.173179 7f46a868b7c0 -1 mon/PGMonitor.cc: In 
function 'virtual void PGMonitor::update_from_paxos()' thread 7f46a868b7c0 time 
2013-06-25 11:18:47.172341
mon/PGMonitor.cc: 173: FAILED assert(err == 0)



-- Original --
From:  "Sage Weil";
Date:  Tue, Jun 25, 2013 12:04 PM
To:  "Da Chun"; 
Cc:  "ceph-users"; 
Subject:  Re: [ceph-users] One mon failed to start



On Tue, 25 Jun 2013, Da Chun wrote:
> Hi All,
> One of my three mons failed to start. Below is the error in the mon log. I
> tried to attach the complete log, but it's limited.
> I can't tell what's happening to it.
> 
> --- begin dump of recent events ---
>  0> 2013-06-25 11:18:47.177334 7f46a868b7c0 -1 *** Caught signal
> (Aborted) **
>  in thread 7f46a868b7c0

If you look a bit higher up in teh log there shoudl be a couple of lines 
that include 'assert' and tell us what line of code and what check was 
untrue.

Thanks!
sage


> 
>  ceph version 0.61.4 (1669132fcfc27d0c0b5e5bb93ade59d147e23404)
>  1: /usr/bin/ceph-mon() [0x599180]
>  2: (()+0xfbd0) [0x7f46a826cbd0]
>  3: (gsignal()+0x37) [0x7f46a6874037]
>  4: (abort()+0x148) [0x7f46a6877698]
>  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f46a7180e8d]
>  6: (()+0x5ef76) [0x7f46a717ef76]
>  7: (()+0x5efa3) [0x7f46a717efa3]
>  8: (()+0x5f1de) [0x7f46a717f1de]
>  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const

Re: [ceph-users] several radosgw sharing pools

2013-06-24 Thread Artem Silenkov
Good day!

Basically we don't have to. Write operations are comparable rare and made
from one point. As for read operations - we have low TTL - 1 second. So
varnish is not basically cache here but balancer and rps eater.
If any need in coherency you could write directly to radosgw and setup a
low ttl in cache. It seems that radosgw doing very fine with data
consistency across the cluster.


Regards, Artem Silenkov, 2GIS TM.

---
2GIS LLChttp://2gis.ru
a.silenkov at 2gis.ru 
gtalk:artem.silenkov at gmail.com

cell:+79231534853




2013/6/24 John Nielsen 

> How do you manage cache coherency with Varnish?
>
> On Jun 21, 2013, at 6:09 AM, Artem Silenkov 
> wrote:
>
> > This picture shows the way we do it
> http://habrastorage.org/storage2/1ed/532/627/1ed5326273399df81f3a73179848a404.png
> >
> > Regards, Artem Silenkov, 2GIS TM.
> > ---
> > 2GIS LLC
> >
> > http://2gis.ru
> > a.silenkov at 2gis.ru
> >
> > gtalk:
> > artem.silenkov at gmail.com
> >
> > cell:+79231534853
> >
> >
> >
> >
> > 2013/6/21 Alvaro Izquierdo Jimeno 
> > Thanks Artem
> >
> >
> >
> > De: Artem Silenkov [mailto:artem.silen...@gmail.com]
> > Enviado el: viernes, 21 de junio de 2013 14:01
> > Para: Alvaro Izquierdo Jimeno
> > CC: ceph-users@lists.ceph.com
> > Asunto: Re: [ceph-users] several radosgw sharing pools
> >
> >
> >
> > Good day!
> >
> >
> >
> > We use balancing such way
> >
> >
> >
> > varnish frontend-->radosgw1
> >
> > |
> >
> > ->radosgw2
> >
> >
> >
> > Every radosgw host use his own config so not necessary to add both nodes
> in every ceph.conf. It looks like
> >
> >
> >
> > Host1
> >
> > [client.radosgw.gateway]
> >
> >  host = myhost1
> >
> > ...
> >
> >
> >
> > Host2
> >
> > [client.radosgw.gateway]
> >
> >  host = myhost2
> >
> > ...
> >
> >
> >
> >
> >
> > Pools, users, etc are internal params so every radosgw installation
> share this without any problem. And shares concurrently so you can do
> atomic writes and other good things. You could also use monit to monitor
> service health and even try to repair it automatically.
> >
> >
> >
> > Regards, Artem Silenkov, 2GIS TM.
> >
> > ---
> >
> > 2GIS LLC
> >
> > http://2gis.ru
> >
> > a.silen...@2gis.ru
> >
> > gtalk:artem.silen...@gmail.com
> >
> > cell:+79231534853
> >
> >
> >
> > 2013/6/21 Alvaro Izquierdo Jimeno 
> >
> > Hi,
> >
> >
> >
> > I have a ceph cluster with a radosgw running. The radosgw part in
> ceph.conf is:
> >
> > [client.radosgw.gateway]
> >
> >   host = myhost1
> >
> >   ……..
> >
> >
> >
> > But if the process radosgw dies for some reason, we lose this
> behavior…So:
> >
> >
> >
> > -Can I setup another radosgw in other host sharing pools, users…. in
> ceph?
> >
> > i.e.:
> >
> > [client.radosgw.gateway2]
> >
> >  host = myhost2
> >
> >   ……..
> >
> > -If previous question is ‘yes’, Is there any load balancer in the
> radosgw configure options?
> >
> >
> >
> >
> >
> > Thank you so much in advanced and best regards,
> >
> > Álvaro.
> >
> >
> > 
> > Verificada la ausencia de virus por G Data AntiVirus Versión: AVA
> 22.10538 del 21.06.2013 Noticias de virus: www.antiviruslab.com
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> >
> > 
> > Verificada la ausencia de virus por G Data AntiVirus Versión: AVA
> 22.10538 del 21.06.2013 Noticias de virus: www.antiviruslab.com
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor removal and re-add

2013-06-24 Thread Alex Bligh

On 25 Jun 2013, at 00:39, Mandell Degerness wrote:

> The issue, Sage, is that we have to deal with the cluster being
> re-expanded.  If we start with 5 monitors and scale back to 3, running
> the "ceph mon remove N" command after stopping each monitor and don't
> restart the existing monitors, we cannot re-add those same monitors
> that were previously removed.  They will suicide at startup.

Can you not restart the remaining monitors individually at the
end of the process once the monmaps and the ceph.confs have been
updated so they only think there are 3 monitors?

Once you have got to a stable 3 mon config, you can go back up
to 5.

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] monitor removal and re-add

2013-06-24 Thread Mandell Degerness
Precisely.  This is what we need to do.  It is just a case of
adjusting our process to make that possible.  As I stated a couple
e-mails ago, the design of Ceph allows it, it is just a bit of a
challenge to fit it into our existing processes.  It's on me now to
fix the process.

On Mon, Jun 24, 2013 at 11:26 PM, Alex Bligh  wrote:
>
> On 25 Jun 2013, at 00:39, Mandell Degerness wrote:
>
>> The issue, Sage, is that we have to deal with the cluster being
>> re-expanded.  If we start with 5 monitors and scale back to 3, running
>> the "ceph mon remove N" command after stopping each monitor and don't
>> restart the existing monitors, we cannot re-add those same monitors
>> that were previously removed.  They will suicide at startup.
>
> Can you not restart the remaining monitors individually at the
> end of the process once the monmaps and the ceph.confs have been
> updated so they only think there are 3 monitors?
>
> Once you have got to a stable 3 mon config, you can go back up
> to 5.
>
> --
> Alex Bligh
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com