Re: [ceph-users] machine hangs & soft lockups with 10.2.2 / kernel 4.4.0

2017-01-24 Thread Peter Maloney
linux-stable/Documentation/oops-tracing.txt:
>  8: 'D' if the kernel has died recently, i.e. there was an OOPS or BUG.
> 15: 'L' if a soft lockup has previously occurred on the system.

Your first entry already has D and L... can you try to get the first one
before D or L were flagged?

What your log says without this is just what no longer works as a result
of the problem, but not necessarily the problem itself.

To capture the full log of a dying/dead system, you need to set up
another way of logging, other than the local disk (a dead kernel will
not write to its persistent storage for fear of destroying its
integrity). So you need something like a network logger, or a serial
console logger.

For network, there is a way with the kernel cmdline, which is horribly
documented and I have never managed to get to work and do not
recommend... you only need that method when the machine won't boot, and
still a serial console ought to work. The other network ways include
things like configuring syslog to send the log over the network. I think
it's probably also possibly to simply run a long running "sudo cat
/dev/kmsg | nc ..." command to keep reading the file and send it over
the network.

Peter

On 01/23/17 17:37, Matthew Vernon wrote:
> Hi,
>
> We have a 9-node ceph cluster, running 10.2.2 and kernel 4.4.0 (Ubuntu
> Xenial). We're seeing both machines freezing (nothing in logs on the
> machine, which is entirely unresponsive to anything except the power
> button) and suffering soft lockups.
>
> Has anyone seen similar? Googling hasn't found anything obvious, and
> while ceph repairs itself when a machine is lost, this is obviously
> quite concerning.
>
> I don't have any useful logs from the machines that freeze, but I do
> have logs from the machine that suffered soft lockups - you can see the
> relevant bits of kern.log here:
>
> https://drive.google.com/drive/folders/0B4TV1iNptBAdblJMX1R4ZWI5eGc?usp=sharing
>
> [available compressed and uncompressed]
>
> The cluster was installed with ceph-ansible, and the specs of each node
> are roughly:
>
> Cores: 16 (2 x 8-core Intel E5-2690)
> Memory: 512 GB (16 x32 GB)
> Storage: 2x 120GB SAMSUNG SSD (system disk)
>  2x 2TB NVME cards (ceph journal)
>60x 6TB Toshiba 7200 rpm disks (ceph storage)
> Network: 1 Gbit/s Intel I350 (Control interface)
>2x 100Gbit/s Mellanox cards (bonded together)
>
> We're in pre-production testing, but any suggestions on how we might get
> to the bottom of this would be appreciated!
>
> There's no obvious pattern to these problems, and we've had 2 freezes
> and 1 soft lockup in the last ~1.5 weeks.
>
> Thanks,
>
> Matthew
>
>


-- 


Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.malo...@brockmann-consult.de
Internet: http://www.brockmann-consult.de


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [RBD][mirror]Can't remove mirrored image.

2017-01-24 Thread Peter Maloney
>From my observations since there is no documentation about it, syncing
means it's continually copying new changes to the mirror... it's "ok".
Stop the mirror daemon to see what it looks like when it's not ok...
says something like unknown, or some word like stale.

To remove the image, you have to stop the syncing permanently (not just
stop the mirror, but mark it as no longer to be mirrorred)... I think
disabling journaling on the source rbd image is enough for that (which
won't work if the source machine is dead).

On 01/24/17 05:47, int32bit wrote:
> Hi,All,
>
> I'm a new comer of Ceph, I deployed two ceph cluster, and one of which
> is used as mirror cluster. When I created an image, I found that the
> primary image blocked in 'up+stopped' status and the non-primary
> image's status is 'up+syncing`. I'm really not sure if this is in OK
> status and I really couldn't find any references about sync status in
> docs. When I tried to remove the image from primary node, I caught
> following error:
>
> # rbd --cluster server-31 rm int32bit-test/mirror-test
> 2017-01-24 12:40:41.494963 7fd8dff91d80 -1 librbd: image has watchers
> - not removing
> Removing image: 0% complete...failed.
> rbd: error: image still has watchers
> This means the image is still open or the client using it crashed. Try
> again after closing/unmapping it or waiting 30s for the crashed client
> to timeout.
>
> I wonder know if my mirror status is ok and how to remove mirrored image.
>
>
> My ceph version is 10.2.3, and the default rbd features is set to 125.
>
> Thanks for any help!
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 


Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.malo...@brockmann-consult.de
Internet: http://www.brockmann-consult.de


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] [RBD][mirror]Can't remove mirrored image.

2017-01-24 Thread Jason Dillaman
On Mon, Jan 23, 2017 at 11:47 PM, int32bit  wrote:
> I'm a new comer of Ceph, I deployed two ceph cluster, and one of which is
> used as mirror cluster. When I created an image, I found that the primary
> image blocked in 'up+stopped' status and the non-primary image's status is
> 'up+syncing`. I'm really not sure if this is in OK status and I really
> couldn't find any references about sync status in docs.

That is the expected behavior for the primary to be listed as
"up+stopped" since it isn't syncing w/ the remote, non-primary image.
The "rbd mirror pool status" command should list your health as OK --
when something is wrong it will list the health as WARNING or ERROR.

< When I tried to
> remove the image from primary node, I caught following error:
>
> # rbd --cluster server-31 rm int32bit-test/mirror-test
> 2017-01-24 12:40:41.494963 7fd8dff91d80 -1 librbd: image has watchers - not
> removing
> Removing image: 0% complete...failed.
> rbd: error: image still has watchers
> This means the image is still open or the client using it crashed. Try again
> after closing/unmapping it or waiting 30s for the crashed client to timeout.
>
> I wonder know if my mirror status is ok and how to remove mirrored image.
>

Is the image that you are trying to remove still bootstrapping from
the primary cluster to the non-primary cluster? This is a known
limitation in v10.2.3 and was resolved in v10.2.4 [1].

> My ceph version is 10.2.3, and the default rbd features is set to 125.
>
> Thanks for any help!


[1] http://tracker.ceph.com/issues/17559

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Health_Warn recovery stuck / crushmap problem?

2017-01-24 Thread Jonas Stunkat
All OSD´s and Monitors are up from what I can see.
I read through the troubleshooting like mentioned in the ceph documentation for 
PGs and came to the conclusion that nothing there would help me, so I didn´t 
try anything - except restarting / rebooting OSD´s and Monitors.

How do I recover from this, it looks to me that the data itself should be safe 
for now, but why is it not restoring?
I guess the problem may be the crushmap.

Here are some outputs:

#ceph health detail

HEALTH_WARN 475 pgs degraded; 640 pgs stale; 475 pgs stuck degraded; 640 pgs 
stuck stale; 640 pgs stuck unclean; 475 pgs stuck undersized; 475 pgs 
undersized; recovery 104812/279550 objects degraded (37.493%); recovery 
69926/279550 objects misplaced (25.014%)
pg 3.ec is stuck unclean for 3326815.935321, current state 
stale+active+remapped, last acting [7,6]
pg 3.ed is stuck unclean for 3288818.682456, current state 
stale+active+remapped, last acting [6,7]
pg 3.ee is stuck unclean for 409973.052061, current state 
stale+active+undersized+degraded, last acting [7]
pg 3.ef is stuck unclean for 3357894.554762, current state 
stale+active+undersized+degraded, last acting [7]
pg 3.e8 is stuck unclean for 384815.518837, current state 
stale+active+undersized+degraded, last acting [6]
pg 3.e9 is stuck unclean for 3274554.591000, current state 
stale+active+remapped, last acting [6,7]
..



This is the crushmap I created and intended to use and thought I used for the 
past 2 months:
- pvestorage1-ssd and pvestorage1-platter are the same hosts, it seems like 
this is not possible but I never noticed
- likewise with pvestorage2

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host pvestorage1-ssd {
 id -2 # do not change unnecessarily
 # weight 1.740
 alg straw
 hash 0 # rjenkins1
 item osd.0 weight 0.870
 item osd.1 weight 0.870
}
host pvestorage2-ssd {
 id -3 # do not change unnecessarily
 # weight 1.740
 alg straw
 hash 0 # rjenkins1
 item osd.2 weight 0.870
 item osd.3 weight 0.870
}
host pvestorage1-platter {
 id -4 # do not change unnecessarily
 # weight 4
 alg straw
 hash 0 # rjenkins1
 item osd.4 weight 2.000
 item osd.5 weight 2.000
}
host pvestorage2-platter {
 id -5 # do not change unnecessarily
 # weight 4
 alg straw
 hash 0 # rjenkins1
 item osd.6 weight 2.000
 item osd.7 weight 2.000
}

root ssd {
 id -1 # do not change unnecessarily
 # weight 3.480
 alg straw
 hash 0 # rjenkins1
 item pvestorage1-ssd weight 1.740
 item pvestorage2-ssd weight 1.740
}

root platter {
 id -6 # do not change unnecessarily
 # weight 8
 alg straw
 hash 0 # rjenkins1
 item pvestorage1-platter weight 4.000
 item pvestorage2-platter weight 4.000
}

# rules
rule ssd {
 ruleset 0
 type replicated
 min_size 1
 max_size 10
 step take ssd
 step chooseleaf firstn 0 type host
 step emit
}

rule platter {
 ruleset 1
 type replicated
 min_size 1
 max_size 10
 step take platter
 step chooseleaf firstn 0 type host
 step emit
}
# end crush map


This is the what ceph made of this crushmap and the one that is actually used 
right now, I never looked -_- :

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host pvestorage1-ssd {
 id -2 # do not change unnecessarily
 # weight 0.000
 alg straw
 hash 0 # rjenkins1
}
host pvestorage2-ssd {
 id -3 # do not change unnecessarily
 # weight 0.000
 alg straw
 hash 0 # rjenkins1
}
root ssd {
 id -1 # do not change unnecessarily
 # weight 0.000
 alg straw
 hash 0 # rjenkins1
 item pvestorage1-ssd weight 0.000
 item pvestorage2-ssd weight 0.000
}
host pvestorage1-platter {
 id -4 # do not change unnecessarily
 # weight 0.000
 alg straw
 hash 0 # rjenkins1
}
host pvestorage2-platter {
 id -5 # do not change unnecessarily
 # weight 0.000
 alg straw
 hash 0 # rjenkins1
}
root platter {
 id -6 # do not change unnecessarily
 # weight 0.000
 alg straw
 hash 0 # rjenkins1
 item pvestorage1-platter weight 0.000
 item pvestorage2-platter weight 0.000
}
host pvestorage1 {
 id -7 # do not change unnecessarily
 # weight 5.740
 alg straw
 hash 0 # rjenkins

Re: [ceph-users] Suddenly having slow writes

2017-01-24 Thread Mark Nelson



On 01/24/2017 09:38 AM, Florent B wrote:

Hi everyone,

I run a Ceph Jewel cluster over 3 nodes having 3 Samsung 256GB SSD each
(9 OSD total).

I use it for RBD disks for my VMs.

It has run nice for a few weeks, then suddenly whole cluster is
extremely slow, Ceph is reporting blocked requests, recovery is endless
("ceph -s" is not showing recovery speed so it's slow...).

If I copy RBD images to another cluster, read of RBD is OK and fast. But
if I try to remove it after copy, slow requests are back.

Here is my  Ceph status without any VM running :

root@host105:~# ceph -s
cluster 853be806-f101-45ec-9926-73df7e159838
 health HEALTH_WARN
noout,require_jewel_osds flag(s) set
 monmap e9: 3 mons at
{3=10.111.0.105:6789/0,4=10.111.0.106:6789/0,5=10.111.0.107:6789/0}
election epoch 3770, quorum 0,1,2 3,4,5
  fsmap e1103917: 1/1/1 up {0=host107=up:active}, 2 up:standby
 osdmap e1113536: 9 osds: 9 up, 9 in
flags noout,require_jewel_osds
  pgmap v36244619: 168 pgs, 6 pools, 467 GB data, 120 kobjects
743 GB used, 1192 GB / 1935 GB avail
 168 active+clean


If I start a single VM, it's OK, no slow request. But if I heavily write
inside it, slow requests are back.

If I look in ceph.log, I can see that slow requests are due to
"currently waiting for subops from" and never from same OSD.


2017-01-24 16:29:51.576115 osd.7 10.111.0.105:6807/2675 74 : cluster
[WRN] slow request 120.096960 seconds old, received at 2017-01-24
16:27:51.479116: osd_op(client.116307465.0:6786 1.ac8f524e
rbd_data.30e6a6430d938c1.00ef [set-alloc-hint object_size
4194304 write_size 4194304,writefull 0~4194304] snapc 0=[]
ack+ondisk+write+known_if_redirected e1113684) currently waiting for
subops from 9,13
2017-01-24 16:29:52.300764 osd.10 10.111.0.106:6802/18968 57 : cluster
[WRN] 1 slow requests, 1 included below; oldest blocked for > 120.311863
secs
2017-01-24 16:29:52.300766 osd.10 10.111.0.106:6802/18968 58 : cluster
[WRN] slow request 120.311863 seconds old, received at 2017-01-24
16:27:51.988838: osd_op(client.116307465.0:6787 1.e9020b2b
rbd_data.30e6a6430d938c1.00f0 [set-alloc-hint object_size
4194304 write_size 4194304,writefull 0~4194304] snapc 0=[]
ack+ondisk+write+known_if_redirected e1113684) currently waiting for
subops from 8,12
2017-01-24 16:29:54.088301 osd.6 10.111.0.105:6803/2493 124 : cluster
[WRN] 3 slow requests, 2 included below; oldest blocked for > 120.298857
secs
2017-01-24 16:29:54.088304 osd.6 10.111.0.105:6803/2493 125 : cluster
[WRN] slow request 120.298857 seconds old, received at 2017-01-24
16:27:53.789391: osd_op(client.116307465.0:6793 1.454a3d14
rbd_data.30e6a6430d938c1.00f5 [set-alloc-hint object_size
4194304 write_size 4194304,writefull 0~4194304] snapc 0=[]
ack+ondisk+write+known_if_redirected e1113686) currently waiting for
subops from 10,12
2017-01-24 16:29:54.088305 osd.6 10.111.0.105:6803/2493 126 : cluster
[WRN] slow request 120.196257 seconds old, received at 2017-01-24
16:27:53.891992: osd_op(client.116307465.0:6794 1.f585dbe2
rbd_data.30e6a6430d938c1.00f6 [set-alloc-hint object_size
4194304 write_size 4194304,writefull 0~4194304] snapc 0=[]
ack+ondisk+write+known_if_redirected e1113686) currently waiting for
subops from 10,12
2017-01-24 16:29:57.088860 osd.6 10.111.0.105:6803/2493 127 : cluster
[WRN] 3 slow requests, 1 included below; oldest blocked for > 123.299409
secs
2017-01-24 16:29:57.088863 osd.6 10.111.0.105:6803/2493 128 : cluster
[WRN] slow request 120.554780 seconds old, received at 2017-01-24
16:27:56.534020: osd_op(client.116307465.0:6798 1.b5e4e0cf
rbd_data.30e6a6430d938c1.00f8 [set-alloc-hint object_size
4194304 write_size 4194304,writefull 0~4194304] snapc 0=[]
ack+ondisk+write+known_if_redirected e1113688) currently waiting for
subops from 9,12


Sometime I can dump operations from osd, for example :

root@host106:~# ceph daemon osd.10 dump_ops_in_flight
{
"ops": [
{
"description": "osd_op(client.116307465.0:6787 1.e9020b2b
rbd_data.30e6a6430d938c1.00f0 [set-alloc-hint object_size
4194304 write_size 4194304,writefull 0~4194304] snapc 0=[]
ack+ondisk+write+known_if_redirected e1113684)",
"initiated_at": "2017-01-24 16:27:51.988838",
"age": 126.079344,
"duration": 126.079362,
"type_data": [
"waiting for sub ops",
{
"client": "client.116307465",
"tid": 6787
},
[
{
"time": "2017-01-24 16:27:51.988838",
"event": "initiated"
},
{
"time": "2017-01-24 16:27:51.990699",
"event": "queued_for_pg"
},
{
"time": "2017-01-24 16:27:51.990761",
  

[ceph-users] Replacing an mds server

2017-01-24 Thread Jorge Garcia
I have been using a ceph-mds server that has low memory. I want to 
replace it with a new system that has a lot more memory. How does one go 
about replacing the ceph-mds server? I looked at the documentation, 
figuring I could remove the current metadata server and add the new one, 
but the remove metadata server section just says "Coming soon...". The 
same page also has a warning about running multiple metadata servers. So 
am I stuck?


Thanks!

Jorge
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replacing an mds server

2017-01-24 Thread Alex Evonosky
just my own experience on this---

I have two MDS servers running (since I run cephFS).  I have the config
dictating both MDS servers in the ceph.conf file.

When I issue a "ceph -s"  I see the following:

 1/1/1 up {0=alpha=up:active}, 1 up:standby


I have shut one MDS server down (current active) and the alternate server
becomes active.


This is just an FYI, but others may have a better answer...




On Tue, Jan 24, 2017 at 2:56 PM, Jorge Garcia  wrote:

> I have been using a ceph-mds server that has low memory. I want to replace
> it with a new system that has a lot more memory. How does one go about
> replacing the ceph-mds server? I looked at the documentation, figuring I
> could remove the current metadata server and add the new one, but the
> remove metadata server section just says "Coming soon...". The same page
> also has a warning about running multiple metadata servers. So am I stuck?
>
> Thanks!
>
> Jorge
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replacing an mds server

2017-01-24 Thread Goncalo Borges
Hi Jorge
Indeed my advice is to configure your high memory mds as a standby mds. Once 
you restart the service in the low memory mds, the standby one should take over 
without downtime and the first one becomes the standby one.
Cheers
Goncalo

From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Alex Evonosky 
[alex.evono...@gmail.com]
Sent: 25 January 2017 07:00
To: Jorge Garcia
Cc: Users, Ceph
Subject: Re: [ceph-users] Replacing an mds server

just my own experience on this---

I have two MDS servers running (since I run cephFS).  I have the config 
dictating both MDS servers in the ceph.conf file.

When I issue a "ceph -s"  I see the following:

 1/1/1 up {0=alpha=up:active}, 1 up:standby


I have shut one MDS server down (current active) and the alternate server 
becomes active.


This is just an FYI, but others may have a better answer...




On Tue, Jan 24, 2017 at 2:56 PM, Jorge Garcia 
mailto:jgar...@soe.ucsc.edu>> wrote:
I have been using a ceph-mds server that has low memory. I want to replace it 
with a new system that has a lot more memory. How does one go about replacing 
the ceph-mds server? I looked at the documentation, figuring I could remove the 
current metadata server and add the new one, but the remove metadata server 
section just says "Coming soon...". The same page also has a warning about 
running multiple metadata servers. So am I stuck?

Thanks!

Jorge
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Objects Stuck Degraded

2017-01-24 Thread Richard Bade
Hi Everyone,
I've got a strange one. After doing a reweight of some osd's the other
night our cluster is showing 1pg stuck unclean.

2017-01-25 09:48:41 : 1 pgs stuck unclean | recovery 140/71532872
objects degraded (0.000%) | recovery 2553/71532872 objects misplaced
(0.004%)

When I query the pg it shows one of the osd's is not up.

"state": "active+remapped",
"snap_trimq": "[]",
"epoch": 231928,
"up": [
155
],
"acting": [
155,
105
],
"actingbackfill": [
"105",
"155"
],

I've tried restarting the osd's, ceph pg repair, ceph pg 4.559
list_missing, ceph pg 4.559 mark_unfound_lost revert.
Nothing works.
I've just tried setting osd.105 out, waiting for backfill to evacuate
the osd and stopping the osd process to see if it'll recreate the 2nd
set of data but no luck.
It would seem that the primary copy of the data on osd.155 is fine but
the 2nd copy on osd.105 isn't there.

Any ideas how I can force rebuilding the 2nd copy? Or any other ideas
to resolve this?

We're running Hammer
ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)

Regards,
Richard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Objects Stuck Degraded

2017-01-24 Thread Mehmet
Perhaps a deep scrub will cause a scrub error Which you can try to ceph pg 
repair?

Btw. It seems that you use 2 replicas Which is not recommendet except for dev 
environments.

Am 24. Januar 2017 22:58:14 MEZ schrieb Richard Bade :
>Hi Everyone,
>I've got a strange one. After doing a reweight of some osd's the other
>night our cluster is showing 1pg stuck unclean.
>
>2017-01-25 09:48:41 : 1 pgs stuck unclean | recovery 140/71532872
>objects degraded (0.000%) | recovery 2553/71532872 objects misplaced
>(0.004%)
>
>When I query the pg it shows one of the osd's is not up.
>
>"state": "active+remapped",
>"snap_trimq": "[]",
>"epoch": 231928,
>"up": [
>155
>],
>"acting": [
>155,
>105
>],
>"actingbackfill": [
>"105",
>"155"
>],
>
>I've tried restarting the osd's, ceph pg repair, ceph pg 4.559
>list_missing, ceph pg 4.559 mark_unfound_lost revert.
>Nothing works.
>I've just tried setting osd.105 out, waiting for backfill to evacuate
>the osd and stopping the osd process to see if it'll recreate the 2nd
>set of data but no luck.
>It would seem that the primary copy of the data on osd.155 is fine but
>the 2nd copy on osd.105 isn't there.
>
>Any ideas how I can force rebuilding the 2nd copy? Or any other ideas
>to resolve this?
>
>We're running Hammer
>ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)
>
>Regards,
>Richard
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph mon unable to reach quorum

2017-01-24 Thread lee_yiu_ch...@yahoo.com



lee_yiu_ch...@yahoo.com 於 18/1/2017 11:17 寫道:

Dear all,

I have a ceph installation (dev site) with two nodes, each running a mon daemon 
and osd daemon.
(Yes, I know running a cluster of two mon is bad, but I have no choice since I 
only have two nodes.)

Now, the two nodes are migrated to another datacenter, but after it is booted 
up the mon daemon are
unable to reach quorum. How can I proceed? (If there is no way to recover, I 
can accept the loss but
I wish to know how to avoid this to happen again.)

Here is the mon_status output of the two nodes:

---

root@openstack003:/var/log/ceph# ceph daemon mon.openstack003 mon_status
{
"name": "openstack003",
"rank": 0,
"state": "electing",
"election_epoch": 45,
"quorum": [],
"outside_quorum": [],
"extra_probe_peers": [],
"sync_provider": [
754974721,
"mon.1 10.41.41.4:6789\/0",
"2017-01-18 02:53:49.425917",
5654786,
","
],
"monmap": {
"epoch": 10,
"fsid": "71861477-db77-4fab-a8f8-10d3b16e1722",
"modified": "2016-10-19 06:41:29.202924",
"created": "2016-10-19 06:26:24.911408",
"mons": [
{
"rank": 0,
"name": "openstack003",
"addr": "10.41.41.3:6789\/0"
},
{
"rank": 1,
"name": "openstack004",
"addr": "10.41.41.4:6789\/0"
}
]
}
}


root@openstack004:/var/log/ceph# ceph daemon mon.openstack004 mon_status
{
"name": "openstack004",
"rank": 1,
"state": "probing",
"election_epoch": 0,
"quorum": [],
"outside_quorum": [
"openstack004"
],
"extra_probe_peers": [],
"sync_provider": [],
"monmap": {
"epoch": 10,
"fsid": "71861477-db77-4fab-a8f8-10d3b16e1722",
"modified": "2016-10-19 06:41:29.202924",
"created": "2016-10-19 06:26:24.911408",
"mons": [
{
"rank": 0,
"name": "openstack003",
"addr": "10.41.41.3:6789\/0"
},
{
"rank": 1,
"name": "openstack004",
"addr": "10.41.41.4:6789\/0"
}
]
}
}




Here are the logs of the two mons:

2017-01-18 03:15:04.675296 7fc892173700  5 
mon.openstack003@0(electing).elector(45) start -- can i
be leader?
2017-01-18 03:15:04.675355 7fc892173700  1 
mon.openstack003@0(electing).elector(45) init, last seen
epoch 45
2017-01-18 03:15:04.675932 7fc892173700  1 -- 10.41.41.3:6789/0 --> mon.1 
10.41.41.4:6789/0 --
election(71861477-db77-4fab-a8f8-10d3b16e1722 propose 45) v5 -- ?+0 
0x55b488984700
2017-01-18 03:15:05.515390 7fc891972700  1 -- 10.41.41.3:6789/0 <== mon.1 
10.41.41.4:6789/0 675 
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6  
69+0+0 (72044430 0
0) 0x55b4889bf600 con 0x55b487666900
2017-01-18 03:15:05.515458 7fc891972700 10 mon.openstack003@0(electing) e10 
handle_probe
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6
2017-01-18 03:15:05.515463 7fc891972700 10 mon.openstack003@0(electing) e10 
handle_probe_probe mon.1
10.41.41.4:6789/0mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name 
openstack004) v6 features
576460752032874495
2017-01-18 03:15:05.515500 7fc891972700  1 -- 10.41.41.3:6789/0 --> 
10.41.41.4:6789/0 --
mon_probe(reply 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack003 paxos( 
fc 5654529 lc 5655078
)) v6 -- ?+0 0x55b4889bf340 con 0x55b487666900
2017-01-18 03:15:07.515552 7fc891972700  1 -- 10.41.41.3:6789/0 <== mon.1 
10.41.41.4:6789/0 676 
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6  
69+0+0 (72044430 0
0) 0x55b4889bf8c0 con 0x55b487666900
2017-01-18 03:15:07.515620 7fc891972700 10 mon.openstack003@0(electing) e10 
handle_probe
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6
2017-01-18 03:15:07.515625 7fc891972700 10 mon.openstack003@0(electing) e10 
handle_probe_probe mon.1
10.41.41.4:6789/0mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name 
openstack004) v6 features
576460752032874495
2017-01-18 03:15:07.515652 7fc891972700  1 -- 10.41.41.3:6789/0 --> 
10.41.41.4:6789/0 --
mon_probe(reply 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack003 paxos( 
fc 5654529 lc 5655078
)) v6 -- ?+0 0x55b4889bf600 con 0x55b487666900
2017-01-18 03:15:09.515709 7fc891972700  1 -- 10.41.41.3:6789/0 <== mon.1 
10.41.41.4:6789/0 677 
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6  
69+0+0 (72044430 0
0) 0x55b4889bfb80 con 0x55b487666900
2017-01-18 03:15:09.515777 7fc891972700 10 mon.openstack003@0(electing) e10 
handle_probe
mon_probe(probe 71861477-db77-4fab-a8f8-10d3b16e1722 name openstack004) v6
2017-01-18 03:15:09.515782 7fc891972700 10 mon.openstack003@0(electing) e10 
handle_probe_probe mon

Re: [ceph-users] Replacing an mds server

2017-01-24 Thread Wido den Hollander

> Op 24 januari 2017 om 22:08 schreef Goncalo Borges 
> :
> 
> 
> Hi Jorge
> Indeed my advice is to configure your high memory mds as a standby mds. Once 
> you restart the service in the low memory mds, the standby one should take 
> over without downtime and the first one becomes the standby one.

Yes, you can clear the old one afterwards.

Don't forget to remove the cephx key with 'ceph auth del' of that MDS 
afterwards.

Wido

> Cheers
> Goncalo
> 
> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Alex 
> Evonosky [alex.evono...@gmail.com]
> Sent: 25 January 2017 07:00
> To: Jorge Garcia
> Cc: Users, Ceph
> Subject: Re: [ceph-users] Replacing an mds server
> 
> just my own experience on this---
> 
> I have two MDS servers running (since I run cephFS).  I have the config 
> dictating both MDS servers in the ceph.conf file.
> 
> When I issue a "ceph -s"  I see the following:
> 
>  1/1/1 up {0=alpha=up:active}, 1 up:standby
> 
> 
> I have shut one MDS server down (current active) and the alternate server 
> becomes active.
> 
> 
> This is just an FYI, but others may have a better answer...
> 
> 
> 
> 
> On Tue, Jan 24, 2017 at 2:56 PM, Jorge Garcia 
> mailto:jgar...@soe.ucsc.edu>> wrote:
> I have been using a ceph-mds server that has low memory. I want to replace it 
> with a new system that has a lot more memory. How does one go about replacing 
> the ceph-mds server? I looked at the documentation, figuring I could remove 
> the current metadata server and add the new one, but the remove metadata 
> server section just says "Coming soon...". The same page also has a warning 
> about running multiple metadata servers. So am I stuck?
> 
> Thanks!
> 
> Jorge
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com