[ceph-users] ceph admin node

2015-10-06 Thread gjprabu
Hi Team, If i lost admin node, what will be the recovery procedure with same keys. Regards Prabu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Max A. Krasilnikov
Hello! On Mon, Oct 05, 2015 at 09:35:26PM -0600, robert wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > With some off-list help, we have adjusted > osd_client_message_cap=1. This seems to have helped a bit and we > have seen some OSDs have a value up to 4,000 for client messages

[ceph-users] avoid 3-mds fs laggy on 1 rejoin?

2015-10-06 Thread Dzianis Kahanovich
Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin? ceph version 0.94.3-242-g79385a8 (79385a85beea9bccd82c99b6bda653f0224c4fcd) I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapshot) to cephfs (at least I can backup it). May be I just don't see it before, may

Re: [ceph-users] avoid 3-mds fs laggy on 1 rejoin?

2015-10-06 Thread John Spray
On Tue, Oct 6, 2015 at 11:43 AM, Dzianis Kahanovich wrote: > Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin? > > ceph version 0.94.3-242-g79385a8 (79385a85beea9bccd82c99b6bda653f0224c4fcd) > > I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapshot) to > cephf

Re: [ceph-users] avoid 3-mds fs laggy on 1 rejoin?

2015-10-06 Thread Dzianis Kahanovich
John Spray пишет: On Tue, Oct 6, 2015 at 11:43 AM, Dzianis Kahanovich wrote: Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin? ceph version 0.94.3-242-g79385a8 (79385a85beea9bccd82c99b6bda653f0224c4fcd) I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapsho

Re: [ceph-users] avoid 3-mds fs laggy on 1 rejoin?

2015-10-06 Thread Dzianis Kahanovich
PS This is standard 3 node (MON+MDS+OSDs - initial 3x setup) cluster + 1 OSDs later node. Nothing special. OSDs balanced near equal size per host. Dzianis Kahanovich пишет: John Spray пишет: On Tue, Oct 6, 2015 at 11:43 AM, Dzianis Kahanovich wrote: Short: how to sure avoid (if possible) fs

Re: [ceph-users] avoid 3-mds fs laggy on 1 rejoin?

2015-10-06 Thread John Spray
On Tue, Oct 6, 2015 at 12:07 PM, Dzianis Kahanovich wrote: > John Spray пишет: >> >> On Tue, Oct 6, 2015 at 11:43 AM, Dzianis Kahanovich >> wrote: >>> >>> Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin? >>> >>> ceph version 0.94.3-242-g79385a8 >>> (79385a85beea9bccd82c99b6

Re: [ceph-users] avoid 3-mds fs laggy on 1 rejoin?

2015-10-06 Thread Dzianis Kahanovich
John Spray пишет: Short: how to sure avoid (if possible) fs freezes on 1 of 3 mds rejoin? ceph version 0.94.3-242-g79385a8 (79385a85beea9bccd82c99b6bda653f0224c4fcd) I moving 2 VM clients from ocfs2 (starting to deadlock VM on snapshot) to cephfs (at least I can backup it). May be I just don't

[ceph-users] Placement rule not resolved

2015-10-06 Thread ghislain.chevalier
Hi, Context: Firefly 0.80.9 8 storage nodes 176 osds : 14*8 sas and 8*8 ssd 3 monitors I create an alternate crushmap in order to fulfill tiering requirement i.e. select ssd or sas. I created specific buckets "host-ssd" and "host-sas" and regroup them in "tier-ssd" and "tier-sas" under a "tier-

Re: [ceph-users] avoid 3-mds fs laggy on 1 rejoin?

2015-10-06 Thread John Spray
On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich wrote: > Even now I remove "mds standby replay = true": > e7151: 1/1/1 up {0=b=up:active}, 2 up:standby > Cluster stuck on KILL active mds.b. How to correctly stop mds to get > behaviour like on MONs - leader->down/peon->leader? It's not clear to

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Sage Weil
On Mon, 5 Oct 2015, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > With some off-list help, we have adjusted > osd_client_message_cap=1. This seems to have helped a bit and we > have seen some OSDs have a value up to 4,000 for client messages. But > it does not s

Re: [ceph-users] avoid 3-mds fs laggy on 1 rejoin?

2015-10-06 Thread Dzianis Kahanovich
John Spray пишет: On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich wrote: Even now I remove "mds standby replay = true": e7151: 1/1/1 up {0=b=up:active}, 2 up:standby Cluster stuck on KILL active mds.b. How to correctly stop mds to get behaviour like on MONs - leader->down/peon->leader? It'

Re: [ceph-users] avoid 3-mds fs laggy on 1 rejoin?

2015-10-06 Thread Dzianis Kahanovich
Sorry, skipped some... John Spray пишет: On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich wrote: Even now I remove "mds standby replay = true": e7151: 1/1/1 up {0=b=up:active}, 2 up:standby Cluster stuck on KILL active mds.b. How to correctly stop mds to get behaviour like on MONs - leader->d

Re: [ceph-users] avoid 3-mds fs laggy on 1 rejoin?

2015-10-06 Thread John Spray
On Tue, Oct 6, 2015 at 2:21 PM, Dzianis Kahanovich wrote: > John Spray пишет: >> >> On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich >> wrote: >>> >>> Even now I remove "mds standby replay = true": >>> e7151: 1/1/1 up {0=b=up:active}, 2 up:standby >>> Cluster stuck on KILL active mds.b. How to

Re: [ceph-users] Can't mount cephfs to host outside of cluster

2015-10-06 Thread Gregory Farnum
On Mon, Oct 5, 2015 at 11:21 AM, Egor Kartashov wrote: > Hello! > > I have cluster of 3 machines with ceph 0.80.10 (package shipped with Ubuntu > Trusty). Ceph sucessfully mounts on all of them. On external machine I'm > reciving error "can't read superblock" and dmesg shows records like: > > [1

Re: [ceph-users] Correct method to deploy on jessie

2015-10-06 Thread Gregory Farnum
On Mon, Oct 5, 2015 at 10:36 PM, Dmitry Ogorodnikov wrote: > Good day, > > I think I will use wheezy for now for tests. Bad thing is wheezy full > support ends in 5 months, so wheezy is not ok for persistent production > cluster. > > I cant find out what ceph team offer to debian users, move to ot

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Robert LeBlanc
Thanks for your time Sage. It sounds like a few people may be helped if you can find something. I did a recursive chown as in the instructions (although I didn't know about the doc at the time). I did an osd debug at 20/20 but didn't see anything. I'll also do ms and make the logs available. I'll

Re: [ceph-users] memory stats

2015-10-06 Thread Gregory Farnum
On Mon, Oct 5, 2015 at 10:40 PM, Serg M wrote: > What difference between memory statistics of "ceph tell {daemon}.{id} heap > stats" Assuming you're using tcmalloc (by default you are) this will get information straight from the memory allocator about what the actual daemon memory usage is. > ,

Re: [ceph-users] Correct method to deploy on jessie

2015-10-06 Thread Chad William Seys
> Most users in the apt family have deployed on Ubuntu > though, and that's what our tests run on, fyi. That is good to know - I wouldn't be surprised if the same packages could be used in Ubuntu and Debian. Especially if the release dates of the Ubuntu and Debian versions were similar. Thanks

Re: [ceph-users] Can't mount cephfs to host outside of cluster

2015-10-06 Thread Egor Kartashov
All four machines are located in different datacenters and networks. All that networks are routable with each other. Public network section of ceph.conf contains all that networks. -- С уважением, Карташов Егор http://staff/kartvep 06.10.2015, 17:23, "Gregory Farnum" : > On Mon, Oct 5, 2015

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Sage Weil
On Tue, 6 Oct 2015, Robert LeBlanc wrote: > Thanks for your time Sage. It sounds like a few people may be helped if you > can find something. > > I did a recursive chown as in the instructions (although I didn't know about > the doc at the time). I did an osd debug at 20/20 but didn't see anything

Re: [ceph-users] Correct method to deploy on jessie

2015-10-06 Thread Alfredo Deza
On Tue, Oct 6, 2015 at 10:29 AM, Gregory Farnum wrote: > On Mon, Oct 5, 2015 at 10:36 PM, Dmitry Ogorodnikov > wrote: >> Good day, >> >> I think I will use wheezy for now for tests. Bad thing is wheezy full >> support ends in 5 months, so wheezy is not ok for persistent production >> cluster. >>

Re: [ceph-users] avoid 3-mds fs laggy on 1 rejoin?

2015-10-06 Thread Dzianis Kahanovich
John Spray пишет: On Tue, Oct 6, 2015 at 2:21 PM, Dzianis Kahanovich wrote: John Spray пишет: On Tue, Oct 6, 2015 at 1:22 PM, Dzianis Kahanovich wrote: Even now I remove "mds standby replay = true": e7151: 1/1/1 up {0=b=up:active}, 2 up:standby Cluster stuck on KILL active mds.b. How to co

[ceph-users] Poor Read Performance with Ubuntu 14.04 LTS 3.19.0-30 Kernel

2015-10-06 Thread MailingLists - EWS
I have encountered a rather interesting issue with Ubuntu 14.04 LTS running 3.19.0-30 kernel (Vivid) using Ceph Hammer (0.94.3). With everything else identical in our testing cluster, no other changes other than the kernel (apt-get install linux-image-generic-lts-vivid and then a reboot), we ar

Re: [ceph-users] Poor Read Performance with Ubuntu 14.04 LTS 3.19.0-30 Kernel

2015-10-06 Thread Mark Nelson
On 10/06/2015 10:14 AM, MailingLists - EWS wrote: I have encountered a rather interesting issue with Ubuntu 14.04 LTS running 3.19.0-30 kernel (Vivid) using Ceph Hammer (0.94.3). With everything else identical in our testing cluster, no other changes other than the kernel (apt-get install linux-

Re: [ceph-users] Poor Read Performance with Ubuntu 14.04 LTS 3.19.0-30 Kernel

2015-10-06 Thread Quentin Hartman
Could you share some of your testing methodology? I'd like to repeat your tests. I have a cluster that is currently running mostly 3.13 kernels, but the latest patch of that version breaks the onboard 1Gb NIC in the servers I'm using. I recently had to redeploy several of these servers due to SSD

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Robert LeBlanc
I downgraded to the hammer gitbuilder branch, but it looks like I've passed the point of no return: 2015-10-06 09:44:52.210873 7fd3dd8b78c0 -1 ERROR: on disk data includes unsupported features: compat={},rocompat={},incompat={7=support shec erasure code} 2015-10-06 09:44:52.210922 7fd3dd8b78c0 -1

Re: [ceph-users] Placement rule not resolved

2015-10-06 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I've only done a 'step take ' where is a root entry. I haven't tried with it being under the root. I would suspect it would work, but you can try to put your tiers in a root section and test it there. - Robert LeBlanc PGP Fingerprin

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Sage Weil
On Tue, 6 Oct 2015, Robert LeBlanc wrote: > I downgraded to the hammer gitbuilder branch, but it looks like I've > passed the point of no return: > > 2015-10-06 09:44:52.210873 7fd3dd8b78c0 -1 ERROR: on disk data > includes unsupported features: > compat={},rocompat={},incompat={7=support shec era

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Ken Dreyer
On Tue, Oct 6, 2015 at 8:38 AM, Sage Weil wrote: > Oh.. I bet you didn't upgrade the osds to 0.94.4 (or latest hammer build) > first. They won't be allowed to boot until that happens... all upgrades > must stop at 0.94.4 first. This sounds pretty crucial. is there Redmine ticket(s)? - Ken _

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Sage Weil
On Tue, 6 Oct 2015, Ken Dreyer wrote: > On Tue, Oct 6, 2015 at 8:38 AM, Sage Weil wrote: > > Oh.. I bet you didn't upgrade the osds to 0.94.4 (or latest hammer build) > > first. They won't be allowed to boot until that happens... all upgrades > > must stop at 0.94.4 first. > > This sounds pretty

Re: [ceph-users] Poor Read Performance with Ubuntu 14.04 LTS 3.19.0-30 Kernel

2015-10-06 Thread MailingLists - EWS
> Hi, > > Very interesting! Did you upgrade the kernel on both the OSDs and clients or > just some of them? I remember there were some kernel performance > regressions a little while back. You might try running perf during your tests > and look for differences. Also, iperf might be worth tryin

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 This was from the monitor (can't bring it up with Hammer now, complete cluster is down, this is only my lab, so no urgency). I got it up and running this way: 1. Upgrade the mon node to Infernalis and started the mon. 2. Downgraded the OSDs to to-be

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 OK, an interesting point. Running ceph version 9.0.3-2036-g4f54a0d (4f54a0dd7c4a5c8bdc788c8b7f58048b2a28b9be) looks a lot better. I got messages when the OSD was marked out: 2015-10-06 11:52:46.961040 osd.13 192.168.55.12:6800/20870 81 : cluster [WR

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Sage Weil
On Tue, 6 Oct 2015, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > OK, an interesting point. Running ceph version 9.0.3-2036-g4f54a0d > (4f54a0dd7c4a5c8bdc788c8b7f58048b2a28b9be) looks a lot better. I got > messages when the OSD was marked out: > > 2015-10-06 11:52:

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I can't think of anything. In my dev cluster the only thing that has changed is the Ceph versions (no reboot). What I like is even though the disks are 100% utilized, it is preforming as I expect now. Client I/O is slightly degraded during the recove

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Sage Weil
On Tue, 6 Oct 2015, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > I can't think of anything. In my dev cluster the only thing that has > changed is the Ceph versions (no reboot). What I like is even though > the disks are 100% utilized, it is preforming as I expect

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I'll capture another set of logs. Is there any other debugging you want turned up? I've seen the same thing where I see the message dispatched to the secondary OSD, but the message just doesn't show up for 30+ seconds in the secondary OSD logs. - ---

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On my second test (a much longer one), it took nearly an hour, but a few messages have popped up over a 20 window. Still far less than I have been seeing. - Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62

Re: [ceph-users] Poor Read Performance with Ubuntu 14.04 LTS 3.19.0-30 Kernel

2015-10-06 Thread Nick Fisk
I'm wondering if you are hitting the "bug" with the readahead changes? I know the changes to limit readahead to 2MB was introduced in 3.15, but I don't know if it was back ported into 3.13 or not. I have a feeling this may also limit maximum request size to 2MB as well. If you look in iostat do y

Re: [ceph-users] Potential OSD deadlock?

2015-10-06 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I upped the debug on about everything and ran the test for about 40 minutes. I took OSD.19 on ceph1 doen and then brought it back in. There was at least one op on osd.19 that was blocked for over 1,000 seconds. Hopefully this will have something that

[ceph-users] Cache tier experiences (for ample sized caches ^o^)

2015-10-06 Thread Christian Balzer
Hello, a bit of back story first, it may prove educational for others a future generations. As some may recall, I have a firefly production cluster with a storage node design that was both optimized for the use case at the time and with an estimated capacity to support 140 VMs (all running the s

[ceph-users] proxmox 4.0 release : lxc with krbd support and qemu librbd improvements

2015-10-06 Thread Alexandre DERUMIER
Hi, proxmox 4.0 has been released: http://forum.proxmox.com/threads/23780-Proxmox-VE-4-0-released! Some ceph improvements : - lxc containers with krbd support (multiple disks + snapshots) - qemu with jemalloc support (improve librbd performance) - qemu iothread option by disk (improve scaling

[ceph-users] pgs stuck inactive and unclean, too feww PGs per OSD

2015-10-06 Thread wikison
Hi, I have a cluster of one monitor and eight OSDs. These OSDs are running on four hosts(each host has two OSDs). When I set up everything and started Ceph, I got this: esta@monitorOne:~$ sudo ceph -s [sudo] password for esta: cluster 0b9b05db-98fe-49e6-b12b-1cce0645c015 health HEALTH_W

Re: [ceph-users] pgs stuck inactive and unclean, too feww PGs per OSD

2015-10-06 Thread Christian Balzer
Hello, On Wed, 7 Oct 2015 12:57:58 +0800 (CST) wikison wrote: This is a very old bug, misfeature. And creeps up every week or so here, google is your friend. > Hi, > I have a cluster of one monitor and eight OSDs. These OSDs are running > on four hosts(each host has two OSDs). When I set up ev

Re: [ceph-users] Cache tier experiences (for ample sized caches ^o^)

2015-10-06 Thread Loic Dachary
Hi Christian, Interesting use case :-) How many OSDs / hosts do you have ? And how are they connected together ? Cheers On 07/10/2015 04:58, Christian Balzer wrote: > > Hello, > > a bit of back story first, it may prove educational for others a future > generations. > > As some may recall, I