Re: [ceph-users] Potential OSD deadlock?

2015-10-16 Thread Max A. Krasilnikov
Hello! On Fri, Oct 09, 2015 at 01:45:42PM +0200, jan wrote: > Have you tried running iperf between the nodes? Capturing a pcap of the > (failing) Ceph comms from both sides could help narrow it down. > Is there any SDN layer involved that could add overhead/padding to the frames? > What about s

[ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Richard Bade
Hi Everyone, I upgraded our cluster to Hammer 0.94.3 a couple of days ago and today we've had one monitor crash twice and another one once. We have 3 monitors total and have been running Firefly 0.80.10 for quite some time without any monitor issues. When the monitor crashes it leaves a core file a

Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Dan van der Ster
Hi, Is there a backtrace in /var/log/ceph/ceph-mon.*.log ? Cheers, Dan On Fri, Oct 16, 2015 at 12:46 PM, Richard Bade wrote: > Hi Everyone, > I upgraded our cluster to Hammer 0.94.3 a couple of days ago and today we've > had one monitor crash twice and another one once. We have 3 monitors total >

Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Richard Bade
Thanks for your quick response Dan, but no. All the ceph-mon.*.log files are empty. I did track this down in syslog though, in case it helps: ceph-mon: 2015-10-16 21:25:00.117115 7f4c9f458700 -1 *** Caught signal (Segmentation fault) **#012 in thread 7f4c9f458700#012#012 ceph version 0.94.3 (95cefe

Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Dan van der Ster
Hmm, that's strange. I didn't see anything in the tracker that looks related. Hopefully an expert can chime in... Cheers, Dan On Fri, Oct 16, 2015 at 1:38 PM, Richard Bade wrote: > Thanks for your quick response Dan, but no. All the ceph-mon.*.log files are > empty. > I did track this down in sy

Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Sage Weil
This doesn't look familiar. Are you able to enable a higher log level so that if it happens again we'll have more info? debug mon = 20 debug ms = 1 Thanks! sage On Fri, 16 Oct 2015, Dan van der Ster wrote: > Hmm, that's strange. I didn't see anything in the tracker that looks > related. Hopef

[ceph-users] Error after upgrading to Infernalis

2015-10-16 Thread German Anders
Hi all, I'm trying to upgrade a ceph cluster (prev hammer release 0.94.3) to the last release of *infernalis* (9.1.0-61-gf2b9f89). So far so good while upgrading the mon servers, all work fine. But then when trying to upgrade the OSD servers I got an error while trying to start the osd services ag

Re: [ceph-users] Error after upgrading to Infernalis

2015-10-16 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 You need to make sure that you go through the 0.94.4 (not yet released version) before the OSD will boot in the latest Infernalis. You can get the packages from gitbuilder.ceph.com in the Hammer branch. Install the packages (downgrade), and start up

Re: [ceph-users] Cache Tiering Question

2015-10-16 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 OK, I've set this up and now all I/O is locked up. I've reduced target_max_bytes because one OSD was reporting 97% usage, there was some I/O for a few seconds as things flushed, but client I/O is still blocked. Anyone have some thoughts? ceph osd cr

Re: [ceph-users] Cache Tiering Question

2015-10-16 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I started another fio test to one of the same RBDs (leaving the hung ones still hung) and it is working OK, but the hungs ones are still just hung. - Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1

Re: [ceph-users] Cache Tiering Question

2015-10-16 Thread Sage Weil
On Fri, 16 Oct 2015, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > I started another fio test to one of the same RBDs (leaving the hung > ones still hung) and it is working OK, but the hungs ones are still > just hung. There is a full-disk failsafe that is still so

Re: [ceph-users] Cache Tiering Question

2015-10-16 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Is the only option to restart the librbd client in this case? Anything I can do to help resolve it? Thanks, - Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Fri, Oct 16, 2015 at 10:17 AM, Sage

Re: [ceph-users] Cache Tiering Question

2015-10-16 Thread Sage Weil
On Fri, 16 Oct 2015, Robert LeBlanc wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > Is the only option to restart the librbd client in this case? Anything > I can do to help resolve it? If you know which OSD the request is outstanding against (ceph daemon objecter_requests) you c

Re: [ceph-users] ceph-mon crash after update to Hammer 0.94.3 from Firefly 0.80.10

2015-10-16 Thread Richard Bade
Ok, debugging increased ceph tell mon.[abc] injectargs --debug-mon 20 ceph tell mon.[abc] injectargs --debug-ms 1 Regards, Richard On 17 October 2015 at 01:38, Sage Weil wrote: > This doesn't look familiar. Are you able to enable a higher log level so > that if it happens again we'll have more

[ceph-users] CephFS and page cache

2015-10-16 Thread Burkhard Linke
Hi, I've noticed that CephFS (both ceph-fuse and kernel client in version 4.2.3) remove files from page cache as soon as they are not in use by a process anymore. Is this intended behaviour? We use CephFS as a replacement for NFS in our HPC cluster. It should serve large files which are read

Re: [ceph-users] v9.1.0 Infernalis release candidate released

2015-10-16 Thread Alfredo Deza
Trusty has just been pushed out and should be ready to be used right away. We didn't realize this until today, sorry! -Alfredo On Wed, Oct 14, 2015 at 9:24 PM, Sage Weil wrote: > On Thu, 15 Oct 2015, Francois Lafont wrote: > >> Sorry, another remark. >> >> On 13/10/2015 23:01, Sage Weil wrote: >

[ceph-users] qemu-img error connecting

2015-10-16 Thread wikison
After I set up the Ceph cluster. I tried to create a block device image from QEMU. But I got this: $ qemu-img create -f raw rbd:rbd/test 20G Formatting 'rbd:rbd/test', fmt=raw size=21474836480 qemu-img: rbd:rbd/test: error connecting There is a pool named rbd, and the output of ceph -s is: