[ceph-users] Synchronous writes - tuning and some thoughts about them?

2015-05-25 Thread Jan Schermer
Hi, I have a full-ssd cluster on my hands, currently running Dumpling, with plans to upgrade soon, and Openstack with RBD on top of that. While I am overall quite happy with the performance (scales well accross clients), there is one area where it really fails bad - big database workloads. Typi

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-25 Thread Yan, Zheng
the kernel client bug should be fixed by https://github.com/ceph/ceph-client/commit/72f22efb658e6f9e126b2b0fcb065f66ffd02239 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Replacing OSD disks with SSD journal - journal disk space use

2015-05-25 Thread Eneko Lacunza
Hi all, We have a firefly ceph cluster (using Promxox VE, but I don't think this is revelant), and found a OSD disk was having quite a high amount of errors as reported by SMART, and also quite high wait time as reported by munin, so we decided to replace it. What I have done is down/out the

Re: [ceph-users] Synchronous writes - tuning and some thoughts about them?

2015-05-25 Thread Nick Fisk
Hi Jan, I share your frustrations with slow sync writes. I'm exporting RBD's via iSCSI to ESX, which seems to do most operations in 64k sync IO's. You can do a fio run and impress yourself with the numbers that you can get out of the cluster, but this doesn't translate into what you can achieve

Re: [ceph-users] Synchronous writes - tuning and some thoughts about them?

2015-05-25 Thread Jan Schermer
Hi Nick, flashcache doesn’t support barriers, so I haven’t even considered it. I used a few years ago to speed up some workloads out of curiosity and it worked well, but I can’t use it to cache this kind of workload. EnhanceIO passed my initial testing (although the documentation is very sketch

[ceph-users] ceph-users mailing list

2015-05-25 Thread heyun
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] radosgw load/performance/crashing

2015-05-25 Thread Daniel Hoffman
Hi All. We are trying to cope with radosGW crashing every 5-15mins. This seems to be getting worse and worse but we are unable to determine the cause, nothing in the logs as it appears to be a radosgw hang. The port is open, accepts a connect but there is no response to a HEAD/GET etc etc. We ar

[ceph-users] radosgw load/performance/crashing

2015-05-25 Thread Daniel Hoffman
Hi All. We are trying to cope with radosGW crashing every 5-15mins. This seems to be getting worse and worse but we are unable to determine the cause, nothing in the logs as it appears to be a radosgw hang. The port is open, accepts a connect but there is no response to a HEAD/GET etc etc. We ar

[ceph-users] Multi-Object delete and RadosGW

2015-05-25 Thread Daniel Hoffman
Has anyone come accross a problem with multi-object deletes. We have a number of systems that send we think are sending big piles of POST/XML and multi-object deletes. Has anyone had any experience with this locking up civetweb or apache/fast_cgi threads. Are there any tunable settings we could u

[ceph-users] Performance and CPU load on HP servers running ceph (DL380 G6, should apply to others too)

2015-05-25 Thread Tuomas Juntunen
Hi I wanted to share my findings of running ceph on HP servers. We had a lot of problems with CPU load, which was sometimes even 800. We were trying to figure out why this happens even while not doing anything special. Our OSD nodes are running DL380 G6 with Dual Quad core cpu's and 32g

[ceph-users] Blocked requests/ops?

2015-05-25 Thread Xavier Serrano
Hello, We have observed that our cluster is often moving back and forth from HEALTH_OK to HEALTH_WARN states due to "blocked requests". We have also observed "blocked ops". For instance: # ceph status cluster 905a1185-b4f0-4664-b881-f0ad2d8be964 health HEALTH_WARN 1 requests

Re: [ceph-users] Blocked requests/ops?

2015-05-25 Thread Christian Balzer
Hello, Firstly, find my "Unexplainable slow request" thread in the ML archives and read all of it. On Tue, 26 May 2015 07:05:36 +0200 Xavier Serrano wrote: > Hello, > > We have observed that our cluster is often moving back and forth > from HEALTH_OK to HEALTH_WARN states due to "blocked reque