Re: [ceph-users] OSDs are down, don't know why

2016-01-18 Thread Jeff Epstein
age, be advised that any dissemination or copying of this message is prohibited. If you received this message erroneously, please notify the sender and delete it, together with any attachments. -Original Message----- From: Jeff Epstein [mailto:jeff.epst...@commerceguys.com] Sent: Monday, Jan

Re: [ceph-users] OSDs are down, don't know why

2016-01-18 Thread Jeff Epstein
n or copying of this message is prohibited. If you received this message erroneously, please notify the sender and delete it, together with any attachments. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jeff Epstein Sent: Friday, January 15, 2016 7:

[ceph-users] OSDs are down, don't know why

2016-01-15 Thread Jeff Epstein
Hello, I'm setting up a small test instance of ceph and I'm running into a situation where the OSDs are being shown as down, but I don't know why. Connectivity seems to be working. The OSD hosts are able to communicate with the MON hosts; running "ceph status" and "ceph osd in" from an OSD h

Re: [ceph-users] occasional failure to unmap rbd

2015-09-25 Thread Jeff Epstein
On 09/25/2015 02:28 PM, Jan Schermer wrote: What about /sys/block/krbdX/holders? Nothing in there? There is no /sys/block/krbd450, but there is /sys/block/rbd450. In our case, /sys/block/rbd450/holders is empty. Jeff ___ ceph-users mailing list ceph

Re: [ceph-users] occasional failure to unmap rbd

2015-09-25 Thread Jeff Epstein
On 09/25/2015 12:53 PM, Jan Schermer wrote: What are you looking for in lsof? Did you try looking for the major/minor number of the rbd device? Things that could hold the device are devicemapper, lvm, swraid and possibly many more, not sure if all that shows in lsof output... I searched for th

Re: [ceph-users] occasional failure to unmap rbd

2015-09-25 Thread Jeff Epstein
On 09/25/2015 12:38 PM, Ilya Dryomov wrote: On Fri, Sep 25, 2015 at 7:17 PM, Jeff Epstein wrote: We occasionally have a situation where we are unable to unmap an rbd. This occurs intermittently, with no obvious cause. For the most part, rbds can be unmapped fine, but sometimes we get this

[ceph-users] occasional failure to unmap rbd

2015-09-25 Thread Jeff Epstein
We occasionally have a situation where we are unable to unmap an rbd. This occurs intermittently, with no obvious cause. For the most part, rbds can be unmapped fine, but sometimes we get this: # rbd unmap /dev/rbd450 rbd: sysfs write failed rbd: unmap failed: (16) Device or resource busy Thin

[ceph-users] maximum number of mapped rbds?

2015-09-03 Thread Jeff Epstein
Hello, In response to an rbd map command, we are getting a "Device or resource busy". $ rbd -p platform map ceph:pzejrbegg54hi-stage-4ac9303161243dc71c75--php rbd: sysfs write failed rbd: map failed: (16) Device or resource busy We currently have over 200 rbds mapped on a single host. Can

Re: [ceph-users] long blocking with writes on rbds

2015-05-06 Thread Jeff Epstein
s now normal. Odd that no one here suggested this fix, and all the messing about with various topologies, placement groups, and so on, was for naught. Jeff On 04/09/2015 11:25 PM, Jeff Epstein wrote: As a follow-up to this issue, I'd like to point out some other things I've notice

Re: [ceph-users] long blocking with writes on rbds

2015-04-24 Thread Jeff Epstein
do you see any unusal message in a « ceph -w » output? Pastebin and we’ll see if we can spot something. As for the too few PGs, once we’ve found the root cause of why it’s slow, you’ll be able to adjust and increase the number of PGs per pool. Cheers JC On 9 Apr 2015, at 20:25, Jeff Epstein wrote:

Re: [ceph-users] long blocking with writes on rbds

2015-04-23 Thread Jeff Epstein
192.168.128.4:6800 socket closed (con state OPEN) Jeff On 04/23/2015 12:26 AM, Jeff Epstein wrote: Do you have some idea how I can diagnose this problem? I'll look at ceph -s output while you get these stuck process to see if there's any unusual activity (scrub/deep scrub/recovery/bacfill

Re: [ceph-users] long blocking with writes on rbds

2015-04-22 Thread Jeff Epstein
Do you have some idea how I can diagnose this problem? I'll look at ceph -s output while you get these stuck process to see if there's any unusual activity (scrub/deep scrub/recovery/bacfills/...). Is it correlated in any way with rbd removal (ie: write blocking don't appear unless you remo

Re: [ceph-users] long blocking with writes on rbds

2015-04-22 Thread Jeff Epstein
On 04/10/2015 10:10 AM, Lionel Bouton wrote: On 04/10/15 15:41, Jeff Epstein wrote: [...] This seems highly unlikely. We get very good performance without ceph. Requisitioning and manupulating block devices through LVM happens instantaneously. We expect that ceph will be a bit slower by

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-22 Thread Jeff Epstein
Hi Christian This sounds like the same problem we are having. We get long wait times on ceph nodes, with certain commands (in our case, mainly mkfs) blocking for long periods of time, stuck in a wait (and not read or write) state. We get the same warning messages in syslog, as well. Jeff On

Re: [ceph-users] long blocking with writes on rbds

2015-04-09 Thread Jeff Epstein
On 04/09/2015 03:14 AM, Christian Balzer wrote: Your 6 OSDs are on a single VM from what I gather? Aside from being a very small number for something that you seem to be using in some sort of production environment (Ceph gets faster the more OSDs you add), where is the redundancy, HA in that?

Re: [ceph-users] long blocking with writes on rbds

2015-04-08 Thread Jeff Epstein
Our workload involves creating and destroying a lot of pools. Each pool has 100 pgs, so it adds up. Could this be causing the problem? What would you suggest instead? ...this is most likely the cause. Deleting a pool causes the data and pgs associated with it to be deleted asynchronously, whic

Re: [ceph-users] long blocking with writes on rbds

2015-04-08 Thread Jeff Epstein
Hi, thanks for answering. Here are the answers to your questions. Hopefully they will be helpful. On 04/08/2015 12:36 PM, Lionel Bouton wrote: I probably won't be able to help much, but people knowing more will need at least: - your Ceph version, - the kernel version of the host on which you a

[ceph-users] long blocking with writes on rbds

2015-04-08 Thread Jeff Epstein
Hi, I'm having sporadic very poor performance running ceph. Right now mkfs, even with nodiscard, takes 30 mintes or more. These kind of delays happen often but irregularly .There seems to be no common denominator. Clearly, however, they make it impossible to deploy ceph in production. I report