Re: [ceph-users] PG's incomplete after OSD failure

2014-11-12 Thread Chad Seys
Would love to hear if you discover a way to get zapping incomplete PGs! Perhaps this is a common enough issue to open an issue? Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

2014-11-11 Thread Chad Seys
Find out which OSD it is: ceph health detail Squeeze blocks off the affected OSD: ceph osd reweight OSDNUM 0.8 Repeat with any OSD which becomes toofull. Your cluster is only about 50% used, so I think this will be enough. Then when it finishes, allow data back on OSD: ceph osd reweight OSDN

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-11 Thread Chad Seys
Thanks Craig, I'll jiggle the OSDs around to see if that helps. Otherwise, I'm almost certain removing the pool will work. :/ Have a good one, Chad. > I had the same experience with force_create_pg too. > > I ran it, and the PGs sat there in creating state. I left the cluster > overnight, and

[ceph-users] long term support version?

2014-11-11 Thread Chad Seys
Hi all, Did I notice correctly that firefly is going to be supported "long term" whereas Giant is not going to be supported as long? http://ceph.com/releases/v0-80-firefly-released/ This release will form the basis for our long-term supported release Firefly, v0.80.x. http://ceph.com/uncategor

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-10 Thread Chad Seys
Hi Craig, > If all of your PGs now have an empty down_osds_we_would_probe, I'd run > through this discussion again. Yep, looks to be true. So I ran: # ceph pg force_create_pg 2.5 and it has been creating for about 3 hours now. :/ # ceph health detail | grep creating pg 2.5 is stuck inactive

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-10 Thread Chad Seys
Hi Craig and list, > > > If you create a real osd.20, you might want to leave it OUT until you > > > get things healthy again. I created a real osd.20 (and it turns out I needed an osd.21 also). ceph pg x.xx query no longer lists down osds for probing: "down_osds_we_would_probe": [], But I ca

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-06 Thread Chad Seys
Hi Craig, > You'll have trouble until osd.20 exists again. > > Ceph really does not want to lose data. Even if you tell it the osd is > gone, ceph won't believe you. Once ceph can probe any osd that claims to > be 20, it might let you proceed with your recovery. Then you'll probably > need to

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-06 Thread Chad Seys
Hi Sam, > > Amusingly, that's what I'm working on this week. > > > > http://tracker.ceph.com/issues/7862 Well, thanks for any bugfixes in advance! :) > Also, are you certain that osd 20 is not up? > -Sam Yep. # ceph osd metadata 20 Error ENOENT: osd.20 does not exist So part of ceph thinks

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-05 Thread Chad Seys
Hi Sam, > 'ceph pg query'. Thanks. Looks like ceph is looking for and osd.20 which no longer exists: "probing_osds": [ "1", "7", "15", "16"], "down_osds_we_would_probe": [ 20], So perhaps during

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-05 Thread Chad Seys
Hi Sam, > Incomplete usually means the pgs do not have any complete copies. Did > you previously have more osds? No. But could have OSDs quitting after hitting assert(0 == "we got a bad state machine event"), or interacting with kernel 3.14 clients have caused the incomplete copies? How can

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-04 Thread Chad Seys
On Monday, November 03, 2014 17:34:06 you wrote: > If you have osds that are close to full, you may be hitting 9626. I > pushed a branch based on v0.80.7 with the fix, wip-v0.80.7-9626. > -Sam Thanks Sam I may have been hitting that as well. I certainly hit too_full conditions often. I am abl

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-03 Thread Chad Seys
> > No, it is a change, I just want to make sure I understand the > scenario. So you're reducing CRUSH weights on full OSDs, and then > *other* OSDs are crashing on these bad state machine events? That is right. The other OSDs shutdown sometime later. (Not immediately.) I really haven't tested

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-03 Thread Chad Seys
On Monday, November 03, 2014 13:50:05 you wrote: > On Mon, Nov 3, 2014 at 11:41 AM, Chad Seys wrote: > > On Monday, November 03, 2014 13:22:47 you wrote: > >> Okay, assuming this is semi-predictable, can you start up one of the > >> OSDs that is going to fail wi

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-03 Thread Chad Seys
On Monday, November 03, 2014 13:22:47 you wrote: > Okay, assuming this is semi-predictable, can you start up one of the > OSDs that is going to fail with "debug osd = 20", "debug filestore = > 20", and "debug ms = 1" in the config file and then put the OSD log > somewhere accessible after it's cras

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-03 Thread Chad Seys
> There's a "ceph osd metadata" command, but i don't recall if it's in > Firefly or only giant. :) It's in firefly. Thanks, very handy. All the OSDs are running 0.80.7 at the moment. What next? Thanks again, Chad. ___ ceph-users mailing list ceph-us

Re: [ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-03 Thread Chad Seys
P.S. The OSDs interacted with some 3.14 krbd clients before I realized that kernel version was too old for the firefly CRUSH map. Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] emperor -> firefly 0.80.7 upgrade problem

2014-11-03 Thread Chad Seys
Hi All, I upgraded from emperor to firefly. Initial upgrade went smoothly and all placement groups were active+clean . Next I executed 'ceph osd crush tunables optimal' to upgrade CRUSH mapping. Now I keep having OSDs go down or have requests blocked for long periods of time. I start

Re: [ceph-users] CRUSH depends on host + OSD?

2014-10-21 Thread Chad Seys
Hi Craig, > It's part of the way the CRUSH hashing works. Any change to the CRUSH map > causes the algorithm to change slightly. Dan@cern could not replicate my observations, so I plan to follow his procedure (fake create an OSD, wait for rebalance, remove fake OSD) in the near future to see i

Re: [ceph-users] CRUSH depends on host + OSD?

2014-10-16 Thread Chad Seys
Hi Dan, I'd like to decommission a node to reproduce the problem and post enough information for you (at least) to understand what is going on. Unfortunately I'm a ceph newbie, so I'm not sure what info would be of interest before/during the drain. Probably the crushmap would be of interest

Re: [ceph-users] CRUSH depends on host + OSD?

2014-10-15 Thread Chad Seys
Hi Dan, I'm using Emperor (0.72). Though I would think CRUSH maps have not changed that much btw versions? > That sounds bizarre to me, and I can't reproduce it. I added an osd (which > was previously not in the crush map) to a fake host=test: > >ceph osd crush create-or-move osd.52 1.0 r

Re: [ceph-users] CRUSH depends on host + OSD?

2014-10-15 Thread Chad Seys
Hi Mariusz, > Usually removing OSD without removing host happens when you > remove/replace dead drives. > > Hosts are in map so > > * CRUSH wont put 2 copies on same node > * you can balance around network interface speed That does not answer the original question IMO: "Why does the CRUSH map d

[ceph-users] CRUSH depends on host + OSD?

2014-10-15 Thread Chad Seys
Hi all, When I remove all OSDs on a given host, then wait for all objects (PGs?) to be to be active+clean, then remove the host (ceph osd crush remove hostname), that causes the objects to shuffle around the cluster again. Why does the CRUSH map depend on hosts that no longer have OSDs on the

[ceph-users] script for commissioning a node with multiple osds, added to cluster as a whole

2014-08-29 Thread Chad Seys
Hi All, Does anyone have a script or sequence of commands to prepare all drives on a single computer for use by ceph, and then start up all OSDs on the computer at one time? I feel this would be faster and less network traffic than adding one drive at a time, which is what the current script

[ceph-users] decreasing pg_num?

2014-07-22 Thread Chad Seys
Hi All, Is it possible to decrease pg_num? I was able to decrease pgp_num, but when I try to decrease pg_num I get an error: # ceph osd pool set tibs pg_num 1024 specified pg_num 1024 <= current 2048 Thanks! C. ___ ceph-users mailing list ceph-users

[ceph-users] limitations of erasure coded pools

2014-06-26 Thread Chad Seys
Thanks for the link Blairo! I can think of a use case already! (combo replicated pool / erasure pool for a virtual tape library) ! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] limitations of erasure coded pools

2014-06-24 Thread Chad Seys
Hi All, Could someone point me to a document (possibly a FAQ :) ) describing the limitations of erasure coded pools? Hopefully it would contain the when and how to use them as well. E.g. I read about people using replicated pools as a front end to erasure coded pools, but I don't know why

Re: [ceph-users] qemu/librbd versus qemu/kernel module rbd

2014-06-20 Thread Chad Seys
Hi John, Thanks for the reply! Yes, I agree Ceph is exciting! Keep up the good work! > Using librbd, as you've pointed out, doesn't run afoul of potential Linux > kernel deadlocks; however, you normally wouldn't encounter this type of > situation in a production cluster anyway as you'd likel

[ceph-users] qemu/librbd versus qemu/kernel module rbd

2014-06-20 Thread Chad Seys
Hi All, What are the pros and cons of running a virtual machine (with qemu-kvm) whose image is accessed via librbd or by mounting /dev/rbdX ? I've heard that the librbd method has the advantage of not being vulnerable to deadlocks due to memory allocation problems. ? Would one also benefit

Re: [ceph-users] /etc/ceph/rbdmap

2014-06-19 Thread Chad Seys
> This is for mapping kernel rbd devices on system startup, and belong with > ceph-common (which hasn't yet been but soon will be split out from ceph) Great! Yeah, I was hoping to map /dev/rbd without installing all the ceph daemons! > along with the 'rbd' cli utility. It isn't directly relat

[ceph-users] /etc/ceph/rbdmap

2014-06-19 Thread Chad Seys
Hi all, Also /etc/ceph/rbdmap in librbd1 rather than ceph? Thanks, Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] /etc/init.d/rbdmap

2014-06-19 Thread Chad Seys
Hi all, Shouldn't /etc/init.d/rbdmap be in the librbd package rather than in "ceph"? Thanks, Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd_recovery_max_single_start

2014-04-24 Thread Chad Seys
a > given PG will start up to 5 recovery operations at time of a total of 15 > operations active at a time. This allows recovery to spread operations > across more or less PGs at any given time. > > David Zafman > Senior Developer > http://www.inktank.com > > On Apr 24,

[ceph-users] osd_recovery_max_single_start

2014-04-24 Thread Chad Seys
Hi All, What does osd_recovery_max_single_start do? I could not find a description of it. Thanks! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] newb question: how to apply and check config

2014-04-23 Thread Chad Seys
Thanks for the tip Brian! Chad. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] newb question: how to apply and check config

2014-04-23 Thread Chad Seys
Hello all, I want to set the following value for ceph: osd recovery max active = 1 Where do I place this setting? And how do I ensure that it is active? Do I place it only in /etc/ceph/ceph.conf on the monitor in a section like so: [osd] osd recovery max active = 1 Or do I have to place i

Re: [ceph-users] create multiple OSDs without changing CRUSH until one last step

2014-04-11 Thread Chad Seys
Hi Greg, > How many monitors do you have? 1 . :) > It's also possible that re-used numbers won't get caught in this, > depending on the process you went through to clean them up, but I > don't remember the details of the code here. Yeah, too bad. I'm following the standard removal procedure in

Re: [ceph-users] fuse or kernel to mount rbd?

2014-04-07 Thread Chad Seys
Hi Sage et al, Thanks for the info! How stable are the cutting edge kernels like 3.13 ? Is 3.8 (e.g. from Ubuntu Raring) a better choice? Thanks again! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-use

[ceph-users] fuse or kernel to mount rbd?

2014-04-04 Thread Chad Seys
Hi, I'm running Debian Wheezy which has kernel version 3.2.54-2 . Should I be using rbd-fuse 0.72.2 or the kernel client to mount rbd devices? I.e. This is an old kernel relative to Emperor, but maybe bugs are backported to the kernel? Thanks! Chad.

Re: [ceph-users] out then rm / just rm an OSD?

2014-04-03 Thread Chad Seys
On Thursday, April 03, 2014 07:57:58 Dan Van Der Ster wrote: > Hi, > By my observation, I don't think that marking it out before crush rm would > be any safer. > > Normally what I do (when decommissioning an OSD or whole server) is stop > the OSD process, then crush rm / osd rm / auth del the OSD

Re: [ceph-users] degraded objects after adding OSD?

2014-03-28 Thread Chad Seys
> Backfilling process can be stopped/paused at some point due to config > settings or other reasons, so ceph reflects current state of PGs that are > in fact degraded because replica is missing on fresh OSD. Those PGs > actually being backfilled display 'degraded+backfilling' state. Also makes se

Re: [ceph-users] degraded objects after adding OSD?

2014-03-28 Thread Chad Seys
y exist and might be appropriate. Thanks again, Chad. On Friday, March 28, 2014 04:49:02 you wrote: > On 28.03.14, 0:38, Chad Seys wrote: > > Hi all, > > > >Beginning with a cluster with only "active+clean" PGS, adding an OSD > >causes > > > > o

[ceph-users] degraded objects after adding OSD?

2014-03-27 Thread Chad Seys
Hi all, Beginning with a cluster with only "active+clean" PGS, adding an OSD causes objects to be "degraded". Does this mean that ceph deletes replicas before copying them to the new OSD? Or does degraded also mean that there are not replicas on the target OSD, even though there are alrea