Hi Wyatt, This is almost certainly a configuration issue. If i recall, there is a min_size setting in the CRUSH rules for each pool that defaults to two which you may also need to reduce to one. I don't have the documentation in front of me, so that's just off the top of my head...
Dino On Wed, May 1, 2013 at 3:19 PM, Wyatt Gorman <wyattgor...@wyattgorman.com>wrote: > Okay! Dino, thanks for your response. I reduced my metadata pool size and > data pool size to 1, which eliminated the "recovery 21/42 degraded > (50.000%)" at the end of my HEALTH_WARN error. So now, when I run "ceph > health" I get the following: > > HEALTH_WARN 384 pgs degraded; 384 pgs stale; 384 pgs stuck unclean > > So this seems to be from one single root cause. Any ideas? Again, is this > a corrupted drive issue that I can clean up, or is this still a ceph > configuration error? > > > On Wed, May 1, 2013 at 12:52 PM, Dino Yancey <dino2...@gmail.com> wrote: > >> Hi Wyatt, >> >> You need to reduce the replication level on your existing pools to 1, or >> bring up another OSD. The default configuration specifies a replication >> level of 2, and the default crush rules want to place a replica on two >> distinct OSDs. With one OSD, CRUSH can't determine placement for the >> replica and so Ceph is reporting a degraded state. >> >> Dino >> >> >> On Wed, May 1, 2013 at 11:45 AM, Wyatt Gorman < >> wyattgor...@wyattgorman.com> wrote: >> >>> Well, those points solved the issue of the redefined host and the >>> unidentified protocol. The >>> >>> >>> "HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42 >>> degraded (50.000%)" >>> >>> error is still an issue, though. Is this something simple like some hard >>> drive corruption that I can clean up with a fsck, or is this a ceph issue? >>> >>> >>> >>> On Wed, May 1, 2013 at 12:31 PM, Mike Dawson < >>> mike.daw...@scholarstack.com> wrote: >>> >>>> Wyatt, >>>> >>>> A few notes: >>>> >>>> - Yes, the second "host = ceph" under mon.a is redundant and should be >>>> deleted. >>>> >>>> - "auth client required = cephx [osd]" should be simply >>>> auth client required = cephx". >>>> >>>> - Looks like you only have one OSD. You need at least as many (and >>>> hopefully more) OSDs than highest replication level out of your pools. >>>> >>>> Mike >>>> >>>> >>>> On 5/1/2013 12:23 PM, Wyatt Gorman wrote: >>>> >>>>> Here is my ceph.conf. I just figured out that the second host = isn't >>>>> necessary, though it is like that on the 5-minute quick start guide... >>>>> (Perhaps I'll submit my couple of fixes that I've had to implement so >>>>> far). That fixes the "redefined host" issue, but none of the others. >>>>> >>>>> [global] >>>>> # For version 0.55 and beyond, you must explicitly enable or >>>>> # disable authentication with "auth" entries in [global]. >>>>> >>>>> auth cluster required = cephx >>>>> auth service required = cephx >>>>> auth client required = cephx [osd] >>>>> osd journal size = 1000 >>>>> >>>>> #The following assumes ext4 filesystem. >>>>> filestore xattr use omap = true >>>>> # For Bobtail (v 0.56) and subsequent versions, you may add >>>>> #settings for mkcephfs so that it will create and mount the file >>>>> #system on a particular OSD for you. Remove the comment `#` >>>>> #character for the following settings and replace the values in >>>>> #braces with appropriate values, or leave the following settings >>>>> #commented out to accept the default values. You must specify >>>>> #the --mkfs option with mkcephfs in order for the deployment >>>>> #script to utilize the following settings, and you must define >>>>> #the 'devs' option for each osd instance; see below. osd mkfs >>>>> #type = {fs-type} osd mkfs options {fs-type} = {mkfs options} # >>>>> #default for xfs is "-f" osd mount options {fs-type} = {mount >>>>> #options} # default mount option is "rw,noatime" >>>>> # For example, for ext4, the mount option might look like this: >>>>> >>>>> #osd mkfs options ext4 = user_xattr,rw,noatime >>>>> # Execute $ hostname to retrieve the name of your host, and >>>>> # replace {hostname} with the name of your host. For the >>>>> # monitor, replace {ip-address} with the IP address of your >>>>> # host. >>>>> [mon.a] >>>>> host = ceph >>>>> mon addr = 10.81.2.100:6789 <http://10.81.2.100:6789> [osd.0] >>>>> >>>>> host = ceph >>>>> >>>>> # For Bobtail (v 0.56) and subsequent versions, you may add >>>>> # settings for mkcephfs so that it will create and mount the >>>>> # file system on a particular OSD for you. Remove the comment >>>>> # `#` character for the following setting for each OSD and >>>>> # specify a path to the device if you use mkcephfs with the >>>>> # --mkfs option. >>>>> >>>>> #devs = {path-to-device} >>>>> [osd.1] >>>>> host = ceph >>>>> #devs = {path-to-device} >>>>> [mds.a] >>>>> host = ceph >>>>> >>>>> >>>>> On Wed, May 1, 2013 at 12:14 PM, Mike Dawson >>>>> <mike.daw...@scholarstack.com >>>>> <mailto:mike.dawson@**scholarstack.com<mike.daw...@scholarstack.com>>> >>>>> wrote: >>>>> >>>>> Wyatt, >>>>> >>>>> Please post your ceph.conf. >>>>> >>>>> - mike >>>>> >>>>> >>>>> On 5/1/2013 12:06 PM, Wyatt Gorman wrote: >>>>> >>>>> Hi everyone, >>>>> >>>>> I'm setting up a test ceph cluster and am having trouble >>>>> getting it >>>>> running (great for testing, huh?). I went through the >>>>> installation on >>>>> Debian squeeze, had to modify the mkcephfs script a bit because >>>>> it calls >>>>> monmaptool with too many paramaters in the $args variable >>>>> (mine had >>>>> "--add a [ip address]:[port] [osd1]" and I had to get rid of >>>>> the >>>>> [osd1] >>>>> part for the monmaptool command to take it). Anyway, so I got >>>>> it >>>>> installed, started the service, waiting a little while for it >>>>> to >>>>> build >>>>> the fs, and ran "ceph health" and got (and am still getting >>>>> after a day >>>>> and a reboot) the following error: (note: I have also been >>>>> getting the >>>>> first line in various calls, unsure why it is complaining, I >>>>> followed >>>>> the instructions...) >>>>> >>>>> warning: line 34: 'host' in section 'mon.a' redefined >>>>> 2013-05-01 12:04:39.801102 b733b710 -1 WARNING: unknown auth >>>>> protocol >>>>> defined: [osd] >>>>> HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery >>>>> 21/42 >>>>> degraded (50.000%) >>>>> >>>>> Can anybody tell me the root of this issue, and how I can fix >>>>> it? Thank you! >>>>> >>>>> - Wyatt Gorman >>>>> >>>>> >>>>> ______________________________**___________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> <mailto:ceph-us...@lists.ceph.**com<ceph-users@lists.ceph.com> >>>>> > >>>>> >>>>> http://lists.ceph.com/__**listinfo.cgi/ceph-users-ceph._**_com<http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com> >>>>> >>>>> <http://lists.ceph.com/**listinfo.cgi/ceph-users-ceph.**com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com> >>>>> > >>>>> >>>>> >>>>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >> >> >> -- >> ______________________________ >> Dino Yancey >> 2GNT.com Admin >> > > -- ______________________________ Dino Yancey 2GNT.com Admin
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com