[ceph-users] kernel RBD where is /dev/rbd?

2016-07-23 Thread Nathanial Byrnes

Hi All,
I'm working with a debian 8.5 new install and I'm trying to, 
without installing any additional software, mount an RBD image on my 
cluster using the kernel module. When I run:


/bin/echo 10.88.28.23 name=admin,secret= xcp-vol-pool1 
proxy-img1 > /sys/bus/rbd/add


I see the following in dmesg:

[  924.396284] libceph: client492174 fsid 
f7f42ded-2fd0-46c7-abc1-6e9923032fd4

[  924.398495] libceph: mon0 10.88.28.83:6789 session established

But, I do not get any device files in /dev to mkfs or later mount.

I have another system, xenserver7, where RBD mounts are working, but the 
means to get there are abstracted by xapi and RBDSR the kernel 
output is the same, and there are /dev/rbd? files, but no udev rules I 
could find mentioning rbd


Is this a debian issue?

Thanks,
Nate
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] kernel RBD where is /dev/rbd?

2016-07-23 Thread Ruben Kerkhof
On Sat, Jul 23, 2016 at 3:58 PM, Nathanial Byrnes  wrote:
> Hi All,
> I'm working with a debian 8.5 new install and I'm trying to, without
> installing any additional software, mount an RBD image on my cluster using
> the kernel module. When I run:
>
> /bin/echo 10.88.28.23 name=admin,secret= xcp-vol-pool1 proxy-img1 >
> /sys/bus/rbd/add
>
> I see the following in dmesg:
>
> [  924.396284] libceph: client492174 fsid
> f7f42ded-2fd0-46c7-abc1-6e9923032fd4
> [  924.398495] libceph: mon0 10.88.28.83:6789 session established
>
> But, I do not get any device files in /dev to mkfs or later mount.
>
> I have another system, xenserver7, where RBD mounts are working, but the
> means to get there are abstracted by xapi and RBDSR the kernel output is
> the same, and there are /dev/rbd? files, but no udev rules I could find
> mentioning rbd

Just a guess, but those rules your looking for are probably not in
upstream udev but installed by Ceph.
If you're not willing to install Ceph, you'll need to create them yourself.

Kind regards,

Ruben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] kernel RBD where is /dev/rbd?

2016-07-23 Thread Nathanial Byrnes
I found it. I'm not sure how the block device was created, but, I had 
the wrong image format. I thought that image format 2 was in 3.11+, and 
debian 8.5 is 3.16 ...  but I attached to an image-format 1 image and 
/dev/rbd0 magically appeared...


Best Regards,
Nate


On 07/23/2016 10:15 AM, Ruben Kerkhof wrote:

On Sat, Jul 23, 2016 at 3:58 PM, Nathanial Byrnes  wrote:

Hi All,
 I'm working with a debian 8.5 new install and I'm trying to, without
installing any additional software, mount an RBD image on my cluster using
the kernel module. When I run:

/bin/echo 10.88.28.23 name=admin,secret= xcp-vol-pool1 proxy-img1 >
/sys/bus/rbd/add

I see the following in dmesg:

[  924.396284] libceph: client492174 fsid
f7f42ded-2fd0-46c7-abc1-6e9923032fd4
[  924.398495] libceph: mon0 10.88.28.83:6789 session established

But, I do not get any device files in /dev to mkfs or later mount.

I have another system, xenserver7, where RBD mounts are working, but the
means to get there are abstracted by xapi and RBDSR the kernel output is
the same, and there are /dev/rbd? files, but no udev rules I could find
mentioning rbd

Just a guess, but those rules your looking for are probably not in
upstream udev but installed by Ceph.
If you're not willing to install Ceph, you'll need to create them yourself.

Kind regards,

Ruben

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] kernel RBD where is /dev/rbd?

2016-07-23 Thread Ilya Dryomov
On Sat, Jul 23, 2016 at 4:39 PM, Nathanial Byrnes  wrote:
> I found it. I'm not sure how the block device was created, but, I had the
> wrong image format. I thought that image format 2 was in 3.11+, and debian
> 8.5 is 3.16 ...  but I attached to an image-format 1 image and /dev/rbd0
> magically appeared...

3.16 supports format 2, but doesn't support the more recent features,
which can be disabled with rbd feature disable.  If you had used the
CLI, you would've got an error message.

If you are going to write to sysfs directly, which I would advise
against, at least check the exit code.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recovery stuck after adjusting to recent tunables

2016-07-23 Thread Goncalo Borges
Hi Kostis
This is a wild guess but one thing I note is that your pool 179 has a very low 
pg number (100). 

Maybe the algorithm behind the new tunable need a higher pg number to actually 
proceed with the recovery? 

You could try to increase the pgs to 128 (it is always better to use powers of 
2) and see if the recover completes..

Cheers
G.

From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Kostis 
Fardelas [dante1...@gmail.com]
Sent: 23 July 2016 16:32
To: Brad Hubbard
Cc: ceph-users
Subject: Re: [ceph-users] Recovery stuck after adjusting to recent tunables

Hi Brad,

pool 0 'data' replicated size 2 min_size 1 crush_ruleset 3 object_hash
rjenkins pg_num 2048 pgp_num 2048 last_change 119047
crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 2 min_size 1 crush_ruleset 3
object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 119048
stripe_width 0
pool 2 'rbd' replicated size 2 min_size 1 crush_ruleset 3 object_hash
rjenkins pg_num 2048 pgp_num 2048 last_change 119049 stripe_width 0
pool 3 'blocks' replicated size 2 min_size 1 crush_ruleset 4
object_hash rjenkins pg_num 2048 pgp_num 2048 last_change 119050
stripe_width 0
pool 4 'maps' replicated size 2 min_size 1 crush_ruleset 3 object_hash
rjenkins pg_num 2048 pgp_num 2048 last_change 119051 stripe_width 0
pool 179 'scbench' replicated size 3 min_size 1 crush_ruleset 0
object_hash rjenkins pg_num 100 pgp_num 100 last_change 154034 flags
hashpspool stripe_width 0

This is the status of 179.38 when the cluster is healthy:
http://pastebin.ca/3663600

and this is when recovery is stuck:
http://pastebin.ca/3663601


It seems that the PG is replicated with size 3 but the cluster cannot
create the third replica for some objects whose third OSD (OSD.14) is
down. That was not the case with argonaut tunables as I remember.

Regards


On 23 July 2016 at 06:16, Brad Hubbard  wrote:
> On Sat, Jul 23, 2016 at 12:17 AM, Kostis Fardelas  wrote:
>> Hello,
>> being in latest Hammer, I think I hit a bug with more recent than
>> legacy tunables.
>>
>> Being in legacy tunables for a while, I decided to experiment with
>> "better" tunables. So first I went from argonaut profile to bobtail
>> and then to firefly. However, I decided to make the changes on
>> chooseleaf_vary_r incrementally (because the remapping from 0 to 5 was
>> huge), from 5 down to the best value (1). So when I reached
>> chooseleaf_vary_r = 2, I decided to run a simple test before going to
>> chooseleaf_vary_r = 1: close an OSD (OSD.14) and let the cluster
>> recover. But the recovery never completes and a PG remains stuck,
>> reported as undersized+degraded. No OSD is near full and all pools
>> have min_size=1.
>>
>> ceph osd crush show-tunables -f json-pretty
>>
>> {
>> "choose_local_tries": 0,
>> "choose_local_fallback_tries": 0,
>> "choose_total_tries": 50,
>> "chooseleaf_descend_once": 1,
>> "chooseleaf_vary_r": 2,
>> "straw_calc_version": 1,
>> "allowed_bucket_algs": 22,
>> "profile": "unknown",
>> "optimal_tunables": 0,
>> "legacy_tunables": 0,
>> "require_feature_tunables": 1,
>> "require_feature_tunables2": 1,
>> "require_feature_tunables3": 1,
>> "has_v2_rules": 0,
>> "has_v3_rules": 0,
>> "has_v4_buckets": 0
>> }
>>
>> The really strange thing is that the OSDs of the stuck PG belong to
>> other nodes than the one I decided to stop (osd.14).
>>
>> # ceph pg dump_stuck
>> ok
>> pg_stat state up up_primary acting acting_primary
>> 179.38 active+undersized+degraded [2,8] 2 [2,8] 2
>
> Can you share a query of this pg?
>
> What size (not min size) is this pool (assuming it's 2)?
>
>>
>>
>> ID WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
>> -1 11.19995 root default
>> -3 11.19995 rack unknownrack
>> -2  0.3 host staging-rd0-03
>> 14  0.2 osd.14   up  1.0  1.0
>> 15  0.2 osd.15   up  1.0  1.0
>> -8  5.19998 host staging-rd0-01
>>  6  0.5 osd.6up  1.0  1.0
>>  7  0.5 osd.7up  1.0  1.0
>>  8  1.0 osd.8up  1.0  1.0
>>  9  1.0 osd.9up  1.0  1.0
>> 10  1.0 osd.10   up  1.0  1.0
>> 11  1.0 osd.11   up  1.0  1.0
>> -7  5.19998 host staging-rd0-00
>>  0  0.5 osd.0up  1.0  1.0
>>  1  0.5 osd.1up  1.0  1.0
>>  2  1.0 osd.2up  1.0  1.0
>>  3  1.0 osd.3up  1.0  1.0
>>  4  1.0 osd.4up  1.0  1.0
>>  5  1.0 

Re: [ceph-users] pgs stuck unclean after reweight

2016-07-23 Thread Goncalo Borges
Hi Christian
Thanks for the tips.
We do have monitoring in place but we are currently on a peak and the occupancy 
increased tremendously in a couple of days time.

I solved the problem of the stucked pgs by reweight (decreasing weights) of the 
new osds which were preventing the backfilling. Once those 4 pgs recovered i 
applied your suggestion of increasing weight os the less used osds. Cluster is 
much more balanced now and we will add more osds soon. . It is still a mystery 
to me why in my initial procedure which triggered the problem, heavy used osds 
were chosen for the remapping. 

Thanks for the help
Goncalo



From: Christian Balzer [ch...@gol.com]
Sent: 20 July 2016 19:36
To: ceph-us...@ceph.com
Cc: Goncalo Borges
Subject: Re: [ceph-users] pgs stuck unclean after reweight

Hello,

On Wed, 20 Jul 2016 13:42:20 +1000 Goncalo Borges wrote:

> Hi All...
>
> Today we had a warning regarding 8 near full osd. Looking to the osds
> occupation, 3 of them were above 90%.

One would hope that this would have been picked up earlier, as in before
it even reaches near-full.
Either by monitoring (nagios, etc) disk usage checks and/or graphing the
usage and taking a look at it at least daily.

Since you seem to have at least 60 OSDs going from below 85% to 90% must

> In order to solve the situation,
> I've decided to reweigh those first using
>
>  ceph osd crush reweight osd.1 2.67719
>
>  ceph osd crush reweight osd.26 2.67719
>
>  ceph osd crush reweight osd.53 2.67719
>
What I'd do is to find the least utilized OSDs and give them higher
weights, so data will (hopefully) move there instead of potentially
pushing another OSD to near-full as with the approach above.

You might consider doing that aside from what I'm writing below.

> Please note that I've started with a very conservative step since the
> original weight for all osds was 2.72710.
>
> After some rebalancing (which has now stopped) I've seen that the
> cluster is currently in the following state
>
> # ceph health detail
> HEALTH_WARN 4 pgs backfill_toofull; 4 pgs stuck unclean; recovery
> 20/39433323 objects degraded (0.000%); recovery 77898/39433323
> objects misplaced (0.198%); 8 near full osd(s); crush map has legacy
> tunables (require bobtail, min is firefly)
>
So there are all your woes in one fell swoop.

Unless you changed the defaults, your mon_osd_nearfull_ratio and
osd_backfill_full_ratio are the same at 0.85.
So any data movement towards those 8 near full OSDs will not go anywhere.

Thus aside from the tip above, consider upping your
osd_backfill_full_ratio for those OSDs to something like .92 for the time
being until things are good again.

Going forward, you will want to:
a) add more OSDs
b) re-weight things so that your OSDs are within a few % of each other
than the often encountered 20%+ variance.

Christian

> pg 6.e2 is stuck unclean for 9578.920997, current state
> active+remapped+backfill_toofull, last acting [49,38,11]
> pg 6.4 is stuck unclean for 9562.054680, current state
> active+remapped+backfill_toofull, last acting [53,6,26]
> pg 5.24 is stuck unclean for 10292.469037, current state
> active+remapped+backfill_toofull, last acting [32,13,51]
> pg 5.306 is stuck unclean for 10292.448364, current state
> active+remapped+backfill_toofull, last acting [44,7,59]
> pg 5.306 is active+remapped+backfill_toofull, acting [44,7,59]
> pg 5.24 is active+remapped+backfill_toofull, acting [32,13,51]
> pg 6.4 is active+remapped+backfill_toofull, acting [53,6,26]
> pg 6.e2 is active+remapped+backfill_toofull, acting [49,38,11]
> recovery 20/39433323 objects degraded (0.000%)
> recovery 77898/39433323 objects misplaced (0.198%)
> osd.1 is near full at 88%
> osd.14 is near full at 87%
> osd.24 is near full at 86%
> osd.26 is near full at 87%
> osd.37 is near full at 87%
> osd.53 is near full at 88%
> osd.56 is near full at 85%
> osd.62 is near full at 87%
>
> crush map has legacy tunables (require bobtail, min is firefly);
> see http://ceph.com/docs/master/rados/operations/crush-map/#tunables
>
> Not sure if it is worthwhile to mention, but after upgrading to Jewel,
> our cluster shows the warnings regarding tunables. We still have not
> migrated to the optimal tunables because the cluster will be very
> actively used during the 3 next weeks ( due to one of the main
> conference in our area) and we prefer to do that migration after this
> peak period,
>
>
> I am unsure what happen during the rebalacing but the mapping of these 4
> stuck pgs seems strange, namely the up and acting osds are different.
>
> # ceph pg dump_stuck unclean
> ok
> pg_statstateupup_primaryactingacting_primary
> 6.e2active+remapped+backfill_toofull[8,53,38]8
> [49,38,11]49
> 6.4active+remapped+backfill_toofull[53,24,6]53
> [53