date:20170530

Re: [ceph-users] strange remap on host failure

2017-05-30 Thread Brad Hubbard

See the release notes for the jewel releases which include instructions for upgrading from hammer. On Wed, May 31, 2017 at 1:53 PM, Laszlo Budai wrote: > Hi Brad, > > Thank you for the answer. > We are aware of the fact that hammer is close to retirement, and we are > planning for the upgrade. BT

Re: [ceph-users] strange remap on host failure

2017-05-30 Thread Laszlo Budai

Hello Greg! Thank you for the answer. Our pools have their size set to 3: tv-dl360-1:~$ ceph osd pool ls detail pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool stripe_width 0 pool 1 'images' replicated size 3 mi

Re: [ceph-users] strange remap on host failure

2017-05-30 Thread Laszlo Budai

Hi Brad, Thank you for the answer. We are aware of the fact that hammer is close to retirement, and we are planning for the upgrade. BTW: can you recommend some documentation to read before the hammer -> jewel upgrade? I know http://docs.ceph.com/docs/jewel/install/upgrading-ceph/ and that goo

Re: [ceph-users] strange remap on host failure

2017-05-30 Thread Brad Hubbard

It should also be noted that hammer is pretty close to retirement and is a poor choice for new clusters. On Wed, May 31, 2017 at 6:17 AM, Gregory Farnum wrote: > On Mon, May 29, 2017 at 4:58 AM, Laszlo Budai wrote: >> >> Hello all, >> >> We have a ceph cluster with 72 OSDs distributed on 6 hosts

Re: [ceph-users] Re-weight Entire Cluster?

2017-05-30 Thread Anthony D'Atri

OIC, thanks for providing the tree output. From what you wrote originally it seemed plausible that you were mixing up the columns, which is not an uncommon thing to do. If all of your OSD’s are the same size, and have a CRUSH weight of 1., then you have just the usual OSD fullness distribu

Re: [ceph-users] strange remap on host failure

2017-05-30 Thread Gregory Farnum

On Mon, May 29, 2017 at 4:58 AM, Laszlo Budai wrote: > > Hello all, > > We have a ceph cluster with 72 OSDs distributed on 6 hosts, in 3 chassis. In > our crush map the we are distributing the PGs on chassis (complete crush map > below): > > # rules > rule replicated_ruleset { > ruleset 0

Re: [ceph-users] Re-weight Entire Cluster?

2017-05-30 Thread Mike Cave

Hi Anthony, When the OSDs were added it appears they were added with a crush weight of 1 so I believe we need to change the weighting as we are getting a lot of very full OSDs. -21 20.0 host somehost 216 1.0 osd.216 up 1.0 1.0 217 1.0

Re: [ceph-users] Re-weight Entire Cluster?

2017-05-30 Thread Anthony D'Atri

> It appears the current best practice is to weight each OSD according to it?s > size (3.64 for 4TB drive, 7.45 for 8TB drive, etc). OSD’s are created with those sorts of CRUSH weights by default, yes. Which is convenient, but it’s import to know that those weights are arbitrary, and what re

Re: [ceph-users] strange remap on host failure

2017-05-30 Thread Laszlo Budai

I agree with you that the crush map is changing all the time, because of the changes in the cluster. Our problem is that it did not changed as expected in this host failure situation. Kind regards, Laszlo On 30.05.2017 21:28, David Turner wrote: Adding osds and nodes to a cluster changes the

Re: [ceph-users] strange remap on host failure

2017-05-30 Thread David Turner

Adding osds and nodes to a cluster changes the crush map, an osd being marked out changes the crush map, an osd being removed from the cluster changes the crush map... The crush map changes all the time even if you aren't modifying it directly. On Tue, May 30, 2017 at 2:08 PM Laszlo Budai wrote:

Re: [ceph-users] Prometheus RADOSGW usage exporter

2017-05-30 Thread Berant Lemmenes

Ben, Thanks for taking a look at it and trying it out! Hmm looks like at some point where the bucket owner is in the JSON changed. Later in the week I'll take a look at adding something to take either location into account. Thanks, Berant On Tue, May 30, 2017 at 3:54 AM, Ben Morrice wrote: > H

Re: [ceph-users] strange remap on host failure

2017-05-30 Thread Laszlo Budai

we have not touched the crush map. We have only observed that the cluster is not responding as expected to a failure, and we wonder why. As I've mentioned in the previous post, we were able to reproduce the situation on a different ceph cluster so I've filled in a bug report. So far this is wh

Re: [ceph-users] strange remap on host failure

2017-05-30 Thread David Turner

When you lose a host, the entire CRUSH map is affected. Any change to the crush map can affect any PG, OSD, host, or failure domain in the entire cluster. If you modified osd.10's weight in the crush map by increasing it by 0.5, you would likely see PGs in the entire cluster moving around, not ju

Re: [ceph-users] strange remap on host failure

2017-05-30 Thread Laszlo Budai

Hello David, Thank you for your message. Indeed we were expecting to see the PGs from the lost host redistributed to the surviving host from the same chassis (failure domain), but the reality is different :( I can see a lot of PGs being stuck active+undersized+degraded and active+remapped. An

Re: [ceph-users] OSD scrub during recovery

2017-05-30 Thread Reed Dier

Thanks, This makes sense, but just wanted to sanity check my assumption against reality. In my specific case, 24 of the OSD’s are HDD, 30 SSD in different roots/pools, and so deep scrubs on the other 23 spinning disks could in theory eat iops on a disk currently backfilling to the other OSD. E

Re: [ceph-users] OSD scrub during recovery

2017-05-30 Thread Wido den Hollander

> Op 30 mei 2017 om 17:37 schreef Reed Dier : > > > Lost an OSD and having to rebuild it. > > 8TB drive, so it has to backfill a ton of data. > Been taking a while, so looked at ceph -s and noticed that deep/scrubs were > running even though I’m running newest Jewel (10.2.7) and OSD’s have the

Re: [ceph-users] OSD scrub during recovery

2017-05-30 Thread David Turner

"Is it only preventing scrubs on the OSD's that are actively recovering/backfilling?" That's exactly what it's doing. Notice that none of your PGs listed as scrubbing have undersized, degraded, backfill, backfilling, etc in the PG status. They are all "active+clean+scrubbing+deep". I don't see

[ceph-users] OSD scrub during recovery

2017-05-30 Thread Reed Dier

Lost an OSD and having to rebuild it. 8TB drive, so it has to backfill a ton of data. Been taking a while, so looked at ceph -s and noticed that deep/scrubs were running even though I’m running newest Jewel (10.2.7) and OSD’s have the osd_scrub_during_recovery set to false. > $ cat /etc/ceph/ce

Re: [ceph-users] Ceph recovery

2017-05-30 Thread David Turner

I just responded to this on the thread "Strange remap on host failure". I think that response covers your question. On Mon, May 29, 2017, 4:10 PM Laszlo Budai wrote: > Hello, > > can someone give me some directions on how the ceph recovery works? > Let's suppose we have a ceph cluster with sever

Re: [ceph-users] strange remap on host failure

2017-05-30 Thread David Turner

If you lose 1 of the hosts in a chassis, or a single drive, the pgs from that drive/host will be distributed to other drives in that chassis (because you only have 3 failure domains). That is to say that if you lose tv-c1-al01 then all of the pgs and data that were on that will be distributed to tv

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-05-30 Thread nokia ceph

Hello Mark, Yes this issue happens once the test/write started after 60 secs which correspond config value -- "threadpool_default_timeout = 60 " . Do you require the down OSD coredump to analyse tp_osd_tp state. . Please be specify which process dump you would require to analyse. Like , #gcore

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-05-30 Thread Mark Nelson

On 05/30/2017 05:07 AM, nokia ceph wrote: Hello Mark, I can able to reproduce this problem everytime. Ok, next question, does it happen 60s after starting the 200MB/s load, or does it take a while? Sounds like it's pretty random across OSDs? I'm thinking we want to figure out what state the

[ceph-users] releasedate for 10.2.8?

2017-05-30 Thread Manuel Lausch

Hi, is there a release date for the next Jewel release (10.2.8)? I'm waiting for it since a few weeks because there are some fixes included related to snapshot deleting and snap trim sleep. Thanks Manuel -- Manuel Lausch Systemadministrator Cloud Services 1&1 Mail & Media Development & Techno

Re: [ceph-users] cephfs metadata damage and scrub error

2017-05-30 Thread James Eckersall

Further to this, we managed to repair the inconsistent PG by comparing the object digests and removing the one that didn't match (3 of 4 replicas had the same digest, 1 didn't) and then issuing a pg repair and scrub. This has removed the inconsistent flag on the PG, however, we are still seeing the

[ceph-users] how to configure/migrate data to and fro from AWS to Ceph cluster

2017-05-30 Thread ankit malik

hello ,Iam trying to migrate data from Ceph cluster to AWS . May I know is it possible? I am new to Ceph but do understand that RGW is required but not sure what type of setup is required on AWS? Do I need to create Ceph cluster on AWS and make it federated with Ceph (internal) cluster? Is there

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

2017-05-30 Thread nokia ceph

Hello Mark, I can able to reproduce this problem everytime. Env:-- 5 node, v12.0.3, EC 4+1 bluestore , RHEL 7.3 - 3.10.0-514.el7.x86_64 Tested with debug bluestore = 20... >From ceph watch === 2017-05-30 08:57:33.510794 mon.0 [INF] pgmap v15649: 8192 pgs: 8192 active+clean; 774 GB data

[ceph-users] RGW multisite sync data sync shard stuck

2017-05-30 Thread Andreas Calminder

Hello, I've got a sync issue with my multisite setup. There's 2 zones in 1 zone group in 1 realm. The data sync in the non-master zone have stuck on Incremental sync is behind by 1 shard, this wasn't noticed until the radosgw instances in the master zone started dying from out of memory issues, all

[ceph-users] Tuning radosgw for constant uniform high load.

2017-05-30 Thread Aleksei Gutikov

Hi, We are designing ceph+rgw setup for constant uniform high load. We prefer higher throughput than lower latency so seems that we do not need asynchronous features, especially garbage collection. Currently we observing issue that after amount of time rgw's gc becoming very very slow (removing

Re: [ceph-users] Network redundancy...

2017-05-30 Thread Marco Gaiarin

> The switches your using can they stack? > If so you could spread the LACP across the two switches. And: > Just use balance-alb, this will do a trick with no stack switches Thanks for the answers, i'll do some tests! ;-) -- dott. Marco Gaiarin GNUPG Key ID

Re: [ceph-users] Prometheus RADOSGW usage exporter

2017-05-30 Thread Ben Morrice

Hello Berant, This is very nice! I've had a play with this against our installation of Ceph which is Kraken. We had to change the bucket_owner variable to be inside the for loop [1] and we are currently not getting any bytes sent/received statistics - though this is not an issue with your code

Re: [ceph-users] strange remap on host failure

Re: [ceph-users] strange remap on host failure

Re: [ceph-users] strange remap on host failure

Re: [ceph-users] strange remap on host failure

Re: [ceph-users] Re-weight Entire Cluster?

Re: [ceph-users] strange remap on host failure

Re: [ceph-users] Re-weight Entire Cluster?

Re: [ceph-users] Re-weight Entire Cluster?

Re: [ceph-users] strange remap on host failure

Re: [ceph-users] strange remap on host failure

Re: [ceph-users] Prometheus RADOSGW usage exporter

Re: [ceph-users] strange remap on host failure

Re: [ceph-users] strange remap on host failure

Re: [ceph-users] strange remap on host failure

Re: [ceph-users] OSD scrub during recovery

Re: [ceph-users] OSD scrub during recovery

Re: [ceph-users] OSD scrub during recovery

[ceph-users] OSD scrub during recovery

Re: [ceph-users] Ceph recovery

Re: [ceph-users] strange remap on host failure

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

[ceph-users] releasedate for 10.2.8?

Re: [ceph-users] cephfs metadata damage and scrub error

[ceph-users] how to configure/migrate data to and fro from AWS to Ceph cluster

Re: [ceph-users] Lumionous: bluestore 'tp_osd_tp thread tp_osd_tp' had timed out after 60

[ceph-users] RGW multisite sync data sync shard stuck

[ceph-users] Tuning radosgw for constant uniform high load.

Re: [ceph-users] Network redundancy...

Re: [ceph-users] Prometheus RADOSGW usage exporter

30 matches

Site Navigation

Mail list logo

Footer information