Hi,
We see this error on Hammer 0.94.6.
Bug report updated with logs.
Thanks,
On 11/15/2016 07:30 PM, Samuel Just wrote:
> http://tracker.ceph.com/issues/17916
>
> I just pushed a branch wip-17916-jewel based on v10.2.3 with some
> additional debugging. Once it builds, would you be able to st
Hello,
One of my ceph node with 20 OSDs down...After a couple of hours,
ceph health is in OK state.
Now, I tried to remove those OSDs, which were down state from
ceph cluster...
using the "ceh osd remove osd."
then ceph clsuter started rebalancing...which is strange ..because
thsoe OSDs are down f
(Copying list back in)
On Thu, Dec 1, 2016 at 10:22 AM, John Spray wrote:
> On Wed, Nov 30, 2016 at 3:48 PM, Jens Offenbach wrote:
>> Thanks a lot... "ceph daemon mds. session ls" was a good starting point.
>>
>> What is happening:
>> I am in an OpenStack environment and start a VM. Afterwards,
Hello!
Tonight i had a osd crash. See the dump below. Also this osd is still mounted.
Whats the cause? A bug? What to do next?
Thank You!
Dec 1 00:31:30 ceph2 kernel: [17314369.493029] divide error: [#1] SMP
Dec 1 00:31:30 ceph2 kernel: [17314369.493062] Modules linked in: act_police
cl
Are you using Ubuntu 16.04 (Guessing from your kernel version). There was a
numa bug in early kernels, try updating to the latest in
the 4.4 series.
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
VELARTIS Philipp Dürhammer
Sent: 01 December 2016 12:04
To: 'ceph-us...
Hello!
Tonight i had a osd crash. See the dump below. Also this osd is still mounted.
Whats the cause? A bug? What to do next? I cant do a lsof or ps ax because it
hangs.
Thank You!
Dec 1 00:31:30 ceph2 kernel: [17314369.493029] divide error: [#1] SMP
Dec 1 00:31:30 ceph2 kernel: [17314
I am using proxmox so i guess ist debian. I will update the kernel there are
newer versions. But generally if a osd crashes like this - can it be hardware
related?
How to dismount the disk? I cant even make ps ax or losof -it hangs because my
osd is still mounted and blocks everything... i canno
Hi list,
I use the script from [1] to control the deep-scrubs myself in a
cronjob. It seems to work fine, I get the "finished batch" message in
/var/log/messages, but in every run I get an email from cron daemon
with at least one line saying:
2016-11-30 21:40:59.271854 7f3d5700 0 mon
I assume you also did ceph osd crush remove osd.. When you removed the osd
that was down/out and balanced off of, you changed the weight of the host that
it was on which triggers additional backfilling to balance the crush map.
[cid:image0b480d.JPG@7a964f55.48b
Hi,
I managed to remove the warning reweighting the crashed OSD:
ceph osd crush reweight osd.33 0.8
After the recovery, the cluster is not showing the warning any more
Xabier
On 29/11/16 11:18, Xabier Elkano wrote:
> Hi all,
>
> my cluster is in WARN state because apparently there are some
Hi Sage, Sam,
We're impacted by this bug (case 01725311). Our cluster is running RHCS
2.0 and is no more capable to scrub neither deep-scrub.
[1] http://tracker.ceph.com/issues/17859
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1394007
[3] https://github.com/ceph/ceph/pull/11898
I'm worri
You’ll need to upgrade your kernel. It’s a terrible div by zero bug that occurs
while trying to calculate load. You can still use “top –b –n1” instead of ps,
but ultimately the kernel update fixed it for us. You can’t kill procs that are
in uninterruptible wait.
Here’s the Ubuntu version:
http
Hello,
> We're impacted by this bug (case 01725311). Our cluster is running RHCS 2.0
> and is no more capable to scrub neither deep-scrub.
>
> [1] http://tracker.ceph.com/issues/17859
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1394007
> [3] https://github.com/ceph/ceph/pull/11898
>
> I'm
Jumbo frames for the cluster network has been done by quite a few operators
without any problems. Admittedly, I’ve not run it that way in a year now, but
we plan on switching back to jumbo for the cluster.
I do agree that jumbo on the public could result in poor behavior from clients,
if you’re
Hi list,
I am testing the Ceph cluster with unpractical pg numbers to do some
experiments.
But when I use ceph -w to watch my cluster status, I see pg numbers doubled.
From my ceph -w
root@mon1:~# ceph -w
cluster 1c33bf75-e080-4a70-9fd8-860ff216f595
health HEALTH_WARN
too many PGs per OSD (514
Ok, you convinced me to increase size to 3 and min_size to 2. During my
time running ceph I only had issues like single disk or host failures -
nothing exotic, but I think it is better to be safe than sorry.
Kind regards,
Piotr Dzionek
W dniu 30.11.2016 o 12:16, Nick Fisk pisze:
-Origina
On Thu, Dec 1, 2016 at 7:24 AM, Frédéric Nass <
frederic.n...@univ-lorraine.fr> wrote:
>
> Hi Sage, Sam,
>
> We're impacted by this bug (case 01725311). Our cluster is running RHCS
> 2.0 and is no more capable to scrub neither deep-scrub.
>
> [1] http://tracker.ceph.com/issues/17859
> [2] https://
Hi Yoann,
Thank you for your input. I was just told by RH support that it’s gonna make it
to RHCS 2.0 (10.2.3). Thank you guys for the fix !
We thought about increasing the number of PGs just after changing the
merge/split threshold values but this would have led to a _lot_ of data
movements (
Hi,
I was using Hammer on some clients and Jewel on others, even though is
it NOT recommended.
I'd like to recommend you to tripe check your rbd_default_features in
case you are mixing versions. This options isn't docummented well and it
is easy to miss it. I understand reasons for this chang
Apologies if this has been asked dozens of times before, but most answers are
from pre-Jewel days, and want to double check that the methodology still holds.
Currently have 16 OSD’s across 8 machines with on-disk journals, created using
ceph-deploy.
These machines have NVMe storage (Intel P3600
On Thu, 1 Dec 2016 18:06:38 -0600 Reed Dier wrote:
> Apologies if this has been asked dozens of times before, but most answers are
> from pre-Jewel days, and want to double check that the methodology still
> holds.
>
It does.
> Currently have 16 OSD’s across 8 machines with on-disk journals, c
Good day.
I have set up the repository ceph and created several pools on the hdd 4TB.
My problem lies in uneven filling hdd.
root@ceph-node1:~# df -H
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 236G 2.7G 221G 2% /
none4.1k 0 4.1k 0% /sys/fs/cgroup
You can reweight the OSD's either automatically based on utilization (ceph
osd reweight-by-utilization) or by hand.
See:
https://ceph.com/planet/ceph-osd-reweight/
http://docs.ceph.com/docs/master/rados/operations/control/#osd-subsystem
It's probably not ideal to have OSD's of such different size
Hi David - Yep, I did the "ceph osd crush remove osd.", which started
the recovery.
My worries is - why Ceph is doing the recovery, if an OSD is already down
and no more in the cluster. That means, ceph already maintained down OSDs
objects copied to another OSDs.. here is the ceph osd tree o/p:
===
24 matches
Mail list logo