Hello,
I've found my ceph v 0.80.3 cluster in a state with 5 of 34 OSDs being down
through night after months of running without change. From Linux logs I
found out the OSD processes were killed because they consumed all available
memory.
Those 5 failed OSDs were from different hosts of my 4-node
Hello,
I've found my ceph v 0.80.3 cluster in a state with 5 of 34 OSDs being down
through night after months of running without change. From Linux logs I
found out the OSD processes were killed because they consumed all available
memory.
Those 5 failed OSDs were from different hosts of my 4-node
t noscrub
> - ceph osd unset nodeep-scrub
>
>
> ## For help identifying why memory usage was so high, please provide:
> * ceph osd dump | grep pool
> * ceph osd crush rule dump
>
> Let us know if this helps... I know it looks extreme, but it's worked for
> me in
leset 0 object_hash
rjenkins pg_num 1024 pgp_num 1024 last_change 1519 flags hashpspool
stripe_width 0
pool 12 'backups' replicated size 2 min_size 1 crush_ruleset 0 object_hash
rjenkins pg_num 1024 pgp_num 1024 last_change 862 flags hashpspool
stripe_width 0
pool 14 'volumes-cache
e the problematic OSD?
I'll welcome any ideas. Currently, I'm keeping the osd.10 in an automatic
restart loop with 60 seconds pause before starting again.
Thanks and greetings,
Lukas
On Wed, Oct 29, 2014 at 8:04 PM, Lukáš Kubín wrote:
> I should have figured that out myself since I
ering and how it could relate to the
> load being seen.
>
> Hope this helps...
>
> Michael J. Kidd
> Sr. Storage Consultant
> Inktank Professional Services
> - by Red Hat
>
> On Thu, Oct 30, 2014 at 4:00 AM, Lukáš Kubín
> wrote:
>
>> Hi,
>> I've notice
ltant
> Inktank Professional Services
> - by Red Hat
>
> On Thu, Oct 30, 2014 at 11:00 AM, Lukáš Kubín
> wrote:
>
>> Thanks Michael, still no luck.
>>
>> Letting the problematic OSD.10 down has no effect. Within minutes more of
>> OSDs fail on same issue aft
e was somehow related to the caching tier. Does
anybody have an idea how to prevent this? Did anybody experienced similar
issue with writeback cache tier?
Big thanks to Michael J. Kidd for all his support!
Best greetings,
Lukas
On Thu, Oct 30, 2014 at 8:18 PM, Lukáš Kubín wrote:
> Nevermind, yo
Hi,
I'm most probably hitting bug http://tracker.ceph.com/issues/13755 - when
libvirt mounted RBD disks suspend I/O during snapshot creation until hard
reboot.
My Ceph cluster (monitors and OSDs) is running v0.94.3, while clients
(OpenStack/KVM computes) run v0.94.5. Can I still update the client
AM, Lukáš Kubín
> wrote:
> > Hi,
> > I'm most probably hitting bug http://tracker.ceph.com/issues/13755 -
> when
> > libvirt mounted RBD disks suspend I/O during snapshot creation until hard
> > reboot.
> >
> > My Ceph cluster (monitors and OSDs) is
Hi,
I'm running a very small setup of 2 nodes with 6 OSDs each. There are 2
pools, each of size=2. Today, one of our OSDs got full, another 2 near
full. Cluster turned into ERR state. I have noticed uneven space
distribution among OSD drives between 70 and 100 perce. I have realized
there's a low a
size in TB.
>
> Beware that reweighting will (afaik) only shuffle the data to other local
> drives, so you should reweight both the full drives at the same time and
> only by little bit at a time (0.95 is a good starting point).
>
> Jan
>
>
>
> On 17 Feb 2016, at 21:43, L
ly really full.
> OSDs don't usually go down when "full" (95%) .. or do they? I don't think
> so... so the reason they stopped is likely a completely full filfeystem.
> You have to move something out of the way, restart those OSDs with lower
> reweight and hopefully
1.0
9 0.53999 osd.9 up 1.0 1.0
10 0.53999 osd.10 up 1.0 1.0
11 0.26999 osd.11 up 1.0 1.0
On Wed, Feb 17, 2016 at 9:43 PM Lukáš Kubín wrote:
> Hi,
> I'm running a very small setup of 2 node
t have to risk data loss.
>
> It usually doesn't take much before you can restart the OSDs and let ceph
> take care of the rest.
>
> Bryan
>
> From: ceph-users on behalf of Lukáš
> Kubín
> Date: Thursday, February 18, 2016 at 2:39 PM
> To: "ceph-users
Hello,
I am considering enabling optimal crush tunables in our Jewel cluster (4
nodes, 52 OSD, used as OpenStack Cinder+Nova backend = RBD images). I've
got two questions:
1. Do I understand right that having the optimal tunables on can be
considered best practice and should be applied in most sce
Hello,
yesterday I've added 4th OSD node (increase from 39 to 52 OSDs) into our
Jewel cluster. Backfilling of remapped pgs is still running and seems it
will run for another day until complete.
I know the pg_num of largest is undersized and I should increase it from
512 to 2048.
The question is -
17 matches
Mail list logo