[ceph-users] Degraded data redundancy (low space): 1 pg backfill_toofull

2018-07-28 Thread Sebastian Igerl
Hi, i added 4 more OSDs on my 4 node Test Cluster and now i'm in HEALTH_ERR state. Right now its still recovering, but still, should this happen ? None of my OSDs are full. Maybe i need more PGs ? But since my %USE is < 40% it should be still ok to recover without HEALTH_ERR ? data: pools:

Re: [ceph-users] Degraded data redundancy (low space): 1 pg backfill_toofull

2018-07-28 Thread Sinan Polat
Ceph has tried to (re)balance your data, backfill_toofull means no available space to move data, but you have plenty of space. Why do you have so little pgs? I would increase the amount of pgs, but before doing so lets see what others will say. Sinan > Op 28 jul. 2018 om 11:50 heeft Sebastian

Re: [ceph-users] Degraded data redundancy (low space): 1 pg backfill_toofull

2018-07-28 Thread Sebastian Igerl
i set up my test cluster many years ago with only 3 OSDs and never increased the PGs :-) I plan on doing so after its healthy again... it's long overdue... maybe 512 :-) and yes that's what i thought too.. it should have more than enough space to move data .. hmm... i wouldn't be surprised if i

Re: [ceph-users] Degraded data redundancy (low space): 1 pg backfill_toofull

2018-07-28 Thread Sebastian Igerl
well.. it repaired itself.. hmm.. still.. strange.. :-) [INF] Health check cleared: PG_DEGRADED_FULL (was: Degraded data redundancy (low space): 1 pg backfill_toofull) On Sat, Jul 28, 2018 at 12:03 PM Sinan Polat wrote: > Ceph has tried to (re)balance your data, backfill_toofull means no > a

[ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-28 Thread ceph . novice
Dear users and developers.   I've updated our dev-cluster from v13.2.0 to v13.2.1 yesterday and since then everything is badly broken. I've restarted all Ceph components via "systemctl" and also rebootet the server SDS21 and SDS24, nothing changes. This cluster started as Kraken, was updated to

Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-28 Thread Sage Weil
Can you include more or your osd log file? On July 28, 2018 9:46:16 AM CDT, ceph.nov...@habmalnefrage.de wrote: >Dear users and developers. >  >I've updated our dev-cluster from v13.2.0 to v13.2.1 yesterday and >since then everything is badly broken. >I've restarted all Ceph components via "system

Re: [ceph-users] Slack-IRC integration

2018-07-28 Thread Alex Gorbachev
-- Forwarded message -- From: Matt.Brown Can you please add me to the ceph-storage slack channel? Thanks! Me too, please -- Alex Gorbachev Storcium - Matt Brown | Lead Engineer | Infrastructure Services – Cloud & Compute | Target | 7000 Target Pkwy N., NCE-0706 | Brooklyn P

Re: [ceph-users] Setting up Ceph on EC2 i3 instances

2018-07-28 Thread Sean Redmond
Hi, You may need to consider the latency between the az's, it may make it difficult to get very high iops - I suspect that is the reason ebs is replicated within a single AZ. Have you any data that shows the latency between the az's? Thanks On Sat, 28 Jul 2018, 05:52 Mansoor Ahmed, wrote: > H

[ceph-users] Upgrade Ceph 13.2.0 -> 13.2.1 and Windows iSCSI clients breakup

2018-07-28 Thread Wladimir Mutel
Dear all, I want to share some experience of upgrading my experimental 1-host Ceph cluster from v13.2.0 to v13.2.1. First, I fetched new packages and installed them using 'apt dist-upgrade', which went smooth as usual. Then I noticed from 'lsof', that Ceph daemons were not restarted

Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-28 Thread ceph . novice
Hi Sage. Sure. Any specific OSD(s) log(s)? Or just any? Gesendet: Samstag, 28. Juli 2018 um 16:49 Uhr Von: "Sage Weil" An: ceph.nov...@habmalnefrage.de, ceph-users@lists.ceph.com, ceph-de...@vger.kernel.org Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released") Can you

[ceph-users] Help needed to recover from cache tier OSD crash

2018-07-28 Thread Dmitry
Hello all, would someone please help with recovering from a recent failure of all cache tier pool OSDs? My CEPH cluster has a usual replica 2 pool with two 500GB SSD OSD’s writeback cache tier over it (also replica 2). Both cache OSD’s were created with standard ceph deploy tool, and have 2

Re: [ceph-users] Slack-IRC integration

2018-07-28 Thread Dan van der Ster
It's here https://ceph-storage.slack.com/ but for some reason the list of accepted email domains is limited. I have no idea who is maintaining this. Anyway, the slack is just mirroring #ceph and #ceph-devel on IRC so better to connect there directly. Cheers, Dan On Sat, Jul 28, 2018, 6:59 PM

Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-28 Thread ceph . novice
Have you guys changed something with the systemctl startup of the OSDs? I've stopped and disabled all the OSDs on all my hosts via "systemctl stop|disable ceph-osd.target" and rebooted all the nodes. Everything look just the same. The I started all the OSD daemons one after the other via the CLI

Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-28 Thread Vasu Kulkarni
On Sat, Jul 28, 2018 at 6:02 PM, wrote: > Have you guys changed something with the systemctl startup of the OSDs? I think there is some kind of systemd issue hidden in mimic, https://tracker.ceph.com/issues/25004 > > I've stopped and disabled all the OSDs on all my hosts via "systemctl > stop|d