ok now I understand, thanks for all this helpful answers!
On Sat, Apr 7, 2018, 15:26 David Turner wrote:
> I'm seconding what Greg is saying There is no reason to set nobackfill
> and norecover just for restarting OSDs. That will only cause the problems
> you're seeing without giving you any be
I'm seconding what Greg is saying There is no reason to set nobackfill and
norecover just for restarting OSDs. That will only cause the problems
you're seeing without giving you any benefit. There are reasons to use
norecover and nobackfill but unless you're manually editing the crush map,
having
On Thu, Mar 29, 2018 at 3:17 PM Damian Dabrowski wrote:
> Greg, thanks for Your reply!
>
> I think Your idea makes sense, I've did tests and its quite hard to
> understand for me. I'll try to explain my situation in few steps
> below.
> I think that ceph showing progress in recovery but it can on
Greg, thanks for Your reply!
I think Your idea makes sense, I've did tests and its quite hard to
understand for me. I'll try to explain my situation in few steps
below.
I think that ceph showing progress in recovery but it can only solve
objects which doesn't really changed. It won't try to repair
On Thu, Mar 29, 2018 at 7:27 AM Damian Dabrowski wrote:
> Hello,
>
> Few days ago I had very strange situation.
>
> I had to turn off few OSDs for a while. So I've set flags:noout,
> nobackfill, norecover and then turned off selected OSDs.
> All was ok, but when I started these OSDs again all VMs
Hello,
Few days ago I had very strange situation.
I had to turn off few OSDs for a while. So I've set flags:noout,
nobackfill, norecover and then turned off selected OSDs.
All was ok, but when I started these OSDs again all VMs went down due
to recovery process(even when recovery priority was ver
I was able to export the PGs using the ceph-object-store tool and import
them to the new OSDs.
I moved some other OSDs from the bare metal on a node into a virtual
machine on the same node and was surprised at how easy it was. Install ceph
in the VM(using ceph-deploy) -- stop the OSD and dismount
On Fri, Jul 21, 2017 at 10:23 PM Daniel K wrote:
> Luminous 12.1.0(RC)
>
> I replaced two OSD drives(old ones were still good, just too small), using:
>
> ceph osd out osd.12
> ceph osd crush remove osd.12
> ceph auth del osd.12
> systemctl stop ceph-osd@osd.12
> ceph osd rm osd.12
>
> I later fo
Luminous 12.1.0(RC)
I replaced two OSD drives(old ones were still good, just too small), using:
ceph osd out osd.12
ceph osd crush remove osd.12
ceph auth del osd.12
systemctl stop ceph-osd@osd.12
ceph osd rm osd.12
I later found that I also should have unmounted it from /var/lib/ceph/osd-12
(r
I just responded to this on the thread "Strange remap on host failure". I
think that response covers your question.
On Mon, May 29, 2017, 4:10 PM Laszlo Budai wrote:
> Hello,
>
> can someone give me some directions on how the ceph recovery works?
> Let's suppose we have a ceph cluster with sever
Hello,
can someone give me some directions on how the ceph recovery works?
Let's suppose we have a ceph cluster with several nodes grouped in 3 racks (2
nodes/rack). The crush map is configured to distribute PGs on OSDs from
different racks.
What happens if a node fails? Where can I read a des
We are using ceph 80.9 and we recently recovered from a power outage which
caused some data loss. We had replica set to 1. Since then we have
installed another node with the idea that we would change the replica to 3.
We tried to change 1 of the pools to replica 3 but it always gets stuck.
It's be
You mean that you never see recovery without crush map removal ? That
is strange. I see quick recovery in our two small clusters and even in
our production when a daemon is killed.
It's only when as osd crashes, I don't see recovery in production.
Let me talk to ceph-devel community to find wheth
Hi Gaurav,
It could be an issue. But, I never see crush map removal without recovery.
Best regards,
On Wed, May 18, 2016 at 1:41 PM, Gaurav Bafna wrote:
> Is it a known issue and is it expected ?
>
> When as osd is marked out, the reweight becomes 0 and the PGs should
> get remapped , right ?
Is it a known issue and is it expected ?
When as osd is marked out, the reweight becomes 0 and the PGs should
get remapped , right ?
I do see recovery after removing from crush map.
Thanks
Gaurav
On Wed, May 18, 2016 at 12:08 PM, Lazuardi Nasution
wrote:
> Hi Gaurav,
>
> Not onnly marked out,
Hi Gaurav,
Not onnly marked out, you need to remove it from crush map to make sure
cluster do auto recovery. It seem taht the marked out OSD still appear on
crush map calculation so it must be removed manually. You will see that
there will be recovery process after you remove OSD from crush map.
Hi Lazuardi
No, there are no unfound or incomplete PGs.
Replacing the osds surely makes the cluster health. But the problem
should not have occurred in the first place. The cluster should have
automatically healed after the OSDs were marked out of the cluster .
Else this will be a manual process
Gaurav,
Is there any unfound or incomplete PGs? If not, you can remove OSD (with
monitoring ceph -w and ceph -s output) and then replace it with good one,
one by one OSD. I have done with that successfully.
Best regards,
On Tue, May 17, 2016 at 12:30 PM, Gaurav Bafna wrote:
> Even I faced the
Hi Wido,
The 75% happen on 4 nodes of 24 OSDs each with pool size of two and minimum
size of one. Any relation between this configuration and 75%?
Best regards,
On Tue, May 17, 2016 at 3:38 AM, Wido den Hollander wrote:
>
> > Op 14 mei 2016 om 12:36 schreef Lazuardi Nasution <
> mrxlazuar...@g
Even I faced the same issue with our production cluster .
cluster fac04d85-db48-4564-b821-deebda046261
health HEALTH_WARN
658 pgs degraded
658 pgs stuck degraded
688 pgs stuck unclean
658 pgs stuck undersized
658 pgs undersized
> Op 14 mei 2016 om 12:36 schreef Lazuardi Nasution :
>
>
> Hi Wido,
>
> Yes you are right. After removing the down OSDs, reformatting and bring
> them up again, at least until 75% of total OSDs, my Ceph Cluster is healthy
> again. It seem there is high probability of data safety if the total a
Hi Wido,
Yes you are right. After removing the down OSDs, reformatting and bring
them up again, at least until 75% of total OSDs, my Ceph Cluster is healthy
again. It seem there is high probability of data safety if the total active
PGs same with total PGs and total degraded PGs same with total un
> Op 13 mei 2016 om 11:55 schreef Lazuardi Nasution :
>
>
> Hi Wido,
>
> The status is same after 24 hour running. It seem that the status will not
> go to fully active+clean until all down OSDs back again. The only way to
> make down OSDs to go back again is reformating or replace if HDDs has
Hi Wido,
The status is same after 24 hour running. It seem that the status will not
go to fully active+clean until all down OSDs back again. The only way to
make down OSDs to go back again is reformating or replace if HDDs has
hardware issue. Do you think that it is safe way to do?
Best regards,
> Op 13 mei 2016 om 11:34 schreef Lazuardi Nasution :
>
>
> Hi,
>
> After disaster and restarting for automatic recovery, I found following
> ceph status. Some OSDs cannot be restarted due to file system corruption
> (it seem that xfs is fragile).
>
> [root@management-b ~]# ceph status
> c
Hi,
After disaster and restarting for automatic recovery, I found following
ceph status. Some OSDs cannot be restarted due to file system corruption
(it seem that xfs is fragile).
[root@management-b ~]# ceph status
cluster 3810e9eb-9ece-4804-8c56-b986e7bb5627
health HEALTH_WARN
I expected it to return to osd.36. Oh, if you set "noout" during this
process then the pg won't move around when you down osd.36. I expected
osd.36 to go down and back up quickly.
Also, the pg 10.4f is the same situation, so try the same thing on osd.6.
David
On 3/8/16 1:05 PM, Ben Hines
After making that setting, the pg appeared to start peering but then it
actually changed the primary OSD to osd.100 - then went incomplete again.
Perhaps it did that because another OSD had more data? I presume i need to
set that value on each osd where the pg hops to.
-Ben
On Tue, Mar 8, 2016 at
Ben,
I haven't look at everything in your message, but pg 12.7a1 has lost
data because of writes that went only to osd.73. The way to recover
this is to force recovery to ignore this fact and go with whatever data
you have on the remaining OSDs.
I assume that having min_size 1, having multip
Howdy,
I was hoping someone could help me recover a couple pgs which are causing
problems in my cluster. If we aren't able to resolve this soon, we may have
to just destroy them and lose some data. Recovery has so far been
unsuccessful. Data loss would probably cause some here to reconsider Ceph
a
Well yes “pretty much” the same thing :).
I think some people would like to distinguish recovery from replication and
maybe perform some QoS around these 2.
We have to replicate while recovering so one can impact the other.
In the end, I just think it’s a doc issue, still waiting for a dev to ans
My understanding is that Monitors monitor the public address of the
OSDs and other OSDs monitor the cluster address of the OSDs.
Replication, recovery and backfill traffic all use the same network
when you specify 'cluster network = ' in your ceph.conf.
It is useful to remember that replication, re
Hi list,
While reading this
http://ceph.com/docs/master/rados/configuration/network-config-ref/#ceph-networks,
I came across the following sentence:
"You can also establish a separate cluster network to handle OSD heartbeat,
object replication and recovery traffic”
I didn’t know it was possib
t Bauer mailto:kurt.ba...@univie.ac.at>>
Date: Tuesday, November 5, 2013 2:52 PM
To: Kevin Weiler
mailto:kevin.wei...@imc-chicago.com>>
Cc: "ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>"
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] ce
Kevin Weiler schrieb:
> Thanks Kyle,
>
> What's the unit for osd recovery max chunk?
Have a look at
http://ceph.com/docs/master/rados/configuration/osd-config-ref/ where
all the possible OSD config options are described, especially have a
look at the backfilling and recovery sections.
>
> Also, h
Thanks Kyle,
What's the unit for osd recovery max chunk?
Also, how do I find out what my current values are for these osd options?
--
Kevin Weiler
IT
IMC Financial Markets | 233 S. Wacker Drive, Suite 4300 | Chicago, IL
60606 | http://imc-chicago.com/
Phone: +1 312-204-7439 | Fax: +1 312-24
Hello RZK
Would you like to share your experience on this problem and your way of solving
it . This sounds interesting.
Regards
Karan Singh
- Original Message -
From: "Rzk"
To: ceph-users@lists.ceph.com
Sent: Wednesday, 30 October, 2013 4:17:32 AM
Subject: Re: [
Thanks Guys,
after tested it in dev server, i have implemented the new config in prod
system.
next i will upgrade the hard drive.. :)
thanks again All.
On Tue, Oct 29, 2013 at 11:32 PM, Kyle Bader wrote:
> Recovering from a degraded state by copying existing replicas to other
> OSDs is going t
Recovering from a degraded state by copying existing replicas to other OSDs
is going to cause reads on existing replicas and writes to the new
locations. If you have slow media then this is going to be felt more
acutely. Tuning the backfill options I posted is one way to lessen the
impact, another
Hi,
maybe you want to have a look at the following thread:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-October/005368.html
Could be that you suffer from the same problems.
best regards,
Kurt
Rzk schrieb:
> Hi all,
>
> I have the same problem, just curious.
> could it be caused by po
Hi all,
I have the same problem, just curious.
could it be caused by poor hdd performance ?
read/write speed doesn't match the network speed ?
Currently i'm using desktop hdd in my cluster.
Rgrds,
Rzk
On Tue, Oct 29, 2013 at 6:22 AM, Kyle Bader wrote:
> You can change some OSD tunables to
You can change some OSD tunables to lower the priority of backfills:
osd recovery max chunk: 8388608
osd recovery op priority: 2
In general a lower op priority means it will take longer for your
placement groups to go from degraded to active+clean, the idea is to
balance recover
Hi all,
We have a ceph cluster that being used as a backing store for several VMs
(windows and linux). We notice that when we reboot a node, the cluster enters a
degraded state (which is expected), but when it begins to recover, it starts
backfilling and it kills the performance of our VMs. The
43 matches
Mail list logo