Quoting Jelle de Jong (jelledej...@powercraft.nl):
>
> It took three days to recover and during this time clients were not
> responsive.
>
> How can I migrate to bluestore without inactive pgs or slow request. I got
> several more filestore clusters and I would like to know how to migrate
> witho
Jelle,
Try putting just the WAL on the Optane NVMe. I'm guessing your DB is too big
to fit within 5GB. We used a 5GB journal on our nodes as well, but when we
switched to BlueStore (using ceph-volume lvm batch) it created 37GiB logical
volumes (200GB SSD / 5 or 400GB SSD / 10) for our DBs.
A
rnum
> Sent: 09 September 2019 23:25
> To: Byrne, Thomas (STFC,RAL,SC)
> Cc: ceph-users
> Subject: Re: [ceph-users] Help understanding EC object reads
>
> On Thu, Aug 29, 2019 at 4:57 AM Thomas Byrne - UKRI STFC
> wrote:
> >
> > Hi all,
> >
> > I’m investiga
On Thu, Aug 29, 2019 at 4:57 AM Thomas Byrne - UKRI STFC
wrote:
>
> Hi all,
>
> I’m investigating an issue with our (non-Ceph) caching layers of our large EC
> cluster. It seems to be turning users requests for whole objects into lots of
> small byte range requests reaching the OSDs, but I’m not
Arun,
This is what i already suggested in my first reply.
Kind regards,
Caspar
Op za 5 jan. 2019 om 06:52 schreef Arun POONIA <
arun.poo...@nuagenetworks.net>:
> Hi Kevin,
>
> You are right. Increasing number of PGs per OSD resolved the issue. I will
> probably add this config in /etc/ceph/ceph
Hi Kevin,
You are right. Increasing number of PGs per OSD resolved the issue. I will
probably add this config in /etc/ceph/ceph.conf file of ceph mon and OSDs
so it applies on host boot.
Thanks
Arun
On Fri, Jan 4, 2019 at 3:46 PM Kevin Olbrich wrote:
> Hi Arun,
>
> actually deleting was no goo
Hi Arun,
actually deleting was no good idea, thats why I wrote, that the OSDs
should be "out".
You have down PGs, that because the data is on OSDs that are
unavailable but known by the cluster.
This can be checked by using "ceph pg 0.5 query" (change PG name).
Because your PG count is so much ove
Hi Kevin,
I tried deleting newly added server from Ceph Cluster and looks like Ceph
is not recovering. I agree with unfound data but it doesn't say about
unfound data. It says inactive/down for PGs and I can't bring them up.
[root@fre101 ~]# ceph health detail
2019-01-04 15:17:05.711641 7f27b0f3
I don't think this will help you. Unfound means, the cluster is unable
to find the data anywhere (it's lost).
It would be sufficient to shut down the new host - the OSDs will then be out.
You can also force-heal the cluster, something like "do your best possible":
ceph pg 2.5 mark_unfound_lost re
Hi Kevin,
Can I remove newly added server from Cluster and see if it heals cluster ?
When I check Hard Disk Iops on new server which are very low compared to
existing cluster server.
Indeed this is a critical cluster but I don't have expertise to make it
flawless.
Thanks
Arun
On Fri, Jan 4, 20
If you realy created and destroyed OSDs before the cluster healed
itself, this data will be permanently lost (not found / inactive).
Also your PG count is so much oversized, the calculation for peering
will most likely break because this was never tested.
If this is a critical cluster, I would sta
Can anyone comment on this issue please, I can't seem to bring my cluster
healthy.
On Fri, Jan 4, 2019 at 6:26 AM Arun POONIA
wrote:
> Hi Caspar,
>
> Number of IOPs are also quite low. It used be around 1K Plus on one of
> Pool (VMs) now its like close to 10-30 .
>
> Thansk
> Arun
>
> On Fri, Ja
Hi Caspar,
Number of IOPs are also quite low. It used be around 1K Plus on one of Pool
(VMs) now its like close to 10-30 .
Thansk
Arun
On Fri, Jan 4, 2019 at 5:41 AM Arun POONIA
wrote:
> Hi Caspar,
>
> Yes and No, numbers are going up and down. If I run ceph -s command I can
> see it decreases
Hi Caspar,
Yes and No, numbers are going up and down. If I run ceph -s command I can
see it decreases one time and later it increases again. I see there are so
many blocked/slow requests. Almost all the OSDs have slow requests. Around
12% PGs are inactive not sure how to activate them again.
[ro
Are the numbers still decreasing?
This one for instance:
"3883 PGs pending on creation"
Caspar
Op vr 4 jan. 2019 om 14:23 schreef Arun POONIA <
arun.poo...@nuagenetworks.net>:
> Hi Caspar,
>
> Yes, cluster was working fine with number of PGs per OSD warning up until
> now. I am not sure how t
Hi Caspar,
Yes, cluster was working fine with number of PGs per OSD warning up until
now. I am not sure how to recover from stale down/inactive PGs. If you
happen to know about this can you let me know?
Current State:
[root@fre101 ~]# ceph -s
2019-01-04 05:22:05.942349 7f314f613700 -1 asok(0x7f3
Hi Arun,
How did you end up with a 'working' cluster with so many pgs per OSD?
"too many PGs per OSD (2968 > max 200)"
To (temporarily) allow this kind of pgs per osd you could try this:
Change these values in the global section in your ceph.conf:
mon max pg per osd = 200
osd max pg per osd ha
Hi Chris,
Indeed that's what happened. I didn't set noout flag either and I did
zapped disk on new server every time. In my cluster status fre201 is only
new server.
Current Status after enabling 3 OSDs on fre201 host.
[root@fre201 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWE
If you added OSDs and then deleted them repeatedly without waiting for
replication to finish as the cluster attempted to re-balance across them,
its highly likely that you are permanently missing PGs (especially if the
disks were zapped each time).
If those 3 down OSDs can be revived there is
Thanks, Sage! That did the trick.
Wido, seems like an interesting approach but I wasn't brave enough to
attempt it!
Eric, I suppose this does the same thing that the crushtool reclassify
feature does?
Thank you both for your suggestions.
For posterity:
- I grabbed some 14.0.1 packages, extrac
Hi David,
CERN has provided with a python script to swap the correct bucket IDs
(default <-> hdd), you can find it here :
https://github.com/cernceph/ceph-scripts/blob/master/tools/device-class-id-swap.py
The principle is the following :
- extract the CRUSH map
- run the script on it => it create
On Sun, 30 Dec 2018, David C wrote:
> Hi All
>
> I'm trying to set the existing pools in a Luminous cluster to use the hdd
> device-class but without moving data around. If I just create a new rule
> using the hdd class and set my pools to use that new rule it will cause a
> huge amount of data mo
вс, 2 дек. 2018 г., 20:38 Paul Emmerich paul.emmer...@croit.io:
> 10 copies for a replicated setup seems... excessive.
>
I'm try to create golang package for simple key-val store that used ceph
crushmap to distribute data.
For each namespace attach ceph crushmap rule.
>
_
10 copies for a replicated setup seems... excessive.
The rules are quite simple, for example rule 1 could be:
take default
choose firstn 5 type datacenter # picks 5 datacenters
chooseleaf firstn 2 type host # 2 different hosts in each datacenter
emit
rule 2 is the same but type region and first
That turned out to be exactly the issue (And boy was it fun clearing pgs
out on 71 OSDs). I think it's caused by a combination of two factors.
1. This cluster has way to many placement groups per OSD (just north of
800). It was fine when we first created all the pools, but upgrades (most
recently t
Yeah, don't run these commands blind. They are changing the local metadata
of the PG in ways that may make it inconsistent with the overall cluster
and result in lost data.
Brett, it seems this issue has come up several times in the field but we
haven't been able to reproduce it locally or get eno
can you file tracker for your
issues(http://tracker.ceph.com/projects/ceph/issues/new) , email once
its lengthy is not great to track the issue, Ideally full details of
environment (os/ceph versions /before/after/workload info/ tool used
for upgrade) is important if one has to recreate it. There a
Hi,
Sorry to hear that. I’ve been battling with mine for 2 weeks :/
I’ve corrected mine OSDs with the following commands. My OSD logs
(/var/log/ceph/ceph-OSDx.log) has a line including log(EER) with the PG number
besides and before crash dump.
ceph-objectstore-tool --data-path /var/lib/ceph/os
Hi Paul,
Yes, all monitors have been restarted.
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Did you restart the mons or inject the option?
Paul
2018-09-12 17:40 GMT+02:00 Chad William Seys :
> Hi all,
> I'm having trouble turning off the warning "1 pools have many more objects
> per pg than average".
>
> I've tried a lot of variations on the below, my current ceph.conf:
>
> #...
> [mo
Thanks for the info.
On Thu, Sep 6, 2018 at 7:03 PM Darius Kasparavičius
wrote:
> Hello,
>
> I'm currently running a similar setup. It's running a blustore OSD
> with 1 NVME device for db/wal devices. That NVME device is not large
> enough to support 160GB db partition per osd, so I'm stuck with
Hello,
I'm currently running a similar setup. It's running a blustore OSD
with 1 NVME device for db/wal devices. That NVME device is not large
enough to support 160GB db partition per osd, so I'm stuck with 50GB
each. Currently haven't had any issues with slowdowns or crashes.
The cluster is rela
To: Muhammad Junaid
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] help needed
The official ceph documentation recommendations for a db partition for a 4TB
bluestore osd would be 160GB each.
Samsung Evo Pro is not an Enterprise class SSD. A quick search of the ML will
allow which
The official ceph documentation recommendations for a db partition for a
4TB bluestore osd would be 160GB each.
Samsung Evo Pro is not an Enterprise class SSD. A quick search of the ML
will allow which SSDs people are using.
As was already suggested, the better option is an HBA as opposed to a ra
Thanks. Can you please clarify, if we use any other enterprise class SSD
for journal, should we enable write-back caching available on raid
controller for journal device or connect it as write through. Regards.
On Thu, Sep 6, 2018 at 4:50 PM Marc Roos wrote:
>
>
>
> Do not use Samsung 850 PRO fo
Do not use Samsung 850 PRO for journal
Just use LSI logic HBA (eg. SAS2308)
-Original Message-
From: Muhammad Junaid [mailto:junaid.fsd...@gmail.com]
Sent: donderdag 6 september 2018 13:18
To: ceph-users@lists.ceph.com
Subject: [ceph-users] help needed
Hi there
Hope, every one wil
Agreed on not going the disks until your cluster is healthy again. Making
them out and seeing how healthy you can get in the meantime is a good idea.
On Sun, Sep 2, 2018, 1:18 PM Ronny Aasen wrote:
> On 02.09.2018 17:12, Lee wrote:
> > Should I just out the OSD's first or completely zap them and
On 02.09.2018 17:12, Lee wrote:
Should I just out the OSD's first or completely zap them and recreate?
Or delete and let the cluster repair itself?
On the second node when it started back up I had problems with the
Journals for ID 5 and 7 they were also recreated all the rest are
still the or
Ok, rather than going gunhoe at this..
1. I have set out, 31,24,21,18,15,14,13,6 and 7,5 (10 is a new OSD)
Which gives me
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 23.65970 root default
-5 8.18990 host data33-a4
13 0.90999 osd.13 up0
Should I just out the OSD's first or completely zap them and recreate? Or
delete and let the cluster repair itself?
On the second node when it started back up I had problems with the Journals
for ID 5 and 7 they were also recreated all the rest are still the
originals.
I know that some PG's are o
The problem is with never getting a successful run of `ceph-osd
--flush-journal` on the old SSD journal drive. All of the OSDs that used
the dead journal need to be removed from the cluster, wiped, and added back
in. The data on them is not 100% consistent because the old journal died.
Any word tha
I followed:
$ journal_uuid=$(sudo cat /var/lib/ceph/osd/ceph-0/journal_uuid)
$ sudo sgdisk --new=1:0:+20480M --change-name=1:'ceph journal'
--partition-guid=1:$journal_uuid
--typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdk
Then
$ sudo ceph-osd --mkjournal -i 20
$ sudo serv
>
>
> Hi David,
>
> Yes heath detail outputs all the errors etc and recovery / backfill is
> going on, just taking time 25% misplaced and 1.5 degraded.
>
> I can list out the pools and see sizes etc..
>
> My main problem is I have no client IO from a read perspective, I cannot
> start vms I'm opens
Hi David,
Yes heath detail outputs all the errors etc and recovery / backfill is
going on, just taking time 25% misplaced and 1.5 degraded.
I can list out the pools and see sizes etc..
My main problem is I have no client IO from a read perspective, I cannot
start vms I'm openstack and ceph -w st
When the first node went offline with a dead SSD journal, all of the dates
on the OSDs was useless. Unless you could flush the journals, you can't
guarantee that a wire the cluster think happened actually made it to the
disk. The proper procedure here is to remove those OSDs and add them again
as
Does "ceph health detail" work?
Have you manually confirmed the OSDs on the nodes are working?
What was the replica size of the pools?
Are you seeing any progress with the recovery?
On Sun, Sep 2, 2018 at 9:42 AM Lee wrote:
> Running 0.94.5 as part of a Openstack enviroment, our ceph setup is
Now here's the thing:
Some weeks ago Proxmox upgraded from kernel 4.13 to 4.15. Since then I'm
getting slow requests that
cause blocked IO inside the VMs that are running on the cluster (but not
necessarily on the host
with the OSD causing the slow request).
If I boot back into 4.13 then Ceph
Hi All,
there might be a a problem on Scientific Linux 7.5 too:
after upgrading directly from 12.2.5 to 13.2.1
[root@cephr01 ~]# ceph-detect-init
Traceback (most recent call last):
File "/usr/bin/ceph-detect-init", line 9, in
load_entry_point('ceph-detect-init==1.0.1', 'console_scripts',
kzal t maar eens testen :)
On 30/07/18 10:54, Nathan Cutler wrote:
for all others on this list, it might also be helpful to know which
setups are likely affected.
Does this only occur for Filestore disks, i.e. if ceph-volume has
taken over taking care of these?
Does it happen on every RHEL 7.
for all others on this list, it might also be helpful to know which setups are
likely affected.
Does this only occur for Filestore disks, i.e. if ceph-volume has taken over
taking care of these?
Does it happen on every RHEL 7.5 system?
It affects all OSDs managed by ceph-disk on all RHEL syste
e=release)
> ceph_detect_init.exc.UnsupportedPlatform: Platform is not supported.: rhel
> 7.5
>
>
> Gesendet: Sonntag, 29. Juli 2018 um 20:33 Uhr
> Von: "Nathan Cutler"
> An: ceph.nov...@habmalnefrage.de, "Vasu Kulkarni"
> Cc: ceph-users , "Ceph D
atform: Platform is not supported.: rhel 7.5
Gesendet: Sonntag, 29. Juli 2018 um 20:33 Uhr
Von: "Nathan Cutler"
An: ceph.nov...@habmalnefrage.de, "Vasu Kulkarni"
Cc: ceph-users , "Ceph Development"
Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2
303
Nathan
On 07/29/2018 11:16 AM, ceph.nov...@habmalnefrage.de wrote:
>
Gesendet: Sonntag, 29. Juli 2018 um 03:15 Uhr
Von: "Vasu Kulkarni"
An: ceph.nov...@habmalnefrage.de
Cc: "Sage Weil" , ceph-users , "Ceph
Development"
Betreff: Re: [ceph-users] HELP
age Weil" , ceph-users ,
"Ceph Development"
Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
On Sat, Jul 28, 2018 at 6:02 PM, wrote:
> Have you guys changed something with the systemctl startup of the OSDs?
I think there is some ki
/ 8.8 TiB avail
> pgs: 1390 active+clean
>
> io:
> client: 11 KiB/s rd, 10 op/s rd, 0 op/s wr
>
> Any hints?
>
> --
>
>
> Gesendet: Samstag, 28. Juli 2018 um 23:35 Uhr
> Von: ceph
rage.de
An: "Sage Weil"
Cc: ceph-users@lists.ceph.com, ceph-de...@vger.kernel.org
Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
Hi Sage.
Sure. Any specific OSD(s) log(s)? Or just any?
Gesendet: Samstag, 28. Juli 2018 um 16:49 Uhr
Von: "Sage
Hi Sage.
Sure. Any specific OSD(s) log(s)? Or just any?
Gesendet: Samstag, 28. Juli 2018 um 16:49 Uhr
Von: "Sage Weil"
An: ceph.nov...@habmalnefrage.de, ceph-users@lists.ceph.com,
ceph-de...@vger.kernel.org
Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic r
Can you include more or your osd log file?
On July 28, 2018 9:46:16 AM CDT, ceph.nov...@habmalnefrage.de wrote:
>Dear users and developers.
>
>I've updated our dev-cluster from v13.2.0 to v13.2.1 yesterday and
>since then everything is badly broken.
>I've restarted all Ceph components via "system
So my colleague Sean Crosby and I were looking through the logs (with debug mds
= 10) and found some references just before the crash to inode number. We
converted it from HEX to decimal and got something like 109953*5*627776 (last
few digits not necessarily correct). We set one digit up i.e to
On Mon, May 21, 2018 at 11:19 AM Andras Pataki <
apat...@flatironinstitute.org> wrote:
> Hi Greg,
>
> Thanks for the detailed explanation - the examples make a lot of sense.
>
> One followup question regarding a two level crush rule like:
>
>
> step take default
> step choose 3 type=rack
> step ch
Hi Greg,
Thanks for the detailed explanation - the examples make a lot of sense.
One followup question regarding a two level crush rule like:
step take default
step choose 3 type=rack
step chooseleaf 3 type=host
step emit
If the erasure code has 9 chunks, this lines up exactly without any
pro
On Thu, May 17, 2018 at 9:05 AM Andras Pataki
wrote:
> I've been trying to wrap my head around crush rules, and I need some
> help/advice. I'm thinking of using erasure coding instead of
> replication, and trying to understand the possibilities for planning for
> failure cases.
>
> For a simplif
That seems to have worked. Thanks much!
And yes, I realize my setup is less than ideal, but I'm planning on
migrating from another storage system, and this is the hardware I have
to work with. I'll definitely keep your recommendations in mind when I
start to grow the cluster.
On 04/23/2018 1
Hi,
this doesn't sound like a good idea: two hosts is usually a poor
configuration for Ceph.
Also, fewer disks on more servers is typically better than lots of disks in
few servers.
But to answer your question: you could use a crush rule like this:
min_size 4
max_size 4
step take default
step ch
There WAL sis a required party of the osd. If you remove that, then the osd
is missing a crucial part of itself and it will be unable to start until
the WAL is back online. If the SSD were to fail, then all osds using it
would need to be removed and recreated on the cluster.
On Tue, Feb 20, 2018,
ter" from Ceph Days Germany earlier this month for
> other things to watch out for:
>
>
>
> https://ceph.com/cephdays/germany/
>
>
>
> Bryan
>
>
>
> *From: *ceph-users on behalf of Bryan
> Banister
> *Date: *Tuesday, February 20, 2018 at 2:53 PM
er this month for other things
to watch out for:
https://ceph.com/cephdays/germany/
Bryan
From: ceph-users on behalf of Bryan
Banister
Date: Tuesday, February 20, 2018 at 2:53 PM
To: David Turner
Cc: Ceph Users
Subject: Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2
HI David [
Hi,
We were recently testing luminous with bluestore. We have 6 node cluster
with 12 HDD and 1 SSD each, we used ceph-volume with LVM to create all the OSD
and attached with SSD WAL (LVM ). We create individual 10GBx12 LVM on single
SDD for each WAL. So all the OSD WAL is on the singe SSD.
Hi,
We were recently testing luminous with bluestore. We have 6 node cluster
with 12 HDD and 1 SSD each, we used ceph-volume with LVM to create all the OSD
and attached with SSD WAL (LVM ). We create individual 10GBx12 LVM on single
SDD for each WAL. So all the OSD WAL is on the singe SSD. P
nister mailto:bbanis...@jumptrading.com>>
Cc: Bryan Stillwell mailto:bstillw...@godaddy.com>>;
Janne Johansson mailto:icepic...@gmail.com>>; Ceph Users
mailto:ceph-users@lists.ceph.com>>
Subject: Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2
Note: External
arking OSDs with stuck requests down to see if that
> will re-assert them?
>
>
>
> Thanks!!
>
> -Bryan
>
>
>
> *From:* David Turner [mailto:drakonst...@gmail.com]
> *Sent:* Friday, February 16, 2018 2:51 PM
>
>
> *To:* Bryan Banister
> *Cc:* Bryan Stillwe
Cc: Bryan Stillwell ; Janne Johansson
; Ceph Users
Subject: Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2
Note: External Email
The questions I definitely know the answer to first, and then we'll continue
from there. If an OSD is blocking peerin
"scrubber.seed": 0,
>
> "scrubber.waiting_on": 0,
>
> "scrubber.waiting_on_whom": []
>
> }
>
> },
>
> {
>
> "name": "Started",
>
>
uot;: "Started",
"enter_time": "2018-02-13 14:33:17.491148"
}
],
Sorry for all the hand holding, but how do I determine if I need to set an OSD
as ‘down’ to fix the issues, and how does it go about re-asserting itself?
I again tried lo
00
>
>
>
> At this point we do not know to proceed with recovery efforts. I tried
> looking at the ceph docs and mail list archives but wasn’t able to
> determine the right path forward here.
>
>
>
> Any help is appreciated,
>
> -Bryan
>
>
>
>
>
@godaddy.com]
Sent: Tuesday, February 13, 2018 2:27 PM
To: Bryan Banister ; Janne Johansson
Cc: Ceph Users
Subject: Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2
Note: External Email
It may work fine, but I would suggest limiting the number of ope
It may work fine, but I would suggest limiting the number of operations going
on at the same time.
Bryan
From: Bryan Banister
Date: Tuesday, February 13, 2018 at 1:16 PM
To: Bryan Stillwell , Janne Johansson
Cc: Ceph Users
Subject: RE: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2
y.com]
Sent: Tuesday, February 13, 2018 12:43 PM
To: Bryan Banister ; Janne Johansson
Cc: Ceph Users
Subject: Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2
Note: External Email
-
Bryan,
Based off the information you've provided
print $1 "\t" $7 }' |sort -n
-k2
You'll see that within a pool the PG sizes are fairly close to the same size,
but in your cluster the PGs are fairly large (~200GB would be my guess).
Bryan
From: ceph-users on behalf of Bryan
Banister
Date: Monday, February 12, 2018 at 2:19
[mailto:icepic...@gmail.com]
Sent: Wednesday, January 31, 2018 9:34 AM
To: Bryan Banister
Cc: Ceph Users
Subject: Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2
Note: External Email
2018-01-31 15:58 GMT+01:00 Bryan Banister
mailto:bbanis...@jumptrading.com
Thanks, I’m downloading it right now
--
Efficiency is Intelligent Laziness
From: "ceph.nov...@habmalnefrage.de"
Date: Friday, February 2, 2018 at 12:37 PM
To: "ceph.nov...@habmalnefrage.de"
Cc: Frank Li , "ceph-users@lists.ceph.com"
Subject: Aw: Re: [ceph-use
there pick your "DISTRO", klick on the "ID", klick "Repo URL"...
Gesendet: Freitag, 02. Februar 2018 um 21:34 Uhr
Von: ceph.nov...@habmalnefrage.de
An: "Frank Li"
Cc: "ceph-users@lists.ceph.com"
Betreff: Re: [ceph-users] Help ! how to recov
https://shaman.ceph.com/repos/ceph/wip-22847-luminous/f04a4a36f01fdd5d9276fa5cfa1940f5cc11fb81/
Gesendet: Freitag, 02. Februar 2018 um 21:27 Uhr
Von: "Frank Li"
An: "Sage Weil"
Cc: "ceph-users@lists.ceph.com"
Betreff: Re: [ceph-users] Help ! how to recover
Sure, please let me know where to get and run the binaries. Thanks for the fast
response !
--
Efficiency is Intelligent Laziness
On 2/2/18, 10:31 AM, "Sage Weil" wrote:
On Fri, 2 Feb 2018, Frank Li wrote:
> Yes, I was dealing with an issue where OSD are not peerings, and I was
trying
On Fri, 2 Feb 2018, Frank Li wrote:
> Yes, I was dealing with an issue where OSD are not peerings, and I was trying
> to see if force-create-pg can help recover the peering.
> Data lose is an accepted possibility.
>
> I hope this is what you are looking for ?
>
> -3> 2018-01-31 22:47:22.94
Yes, I was dealing with an issue where OSD are not peerings, and I was trying
to see if force-create-pg can help recover the peering.
Data lose is an accepted possibility.
I hope this is what you are looking for ?
-3> 2018-01-31 22:47:22.942394 7fc641d0b700 5 mon.dl1-kaf101@0(electing)
e
On Fri, 2 Feb 2018, Frank Li wrote:
> Hi, I ran the ceph osd force-create-pg command in luminious 12.2.2 to recover
> a failed pg, and it
> Instantly caused all of the monitor to crash, is there anyway to revert back
> to an earlier state of the cluster ?
> Right now, the monitors refuse to come
2018-01-31 15:58 GMT+01:00 Bryan Banister :
>
>
>
> Given that this will move data around (I think), should we increase the
> pg_num and pgp_num first and then see how it looks?
>
>
>
I guess adding pgs and pgps will move stuff around too, but if the PGCALC
formula says you should have more then
and pgp_num first and then see how it looks?
Thanks,
-Bryan
From: Janne Johansson [mailto:icepic...@gmail.com]
Sent: Wednesday, January 31, 2018 7:53 AM
To: Bryan Banister
Cc: Ceph Users
Subject: Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2
Note: External Email
__
2018-01-30 17:24 GMT+01:00 Bryan Banister :
> Hi all,
>
>
>
> We are still very new to running a Ceph cluster and have run a RGW cluster
> for a while now (6-ish mo), it mainly holds large DB backups (Write once,
> read once, delete after N days). The system is now warning us about an OSD
> that
Users
Subject: Re: [ceph-users] Help rebalancing OSD usage, Luminous 12.2.2
Sorry I hadn’t RTFML archive before posting this… Looking at the following
thread for guidance:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008626.html<http://lists.ceph.com/pipermail/ceph-users-ceph.
Sorry I hadn't RTFML archive before posting this... Looking at the following
thread for guidance:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008626.html
Not the exact same situation (e.g. didn't add larger OSD disks later on) but
seems like the same recommendations from thi
Sorry, obviously should have been Luminous 12.2.2,
-B
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Bryan
Banister
Sent: Tuesday, January 30, 2018 10:24 AM
To: Ceph Users
Subject: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2
Note: External Email
Hello!
I can only answer some of your questions:
-The backfill process obeys a "nearfull_ratio" limit (I think defaults
to 85%) above it the system will stop repairing itself, so it wont go up
to 100%
-The normal write ops obey a full_ratio too, I think default 95%, above
that no write io w
okay another day another nightmare ;-)
So far we discussed pools as bundles of:
- pool 1) 15 HDD-OSDs (consisting of a total of 25 HDDs actual, 5
single HDDs and five raid0 pairs as mentioned before)
- pool 2) 6 SSD-OSDs
unfortunately (well) on the "physical" pool 1 there are two "logical"
pools (
Quoting tim taler (robur...@gmail.com):
> And I'm still puzzled about the implication of the cluster size on the
> amount of OSD failures.
> With size=2 min_size=1 one host could die and (if by chance there is
> NO read error on any bit on the living host) I could (theoretically)
> recover, is that
Yep, you are correct, thanks!
On 12/04/2017 07:31 PM, David Turner wrote:
"The journals can only be moved back by a complete rebuild of that osd
as to my knowledge."
I'm assuming that since this is a cluster that he's inherited and that
it's configured like this that it's probably not runnin
On 04.12.2017 19:18, tim taler wrote:
In size=2 losing any 2 discs on different hosts would probably cause data to
be unavailable / lost, as the pg copys are randomly distribbuted across the
osds. Chances are, that you can find a pg which's acting group is the two
failed osd (you lost all your re
"The journals can only be moved back by a complete rebuild of that osd as to
my knowledge."
I'm assuming that since this is a cluster that he's inherited and that it's
configured like this that it's probably not running luminous or bluestore
OSDs. Again more information needed about your cluster a
> In size=2 losing any 2 discs on different hosts would probably cause data to
> be unavailable / lost, as the pg copys are randomly distribbuted across the
> osds. Chances are, that you can find a pg which's acting group is the two
> failed osd (you lost all your replicas)
okay I see, getting cle
1 - 100 of 287 matches
Mail list logo