Le vendredi 21 septembre 2018 à 19:45 +0200, Paul Emmerich a écrit :
> The cache tiering has nothing to do with the PG of the underlying
> pool
> being incomplete.
> You are just seeing these requests as stuck because it's the only
> thing trying to write to the underlying pool.
I agree, It was j
The cache tiering has nothing to do with the PG of the underlying pool
being incomplete.
You are just seeing these requests as stuck because it's the only
thing trying to write to the underlying pool.
What you need to fix is the PG showing incomplete. I assume you
already tried reducing the min_s
So I've totally disable cache-tiering and overlay. Now OSD 68 & 69 are
fine, no more blocked.
But OSD 32 is still blocked, and PG 37.9c still marked incomplete with
:
"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-09-21
According to the query output you pasted shards 1 and 2 are broken.
But, on the other hand EC profile (4+2) should make it possible to recover
from 2 shards lost simultanously...
pt., 21 wrz 2018 o 16:29 Olivier Bonvalet napisał(a):
> Well on drive, I can find thoses parts :
>
> - cs0 on OSD 29
Well on drive, I can find thoses parts :
- cs0 on OSD 29 and 30
- cs1 on OSD 18 and 19
- cs2 on OSD 13
- cs3 on OSD 66
- cs4 on OSD 0
- cs5 on OSD 75
And I can read thoses files too.
And all thoses OSD are UP and IN.
Le vendredi 21 septembre 2018 à 13:10 +, Eugen Block a écrit :
> > > I tr
Yep :
pool 38 'cache-bkp-foo' replicated size 3 min_size 2 crush_rule 26
object_hash rjenkins pg_num 128 pgp_num 128 last_change 585369 lfor
68255/68255 flags hashpspool,incomplete_clones tier_of 37 cache_mode
readproxy target_bytes 209715200 hit_set
bloom{false_positive_probability: 0.05, target_
I also switched the cache tier to "readproxy", to avoid using this
cache. But, it's still blocked.
You could change the cache mode to "none" to disable it. Could you
paste the output of:
ceph osd pool ls detail | grep cache-bkp-foo
Zitat von Olivier Bonvalet :
In fact, one object (only
I tried to flush the cache with "rados -p cache-bkp-foo cache-flush-
evict-all", but it blocks on the object
"rbd_data.f66c92ae8944a.000f2596".
This is the object that's stuck in the cache tier (according to your
output in https://pastebin.com/zrwu5X0w). Can you verify if that block
Could you, please paste the output of pg 37.9c query
pt., 21 wrz 2018 o 14:39 Olivier Bonvalet napisał(a):
> In fact, one object (only one) seem to be blocked on the cache tier
> (writeback).
>
> I tried to flush the cache with "rados -p cache-bkp-foo cache-flush-
> evict-all", but it blocks on
In fact, one object (only one) seem to be blocked on the cache tier
(writeback).
I tried to flush the cache with "rados -p cache-bkp-foo cache-flush-
evict-all", but it blocks on the object
"rbd_data.f66c92ae8944a.000f2596".
So I reduced (a lot) the cache tier to 200MB, "rados -p cache-bk
Ok, so it's a replica 3 pool, and OSD 68 & 69 are on the same host.
Le vendredi 21 septembre 2018 à 11:09 +, Eugen Block a écrit :
> > cache-tier on this pool have 26GB of data (for 5.7TB of data on the
> > EC
> > pool).
> > We tried to flush the cache tier, and restart OSD 68 & 69, without
>
cache-tier on this pool have 26GB of data (for 5.7TB of data on the EC
pool).
We tried to flush the cache tier, and restart OSD 68 & 69, without any
success.
I meant the replication size of the pool
ceph osd pool ls detail | grep
In the experimental state of our cluster we had a cache tier (f
Hi,
cache-tier on this pool have 26GB of data (for 5.7TB of data on the EC
pool).
We tried to flush the cache tier, and restart OSD 68 & 69, without any
success.
But I don't see any related data on cache-tier OSD (filestore) with :
find /var/lib/ceph/osd/ -maxdepth 3 -name '*37.9c*'
I don'
Hi Olivier,
what size does the cache tier have? You could set cache-mode to
forward and flush it, maybe restarting those OSDs (68, 69) helps, too.
Or there could be an issue with the cache tier, what do those logs say?
Regards,
Eugen
Zitat von Olivier Bonvalet :
Hello,
on a Luminous cl
Hello,
on a Luminous cluster, I have a PG incomplete and I can't find how to
fix that.
It's an EC pool (4+2) :
pg 37.9c is incomplete, acting [32,50,59,1,0,75] (reducing pool
bkp-sb-raid6 min_size from 4 may help; search ceph.com/docs for
'incomplete')
Of course, we can't reduce min_size fr
Hi,
Thank you so much!
This fixed my issue completely, minus one image that was apparently
being uploaded while the rack lost power.
Is there anything I can do to prevent this from happening in the
future, or a way to detect this issue?
I've looked online for an explanation of exactly what this
Try restarting the primary osd for that pg with
osd_find_best_info_ignore_history_les set to true (don't leave it set
long term).
-Sam
On Tue, May 17, 2016 at 7:50 AM, Hein-Pieter van Braam wrote:
> Hello,
>
> Today we had a power failure in a rack housing our OSD servers. We had
> 7 of our 30 to
Hello,
Today we had a power failure in a rack housing our OSD servers. We had
7 of our 30 total OSD nodes down. Of the affect PG 2 out of the 3 OSDs
went down.
After everything was back and mostly healthy I found one placement
group marked as incomplete. I can't figure out why.
I'm running ceph
18 matches
Mail list logo