[ceph-users] ceph is stuck after increasing pg_nums

2022-11-04 Thread Adrian Nicolae

Hi,

We have a Pacific cluster (16.2.4) with 30 servers and 30 osds. We 
started to increase the pg_num for the data bucket for more than a 
month, I usually added 64 pgs in every step I didn't have any issue. The 
cluster was healthy before increasing the pgs.


Today I've added 128 pgs  and the cluster is stuck with some unknown pgs 
and some other in peering state. I've restarted a few osds with slow_ops 
and even a few hosts but it didn't change anything. We don't have any 
networking issue .  Do you have any suggestion ?  Our service is 
completely down ...


  cluster:
    id: 322ef292-d129-11eb-96b2-a1b38fd61d55
    health: HEALTH_WARN
    Slow OSD heartbeats on back (longest 1517.814ms)
    Slow OSD heartbeats on front (longest 1551.680ms)
    Reduced data availability: 42 pgs inactive, 33 pgs peering
    1 pool(s) have non-power-of-two pg_num
    2888 slow ops, oldest one blocked for 6028 sec, daemons 
[osd.103,osd.115,osd.126,osd.129,osd.130,osd.138,osd.155,osd.174,osd.179,osd.181]... 
have slow ops.


  services:
    mon: 5 daemons, quorum osd-new-01,osd04,osd05,osd09,osd22 (age 11m)
    mgr: osd-new-01.babahi(active, since 11m), standbys: osd02.wqcizg
    osd: 311 osds: 311 up (since 3m), 311 in (since 3m); 29 remapped pgs
    rgw: 26 daemons active (26 hosts, 1 zones)

  data:
    pools:   8 pools, 2649 pgs
    objects: 590.57M objects, 1.5 PiB
    usage:   2.2 PiB used, 1.2 PiB / 3.4 PiB avail
    pgs: 0.340% pgs unknown
 1.246% pgs not active
 4056622/3539747751 objects misplaced (0.115%)
 2529 active+clean
 33   peering
 31   active+clean+laggy
 26   active+remapped+backfilling
 18   active+clean+scrubbing+deep
 9    unknown
 3    active+remapped+backfill_wait

  io:
    client:   38 KiB/s rd, 0 B/s wr, 37 op/s rd, 25 op/s wr
    recovery: 426 MiB/s, 158 objects/s


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph is stuck after increasing pg_nums

2022-11-04 Thread Adrian Nicolae

 ceph health detail
HEALTH_WARN Reduced data availability: 42 pgs inactive, 33 pgs peering; 
1 pool(s) have non-power-of-two pg_num; 2371 slow ops, oldest one 
blocked for 6218 sec, daemons 
[osd.103,osd.115,osd.126,osd.129,osd.130,osd.138,osd.155,osd.174,osd.179,osd.181]... 
have slow ops.
[WRN] PG_AVAILABILITY: Reduced data availability: 42 pgs inactive, 33 
pgs peering
    pg 6.eb is stuck peering for 54m, current state peering, last 
acting [79,279,68,179,264,240]
    pg 6.10f is stuck peering for 36m, current state peering, last 
acting [288,161,37,63,178,240]
    pg 6.115 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.139 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.17e is stuck peering for 103m, current state peering, last 
acting [126,190,252,282,113,240]
    pg 6.1a5 is stuck peering for 103m, current state peering, last 
acting [41,158,240,177,66,228]
    pg 6.1ae is stuck peering for 103m, current state peering, last 
acting [186,240,162,221,289,219]
    pg 6.1eb is stuck peering for 36m, current state peering, last 
acting [220,240,184,226,205,254]
    pg 6.21b is stuck peering for 58m, current state peering, last 
acting [179,301,168,292,240,121]
    pg 6.26d is stuck peering for 36m, current state peering, last 
acting [68,305,240,47,137,184]
    pg 6.348 is stuck peering for 77m, current state peering, last 
acting [138,307,221,125,240,285]
    pg 6.369 is stuck peering for 54m, current state peering, last 
acting [35,66,240,254,58,179]
    pg 6.39f is stuck peering for 28m, current state peering, last 
acting [264,46,240,154,101,194]
    pg 6.3ca is stuck peering for 58m, current state peering, last 
acting [202,213,174,296,240,45]
    pg 6.3cb is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.3e1 is stuck peering for 77m, current state peering, last 
acting [115,168,240,85,56,26]
    pg 6.3f3 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.473 is stuck peering for 36m, current state peering, last 
acting [265,53,77,240,182,92]
    pg 6.576 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.5a6 is stuck peering for 103m, current state peering, last 
acting [257,37,240,54,263,68]
    pg 6.5eb is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.63f is stuck peering for 85m, current state peering, last 
acting [252,53,240,131,25,278]
    pg 6.655 is stuck peering for 103m, current state peering, last 
acting [103,267,222,308,240,277]
    pg 6.6d5 is stuck peering for 36m, current state peering, last 
acting [197,171,276,177,210,240]
    pg 6.6f2 is stuck peering for 85m, current state peering, last 
acting [174,122,81,129,304,240]
    pg 6.721 is stuck peering for 51m, current state peering, last 
acting [181,76,294,249,299,240]
    pg 6.757 is stuck peering for 23m, current state peering, last 
acting [288,194,213,240,37,22]
    pg 6.785 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.793 is stuck peering for 77m, current state peering, last 
acting [155,301,240,294,214,265]
    pg 6.798 is stuck peering for 51m, current state peering, last 
acting [186,278,196,211,260,240]
    pg 6.79b is stuck peering for 54m, current state peering, last 
acting [186,25,108,240,300,39]
    pg 6.7b7 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.7c5 is stuck peering for 103m, current state peering, last 
acting [130,179,266,240,162,294]
    pg 6.7df is stuck peering for 36m, current state peering, last 
acting [188,240,182,282,265,199]
    pg 6.83c is stuck peering for 77m, current state peering, last 
acting [155,81,228,65,207,240]
    pg 6.85f is stuck peering for 103m, current state peering, last 
acting [129,263,307,28,240,63]
    pg 6.917 is stuck peering for 54m, current state peering, last 
acting [84,179,240,295,92,269]
    pg 6.939 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.97b is stuck peering for 103m, current state peering, last 
acting [34,96,293,129,147,240]
    pg 6.97e is stuck peering for 103m, current state peering, last 
acting [126,190,252,282,113,240]
    pg 6.9a5 is stuck peering for 103m, current state peering, last 
acting [41,158,240,186,66,228]
    pg 6.9ae is stuck peering for 103m, current state peering, last 
acting [186,240,162,221,289,219]

[WRN] POOL_PG_NUM_NOT_POWER_OF_TWO: 1 pool(s) have non-power-of-two pg_num
    pool 'us-east-1.rgw.buckets.data' pg_num 2480 is not a power of two
[WRN] SLOW_OPS: 2371 slow ops, oldest one blocked for 6218 sec, daemons 
[osd.103,osd.115,osd.126,osd.129,osd.130,osd.138,osd.155,osd.174,osd.179,osd.181]... 
have slow ops.


On 11/4/2022 10:45 AM, Adrian Nicolae wrote:

Hi,

We have a Pacific cluster (16.2.4) with 30 servers and 30 osds. We 
started to increase the pg_num for the data bucket for more than a 
month, I usually added 64 pgs in every step I didn't have any issue. 
The cluster was healthy befo

[ceph-users] What is the reason of the rgw_user_quota_bucket_sync_interval and rgw_bucket_quota_ttl values?

2022-11-04 Thread Szabo, Istvan (Agoda)
Hi,

One of my user told me that they can upload bigger files to the bucket than the 
limit. My question is to the developers mainly what’s the reason to set the 
rgw_bucket_quota_ttl=600 and rgw_user_quota_bucket_sync_interval=180? I don’t 
want to set to 0 before I know the reason 😃
With this settings if the user has pretty high bandwidth they can upload 
terabytes of files before the 10minutes limit reached.

I set the following values on a specific bucket:

"bucket_quota": {
"enabled": true,
"check_on_raw": false,
"max_size": 524288000,
"max_size_kb": 512000,
"max_objects": 1126400

But they can upload 600MB files also.

This article came into my face: 
https://bugzilla.redhat.com/show_bug.cgi?id=1417775

Seems like if these values set to 0:

"name": "rgw_bucket_quota_ttl",
"type": "int",
"level": "advanced",
"desc": "Bucket quota stats cache TTL",
"long_desc": "Length of time for bucket stats to be cached within RGW 
instance.",
"default": 600,

and

"name": "rgw_user_quota_bucket_sync_interval",
"type": "int",
"level": "advanced",
"desc": "User quota bucket sync interval",
"long_desc": "Time period for accumulating modified buckets before syncing 
these stats.",
"default": 180,

They will be terminated on bucket limit.

Thank you


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to remove remaining bucket index shard objects

2022-11-04 Thread 伊藤 祐司
Hi,

Mysteriously, the large omap objects alert recurred recently. The values for 
omap_used_mbytes and omap_used_keys are slightly different from the previous 
investigation, but very close. Our team is going to keep this cluster to 
investigate and create another cluster to work. Therefore, my reply may be slow.
Previous values:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/TNQM2W4EDG3J33W7CML2JLCDNFDA6Q3W/

```
$ kubectl exec -n ceph-poc deploy/rook-ceph-tools -- ceph -s
  cluster:
id: 49bd471e-84e6-412e-8ed0-75d7bc176657
health: HEALTH_WARN
25 large omap objects

  services:
mon: 3 daemons, quorum b,d,f (age 36h)
mgr: b(active, since 38h), standbys: a
osd: 96 osds: 96 up (since 31h), 96 in (since 31h)
rgw: 6 daemons active (6 hosts, 2 zones)

  data:
pools:   16 pools, 4432 pgs
objects: 10.74k objects, 34 GiB
usage:   158 GiB used, 787 TiB / 787 TiB avail
pgs: 4432 active+clean

  io:
client:   2.2 KiB/s rd, 169 B/s wr, 2 op/s rd, 0 op/s wr
```

```
$ (header="id used_mbytes used_objects omap_used_mbytes omap_used_keys"
>   echo "${header}"
>   echo "${header}" | tr '[[:alpha:]_' '-'
>   kubectl exec -n ceph-poc deploy/rook-ceph-tools -- ceph pg ls-by-pool 
> "${OSD_POOL}" --format=json | jq -r '.pg_stats |
>   sort_by(.stat_sum.num_bytes) | .[] | (.pgid, .stat_sum.num_bytes/1024/1024,
>   .stat_sum.num_objects, .stat_sum.num_omap_bytes/1024/1024,
>   .stat_sum.num_omap_keys)' | paste - - - - -) | column -t
idused_mbytes  used_objects  omap_used_mbytesomap_used_keys
-----    --
6.0   00 0   0
6.1   00 0   0
6.2   00 86.14682674407959   298586
6.3   00 93.08089542388916   323902
6.4   01 0   0
6.5   01 0   0
6.6   00 0   0
6.7   00 0   0
6.8   00 0   0
6.9   00 439.5090618133545   1524746
6.a   00 0   0
6.b   00 3.4069366455078125  12416
6.c   00 0   0
6.d   00 0   0
6.e   00 0   0
6.f   01 0   0
6.10  01 0   0
6.11  00 0   0
6.12  00 7.727175712585449   28160
6.13  00 114.01904964447021  394996
6.14  00 0   0
6.15  00 0   0
6.16  00 0   0
6.17  00 7.6217451095581055  27776
6.18  00 0   0
6.19  01 0   0
6.1a  01 0   0
6.1b  00 0   0
6.1c  00 88.36568355560303   306677
6.1d  00 0   0
6.1e  01 0   0
6.1f  00 0   0
6.20  01 0   0
6.21  00 0   0
6.22  00 5.883256912231445   21440
6.23  00 0   0
6.24  00 7.938144683837891   28928
6.25  00 0   0
6.26  00 4.267669677734375   15552
6.27  01 0   0
6.28  00 0   0
6.29  00 2.1601409912109375  7872
6.2a  01 0   0
6.2b  00 0   0
6.2c  00 5.479369163513184   19968
6.2d  00 0   0
6.2e  00 0   0
6.2f  00 0   0
6.30  00 0   0
6.31  01 0   0
6.32  01 0   0
6.33  00 5.812973976135254   21184
6.34  00 0   0
6.35  00 0   0
6.36  00 5.865510940551758   21376
6.37  00 0   0
6.38  00 93.97305393218994   327089
6.39  00 15.493829727172852  71787
6

[ceph-users] Re: [PHISHING VERDACHT] ceph is stuck after increasing pg_nums

2022-11-04 Thread Burkhard Linke

Hi,

On 11/4/22 09:45, Adrian Nicolae wrote:

Hi,

We have a Pacific cluster (16.2.4) with 30 servers and 30 osds. We 
started to increase the pg_num for the data bucket for more than a 
month, I usually added 64 pgs in every step I didn't have any issue. 
The cluster was healthy before increasing the pgs.


Today I've added 128 pgs  and the cluster is stuck with some unknown 
pgs and some other in peering state. I've restarted a few osds with 
slow_ops and even a few hosts but it didn't change anything. We don't 
have any networking issue .  Do you have any suggestion ?  Our service 
is completely down ...



*snipsnap*


Do some of the OSDs exceed the PGs per OSD limit? If this is the case, 
the affected OSDs will not allow peering, and tI/O to that OSDs will be 
stuck.


You can check the number of PGs in the 'ceph osd df tree' output. To 
solve this problem you can increase the limit e.g. by setting 
'osd.mon_max_pg_per_osd' in 'ceph config'. The default limit is 200 AFAIK.



Regards,

Burkhard


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG Ratio for EC overwrites Pool

2022-11-04 Thread mailing-lists
Thank you very much. I've increased it to 2*#OSD rounded to the next 
power of 2.


Best

Ken


On 03.11.22 15:30, Anthony D'Atri wrote:

PG count isn’t just about storage size, it also affects performance, 
parallelism, and recovery.

You want pgp_num for RBD metadata pool to be at the VERY least the number of 
OSDs it lives on, rounded up to the next power of 2.  I’d probably go for at 
least (2x#OSD) rounded up.  If you have two few, your metadata operations will 
contend with each other.


On Nov 3, 2022, at 10:24, mailing-lists  wrote:

Dear Ceph'ers,

I am wondering on how to choose the number of PGs for a RBD-EC-Pool.

To be able to use RBD-Images on a EC-Pool, it needs to have an regular 
RBD-replicated-pool, as well as an EC-Pool with EC overwrites enabled, but how 
many PGs would you need for the RBD-replicated-pool. It doesn't seem to eat a 
lot of storage, so if I'm not mistaken, it could be actually a quite low number 
of PGs, but is this recommended? Is there a best practice for this?


Best

Ken

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph is stuck after increasing pg_nums

2022-11-04 Thread Adrian Nicolae
the problem was a single osd daemon (not reported on health detail) 
which slowed down the entire peering process, after restarting it the 
cluster got back to normal.



On 11/4/2022 10:49 AM, Adrian Nicolae wrote:

 ceph health detail
HEALTH_WARN Reduced data availability: 42 pgs inactive, 33 pgs 
peering; 1 pool(s) have non-power-of-two pg_num; 2371 slow ops, oldest 
one blocked for 6218 sec, daemons 
[osd.103,osd.115,osd.126,osd.129,osd.130,osd.138,osd.155,osd.174,osd.179,osd.181]... 
have slow ops.
[WRN] PG_AVAILABILITY: Reduced data availability: 42 pgs inactive, 33 
pgs peering
    pg 6.eb is stuck peering for 54m, current state peering, last 
acting [79,279,68,179,264,240]
    pg 6.10f is stuck peering for 36m, current state peering, last 
acting [288,161,37,63,178,240]
    pg 6.115 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.139 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.17e is stuck peering for 103m, current state peering, last 
acting [126,190,252,282,113,240]
    pg 6.1a5 is stuck peering for 103m, current state peering, last 
acting [41,158,240,177,66,228]
    pg 6.1ae is stuck peering for 103m, current state peering, last 
acting [186,240,162,221,289,219]
    pg 6.1eb is stuck peering for 36m, current state peering, last 
acting [220,240,184,226,205,254]
    pg 6.21b is stuck peering for 58m, current state peering, last 
acting [179,301,168,292,240,121]
    pg 6.26d is stuck peering for 36m, current state peering, last 
acting [68,305,240,47,137,184]
    pg 6.348 is stuck peering for 77m, current state peering, last 
acting [138,307,221,125,240,285]
    pg 6.369 is stuck peering for 54m, current state peering, last 
acting [35,66,240,254,58,179]
    pg 6.39f is stuck peering for 28m, current state peering, last 
acting [264,46,240,154,101,194]
    pg 6.3ca is stuck peering for 58m, current state peering, last 
acting [202,213,174,296,240,45]
    pg 6.3cb is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.3e1 is stuck peering for 77m, current state peering, last 
acting [115,168,240,85,56,26]
    pg 6.3f3 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.473 is stuck peering for 36m, current state peering, last 
acting [265,53,77,240,182,92]
    pg 6.576 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.5a6 is stuck peering for 103m, current state peering, last 
acting [257,37,240,54,263,68]
    pg 6.5eb is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.63f is stuck peering for 85m, current state peering, last 
acting [252,53,240,131,25,278]
    pg 6.655 is stuck peering for 103m, current state peering, last 
acting [103,267,222,308,240,277]
    pg 6.6d5 is stuck peering for 36m, current state peering, last 
acting [197,171,276,177,210,240]
    pg 6.6f2 is stuck peering for 85m, current state peering, last 
acting [174,122,81,129,304,240]
    pg 6.721 is stuck peering for 51m, current state peering, last 
acting [181,76,294,249,299,240]
    pg 6.757 is stuck peering for 23m, current state peering, last 
acting [288,194,213,240,37,22]
    pg 6.785 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.793 is stuck peering for 77m, current state peering, last 
acting [155,301,240,294,214,265]
    pg 6.798 is stuck peering for 51m, current state peering, last 
acting [186,278,196,211,260,240]
    pg 6.79b is stuck peering for 54m, current state peering, last 
acting [186,25,108,240,300,39]
    pg 6.7b7 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.7c5 is stuck peering for 103m, current state peering, last 
acting [130,179,266,240,162,294]
    pg 6.7df is stuck peering for 36m, current state peering, last 
acting [188,240,182,282,265,199]
    pg 6.83c is stuck peering for 77m, current state peering, last 
acting [155,81,228,65,207,240]
    pg 6.85f is stuck peering for 103m, current state peering, last 
acting [129,263,307,28,240,63]
    pg 6.917 is stuck peering for 54m, current state peering, last 
acting [84,179,240,295,92,269]
    pg 6.939 is stuck inactive for 14m, current state unknown, last 
acting []
    pg 6.97b is stuck peering for 103m, current state peering, last 
acting [34,96,293,129,147,240]
    pg 6.97e is stuck peering for 103m, current state peering, last 
acting [126,190,252,282,113,240]
    pg 6.9a5 is stuck peering for 103m, current state peering, last 
acting [41,158,240,186,66,228]
    pg 6.9ae is stuck peering for 103m, current state peering, last 
acting [186,240,162,221,289,219]
[WRN] POOL_PG_NUM_NOT_POWER_OF_TWO: 1 pool(s) have non-power-of-two 
pg_num

    pool 'us-east-1.rgw.buckets.data' pg_num 2480 is not a power of two
[WRN] SLOW_OPS: 2371 slow ops, oldest one blocked for 6218 sec, 
daemons 
[osd.103,osd.115,osd.126,osd.129,osd.130,osd.138,osd.155,osd.174,osd.179,osd.181]... 
have slow ops.


On 11/4/2022 10:45 AM, Adrian Nicolae wrote:

Hi,

We have a Pacifi

[ceph-users] Re: What is the reason of the rgw_user_quota_bucket_sync_interval and rgw_bucket_quota_ttl values?

2022-11-04 Thread Janne Johansson
Den fre 4 nov. 2022 kl 10:48 skrev Szabo, Istvan (Agoda)
:
> Hi,
> One of my user told me that they can upload bigger files to the bucket than 
> the limit. My question is to the developers mainly what’s the reason to set 
> the rgw_bucket_quota_ttl=600 and rgw_user_quota_bucket_sync_interval=180? I 
> don’t want to set to 0 before I know the reason 😃
> With this settings if the user has pretty high bandwidth they can upload 
> terabytes of files before the 10minutes limit reached.

The reason is probably that reading/updating it and syncing stats
between rgws is a costly operation in time, so if you do it for every
4k object someone uploads, the overhead will be very noticeable. For
us mortals whose systems do not allow for many TBs in <600s, the
default timeout is mostly fine.
Even if you check when the file is about to be created, it could grow
quite large while uploading, so you would still pass the limit after
it has been finalized and closed. It's never 100% foolproof but more
like a 'limit within reason'.


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Question about quorum

2022-11-04 Thread Murilo Morais
Hi Tyler, thanks for clarifying, it makes total sense now.

Hypothetically, if there are any failures and most stop, how can I
re-initialize the cluster in its current state or what can be done in this
kind of case?

Em qui., 3 de nov. de 2022 às 17:00, Tyler Brekke 
escreveu:

> Hi Murilo,
>
> Since we need a majority to maintain a quorum when you lost 2 mons, you
> only had 50% available and lost quorum. This is why all recommendations
> specify having an odd number of mons. As you do not get any added
> availability with 4 instead of 3. If you had 5 mons, you can lose two
> without losing availability.
>
>
> On Thu, Nov 3, 2022, 2:55 PM Murilo Morais  wrote:
>
>> Good afternoon everyone!
>>
>> I have a lab with 4 mons, I was testing the behavior in case a certain
>> amount of hosts went offline, as soon as the second one went offline
>> everything stopped. It would be interesting if there was a fifth node to
>> ensure that, if two fall, everything will work, but why did everything
>> stop
>> with only 2 nodes when if there were 3 nodes in the cluster and one fell,
>> everything would still be working? Is there no way to get this behavior
>> with 4 nodes?
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph filesystem stuck in read only

2022-11-04 Thread Galzin Rémi



Hi,
i'm looking for some help/ideas/advices in order to solve the problem 
that occurs on my metadata

server after the server reboot.
"Ceph status" warns about my MDS being "read only" but the fileystem and 
the data seem healthy.
It is still possible to access the content of my cephfs volumes since 
it's read only but i don't know how

to make my filesystem writable again.

Logs keeps showing the same error when i restart the MDS server :

2022-11-04T11:50:14.506+0100 7fbbf83c2700  1 mds.0.6872 handle_mds_map 
state change up:reconnect --> up:rejoin

2022-11-04T11:50:14.510+0100 7fbbf83c2700  1 mds.0.6872 rejoin_start
2022-11-04T11:50:14.510+0100 7fbbf83c2700  1 mds.0.6872 
rejoin_joint_start

2022-11-04T11:50:14.702+0100 7fbbf83c2700  1 mds.0.6872 rejoin_done
2022-11-04T11:50:15.546+0100 7fbbf83c2700  1 mds.node3-5 Updating MDS 
map to version 6881 from mon.3
2022-11-04T11:50:15.546+0100 7fbbf83c2700  1 mds.0.6872 handle_mds_map i 
am now mds.0.6872
2022-11-04T11:50:15.546+0100 7fbbf83c2700  1 mds.0.6872 handle_mds_map 
state change up:rejoin --> up:active
2022-11-04T11:50:15.546+0100 7fbbf83c2700  1 mds.0.6872 recovery_done -- 
successful recovery!

2022-11-04T11:50:15.550+0100 7fbbf83c2700  1 mds.0.6872 active_start
2022-11-04T11:50:15.558+0100 7fbbf83c2700  1 mds.0.6872 cluster 
recovered.
2022-11-04T11:50:18.190+0100 7fbbf5bbd700 -1 mds.pinger is_rank_lagging: 
rank=0 was never sent ping request.
2022-11-04T11:50:18.190+0100 7fbbf5bbd700 -1 mds.pinger is_rank_lagging: 
rank=1 was never sent ping request.
2022-11-04T11:50:18.554+0100 7fbbf23b6700  1 
mds.0.cache.dir(0x106cf14) commit error -22 v 1933183
2022-11-04T11:50:18.554+0100 7fbbf23b6700 -1 log_channel(cluster) log 
[ERR] : failed to commit dir 0x106cf14 object, errno -22
2022-11-04T11:50:18.554+0100 7fbbf23b6700 -1 mds.0.6872 unhandled write 
error (22) Invalid argument, force readonly...
2022-11-04T11:50:18.554+0100 7fbbf23b6700  1 mds.0.cache force file 
system read-only
2022-11-04T11:50:18.554+0100 7fbbf23b6700  0 log_channel(cluster) log 
[WRN] : force file system read-only


More info:

  cluster:
id: f36b996f-221d-4bcb-834b-19fc20bcad6b
health: HEALTH_WARN
1 MDSs are read only
1 MDSs behind on trimming

  services:
mon: 5 daemons, quorum node2-4,node2-5,node3-4,node3-5,node1-1 (age 
22h)
mgr: node2-4(active, since 28h), standbys: node2-5, node3-4, 
node3-5, node1-1

mds: 3/3 daemons up, 3 standby
osd: 112 osds: 112 up (since 22h), 112 in (since 2w)

  data:
volumes: 2/2 healthy
pools:   12 pools, 529 pgs
objects: 8.54M objects, 1.9 TiB
usage:   7.8 TiB used, 38 TiB / 46 TiB avail
pgs: 491 active+clean
 29  active+clean+snaptrim
 9   active+clean+snaptrim_wait

All MDSs, MONs and OSDs are in version 16.2.9.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Question about quorum

2022-11-04 Thread Janne Johansson
Den fre 4 nov. 2022 kl 13:37 skrev Murilo Morais :
> Hi Tyler, thanks for clarifying, it makes total sense now.
> Hypothetically, if there are any failures and most stop, how can I
> re-initialize the cluster in its current state or what can be done in this
> kind of case?
>

Just add one more mon so you have 5?

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade/migrate host operating system for ceph nodes (CentOS/Rocky)

2022-11-04 Thread Jimmy Spets
I have upgraded the majority of the nodes in a cluster that I manage
from CentOS 8.6 to AlmaLinux 9.

We have done the upgrade by emptying one node at a time and then
reinstalling and bringing it back into the cluster.

With AlmaLinux 9 I install the default "Server without GUI" packages
and run with default SE Linux and firewall settings with good results.

Before starting the upgrade we first upgraded to Ceph 17.2.3.

On a new empty node I first add cephadm and then run "cephadm add-repo
--release quincy", then install "ceph-common", "cephadm" and the
dependencies with dnf.

I have been doing the upgrade very slowly on purpose over several
weeks and it has not been an issue for the users, the upgrade is not
finished yet, 3 storage nodes are still in progress and my rados
gateways will be last.

/Jimmy

On Thu, Nov 3, 2022 at 4:02 PM Prof. Dr. Christian Dietrich
 wrote:
>
>
> Hi all,
>
> we're running a ceph cluster with v15.2.17 and cephadm on various CentOS
> hosts. Since CentOS 8.x is EOL, we'd like to upgrade/migrate/reinstall
> the OS, possibly migrating to Rocky or CentOS stream:
>
> host | CentOS   | Podman
> -|--|---
> osd* | 7.9.2009 | 1.6.4   x5
> osd* | 8.4.2105 | 3.0.1   x2
> mon0 | 8.4.2105 | 3.2.3
> mon1 | 8.4.2105 | 3.0.1
> mon2 | 8.4.2105 | 3.0.1
> mds* | 7.9.2009 | 1.6.4   x2
>
> We have a few specific questions:
> 1) Does anyone have experience using Rocky Linux 8 or 9 or CentOS stream
> with ceph? Rocky is not mentioned specifically in the cephadm docs [2].
>
> 2) Is the Podman compatibility list [1] still up to date? CentOS Stream
> 8 as of 2022-10-19 appears to have Podman version 4.x, IIRC. 4.x does
> not appear in the compatibility table. Anyone using Podman 4.x
> successfully (with which ceph version)?
>
> Thanks in advance,
>
> Chris
>
>
> [1]:
> https://docs.ceph.com/en/quincy/cephadm/compatibility/#compatibility-with-podman-versions
>
> [2]:
> https://docs.ceph.com/en/quincy/cephadm/install/#cephadm-install-distros
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs are not utilized evenly

2022-11-04 Thread Joseph Mundackal
Hi Denis,

can you share the following data points?

ceph osd df tree (to see how the osd's are distributed)
ceph osd crush rule dump (to see what your ec rule looks like)
ceph osd pool ls detail (to see the pools and pools to crush rule mapping
and pg nums)

Also
"optimize_result": "Unable to find further optimization, or pool(s)
pg_num is decreasing, or distribution is already perfect
is the auto scaler currently adjusting your pg counts?

-Joseph

On Wed, Nov 2, 2022 at 5:01 PM Denis Polom  wrote:

> Hi Joseph,
>
> thank you for answer. But if I'm looking correctly to 'ceph osd df' output
> I posted I see there are about 195 PGs per OSD.
>
> There are 608 OSDs in the pool, which is the only data pool. What I have
> calculated - PG calc says that PG number is fine.
>
>
> On 11/1/22 14:03, Joseph Mundackal wrote:
>
> If the GB per pg is high, the balancer module won't be able to help.
>
> Your pg count per osd also looks low (30's), so increasing pgs per pool
> would help with both problems.
>
> You can use the pg calculator to determine which pools need what
>
> On Tue, Nov 1, 2022, 08:46 Denis Polom  wrote:
>
>> Hi
>>
>> I observed on my Ceph cluster running latest Pacific that same size OSDs
>> are utilized differently even if balancer is running and reports status
>> as perfectly balanced.
>>
>> {
>>  "active": true,
>>  "last_optimize_duration": "0:00:00.622467",
>>  "last_optimize_started": "Tue Nov  1 12:49:36 2022",
>>  "mode": "upmap",
>>  "optimize_result": "Unable to find further optimization, or pool(s)
>> pg_num is decreasing, or distribution is already perfect",
>>  "plans": []
>> }
>>
>> balancer settings for upmap are:
>>
>>mgr   advanced
>> mgr/balancer/mode   upmap
>>mgr   advanced mgr/balancer/upmap_max_deviation
>> 1
>>mgr   advanced mgr/balancer/upmap_max_optimizations
>> 20
>>
>> It's obvious that utilization is not same (difference is about 1TB) from
>> command `ceph osd df`. Following is just a partial output:
>>
>> ID   CLASS  WEIGHTREWEIGHT  SIZE RAW USE  DATA OMAP
>> META AVAIL%USE   VAR   PGS  STATUS
>>0hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   3.0 MiB
>> 37 GiB  3.6 TiB  78.09  1.05  196  up
>> 124hdd  18.00020   1.0   16 TiB   12 TiB   12 TiB   0 B   32
>> GiB  4.7 TiB  71.20  0.96  195  up
>> 157hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   5.3 MiB   35
>> GiB  3.7 TiB  77.67  1.05  195  up
>>1hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   2.0 MiB
>> 35 GiB  3.7 TiB  77.69  1.05  195  up
>> 243hdd  18.00020   1.0   16 TiB   12 TiB   12 TiB   0 B   31
>> GiB  4.7 TiB  71.16  0.96  195  up
>> 244hdd  18.00020   1.0   16 TiB   12 TiB   12 TiB   0 B   31
>> GiB  4.7 TiB  71.19  0.96  195  up
>> 245hdd  18.00020   1.0   16 TiB   12 TiB   12 TiB   0 B   32
>> GiB  4.7 TiB  71.55  0.96  196  up
>> 246hdd  18.00020   1.0   16 TiB   12 TiB   12 TiB   0 B   31
>> GiB  4.7 TiB  71.17  0.96  195  up
>> 249hdd  18.00020   1.0   16 TiB   12 TiB   12 TiB   0 B   30
>> GiB  4.7 TiB  71.18  0.96  195  up
>> 500hdd  18.00020   1.0   16 TiB   12 TiB   12 TiB   0 B   30
>> GiB  4.7 TiB  71.19  0.96  195  up
>> 501hdd  18.00020   1.0   16 TiB   12 TiB   12 TiB   0 B   31
>> GiB  4.7 TiB  71.57  0.96  196  up
>> 502hdd  18.00020   1.0   16 TiB   12 TiB   12 TiB   0 B   31
>> GiB  4.7 TiB  71.18  0.96  195  up
>> 532hdd  18.00020   1.0   16 TiB   12 TiB   12 TiB   0 B   31
>> GiB  4.7 TiB  71.16  0.96  195  up
>> 549hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   576 KiB   36
>> GiB  3.7 TiB  77.70  1.05  195  up
>> 550hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   3.8 MiB   36
>> GiB  3.7 TiB  77.67  1.05  195  up
>> 551hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   2.4 MiB   35
>> GiB  3.7 TiB  77.68  1.05  195  up
>> 552hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   5.5 MiB   35
>> GiB  3.7 TiB  77.69  1.05  195  up
>> 553hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   5.1 MiB   37
>> GiB  3.6 TiB  77.71  1.05  195  up
>> 554hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   967 KiB   36
>> GiB  3.6 TiB  77.71  1.05  195  up
>> 555hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   1.3 MiB   36
>> GiB  3.6 TiB  78.08  1.05  196  up
>> 556hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   4.7 MiB   36
>> GiB  3.6 TiB  78.10  1.05  196  up
>> 557hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   2.4 MiB   36
>> GiB  3.7 TiB  77.69  1.05  195  up
>> 558hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   4.5 MiB   36
>> GiB  3.6 TiB  77.72  1.05  195  up
>> 559hdd  18.00020   1.0   16 TiB   13 TiB   13 TiB   1.5 MiB   35
>> GiB  3.6 TiB  78.09  1.05  196 

[ceph-users] Re: ceph filesystem stuck in read only

2022-11-04 Thread Ramana Krisna Venkatesh Raja
On Fri, Nov 4, 2022 at 9:36 AM Galzin Rémi  wrote:
>
>
> Hi,
> i'm looking for some help/ideas/advices in order to solve the problem
> that occurs on my metadata
> server after the server reboot.

You rebooted a MDS's host and your file system became read-only? Was
the Ceph cluster healthy before reboot? Any issues with the MDSs,
OSDs? Did this happen after an upgrade?

> "Ceph status" warns about my MDS being "read only" but the fileystem and
> the data seem healthy.
> It is still possible to access the content of my cephfs volumes since
> it's read only but i don't know how
> to make my filesystem writable again.
>
> Logs keeps showing the same error when i restart the MDS server :
>
> 2022-11-04T11:50:14.506+0100 7fbbf83c2700  1 mds.0.6872 handle_mds_map
> state change up:reconnect --> up:rejoin
> 2022-11-04T11:50:14.510+0100 7fbbf83c2700  1 mds.0.6872 rejoin_start
> 2022-11-04T11:50:14.510+0100 7fbbf83c2700  1 mds.0.6872
> rejoin_joint_start
> 2022-11-04T11:50:14.702+0100 7fbbf83c2700  1 mds.0.6872 rejoin_done
> 2022-11-04T11:50:15.546+0100 7fbbf83c2700  1 mds.node3-5 Updating MDS
> map to version 6881 from mon.3
> 2022-11-04T11:50:15.546+0100 7fbbf83c2700  1 mds.0.6872 handle_mds_map i
> am now mds.0.6872
> 2022-11-04T11:50:15.546+0100 7fbbf83c2700  1 mds.0.6872 handle_mds_map
> state change up:rejoin --> up:active
> 2022-11-04T11:50:15.546+0100 7fbbf83c2700  1 mds.0.6872 recovery_done --
> successful recovery!
> 2022-11-04T11:50:15.550+0100 7fbbf83c2700  1 mds.0.6872 active_start
> 2022-11-04T11:50:15.558+0100 7fbbf83c2700  1 mds.0.6872 cluster
> recovered.
> 2022-11-04T11:50:18.190+0100 7fbbf5bbd700 -1 mds.pinger is_rank_lagging:
> rank=0 was never sent ping request.
> 2022-11-04T11:50:18.190+0100 7fbbf5bbd700 -1 mds.pinger is_rank_lagging:
> rank=1 was never sent ping request.
> 2022-11-04T11:50:18.554+0100 7fbbf23b6700  1
> mds.0.cache.dir(0x106cf14) commit error -22 v 1933183
> 2022-11-04T11:50:18.554+0100 7fbbf23b6700 -1 log_channel(cluster) log
> [ERR] : failed to commit dir 0x106cf14 object, errno -22
> 2022-11-04T11:50:18.554+0100 7fbbf23b6700 -1 mds.0.6872 unhandled write
> error (22) Invalid argument, force readonly...
> 2022-11-04T11:50:18.554+0100 7fbbf23b6700  1 mds.0.cache force file
> system read-only

The MDS is unable to write a metadata object to the OSD.  Set
debug_mds=20 and debug_objecter=20 for the MDS, and capture the MDS
logs when this happens for more details.
e.g.,
$ ceph config set mds. debug_mds 20

Also, check the OSD logs when you're hitting this issue.

You can then reset the MDS log level.  You can share the relevant MDS
and OSD logs using,
https://docs.ceph.com/en/pacific/man/8/ceph-post-file/

> 2022-11-04T11:50:18.554+0100 7fbbf23b6700  0 log_channel(cluster) log
> [WRN] : force file system read-only
>
> More info:
>
>cluster:
>  id: f36b996f-221d-4bcb-834b-19fc20bcad6b
>  health: HEALTH_WARN
>  1 MDSs are read only
>  1 MDSs behind on trimming
>
>services:
>  mon: 5 daemons, quorum node2-4,node2-5,node3-4,node3-5,node1-1 (age
> 22h)
>  mgr: node2-4(active, since 28h), standbys: node2-5, node3-4,
> node3-5, node1-1
>  mds: 3/3 daemons up, 3 standby
>  osd: 112 osds: 112 up (since 22h), 112 in (since 2w)
>
>data:
>  volumes: 2/2 healthy
>  pools:   12 pools, 529 pgs
>  objects: 8.54M objects, 1.9 TiB
>  usage:   7.8 TiB used, 38 TiB / 46 TiB avail
>  pgs: 491 active+clean
>   29  active+clean+snaptrim
>   9   active+clean+snaptrim_wait
>
> All MDSs, MONs and OSDs are in version 16.2.9.
>

What are the outputs of `ceph fs status` and `ceph fs dump`?

-Ramana


> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io