[ceph-users] Re: mclock scheduler kills clients IOs

2024-09-19 Thread Andrej Filipcic
. Hope this helps, Daniel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- _ prof. dr. Andrej Filipcic, E-mail: andrej.filip

[ceph-users] Re: rbd on EC pool with fast and extremely slow writes/reads

2023-03-09 Thread Andrej Filipcic
Technology ____ From: Andrej Filipcic Sent: Monday, March 6, 2023 8:51 AM To: ceph-users Subject: [ceph-users] rbd on EC pool with fast and extremely slow writes/reads Hi, I have a problem on one of ceph clusters I do not understand. ceph 17.2.5 on 17 servers, 400 HD

[ceph-users] rbd on EC pool with fast and extremely slow writes/reads

2023-03-06 Thread Andrej Filipcic
st, Andrej -- _ prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674Fax: +386-

[ceph-users] Re: mons excessive writes to local disk and SSD wearout

2023-02-27 Thread Andrej Filipcic
4, 2023 at 7:36 AM Andrej Filipcic wrote: Hi, on our large ceph cluster with 60 servers, 1600 OSDs, we have observed that small system nvmes are wearing out rapidly. Our monitoring shows mon writes on average about 10MB/s to store.db. For small system nvmes of 250GB and DWPD of ~1, this turn

[ceph-users] mons excessive writes to local disk and SSD wearout

2023-02-24 Thread Andrej Filipcic
, Andrej -- _ prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674

[ceph-users] Re: mds damage cannot repair

2023-02-13 Thread Andrej Filipcic
On 2/10/23 08:50, Andrej Filipcic wrote: FYI, the damage went away after a couple of days, not quite sure how. Best, Andrej Hi, there is mds damage on our cluster, version 17.2.5, [    {    "damage_type": "backtrace",    "id": 2287166658,    &quo

[ceph-users] mds damage cannot repair

2023-02-09 Thread Andrej Filipcic
ntry #0x1/hpc/home/euliz/.Xauthority [568,head] auth REMOTE(reg) (dversion lock) pv=0 v=4425667830 ino=(nil) state=1073741824 | ptrwaiter=1 0x5560eb33a780] Any clue how to fix this? or remove the file from namespace? it is not important... Thanks, Andrej -- ______

[ceph-users] ceph kernel client RIP when quota exceeded

2022-08-16 Thread Andrej Filipcic
8: 0008 R09: 00224b5341545f52 2022-08-15T20:11:02+02:00 cn0539 kernel: R10: 0025 R11: 0246 R12: 55d6b3dc7f50 -- _________ prof. dr. Andrej Filipcic, E-mail:andrej.filip...@ijs.si Department of

[ceph-users] Re: slow pacific osd startup

2022-02-14 Thread Andrej Filipcic
On 14/02/2022 16:07, Igor Fedotov wrote: Hi Andrej, On 2/12/2022 9:56 AM, Andrej Filipcic wrote: On 11/02/2022 15:22, Igor Fedotov wrote: Hi Andrej, you might want to set debug_bluestore and debug_bluefs to 10 and check what's happening during the startup... Alternatively you migh

[ceph-users] Re: slow pacific osd startup

2022-02-11 Thread Andrej Filipcic
compactions": 1, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 1, 30, 348, 1853, 0, 0]} 2022-02-12T07:50:18.827+0100 7ff0ad32b700  4 rocksdb: EVENT_LOG_v1 {"time

[ceph-users] Re: slow pacific osd startup

2022-02-11 Thread Andrej Filipcic
ar in size? Is there some fsck enabled during OSD startup? Zitat von Andrej Filipcic : Hi, with 16.2.7, some OSDs are very slow to start, eg it takes ~30min for an hdd (12TB, 5TB used) to become active. After initialization, there is 20-40min of extreme reading at ~150MB/s from the OSD, just

[ceph-users] slow pacific osd startup

2022-02-11 Thread Andrej Filipcic
:46:29Z SUBDEBUG Upgrade: ceph-base-2:16.2.5-0.el8.x86_64 2022-02-09T09:38:42+0100 SUBDEBUG Upgrade: ceph-base-2:16.2.7-0.el8.x86_64 Best regards, Andrej -- _ prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si Department of

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
t the EC recovery time is quite long. I use 16+3 erasure, so even with 5 or 6  failed OSDs, the data loss probability is pretty low. Best regards, Andrej Thanks, Igor On 12/20/2021 3:25 PM, Andrej Filipcic wrote: On 12/20/21 13:14, Igor Fedotov wrote: On 12/20/2021 2:58 PM, Andrej Fil

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
On 12/20/21 13:14, Igor Fedotov wrote: On 12/20/2021 2:58 PM, Andrej Filipcic wrote: On 12/20/21 12:47, Igor Fedotov wrote: Thanks for the info. Just in case - is write caching disabled for the disk in question? What's the output for "hdparm -W " ? no, it is enabled.

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
Andrej Thanks, Igor On 12/20/2021 1:13 PM, Andrej Filipcic wrote: On 12/20/21 10:47, Igor Fedotov wrote: On 12/20/2021 12:26 PM, Andrej Filipcic wrote: On 12/20/21 10:09, Igor Fedotov wrote: Hi Andrej, 3) Please set debug-bluefs to 20, retry the OSD start and share the log. http://www-f9

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
On 12/20/21 10:47, Igor Fedotov wrote: On 12/20/2021 12:26 PM, Andrej Filipcic wrote: On 12/20/21 10:09, Igor Fedotov wrote: Hi Andrej, 3) Please set debug-bluefs to 20, retry the OSD start and share the log. http://www-f9.ijs.si/~andrej/ceph-osd.611.log-20211220-short.gz http://www-f9

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
RRENT 000 909f d59e f778 4f50 acb0 b1ea 59a2 9e90 010 Thanks, Andrej Thanks, Igor On 12/20/2021 11:17 AM, Andrej Filipcic wrote: Hi, When upgrading to 16.2.7 from 16.2.6, 8 out of ~1600 OSDs failed to start. The first 16.2.7 startup crashes here: 2021-12-19T09:52:34.128

[ceph-users] Re: 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
Hi, attachment stripped. Here is the log: http://www-f9.ijs.si/~andrej/ceph-osd.611.log-20211220-short.gz Andrej On 12/20/21 09:17, Andrej Filipcic wrote: Hi, When upgrading to 16.2.7 from 16.2.6, 8 out of ~1600 OSDs failed to start. The first 16.2.7 startup crashes here: 2021-12-19T09

[ceph-users] 16.2.7 pacific rocksdb Corruption: CURRENT

2021-12-20 Thread Andrej Filipcic
. (resending with shortened log) Best regards, Andrej -- _ prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001

[ceph-users] Re: mount.ceph ipv4 fails on dual-stack ceph

2021-12-07 Thread Andrej Filipcic
On 07/12/2021 10:56, Stefan Kooman wrote: On 12/7/21 09:52, Andrej Filipcic wrote: Hi, I am trying to mount cephfs on iipv4, where ceph is in dual stack mode, but it fails with: [1692264.203560] libceph: wrong peer, want (1)153.5.68.28:6789/0, got (1)[2001:1470:ff94:d:153:5:68:28]:6789/0

[ceph-users] mount.ceph ipv4 fails on dual-stack ceph

2021-12-07 Thread Andrej Filipcic
:68:4]:6789/0,v2:153.5.68.4:3300/0,v1:153.5.68.4:6789/0] mon.px01 2: [v2:[2001:1470:ff94:d:153:5:68:28]:3300/0,v1:[2001:1470:ff94:d:153:5:68:28]:6789/0,v2:153.5.68.28:3300/0,v1:153.5.68.28:6789/0] mon.px04 -- _ prof. dr. Andrej

[ceph-users] Re: Unpurgeable rbd image from trash

2021-12-06 Thread Andrej Filipcic
d an email to ceph-users-le...@ceph.io -- Enrico Bocchi CERN European Laboratory for Particle Physics IT - Storage Group - General Storage Services Mailbox: G20500 - Office: 31-2-010 1211 Genève 23 Switzerland -- _____ prof. dr. A

[ceph-users] Re: cephfs kernel 5.10.78 client crashes

2021-11-29 Thread Andrej Filipcic
tried it with elrepo 5.15.5 but the machine also hanged with no tuning. Will report how it goes. Thanks, Andrej On 29/11/2021 19:52, Jeff Layton wrote: On Fri, 2021-11-26 at 09:11 +0100, Andrej Filipcic wrote: Hi, we are doing some extensive stress testing of cephfs client throughput. Ceph is

[ceph-users] cephfs kernel 5.10.78 client crashes

2021-11-26 Thread Andrej Filipcic
07f8900018220 2021-11-25 22:12:40 [ 3322.704783] RBP: 7f8a05d9d100 R08: R09: 000459723280 2021-11-25 22:12:40 [ 3322.711917] R10: 7f8e687103a5 R11: 0246 R12: 7f8900018220 2021-11-25 22:12:40 [ 3322.719045] R13: 7f8c540c7b48 R14: 7f8a05d9d118 R15: 00007f8c54

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Andrej Filipcic
debug this? Given that this has been encountered in previous 16.2.* versions, it doesn't sound like a regression in 16.2.6 to me, rather an issue in pacific. In any case, we'll prioritize fixing it. Thanks, Neha On Mon, Sep 20, 2021 at 8:03 AM Andrej Filipcic wrote: On 20/09/2021 16:0

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Andrej Filipcic
. Been HEALTH OK for a week now after it finished refilling the drive. On 9/19/21 10:47 AM, Andrej Filipcic wrote: 2021-09-19T15:47:13.610+0200 7f8bc1f0e700 2 rocksdb: [db_impl/db_impl_compaction_flush.cc:2344] Waiting after background compaction error: Corruption: block checksum mismatch

[ceph-users] Re: rocksdb corruption with 16.2.6

2021-09-20 Thread Andrej Filipcic
attached it, but did not work, here it is: https://www-f9.ijs.si/~andrej/ceph/ceph-osd.1049.log-20210920.gz Cheers, Andrej On 9/20/21 9:41 AM, Dan van der Ster wrote: On Sun, Sep 19, 2021 at 4:48 PM Andrej Filipcic wrote: I have attached a part of the osd log. Hi Andrej. Did you mean to

[ceph-users] rocksdb corruption with 16.2.6

2021-09-19 Thread Andrej Filipcic
old 262144   mds   advanced  mds_recall_global_max_decay_threshold 131072   mds   advanced  mds_recall_max_caps 3   mds   advanced  mds_recall_max_decay_rate 1.50   mds   advanced  mds_recall_max_decay_threshold 131072   mds   advanced  mds_recall_warning

[ceph-users] speeding up EC recovery

2021-06-25 Thread Andrej Filipcic
, which is not very far from HDD throughput. Regards, Andrej -- _ prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000

[ceph-users] Re: high number of kernel clients per osd slow down

2021-03-19 Thread Andrej Filipcic
On 19/03/2021 19:41, Stefan Kooman wrote: On 3/19/21 7:20 PM, Andrej Filipcic wrote: Hi, I am testing 15.2.10 on a large cluster (RH8). cephfs pool (size=1) with 122 nvme OSDs works fine till the number of clients is relatively low. Writing from 400 kernel clients (ior benchmark), 8 streams

[ceph-users] high number of kernel clients per osd slow down

2021-03-19 Thread Andrej Filipcic
did not help. Restarting OSDs recovers the situation for few minutes. Writing to HDD pool with 1500 HDDs does not have any issues at all under same conditions. Any hints, settings to improve this? Cheers, Andrej -- _ prof. dr. And

[ceph-users] Re: OSD crashes create_aligned_in_mempool in 15.2.9 and 14.2.16

2021-03-09 Thread Andrej Filipcic
just confirming, crashes are gone with gperftools-libs-2.7-8.el8.x86_64.rpm Cheers, Andrej On 09/03/2021 16:52, Andrej Filipcic wrote: Hi, I was checking that bug yesterday, yes, and it smells the same. I will give a try to the epel one, Thanks Andrej On 09/03/2021 16:44, Dan van der

[ceph-users] Re: OSD crashes create_aligned_in_mempool in 15.2.9 and 14.2.16

2021-03-09 Thread Andrej Filipcic
/issues/49618 If so, there is a fixed (downgraded) version in epel-testing now. Cheers, Dan On Tue, Mar 9, 2021 at 4:36 PM Andrej Filipcic wrote: Hi, under heavy load our cluster is experiencing frequent OSD crashes. Is this a known bug or should I report it? Any workarounds? It looks to be

[ceph-users] OSD crashes create_aligned_in_mempool in 15.2.9 and 14.2.16

2021-03-09 Thread Andrej Filipcic
e05700 / safe_timer   7fc129e07700 / ms_dispatch   7fc12ca33700 / bstore_mempool   7fc133446700 / safe_timer   7fc1374bf700 / msgr-worker-2   7fc137cc0700 / msgr-worker-1   7fc1384c1700 / msgr-worker-0   max_recent 1   max_new 1000 -- _______

[ceph-users] Re: cephfs tag not working

2020-10-01 Thread Andrej Filipcic
works but they can't read/write? Regards, Eugen Zitat von Andrej Filipcic : Hi, on octopus 15.2.4 I have an issue with cephfs tag auth. The following works fine: client.f9desktop key: caps: [mds] allow rw caps: [mon] allow r caps: [osd] allow rw

[ceph-users] cephfs tag not working

2020-10-01 Thread Andrej Filipcic
the only way to refresh it is to remount the filesystem. working tag would solve it. Best regards, Andrej -- _ prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si Department of Experimental High Energy Physics - F9 Jozef

[ceph-users] Re: ceph mds slow requests

2020-06-10 Thread Andrej Filipcic
__ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- _____ prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si Department of Experimental High Energy Physi

[ceph-users] Re: cephfs - modifying the ceph.file.layout of existing files

2020-05-30 Thread Andrej Filipcic
cache is practically not used. I am testing it on 5.6.13 kernel with copyfrom mount option and on octopus 15.2.2 with bluefs_preextend_wal_files=false Cheers, Andrej On 2020-05-28 14:07, Andrej Filipcic wrote: Thanks a lot, I will give it a try, I plan to use that in a very controlled e

[ceph-users] Re: cephfs - modifying the ceph.file.layout of existing files

2020-05-28 Thread Andrej Filipcic
Thanks a lot, I will give it a try, I plan to use that in a very controlled environment anyway. Best regards, Andrej On 2020-05-28 12:21, Luis Henriques wrote: Andrej Filipcic writes: Hi, I have two directories, cache_fast and cache_slow, and I would like to move the least used files

[ceph-users] cephfs - modifying the ceph.file.layout of existing files

2020-05-28 Thread Andrej Filipcic
ayout. The only option I see at this point is to "cp" the file to a new dir and removing it from the old one, but this would involve client side operations and can be very slow. Is there any better way, that would work ceph server side? Best regards, Andrej -- _____

[ceph-users] changed caps not propagated to kernel cephfs mounts

2020-05-06 Thread Andrej Filipcic
lla 4.19.60. Is there any way to force to propagate new auth capabilities without remounting the fs? Thanks, Andrej -- _ prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si Department of Experimental High Energy Phys