.
Hope this helps,
Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
--
_
prof. dr. Andrej Filipcic, E-mail: andrej.filip
Technology
____
From: Andrej Filipcic
Sent: Monday, March 6, 2023 8:51 AM
To: ceph-users
Subject: [ceph-users] rbd on EC pool with fast and extremely slow writes/reads
Hi,
I have a problem on one of ceph clusters I do not understand.
ceph 17.2.5 on 17 servers, 400 HD
st,
Andrej
--
_
prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
SI-1001 Ljubljana, Slovenia
Tel.: +386-1-477-3674Fax: +386-
4, 2023 at 7:36 AM Andrej Filipcic wrote:
Hi,
on our large ceph cluster with 60 servers, 1600 OSDs, we have observed
that small system nvmes are wearing out rapidly. Our monitoring shows
mon writes on average about 10MB/s to store.db. For small system nvmes
of 250GB and DWPD of ~1, this turn
,
Andrej
--
_
prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
SI-1001 Ljubljana, Slovenia
Tel.: +386-1-477-3674
On 2/10/23 08:50, Andrej Filipcic wrote:
FYI, the damage went away after a couple of days, not quite sure how.
Best,
Andrej
Hi,
there is mds damage on our cluster, version 17.2.5,
[
{
"damage_type": "backtrace",
"id": 2287166658,
&quo
ntry
#0x1/hpc/home/euliz/.Xauthority [568,head] auth
REMOTE(reg) (dversion lock) pv=0 v=4425667830 ino=(nil) state=1073741824
| ptrwaiter=1 0x5560eb33a780]
Any clue how to fix this? or remove the file from namespace? it is not
important...
Thanks,
Andrej
--
______
8:
0008 R09: 00224b5341545f52
2022-08-15T20:11:02+02:00 cn0539 kernel: R10: 0025 R11:
0246 R12: 55d6b3dc7f50
--
_________
prof. dr. Andrej Filipcic, E-mail:andrej.filip...@ijs.si
Department of
On 14/02/2022 16:07, Igor Fedotov wrote:
Hi Andrej,
On 2/12/2022 9:56 AM, Andrej Filipcic wrote:
On 11/02/2022 15:22, Igor Fedotov wrote:
Hi Andrej,
you might want to set debug_bluestore and debug_bluefs to 10 and
check what's happening during the startup...
Alternatively you migh
compactions": 1, "output_compression": "NoCompression",
"num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0,
"lsm_state": [0, 1, 30, 348, 1853, 0, 0]}
2022-02-12T07:50:18.827+0100 7ff0ad32b700 4 rocksdb: EVENT_LOG_v1
{"time
ar in size? Is there some fsck enabled during OSD startup?
Zitat von Andrej Filipcic :
Hi,
with 16.2.7, some OSDs are very slow to start, eg it takes ~30min
for an hdd (12TB, 5TB used) to become active. After initialization,
there is 20-40min of extreme reading at ~150MB/s from the OSD, just
:46:29Z SUBDEBUG Upgrade: ceph-base-2:16.2.5-0.el8.x86_64
2022-02-09T09:38:42+0100 SUBDEBUG Upgrade: ceph-base-2:16.2.7-0.el8.x86_64
Best regards,
Andrej
--
_
prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si
Department of
t the EC recovery time is quite
long. I use 16+3 erasure, so even with 5 or 6 failed OSDs, the data
loss probability is pretty low.
Best regards,
Andrej
Thanks,
Igor
On 12/20/2021 3:25 PM, Andrej Filipcic wrote:
On 12/20/21 13:14, Igor Fedotov wrote:
On 12/20/2021 2:58 PM, Andrej Fil
On 12/20/21 13:14, Igor Fedotov wrote:
On 12/20/2021 2:58 PM, Andrej Filipcic wrote:
On 12/20/21 12:47, Igor Fedotov wrote:
Thanks for the info.
Just in case - is write caching disabled for the disk in question?
What's the output for "hdparm -W " ?
no, it is enabled.
Andrej
Thanks,
Igor
On 12/20/2021 1:13 PM, Andrej Filipcic wrote:
On 12/20/21 10:47, Igor Fedotov wrote:
On 12/20/2021 12:26 PM, Andrej Filipcic wrote:
On 12/20/21 10:09, Igor Fedotov wrote:
Hi Andrej,
3) Please set debug-bluefs to 20, retry the OSD start and share
the log.
http://www-f9
On 12/20/21 10:47, Igor Fedotov wrote:
On 12/20/2021 12:26 PM, Andrej Filipcic wrote:
On 12/20/21 10:09, Igor Fedotov wrote:
Hi Andrej,
3) Please set debug-bluefs to 20, retry the OSD start and share the
log.
http://www-f9.ijs.si/~andrej/ceph-osd.611.log-20211220-short.gz
http://www-f9
RRENT
000 909f d59e f778 4f50 acb0 b1ea 59a2 9e90
010
Thanks,
Andrej
Thanks,
Igor
On 12/20/2021 11:17 AM, Andrej Filipcic wrote:
Hi,
When upgrading to 16.2.7 from 16.2.6, 8 out of ~1600 OSDs failed to
start. The first 16.2.7 startup crashes here:
2021-12-19T09:52:34.128
Hi,
attachment stripped. Here is the log:
http://www-f9.ijs.si/~andrej/ceph-osd.611.log-20211220-short.gz
Andrej
On 12/20/21 09:17, Andrej Filipcic wrote:
Hi,
When upgrading to 16.2.7 from 16.2.6, 8 out of ~1600 OSDs failed to
start. The first 16.2.7 startup crashes here:
2021-12-19T09
.
(resending with shortened log)
Best regards,
Andrej
--
_
prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
SI-1001
On 07/12/2021 10:56, Stefan Kooman wrote:
On 12/7/21 09:52, Andrej Filipcic wrote:
Hi,
I am trying to mount cephfs on iipv4, where ceph is in dual stack
mode, but it fails with:
[1692264.203560] libceph: wrong peer, want (1)153.5.68.28:6789/0, got
(1)[2001:1470:ff94:d:153:5:68:28]:6789/0
:68:4]:6789/0,v2:153.5.68.4:3300/0,v1:153.5.68.4:6789/0]
mon.px01
2:
[v2:[2001:1470:ff94:d:153:5:68:28]:3300/0,v1:[2001:1470:ff94:d:153:5:68:28]:6789/0,v2:153.5.68.28:3300/0,v1:153.5.68.28:6789/0]
mon.px04
--
_
prof. dr. Andrej
d an email to ceph-users-le...@ceph.io
--
Enrico Bocchi
CERN European Laboratory for Particle Physics
IT - Storage Group - General Storage Services
Mailbox: G20500 - Office: 31-2-010
1211 Genève 23
Switzerland
--
_____
prof. dr. A
tried it with elrepo 5.15.5 but the machine also hanged with no tuning.
Will report how it goes.
Thanks,
Andrej
On 29/11/2021 19:52, Jeff Layton wrote:
On Fri, 2021-11-26 at 09:11 +0100, Andrej Filipcic wrote:
Hi,
we are doing some extensive stress testing of cephfs client throughput.
Ceph is
07f8900018220
2021-11-25 22:12:40 [ 3322.704783] RBP: 7f8a05d9d100 R08:
R09: 000459723280
2021-11-25 22:12:40 [ 3322.711917] R10: 7f8e687103a5 R11:
0246 R12: 7f8900018220
2021-11-25 22:12:40 [ 3322.719045] R13: 7f8c540c7b48 R14:
7f8a05d9d118 R15: 00007f8c54
debug this? Given that this has been
encountered in previous 16.2.* versions, it doesn't sound like a
regression in 16.2.6 to me, rather an issue in pacific. In any case,
we'll prioritize fixing it.
Thanks,
Neha
On Mon, Sep 20, 2021 at 8:03 AM Andrej Filipcic wrote:
On 20/09/2021 16:0
. Been HEALTH OK for
a week now after it finished refilling the drive.
On 9/19/21 10:47 AM, Andrej Filipcic wrote:
2021-09-19T15:47:13.610+0200 7f8bc1f0e700 2 rocksdb:
[db_impl/db_impl_compaction_flush.cc:2344] Waiting after background
compaction error: Corruption: block checksum mismatch
attached it, but did not work, here it is:
https://www-f9.ijs.si/~andrej/ceph/ceph-osd.1049.log-20210920.gz
Cheers,
Andrej
On 9/20/21 9:41 AM, Dan van der Ster wrote:
On Sun, Sep 19, 2021 at 4:48 PM Andrej Filipcic wrote:
I have attached a part of the osd log.
Hi Andrej. Did you mean to
old 262144
mds advanced mds_recall_global_max_decay_threshold 131072
mds advanced mds_recall_max_caps 3
mds advanced mds_recall_max_decay_rate 1.50
mds advanced mds_recall_max_decay_threshold 131072
mds advanced mds_recall_warning
, which is not very far from HDD throughput.
Regards,
Andrej
--
_
prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
On 19/03/2021 19:41, Stefan Kooman wrote:
On 3/19/21 7:20 PM, Andrej Filipcic wrote:
Hi,
I am testing 15.2.10 on a large cluster (RH8). cephfs pool (size=1)
with 122 nvme OSDs works fine till the number of clients is
relatively low.
Writing from 400 kernel clients (ior benchmark), 8 streams
did not help.
Restarting OSDs recovers the situation for few minutes.
Writing to HDD pool with 1500 HDDs does not have any issues at all under
same conditions.
Any hints, settings to improve this?
Cheers,
Andrej
--
_
prof. dr. And
just confirming, crashes are gone with gperftools-libs-2.7-8.el8.x86_64.rpm
Cheers,
Andrej
On 09/03/2021 16:52, Andrej Filipcic wrote:
Hi,
I was checking that bug yesterday, yes, and it smells the same.
I will give a try to the epel one,
Thanks
Andrej
On 09/03/2021 16:44, Dan van der
/issues/49618
If so, there is a fixed (downgraded) version in epel-testing now.
Cheers, Dan
On Tue, Mar 9, 2021 at 4:36 PM Andrej Filipcic wrote:
Hi,
under heavy load our cluster is experiencing frequent OSD crashes. Is
this a known bug or should I report it? Any workarounds? It looks to be
e05700 / safe_timer
7fc129e07700 / ms_dispatch
7fc12ca33700 / bstore_mempool
7fc133446700 / safe_timer
7fc1374bf700 / msgr-worker-2
7fc137cc0700 / msgr-worker-1
7fc1384c1700 / msgr-worker-0
max_recent 1
max_new 1000
--
_______
works but they can't read/write?
Regards,
Eugen
Zitat von Andrej Filipcic :
Hi,
on octopus 15.2.4 I have an issue with cephfs tag auth. The
following works fine:
client.f9desktop
key:
caps: [mds] allow rw
caps: [mon] allow r
caps: [osd] allow rw
the only way to
refresh it is to remount the filesystem. working tag would solve it.
Best regards,
Andrej
--
_
prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si
Department of Experimental High Energy Physics - F9
Jozef
__
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
--
_____
prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si
Department of Experimental High Energy Physi
cache is practically not used.
I am testing it on 5.6.13 kernel with copyfrom mount option and on
octopus 15.2.2 with bluefs_preextend_wal_files=false
Cheers,
Andrej
On 2020-05-28 14:07, Andrej Filipcic wrote:
Thanks a lot, I will give it a try, I plan to use that in a very
controlled e
Thanks a lot, I will give it a try, I plan to use that in a very
controlled environment anyway.
Best regards,
Andrej
On 2020-05-28 12:21, Luis Henriques wrote:
Andrej Filipcic writes:
Hi,
I have two directories, cache_fast and cache_slow, and I would like to move the
least used files
ayout.
The only option I see at this point is to "cp" the file to a new dir and
removing it from the old one, but this would involve client side
operations and can be very slow.
Is there any better way, that would work ceph server side?
Best regards,
Andrej
--
_____
lla
4.19.60.
Is there any way to force to propagate new auth capabilities without
remounting the fs?
Thanks,
Andrej
--
_
prof. dr. Andrej Filipcic, E-mail: andrej.filip...@ijs.si
Department of Experimental High Energy Phys
41 matches
Mail list logo