pect something around processing config values.
I've just set the same config setting on a test cluster and restarted an
OSD without problem. So, not sure what is going on there.
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6
m I doing something wrong?
I wonder if they would still crash if the OSD would drop their caches
beforehand. There is support for this in master, but it doesn't look
like it's backported to nautilus: https://tracker.ceph.com/issues/24176
Gr. Stefan
--
| BIT BV https://www.bit.nl/
Hello Igor,
thanks for all your feedback and all your help.
The first thing i'll try is to upgrade a bunch of system from
4.19.66 kernel to 4.19.97 and see what happens.
I'll report back in 7-10 days to verify whether this helps.
Greets,
Stefan
Am 20.01.20 um 13:12 schrieb Igor Fed
480)
Put( Prefix = O key =
0x7f8001cc45c881217262'd_data.4303206b8b4567.9632!='0xfffe'o'
Value size = 510)
on the right size i always see 0xfffeffff on all
failed OSDs.
greets,
Stefan
Am 19.01.20 um 14:07 schrieb Stefan Priebe -
Yes, except that this happens on 8 different clusters with different hw but
same ceph version and same kernel version.
Greets,
Stefan
> Am 19.01.2020 um 11:53 schrieb Igor Fedotov :
>
> So the intermediate summary is:
>
> Any OSD in the cluster can experience interim RocksDB c
ioned PR denotes high memory pressure as potential trigger for these
> read errors. So if such pressure happens the hypothesis becomes more valid.
we already do this heavily and have around 10GB of memory per OSD. Also
no of those machines show any io pressure at all.
All hosts show a constant ra
if nameing support is already removed from the code but in
any case don't try to name it anything else.
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
onth between those failures
- most probably logs are already deleted.
> Also please note that patch you mentioned doesn't fix previous issues
> (i.e. duplicate allocations), it prevents from new ones only.
>
> But fsck should show them if any...
None showed.
Stefan
> Thanks
c_thread()' thread 7f3350a14700 time 2020-01-16
01:10:13.404113
/build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0)
ceph version 12.2.12-11-gd3eae83543
(d3eae83543bffc0fc6c43823feb637fa851b6213) luminous (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, in
(BlueStore::KVSyncThread::entry()+0xd) [0x55e6df8a208d]
4: (()+0x7494) [0x7f8c50190494]
5: (clone()+0x3f) [0x7f8c4f217acf]
all bluestore OSDs are randomly crashing sometimes (once a week).
Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.cep
OSDs and could reduce latency
from 2.5ms to 0.7ms now.
:p
Cheers
Stefan
-Ursprüngliche Nachricht-
Von: Виталий Филиппов
Gesendet: Dienstag 14 Januar 2020 10:28
An: Wido den Hollander ; Stefan Bauer
CC: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] low io with
choosing the wrong new
> value, or we misunderstood what the old value really was and have been
> plotting it wrong all this time.
I think the last one: not plotting what you think you did. We are using
the telegraf plugin from the manager and using "mds.request" from
"ceph_da
Thank you all,
performance is indeed better now. Can now go back to sleep ;)
KR
Stefan
-Ursprüngliche Nachricht-
Von: Виталий Филиппов
Gesendet: Dienstag 14 Januar 2020 10:28
An: Wido den Hollander ; Stefan Bauer
CC: ceph-users@lists.ceph.com
Betreff: Re: [ceph-users] low io
Hi Vitaliy,
thank you for your time. Do you mean
cephx sign messages = false
with "diable signatures" ?
KR
Stefan
-Ursprüngliche Nachricht-
Von: Виталий Филиппов
Gesendet: Dienstag 14 Januar 2020 10:28
An: Wido den Hollander ; Stefan Bauer
CC: ceph-users@list
Hi Stefan,
thank you for your time.
"temporary write through" does not seem to be a legit parameter.
However write through is already set:
root@proxmox61:~# echo "temporary write through" >
/sys/block/sdb/device/scsi_disk/*/cache_type
root@proxmox61:~# ca
Hello,
does anybody have real live experience with externel block db?
Greets,
Stefan
Am 13.01.20 um 08:09 schrieb Stefan Priebe - Profihost AG:
> Hello,
>
> i'm plannung to split the block db to a seperate flash device which i
> also would like to use as an OSD for erasure co
metric is needed to perform calculations to obtain
"avgtime" (sum/avgcount).
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users mailin
Hi Stefan,
Am 13.01.20 um 17:09 schrieb Stefan Bauer:
> Hi,
>
>
> we're playing around with ceph but are not quite happy with the IOs.
>
>
> 3 node ceph / proxmox cluster with each:
>
>
> LSI HBA 3008 controller
>
> 4 x MZILT960HAHQ/007 Samsung
e
on average 13000 iops / read
We're expecting more. :( any ideas or is that all we can expect?
money is not a problem for this test-bed, any ideas howto gain more IOS is
greatly appreciated.
Thank you.
Stefan
___
ceph-users mailing list
nds a minimum size of 140GB per 14TB HDD.
Is there any recommandation of how many osds a single flash device can
serve? The optane ones can do 2000MB/s write + 500.000 iop/s.
Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
4K native
Dual 25gb network
Does it fit? Has anybody experience with the drives? Can we use EC or do we
need to use normal replication?
Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users
> Am 10.01.2020 um 07:10 schrieb Mainor Daly :
>
>
> Hi Stefan,
>
> before I give some suggestions, can you first describe your usecase for which
> you wanna use that setup? Also which aspects are important for you.
It’s just the backup target of another ceph Clus
DB or not?
Since we started using ceph we're mostly subscribed to SSDs - so no
knowlege about HDD in place.
Greets,
Stefan
Am 09.01.20 um 16:49 schrieb Stefan Priebe - Profihost AG:
>
>> Am 09.01.2020 um 16:10 schrieb Wido den Hollander :
>>
>>
>>
>>> O
recordsize: https://blog.programster.org/zfs-record-size,
https://blogs.oracle.com/roch/tuning-zfs-recordsize
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
> Am 09.01.2020 um 16:10 schrieb Wido den Hollander :
>
>
>
>> On 1/9/20 2:27 PM, Stefan Priebe - Profihost AG wrote:
>> Hi Wido,
>>> Am 09.01.20 um 14:18 schrieb Wido den Hollander:
>>>
>>>
>>> On 1/9/20 2:07 PM, Daniel Aberger -
Quoting Kyriazis, George (george.kyria...@intel.com):
>
>
> > On Jan 9, 2020, at 8:00 AM, Stefan Kooman wrote:
> >
> > Quoting Kyriazis, George (george.kyria...@intel.com):
> >
> >> The source pool has mainly big files, but there are quite a few
> &g
Quoting Kyriazis, George (george.kyria...@intel.com):
> The source pool has mainly big files, but there are quite a few
> smaller (<4KB) files that I’m afraid will create waste if I create the
> destination zpool with ashift > 12 (>4K blocks). I am not sure,
> though, if ZFS will actually write b
about this and most
probobly some overhead we currently have in those numbers. Those values
come from our old classic raid storage boxes. Those use btrfs + zlib
compression + subvolumes for those backups and we've collected those
numbers from all of them.
The new system should just replicate snapshot
ts/blob/master/tools/upmap/upmap-remapped.py
This way you can pause the process or get in "HEALTH_OK" state when
you want to.
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
istoric_slow_ops" on the storage node
hosting this OSD and you will get JSON output with the reason
(flag_point) of the slow op and the series of events.
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i..
t; (well, except for that scrub bug, but my work-around for that is in all
> release versions).
What scrub bug are you talking about?
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +
dos bench is
> stable again.
>
> apt-get install irqbalance nftables
^^ Are these some of these changes? Do you need those packages in order
to unload / blacklist them?
I don't get what your fixes are, or what the problem was. Firewall
issues?
What Ceph version did you upgr
ything in containers. It makes (performance) debugging *a
lot* easier as you can actually isolate things. Something which is way
more difficult to achieve in servers where you have a complex workload
going on ...
I guess (no proof of that) that performance will be more consistent as
well.
Gr. Stefan
--
use case? Low latency generally matters most.
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
es.
Are you planning on dedicated monitor nodes (I would definately do
that)?
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
PN VXLAN network is not trivial ... I advise on
getting networking expertise in your team.
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users
Quoting Ml Ml (mliebher...@googlemail.com):
> Hello Stefan,
>
> The status was "HEALTH_OK" before i ran those commands.
\o/
> root@ceph01:~# ceph osd crush rule dump
> [
> {
> "rule_id": 0,
> "rule_name": "repli
r "OSD" and
not host. What does a "ceph osd crush rule dump" shows?
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users mail
ur
not concerned about lifetime this is just fine. We use quite a lot of
them and even after ~ 2 years the most used SSD is at 4.4% write capacity.
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318
of mds within
> five seconds as follow,
You should run this iostat -x 1 on the OSD nodes ... MDS is not doing
any IO in and of itself as far as Ceph is concerned.
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6
GP EVPN (over VXLAN)? In
that case you would have the ceph nodes in the overlay ... You can put a
LB / Proxy up front (Varnish, ha-proxy, nginx, relayd, etc.)... (outside
of Ceph network) and connect over HTTP to the RGW nodes ... wich can
reach the Ceph network (or are even part of it) on the b
now how to migrate
> without inactive pgs and slow reguests?
Several users reported that setting the following parameters:
osd op queue = wpq
osd op queue cut off = high
Helped in cases like this.
Your milage may vary ...
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koop
want to know where it is used for:
https://tracker.ceph.com/issues/35947
TL;DR: it's not what you think it is.
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
s to
connect to 3300 ... you might get a timeout as well. Not sure if
messenger falls back to v1.
What happens when you change ceph.conf (first without restarting the
mon) and try a "ceph -s" again with a ceph client on the monitor node?
Gr. Stefan
--
| BIT
rkaround for
now if you want to override the config store: just put that in your
config file and reboot the daemon(s).
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
__
$hostname quorum_status
If there is no monitor in quorum ... then that's your problem. See [1]
for more info on debugging the monitor.
Gr. Stefan
[1]:
https://docs.ceph.com/docs/nautilus/rados/troubleshooting/troubleshooting-mon/
--
| BIT BV https://www.bit.nl/Kamer van Koophandel
might be abble to use a Prometheus dashboard
and convert that to InfluxDB compatible dashboard in Grafana. I think I
would do that if I would do it all over again. And / or use Prometheus
with a InfluxDB as the backend for long(er) term storage. With the new
InluxDB query langue "flux" [5],
14s
>
> ...
>
>
> I changed IP back to 192.168.0.104 yeasterday, but all the same.
Just checking here: do you run a firewall? Is port 3300 open (besides
6789)?
What do you see in the logs on the MDS and the ODSs? There are timers
configured in the MON / OSD in case they cannot rea
af exporters).
>
> While changing that is rather trivial, it could make sense to get
> users' feedback and come up with a list of missing perf-counters to be
> exposed.
I made https://tracker.ceph.com/issues/4188 a while ago: missing metrics
in all but prometheus module.
Gr
Quoting Stefan Kooman (ste...@bit.nl):
> 13.2.6 with this patch is running production now. We will continue the
> cleanup process that *might* have triggered this tomorrow morning.
For what's worth it ... that process completed succesfully ... Time will
tell if it's really fix
Hi,
Quoting Yan, Zheng (uker...@gmail.com):
> Please check if https://github.com/ceph/ceph/pull/32020 works
Thanks!
13.2.6 with this patch is running production now. We will continue the
cleanup process that *might* have triggered this tomorrow morning.
Gr. Stefan
--
| BIT BV ht
Quoting Stefan Kooman (ste...@bit.nl):
> and it crashed again (and again) ... until we stopped the mds and
> deleted the mds0_openfiles.0 from the metadata pool.
>
> Here is the (debug) output:
>
> A specific workload that *might* have triggered this: recursively deletin
Hi,
Quoting Stefan Kooman (ste...@bit.nl):
> > please apply following patch, thanks.
> >
> > diff --git a/src/mds/OpenFileTable.cc b/src/mds/OpenFileTable.cc
> > index c0f72d581d..2ca737470d 100644
> > --- a/src/mds/OpenFileTable.cc
> > +++ b/src/mds/Op
emons running with
different ceph versions. What does "ceph versions" give you?
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-u
omap_num_items.resize(omap_num_objs);
> omap_updates.resize(omap_num_objs);
> omap_updates.back().clear = true;
It took a while but an MDS server with this debug patch is now live (and
up:active).
FYI,
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090
"avgtime": 0.004992133
because the communication partner is slow in writing/commiting?
Dont want to follow the red hering :/
We have the following times on our 11 osds. Attached image.
-Ursprüngliche Nachricht-
Von: Paul Emmerich
Gesendet: Donnerstag 7 Novemb
110 110
10 94 94
11 24 24
Stefan
Von: Paul Emmerich
You can have a look at subop_latency in "ceph daemon osd.XX perf
dump", it tells you how long an OSD took to reply to another OSD.
That's usually
+scrubbing+deep
io:
client: 4.99MiB/s rd, 1.36MiB/s wr, 678op/s rd, 105op/s wr
Thank you.
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
you already have the patch (on github) somewhere?
Thanks,
Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lis
e openfiles list (object) becomes corrupted? As in:
have a bugfix in place?
Thanks!
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users mailing
tate (although it
has been crashing for at least 10 times now).
Is the following what you want me to do, and safe to do in this
situation?
1) Stop running (active) MDS
2) delete object 'mdsX_openfiles.0' from cephfs metadata pool
Thanks,
Stefan
--
| BIT BV https://www.bit.nl/
Dear list,
Quoting Stefan Kooman (ste...@bit.nl):
> I wonder if this situation is more likely to be hit on Mimic 13.2.6 than
> on any other system.
>
> Any hints / help to prevent this from happening?
We have had this happening another two times now. In both cases the MDS
recov
the same issue on a Mimic 13.2.6 system:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036702.html
I wonder if this situation is more likely to be hit on Mimic 13.2.6 than
on any other system.
Any hints / help to prevent this from happening?
Thanks,
Stefan
--
| BIT BV
sue or not.
>
> This time it reminds the issue shared in this mailing list a while ago by
> Stefan Priebe. The post caption is "Bluestore OSDs keep crashing in
> BlueStore.cc: 8808: FAILED assert(r == 0)"
>
> So first of all I'd suggest to distinguish these issues
start the MDS to make the
"mds_cache_memory_limit" effective, is that correct?
Gr. Stefan
[1]: https://ceph.com/community/nautilus-cephfs/
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
_
d error message is gone. Either way it
makes sense to enable the crash module anyway.
Thanks,
Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
cep
Quoting Stefan Kooman (ste...@bit.nl):
> Hi List,
>
> We are planning to move a filesystem workload (currently nfs) to CephFS.
> It's around 29 TB. The unusual thing here is the amount of directories
> in use to host the files. In order to combat a "too many files in on
;, line 214, in gather_crashinfo
errno, crashids, err = self.remote('crash', 'do_ls', '', '')
File "/usr/lib/ceph/mgr/mgr_module.py", line 845, in remote
args, kwargs)
ImportError: Module not found
Running 13.2.6 on Ub
Hi Igor,
Am 12.09.19 um 19:34 schrieb Igor Fedotov:
> Hi Stefan,
>
> thanks for the update.
>
> Relevant PR from Paul mentions kernels (4.9+):
> https://github.com/ceph/ceph/pull/23273
>
> Not sure how correct this is. That's all I have..
>
> Try asking Sage
Hello Igor,
i can now confirm that this is indeed a kernel bug. The issue does no
longer happen on upgraded nodes.
Do you know more about it? I really would like to know in which version
it was fixed to prevent rebooting all ceph nodes.
Greets,
Stefan
Am 27.08.19 um 16:20 schrieb Igor Fedotov
o investigate this issue further are highly
appreciated.
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
lugin does not (yet?) provide mds metrics though.
Ideally we would *only* use the ceph mgr telegraf module to collect *all
the things*.
Not sure what's the difference in python code between the modules that could
explain this.
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van
andard deviation. If that is quite high it makes sense to
use balancer to equalize to otain higher utilization. Either PG
optimized or capactity optimized (or a mix of both, default balancer
settings).
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD148
2) The balancer moves the data more efficiently. 3) the
balancer will avoid putting PGs on OSDs that are already full ... you
might avoid "too full" PG situations.
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31
Quoting Kenneth Waegeman (kenneth.waege...@ugent.be):
> The cluster is healthy at this moment, and we have certainly enough space
> (see also osd df below)
It's not well balanced though ... do you use ceph balancer (with
balancer in upmap mode)?
Gr. Stefan
--
| BIT BV https:/
this?
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users
'avgtime' in seconds, with "avgtime": 0.000328972 representing
0.328972 ms?
As far as I can see the logs collected by the telegraf manager plugin
only sends "sum". So how would I calculate the average reply latency for
mds requests?
Thanks,
Gr. Stefan
--
| BI
occasional invalid
> data reads under high memory pressure/swapping:
> https://tracker.ceph.com/issues/22464
We have a current 4.19.X kernel and no memory limit. Mem avail is pretty
constant at 32GB.
Greets,
Stefan
>
> IMO memory usage worth checking as well...
>
>
> Igor
see inline
Am 27.08.19 um 15:43 schrieb Igor Fedotov:
> see inline
>
> On 8/27/2019 4:41 PM, Stefan Priebe - Profihost AG wrote:
>> Hi Igor,
>>
>> Am 27.08.19 um 14:11 schrieb Igor Fedotov:
>>> Hi Stefan,
>>>
>>> this looks like a dupli
Hi Igor,
Am 27.08.19 um 14:11 schrieb Igor Fedotov:
> Hi Stefan,
>
> this looks like a duplicate for
>
> https://tracker.ceph.com/issues/37282
>
> Actually the root cause selection might be quite wide.
>
> From HW issues to broken logic in RocksDB/BlueStore/B
fb1ab2f6494]
5: (clone()+0x3f) [0x7fb1aa37dacf]
I already opend up a tracker:
https://tracker.ceph.com/issues/41367
Can anybody help? Is this known?
Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
fb1ab2f6494]
5: (clone()+0x3f) [0x7fb1aa37dacf]
I already opend up a tracker:
https://tracker.ceph.com/issues/41367
Can anybody help? Is this known?
Greets,
Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
metadata on only HDDs, it's going to be slow.
Only SSD for OSD data pool and NVMe for metadata pool, so that should be
fine. Besides the initial loading of that many files / directories this
workload shouldn't be any problem.
Thanks for your feedback.
Gr. Stefan
--
| BIT BV https://w
you are
using cephfs kernel client it might report as not compatible (jewel) but
recent linux distributions work well (Ubuntu 18.04 / CentOS 7).
Gr. Stefan
--
| BIT BV https://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
.
We are wondering if this kind of directory structure is suitable for
CephFS. Might the MDS get difficulties with keeping up that many inodes
/ dentries or doesn't it care at all?
The amount of metadata overhead might be horrible, but we will test that
out.
Thanks,
Stefan
--
| BIT
ver
needs "dangerous" updates.
This is my view on the matter, please let me know what you think of
this.
Gr. Stefan
P.s. Just to make things clear: this thread is in _no way_ intended to pick on
anybody.
[1]: https://pad.ceph.com/p/ceph-day-nl-2019-panel
--
| BIT BV https://www.
the same active and standby
as before the upgrades, both up to date with as little downtime as
possible.
That said ... I've accidentally updated a standby MDS to a newer version
than the Active one ... and this didn't cause any issues (12.2.8 ->
12.2.11) ... but I would not recommen
Quoting Patrick Donnelly (pdonn...@redhat.com):
> Hi Stefan,
>
> Sorry I couldn't get back to you sooner.
NP.
> Looks like you hit the infinite loop bug in OpTracker. It was fixed in
> 12.2.11: https://tracker.ceph.com/issues/37977
>
> The problem was introduced in
knowing about it.
Gr. Stefan
--
| BIT BV http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ded before I can do that?
It's safe to use in production. We have test clusters running it, and
recently put it in production as well. As Igor noted this might not help
in your situation, but it might prevent you from running into decreased
performance (increased latency) over time.
as been identified to be caused by the
"stupid allocator" memory allocator.
Gr. Stefan
--
| BIT BV http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Quoting Stefan Kooman (ste...@bit.nl):
> Hi Patrick,
>
> Quoting Stefan Kooman (ste...@bit.nl):
> > Quoting Stefan Kooman (ste...@bit.nl):
> > > Quoting Patrick Donnelly (pdonn...@redhat.com):
> > > > Thanks for the detailed notes. It looks like the MDS is s
presentation by Wido/Piotr that might be
useful:
https://static.sched.com/hosted_files/cephalocon2019/d6/ceph%20on%20nvme%20barcelona%202019.pdf
Gr. Stefan
--
| BIT BV http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
_
4096
bluestore min alloc size hdd = 4096
You will have to rebuild _all_ of your OSDs though.
Here is another thread about this:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/thread.html#24801
Gr. Stefan
--
| BIT BV http://www.bit.nl/Kamer van Koophandel 09090351
also be able to balance way better.
Math: (100 (PG/OSD) * 192 (# OSDs)) - 750)) / 3 = 6150 for 3 replica
pools. You might have a lot of contention going on on your OSDs, they
are probably under performing.
Gr. Stefan
___
ceph-users mailing list
ceph-users@
Quoting Frank Schilder (fr...@dtu.dk):
> Dear Yan and Stefan,
>
> it happened again and there were only very few ops in the queue. I
> pulled the ops list and the cache. Please find a zip file here:
> "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l"; .
&g
ds of HEALTH_WARN "clock skew
> detected".
>
> I guess now the workaround now is to ignore the warning, and wait
> for two minutes before rebooting another mon.
You can tune the "mon_timecheck_skew_interval" which by default is set
to 30 seconds.
Quoting Frank Schilder (fr...@dtu.dk):
> Dear Stefan,
>
> thanks for the fast reply. We encountered the problem again, this time in a
> much simpler situation; please see below. However, let me start with your
> questions first:
>
> What bug? -- In a single-active MDS set-
rring to based on info below. It does
seem to work as designed.
Gr. Stefan
--
| BIT BV http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-users@lists.
r,
which (also) might result in slow ops after $period of OSD uptime.
Gr. Stefan
--
| BIT BV http://www.bit.nl/Kamer van Koophandel 09090351
| GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl
___
ceph-users mailing list
ceph-use
1 - 100 of 543 matches
Mail list logo