Quoting Martin Mlynář (nexus+c...@smoula.net):
> Do you think this could help? OSD does not even start, I'm getting a little
> lost how flushing caches could help.
I might have mis-understood. I though the OSDs crashed when you set the
config setting.
> According to trace I suspect something aro
Quoting Martin Mlynář (nexus+c...@smoula.net):
>
> When I remove this option:
> # ceph config rm osd osd_memory_target
>
> OSD starts without any trouble. I've seen same behaviour when I wrote
> this parameter into /etc/ceph/ceph.conf
>
> Is this a known bug? Am I doing something wrong?
I wond
Quoting Ignazio Cassano (ignaziocass...@gmail.com):
> Hello, I just deployed nautilus with ceph-deploy.
> I did not find any option to give a cluster name to my ceph so its name is
> "ceph".
> Please, how can I chenge my cluster name without reinstalling ?
>
> Please, how can I set the cluster nam
Quoting Robert LeBlanc (rob...@leblancnet.us):
>
> req_create
> req_getattr
> req_readdir
> req_lookupino
> req_open
> req_unlink
>
> We were graphing these as ops, but using the new avgcount, we are getting
> very different values, so I'm wondering if we are choosing the wrong new
> value, or we
Quoting Robert LeBlanc (rob...@leblancnet.us):
> The link that you referenced above is no longer available, do you have a
> new link?. We upgraded from 12.2.8 to 12.2.12 and the MDS metrics all
> changed, so I'm trying to may the old values to the new values. Might just
> have to look in the code.
Quoting Kyriazis, George (george.kyria...@intel.com):
>
> Hmm, I meant you can use large block size for the large files and small
> block size for the small files.
>
> Sure, but how to do that. As far as I know block size is a property of the
> pool, not a single file.
recordsize: https://blog
Quoting Kyriazis, George (george.kyria...@intel.com):
>
>
> > On Jan 9, 2020, at 8:00 AM, Stefan Kooman wrote:
> >
> > Quoting Kyriazis, George (george.kyria...@intel.com):
> >
> >> The source pool has mainly big files, but there are quite a few
> &g
Quoting Kyriazis, George (george.kyria...@intel.com):
> The source pool has mainly big files, but there are quite a few
> smaller (<4KB) files that I’m afraid will create waste if I create the
> destination zpool with ashift > 12 (>4K blocks). I am not sure,
> though, if ZFS will actually write b
Quoting Sean Matheny (s.math...@auckland.ac.nz):
> I tested this out by setting norebalance and norecover, moving the host
> buckets under the rack buckets (all of them), and then unsetting. Ceph starts
> melting down with escalating slow requests, even with backfill and recovery
> parameters se
Quoting Sinan Polat (si...@turka.nl):
> Hi,
>
>
> I couldn't find any documentation or information regarding the log format in
> Ceph. For example, I have 2 log lines (see below). For each 'word' I would
> like
> to know what it is/means.
>
> As far as I know, I can break the log lines into:
>
Quoting Paul Emmerich (paul.emmer...@croit.io):
> We've also seen some problems with FileStore on newer kernels; 4.9 is the
> last kernel that worked reliably with FileStore in my experience.
>
> But I haven't seen problems with BlueStore related to the kernel version
> (well, except for that scru
Quoting Jelle de Jong (jelledej...@powercraft.nl):
> question 2: what systemd target i can use to run a service after all
> ceph-osds are loaded? I tried ceph.target ceph-osd.target both do not work
> reliable.
ceph-osd.target works for us (every time). Have you enabled all the
individual OSD ser
Quoting Radhakrishnan2 S (radhakrishnan...@tcs.com):
> Where hypervisor would be your Ceph nodes. I.e. you can connect your
> Ceph nodes on L2 or make them part of the L3 setup (more modern way of
> doing it). You can use "ECMP" to add more network capacity when you need
> it. Setting up a BGP EVPN
Quoting Ignazio Cassano (ignaziocass...@gmail.com):
> Hello All,
> I installed ceph luminous with openstack, an using fio in a virtual machine
> I got slow random writes:
>
> fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test
> --filename=random_read_write.fio --bs=4k --io
Quoting Radhakrishnan2 S (radhakrishnan...@tcs.com):
> In addition, about putting all kinds of disks, putting all drives in
> one box was done for two reasons,
>
> 1. Avoid CPU choking
This depends only on what kind of hardware you select and how you
configure it. You can (if need be) restrict #C
Hi,
>
> Radha: I'm sure we are using BGP EVPN over VXLAN, but all deployments
> are through the infrastructure management network. We are a CSP and
> overlay means tenant network, if ceph nodes are in overlay, then
> multiple tenants will need to be able to communicate to the ceph
> nodes. If LB i
Quoting Ml Ml (mliebher...@googlemail.com):
> Hello Stefan,
>
> The status was "HEALTH_OK" before i ran those commands.
\o/
> root@ceph01:~# ceph osd crush rule dump
> [
> {
> "rule_id": 0,
> "rule_name": "replicated_ruleset",
> "ruleset": 0,
> "type": 1,
>
Quoting Ml Ml (mliebher...@googlemail.com):
> Hello List,
> i have size = 3 and min_size = 2 with 3 Nodes.
That's good.
>
>
> I replaced two osds on node ceph01 and ran into "HEALTH_ERR".
> My problem: it waits for the backfilling process?
> Why did i run into HEALTH_ERR? I thought all data wil
Quoting Sinan Polat (si...@turka.nl):
> Thanks for all the replies. In summary; consumer grade SSD is a no go.
>
> What is an alternative to SM863a? Since it is quite hard to get these due non
> non-stock.
PM863a ... lower endurance ... but still "enterprise" ... but as your
not concerned about
Quoting renjianxinlover (renjianxinlo...@163.com):
> HI, Nathan, thanks for your quick reply!
> comand 'ceph status' outputs warning including about ten clients failing to
> respond to cache pressure;
> in addition, in mds node, 'iostat -x 1' shows drive io usage of mds within
> five seconds as f
Quoting Radhakrishnan2 S (radhakrishnan...@tcs.com):
> Hello Everyone,
>
> We have a pre-prod Ceph cluster and working towards a production cluster
> deployment. I have the following queries and request all your expert tips,
>
>
> 1. Network architecture - We are looking for a private and pub
Quoting Jelle de Jong (jelledej...@powercraft.nl):
>
> It took three days to recover and during this time clients were not
> responsive.
>
> How can I migrate to bluestore without inactive pgs or slow request. I got
> several more filestore clusters and I would like to know how to migrate
> witho
Quoting Miroslav Kalina (miroslav.kal...@livesport.eu):
> Monitor down is also easy as pie, because it's just "num_mon -
> mon_quorum". But there is also metric mon_outside_quorum which I have
> always zero and don't really know how it works.
See this issue if you want to know where it is used fo
Quoting Cc君 (o...@qq.com):
> Hi,daemon is running when using admin socket
> [root@ceph-node1 ceph]# ceph --admin-daemon
> /var/run/ceph/ceph-mon.ceph-node1.asok mon_status
> {
> "name": "ceph-node1",
> "rank": 0,
> "state": "leader",
> "election_epoch": 63,
> "quorum": [
>
Quoting David Herselman (d...@syrex.co):
> Hi,
>
> We assimilated our Ceph configuration to store attributes within Ceph
> itself and subsequently have a minimal configuration file. Whilst this
> works perfectly we are unable to remove configuration entries
> populated by the assimilate-conf comma
Quoting Cc君 (o...@qq.com):
> 4.[root@ceph-node1 ceph]# ceph -s
> just blocked ...
> error 111 after a few hours
Is the daemon running? You can check for the process to be alive in
/var/run/ceph/ceph-mon.$hostname.asok
If so ... then query the monitor for its status:
ceph daemon mon.$hostname quo
Quoting Miroslav Kalina (miroslav.kal...@livesport.eu):
> Hello guys,
>
> is there anyone using Telegraf / InfluxDB metrics exporter with Grafana
> dashboards? I am asking like that because I was unable to find any
> existing Grafana dashboards based on InfluxDB.
\o (telegraf)
> I am having hard
Quoting Cc君 (o...@qq.com):
> ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba) nautilus
> (stable)
>
> os :CentOS Linux release 7.7.1908 (Core)
> single node ceph cluster with 1 mon,1mgr,1 mds,1rgw and 12osds , but
> only cephfs is used.
> ceph -s is blocked after shutting down
Quoting Ernesto Puerta (epuer...@redhat.com):
> The default behaviour is that only perf-counters with priority
> PRIO_USEFUL (5) or higher are exposed (via `get_all_perf_counters` API
> call) to ceph-mgr modules (including Dashboard, DiskPrediction or
> Prometheus/InfluxDB/Telegraf exporters).
>
Quoting Stefan Kooman (ste...@bit.nl):
> 13.2.6 with this patch is running production now. We will continue the
> cleanup process that *might* have triggered this tomorrow morning.
For what's worth it ... that process completed succesfully ... Time will
tell if it's really fix
Hi,
Quoting Yan, Zheng (uker...@gmail.com):
> Please check if https://github.com/ceph/ceph/pull/32020 works
Thanks!
13.2.6 with this patch is running production now. We will continue the
cleanup process that *might* have triggered this tomorrow morning.
Gr. Stefan
--
| BIT BV https://www.bi
Quoting Stefan Kooman (ste...@bit.nl):
> and it crashed again (and again) ... until we stopped the mds and
> deleted the mds0_openfiles.0 from the metadata pool.
>
> Here is the (debug) output:
>
> A specific workload that *might* have triggered this: recursively deletin
Hi,
Quoting Stefan Kooman (ste...@bit.nl):
> > please apply following patch, thanks.
> >
> > diff --git a/src/mds/OpenFileTable.cc b/src/mds/OpenFileTable.cc
> > index c0f72d581d..2ca737470d 100644
> > --- a/src/mds/OpenFileTable.cc
> > +++ b/src/mds/Op
Quoting John Hearns (j...@kheironmed.com):
> And me again for the second time in one day.
>
> ceph -w is now showing messages like this:
>
> 2019-12-03 15:17:22.426988 osd.6 [WRN] failed to encode map e28961 with
> expected crc
I have seen messages like this when there are daemons running with
d
Hi,
Quoting Yan, Zheng (uker...@gmail.com):
> > > I double checked the code, but didn't find any clue. Can you compile
> > > mds with a debug patch?
> >
> > Sure, I'll try to do my best to get a properly packaged Ceph Mimic
> > 13.2.6 with the debug patch in it (and / or get help to get it build)
Quoting Yan, Zheng (uker...@gmail.com):
> I double checked the code, but didn't find any clue. Can you compile
> mds with a debug patch?
Sure, I'll try to do my best to get a properly packaged Ceph Mimic
13.2.6 with the debug patch in it (and / or get help to get it build).
Do you already have th
Quoting Yan, Zheng (uker...@gmail.com):
> delete 'mdsX_openfiles.0' object from cephfs metadata pool. (X is rank
> of the crashed mds)
OK, MDS crashed again, restarted. I stopped it, deleted the object and
restarted the MDS. It became active right away.
Any idea on why the openfiles list (object
Quoting Yan, Zheng (uker...@gmail.com):
> delete 'mdsX_openfiles.0' object from cephfs metadata pool. (X is rank
> of the crashed mds)
Just to make sure I understand correctly. Current status is that the MDS
is active (no standby for now) and not in a "crashed" state (although it
has been crashin
Dear list,
Quoting Stefan Kooman (ste...@bit.nl):
> I wonder if this situation is more likely to be hit on Mimic 13.2.6 than
> on any other system.
>
> Any hints / help to prevent this from happening?
We have had this happening another two times now. In both cases the MDS
recov
Dear list,
Today our active MDS crashed with an assert:
2019-10-19 08:14:50.645 7f7906cb7700 -1
/build/ceph-13.2.6/src/mds/OpenFileTable.cc: In function 'void
OpenFileTable::commit(MDSInternalContextBase*, uint64_t, int)' thread
7f7906cb7700 time 2019-10-19 08:14:50.648559
/build/ceph-13.2.6/s
Hi,
According to [1] there are new parameters in place to have the MDS
behave more stable. Quoting that blog post "One of the more recent
issues weve discovered is that an MDS with a very large cache (64+GB)
will hang during certain recovery events."
For all of us that are not (yet) running Nauti
>
> I created this issue: https://tracker.ceph.com/issues/42116
>
> Seems to be related to the 'crash' module not enabled.
>
> If you enable the module the problem should be gone. Now I need to check
> why this message is popping up.
Yup, crash module enabled and error message is gone. Either w
Quoting Stefan Kooman (ste...@bit.nl):
> Hi List,
>
> We are planning to move a filesystem workload (currently nfs) to CephFS.
> It's around 29 TB. The unusual thing here is the amount of directories
> in use to host the files. In order to combat a "too many files in on
Quoting Wido den Hollander (w...@42on.com):
> Hi,
>
> The Telemetry [0] module has been in Ceph since the Mimic release and
> when enabled it sends back a anonymized JSON back to
> https://telemetry.ceph.com/ every 72 hours with information about the
> cluster.
>
> For example:
>
> - Version(s)
Dear list,
We recently switched the shared storage for our linux shared hosting
platforms from "nfs" to "cephfs". Performance improvement are
noticeable. It all works fine, however, there is one peculiar thing:
when Apache reloads after a logrotate of the "error" logs all but one
node will hang fo
Hi Paul,
Quoting Paul Emmerich (paul.emmer...@croit.io):
> https://static.croit.io/ceph-training-examples/ceph-training-example-admin-socket.pdf
Thanks for the link. So, what tool do you use to gather the metrics? We
are using telegraf module of the Ceph manager. However, this module only
provide
Quoting Massimo Sgaravatto (massimo.sgarava...@gmail.com):
> Thank you
>
> But the algorithms used during backfilling and during rebalancing (to
> decide where data have to be placed) are different ?
Yes, the balancer takes more factors into consideration. It also takes
into consideration all of
Quoting Massimo Sgaravatto (massimo.sgarava...@gmail.com):
> Just for my education, why letting the balancer moving the PGs to the new
> OSDs (CERN approach) is better than a throttled backfilling ?
1) Because you can pause the process on any given moment and obtain
HEALTH_OK again. 2) The balanc
Quoting Kenneth Waegeman (kenneth.waege...@ugent.be):
> The cluster is healthy at this moment, and we have certainly enough space
> (see also osd df below)
It's not well balanced though ... do you use ceph balancer (with
balancer in upmap mode)?
Gr. Stefan
--
| BIT BV https://www.bit.nl/
Hi,
While upgrading the monitors on a Nautilus test cluster warning messages
apear:
[WRN] failed to encode map e905 with expected crc
Is this expected?
I have only seen this in the past when mixing different releases (major
versions), not when upgrading within a release.
What is the impact of
Hi,
Just wondering, what are the units of the metrics logged by "perf dump"?
For example mds perf dump:
"reply_latency": {
"avgcount": 560013520,
"sum": 184229.305658729,
"avgtime": 0.000328972
is the 'avgtime' in seconds, with "avgtime": 0.000328972
Quoting Nathan Fish (lordci...@gmail.com):
> MDS CPU load is proportional to metadata ops/second. MDS RAM cache is
> proportional to # of files (including directories) in the working set.
> Metadata pool size is proportional to total # of files, plus
> everything in the RAM cache. I have seen that
Quoting Peter Sabaini (pe...@sabaini.at):
> What kind of commit/apply latency increases have you seen when adding a
> large numbers of OSDs? I'm nervous how sensitive workloads might react
> here, esp. with spinners.
You mean when there is backfilling going on? Instead of doing "a big
bang" you ca
Hi List,
We are planning to move a filesystem workload (currently nfs) to CephFS.
It's around 29 TB. The unusual thing here is the amount of directories
in use to host the files. In order to combat a "too many files in one
directory" scenario a "let's make use of recursive directories" approach.
N
Hi,
Now the release cadence has been set, it's time for another discussion
:-).
During Ceph day NL we had a panel q/a [1]. One of the things that was
discussed were backports. Occasionally users will ask for backports of
functionality in newer releases to older releases (that are still in
support
Quoting James Wilkins (james.wilk...@fasthosts.com):
> Hi all,
>
> Just want to (double) check something – we’re in the process of
> luminous -> mimic upgrades for all of our clusters – particularly this
> section regarding MDS steps
>
> • Confirm that only one MDS is online and is rank 0
Quoting Patrick Donnelly (pdonn...@redhat.com):
> Hi Stefan,
>
> Sorry I couldn't get back to you sooner.
NP.
> Looks like you hit the infinite loop bug in OpTracker. It was fixed in
> 12.2.11: https://tracker.ceph.com/issues/37977
>
> The problem was introduced in 12.2.8.
We've been quite lon
Quoting solarflow99 (solarflo...@gmail.com):
> can the bitmap allocator be set in ceph-ansible? I wonder why is it not
> default in 12.2.12
We don't use ceph-ansible. But if ceph-ansible allow you to set specific
([osd]) settings in ceph.conf I guess you can do it.
I don't know what the policy i
Quoting Max Vernimmen (vernim...@textkernel.nl):
> Thank you for the suggestion to use the bitmap allocator. I looked at the
> ceph documentation and could find no mention of this setting. This makes me
> wonder how safe and production ready this setting really is. I'm hesitant
> to apply that to o
Quoting Max Vernimmen (vernim...@textkernel.nl):
>
> This is happening several times per day after we made several changes at
> the same time:
>
>- add physical ram to the ceph nodes
>- move from fixed 'bluestore cache size hdd|sdd' and 'bluestore cache kv
>max' to 'bluestore cache au
Quoting Stefan Kooman (ste...@bit.nl):
> Hi Patrick,
>
> Quoting Stefan Kooman (ste...@bit.nl):
> > Quoting Stefan Kooman (ste...@bit.nl):
> > > Quoting Patrick Donnelly (pdonn...@redhat.com):
> > > > Thanks for the detailed notes. It looks like the MDS is s
Quoting Robert Sander (r.san...@heinlein-support.de):
> Hi,
>
> we have a small cluster at a customer's site with three nodes and 4 SSD-OSDs
> each.
> Connected with 10G the system is supposed to perform well.
>
> rados bench shows ~450MB/s write and ~950MB/s read speeds with 4MB objects
> but on
Quoting Robert Ruge (robert.r...@deakin.edu.au):
> Ceph newbie question.
>
> I have a disparity between the free space that my cephfs file system
> is showing and what ceph df is showing. As you can see below my
> cephfs file system says there is 9.5TB free however ceph df says there
> is 186TB w
Quoting Frank Schilder (fr...@dtu.dk):
>
> [root@ceph-01 ~]# ceph status # before the MDS failed over
> cluster:
> id: ###
> health: HEALTH_WARN
> 1 MDSs report slow requests
>
> services:
> mon: 3 daemons, quorum ceph-01,ceph-02,ceph-03
> mgr: ceph-01(active), st
Quoting Frank Schilder (fr...@dtu.dk):
> Dear Yan and Stefan,
>
> it happened again and there were only very few ops in the queue. I
> pulled the ops list and the cache. Please find a zip file here:
> "https://files.dtu.dk/u/w6nnVOsp51nRqedU/mds-stuck-dirfrag.zip?l"; .
> Its a bit more than 100MB.
Quoting Jan Kasprzak (k...@fi.muni.cz):
> OK, many responses (thanks for them!) suggest chrony, so I tried it:
> With all three mons running chrony and being in sync with my NTP server
> with offsets under 0.0001 second, I rebooted one of the mons:
>
> There still was the HEALTH_WARN
Quoting Frank Schilder (fr...@dtu.dk):
> Dear Stefan,
>
> thanks for the fast reply. We encountered the problem again, this time in a
> much simpler situation; please see below. However, let me start with your
> questions first:
>
> What bug? -- In a single-active MDS set-up, should there ever
Quoting Frank Schilder (fr...@dtu.dk):
If at all possible I would:
Upgrade to 13.2.5 (there have been quite a few MDS fixes since 13.2.2).
Use more recent kernels on the clients.
Below settings for [mds] might help with trimming (you might already
have changed mds_log_max_segments to 128 accordi
Quoting Marc Schöchlin (m...@256bit.org):
> Out new setup is now:
> (12.2.10 on Ubuntu 16.04)
>
> [osd]
> osd deep scrub interval = 2592000
> osd scrub begin hour = 19
> osd scrub end hour = 6
> osd scrub load threshold = 6
> osd scrub sleep = 0.3
> osd snap trim sleep = 0.4
> pg max concurrent s
Quoting Patrick Hein (bagba...@googlemail.com):
> It should be as recent as possible. I think would use the HWE Kernel.
^^ This. Use linux-image-generic-hwe-16.04 (4.15 kernel). But ideally you go for
Ubuntu Bionic and use linux-image-generic-hwe-18.04 (4.18 kernel).
Added benefit of 4.18 kernel
Quoting Ignat Zapolsky (ignat.zapol...@ammeon.com):
> Hi,
>
> Just a question what is recommended docker container image to use for
> ceph ?
>
> CEPH website us saying that 12.2.x is LTR but there are at least 2
> more releases in dockerhub – 13 and 14.
>
> Would there be any advise on selection
Quoting Konstantin Shalygin (k0...@k0ste.ru):
> Because you set new crush rule only for `cephfs_metadata` pool and look for
> pg at `cephfs_data` pool.
ZOMG :-O
Yup, that was it. cephfs_metadata pool looks good.
Thanks!
Gr. Stefan
--
| BIT BV http://www.bit.nl/Kamer van Koophandel 09
Quoting Gregory Farnum (gfar...@redhat.com):
> What's the output of "ceph -s" and "ceph osd tree"?
ceph -s
cluster:
id: 40003df8-c64c-5ddb-9fb6-d62b94b47ecf
health: HEALTH_OK
services:
mon: 3 daemons, quorum michelmon1,michelmon2,michelmon3 (age 2d)
mgr: michelmon2(active
Hi List,
I'm playing around with CRUSH rules and device classes and I'm puzzled
if it's working correctly. Platform specifics: Ubuntu Bionic with Ceph 14.2.1
I created two new device classes "cheaphdd" and "fasthdd". I made
sure these device classes are applied to the right OSDs and that the
(sha
Hi,
> Any recommendations?
>
> .. found a lot of names allready ..
> OpenStack
> CloudStack
> Proxmox
> ..
>
> But recommendations are truely welcome.
I would recommend OpenNebula. Adopters of the KISS methodology.
Gr. Stefan
--
| BIT BV http://www.bit.nl/Kamer van Koophandel
Hi List,
TL;DR:
For those of you who are running a Ceph cluster with Intel SSD D3-S4510
and or Intel SSD D3-S4610 with firmware version XCV10100 please upgrade
to firmware XCV10110 ASAP. At least before ~ 1700 power up hours.
More information here:
https://support.microsoft.com/en-us/help/44996
Quoting Lars Täuber (taeu...@bbaw.de):
> > > This is something i was told to do, because a reconstruction of failed
> > > OSDs/disks would have a heavy impact on the backend network.
> >
> > Opinions vary on running "public" only versus "public" / "backend".
> > Having a separate "backend" netwo
Quoting Lars Täuber (taeu...@bbaw.de):
> > I'd probably only use the 25G network for both networks instead of
> > using both. Splitting the network usually doesn't help.
>
> This is something i was told to do, because a reconstruction of failed
> OSDs/disks would have a heavy impact on the backend
Quoting Paul Emmerich (paul.emmer...@croit.io):
> This also happened sometimes during a Luminous -> Mimic upgrade due to
> a bug in Luminous; however I thought it was fixed on the ceph-mgr
> side.
> Maybe the fix was (also) required in the OSDs and you are seeing this
> because the running OSDs hav
Quoting Stadsnet (jwil...@stads.net):
> On 26-3-2019 16:39, Ashley Merrick wrote:
> >Have you upgraded any OSD's?
>
>
> No didn't go through with the osd's
Just checking here: are your sure all PGs have been scrubbed while
running Luminous? As the release notes [1] mention this:
"If you are uns
Quoting Burkhard Linke (burkhard.li...@computational.bio.uni-giessen.de):
> Hi,
> Images:
>
> Straight-forward attempt would be exporting all images with qemu-img from
> one cluster, and uploading them again on the second cluster. But this will
> break snapshots, protections etc.
You can use rbd-
Quoting Zack Brenton (z...@imposium.com):
> On Tue, Mar 12, 2019 at 6:10 AM Stefan Kooman wrote:
>
> > Hmm, 6 GiB of RAM is not a whole lot. Especially if you are going to
> > increase the amount of OSDs (partitions) like Patrick suggested. By
> > default it will take 4 G
Quoting Zack Brenton (z...@imposium.com):
> Types of devices:
> We run our Ceph pods on 3 AWS i3.2xlarge nodes. We're running 3 OSDs, 3
> Mons, and 2 MDS pods (1 active, 1 standby-replay). Currently, each pod runs
> with the following resources:
> - osds: 2 CPU, 6Gi RAM, 1.7Ti NVMe disk
> - mds: 3
Quoting Patrick Donnelly (pdonn...@redhat.com):
> On Thu, Feb 28, 2019 at 12:49 PM Stefan Kooman wrote:
> >
> > Dear list,
> >
> > After upgrading to 12.2.11 the MDSes are reporting slow metadata IOs
> > (MDS_SLOW_METADATA_IO). The metadata IOs would have been bl
Quoting Wido den Hollander (w...@42on.com):
> Just wanted to chime in, I've seen this with Luminous+BlueStore+NVMe
> OSDs as well. Over time their latency increased until we started to
> notice I/O-wait inside VMs.
On a Luminous 12.2.8 cluster with only SSDs we also hit this issue I
guess. After
Dear list,
After upgrading to 12.2.11 the MDSes are reporting slow metadata IOs
(MDS_SLOW_METADATA_IO). The metadata IOs would have been blocked for
more that 5 seconds. We have one active, and one active standby MDS. All
storage on SSD (Samsung PM863a / Intel DC4500). No other (OSD) slow ops
repo
Quoting Matthew Vernon (m...@sanger.ac.uk):
> Hi,
>
> On our Jewel clusters, the mons keep a log of the cluster status e.g.
>
> 2019-01-24 14:00:00.028457 7f7a17bef700 0 log_channel(cluster) log [INF] :
> HEALTH_OK
> 2019-01-24 14:00:00.646719 7f7a46423700 0 log_channel(cluster) log [INF] :
> p
Hi Patrick,
Quoting Stefan Kooman (ste...@bit.nl):
> Quoting Stefan Kooman (ste...@bit.nl):
> > Quoting Patrick Donnelly (pdonn...@redhat.com):
> > > Thanks for the detailed notes. It looks like the MDS is stuck
> > > somewhere it's not even outputting any log mess
Quoting Mike Perez (mipe...@redhat.com):
> Hi Serkan,
>
> I'm currently working on collecting the slides to have them posted to
> the Ceph Day Berlin page as Lenz mentioned they would show up. I will
> notify once the slides are available on mailing list/twitter. Thanks!
FYI: The Ceph Day Berlin
Quoting Robert Sander (r.san...@heinlein-support.de):
> On 07.12.18 18:33, Scharfenberg, Buddy wrote:
>
> > We have 3 nodes set up, 1 with several large drives, 1 with a handful of
> > small ssds, and 1 with several nvme drives.
>
> This is a very unusual setup. Do you really have all your HDDs i
Jay Munsterman schreef op 7 december 2018 21:55:25 CET:
>Hey all,
>I hope this is a simple question, but I haven't been able to figure it
>out.
>On one of our clusters there seems to be a disparity between the global
>available space and the space available to pools.
>
>$ ceph df
>GLOBAL:
>SIZ
Quoting Cody (codeology@gmail.com):
> The Ceph OSD part of the cluster uses 3 identical servers with the
> following specifications:
>
> CPU: 2 x E5-2603 @1.8GHz
> RAM: 16GB
> Network: 1G port shared for Ceph public and cluster traffics
This will hamper throughput a lot.
> Journaling device
Quoting Dan van der Ster (d...@vanderster.com):
> Haven't seen that exact issue.
>
> One thing to note though is that if osd_max_backfills is set to 1,
> then it can happen that PGs get into backfill state, taking that
> single reservation on a given OSD, and therefore the recovery_wait PGs
> can'
Quoting Janne Johansson (icepic...@gmail.com):
> Yes, when you add a drive (or 10), some PGs decide they should have one or
> more
> replicas on the new drives, a new empty PG is created there, and
> _then_ that replica
> will make that PG get into the "degraded" mode, meaning if it had 3
> fine a
Quoting Robin H. Johnson (robb...@gentoo.org):
> On Fri, Nov 23, 2018 at 04:03:25AM +0700, Lazuardi Nasution wrote:
> > I'm looking example Ceph configuration and topology on full layer 3
> > networking deployment. Maybe all daemons can use loopback alias address in
> > this case. But how to set cl
Hi List,
Another interesting and unexpected thing we observed during cluster
expansion is the following. After we added extra disks to the cluster,
while "norebalance" flag was set, we put the new OSDs "IN". As soon as
we did that a couple of hundered objects would become degraded. During
that ti
Hi list,
During cluster expansion (adding extra disks to existing hosts) some
OSDs failed (FAILED assert(0 == "unexpected error", _txc_add_transaction
error (39) Directory not empty not handled on operation 21 (op 1,
counting from 0), full details: https://8n1.org/14078/c534). We had
"norebalance"
Quoting Stefan Kooman (ste...@bit.nl):
> Quoting Patrick Donnelly (pdonn...@redhat.com):
> > Thanks for the detailed notes. It looks like the MDS is stuck
> > somewhere it's not even outputting any log messages. If possible, it'd
> > be helpful to get a coredump (e.g
Quoting Stefan Kooman (ste...@bit.nl):
> I'm pretty sure it isn't. I'm trying to do the same (force luminous
> clients only) but ran into the same issue. Even when running 4.19 kernel
> it's interpreted as a jewel client. Here is the list I made so
Quoting Ilya Dryomov (idryo...@gmail.com):
> On Sat, Nov 3, 2018 at 10:41 AM wrote:
> >
> > Hi.
> >
> > I tried to enable the "new smart balancing" - backend are on RH luminous
> > clients are Ubuntu 4.15 kernel.
[cut]
> > ok, so 4.15 kernel connects as a "hammer" (<1.0) client? Is there a
> > hu
1 - 100 of 180 matches
Mail list logo