A few things that you can try on the network side to shave off microseconds:
1) 10G Base-T has quite some latency compared to fiber or DAC. I've
measured 2 µs on Base-T vs. 0.3µs on fiber for one link in one direction,
so that's 8µs you can save for a round-trip if it's client -> switch -> osd
and
Well, what I was saying was "does it hurt to unconditionally run hdparm -W
0 on all disks?"
Which disk would suffer from this? I haven't seen any disk where this would
be a bad idea
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit
Have a look at cephfs subvolumes:
https://docs.ceph.com/docs/master/cephfs/fs-volumes/#fs-subvolumes
They are internally just directories with quota/pool placement
layout/namespace with some mgr magic to make it easier than doing that all
by hand
Paul
--
Paul Emmerich
Looking for help with
Has anyone ever encountered a drive with a write cache that actually
*helped*?
I haven't.
As in: would it be a good idea for the OSD to just disable the write cache
on startup? Worst case it doesn't do anything, best case it improves
latency.
Paul
--
Paul Emmerich
Looking for help
complex, trying both v1 and v2 is the default.
You can simply write it like this if you are running on the default ports:
mon_host = 10.144.0.2, 10.144.0.3, 10.144.0.4
This has the advantage of being backwards-compatible with old clients.
Paul
--
Paul Emmerich
Looking for help with your Ceph c
rade-off here is recovery speed/locked
objects vs. read amplification)
I think the formula shards = bucket_size / 100k shouldn't apply for buckets
with >= 100 million objects; shards should become bigger as the bucket size
increases.
Paul
--
Paul Emmerich
Looking for help with your Cep
On Mon, Jun 15, 2020 at 7:01 PM wrote:
> Ceph version 10.2.7
>
> ceph.conf
> [global]
> fsid = 75d6dba9-2144-47b1-87ef-1fe21d3c58a8
>
(...)
> mount_activate: Failed to activate
> ceph-disk: Error: No cluster conf found in /etc/ceph with fsid
> e1d7b4ae-2dcd-40ee-bea
don't get back (or broken disks that
you don't replace quickly)
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Tue, Jun 2, 2020 at 12:32 PM Thomas Byrn
"reweight 0" and "out" are the exact same thing
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Tue, Jun 2, 2020 at 9:30 AM Wido den Hollander
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Fri, May 29, 2020 at 2:15 AM Dave Hall wrote:
> Hello.
>
> A few days ago I offered to share the notes I'
Did you disable "osd scrub during recovery"?
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Fri, May 29, 2020 at 12:04 AM Vytenis A wrote:
> Forg
There are two bugs that may cause the tag to be missing from the pools, you
can somehow manually add these tags with "ceph osd pool application ..."; I
think I posted these commands some time ago on tracker.ceph.com
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Con
Common problem for FileStore and really no point in debugging this: upgrade
everything to a recent version and migrate to BlueStore.
99% of random latency spikes are just fixed by doing that.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit
Hi,
since this bug may lead to data loss when several OSDs crash at the same
time (e.g., after a power outage): can we pull the release from the mirrors
and docker hub?
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
Don't optimize stuff without benchmarking *before and after*, don't apply
random tuning tipps from the Internet without benchmarking them.
My experience with Jumbo frames: 3% performance. On a NVMe-only setup with
100 Gbit/s network.
Paul
--
Paul Emmerich
Looking for help with
On Wed, May 20, 2020 at 5:36 PM Vytenis A wrote:
> Is it possible to get any finer prediction date?
>
related question: did anyone actually observe any correlation between the
predicted failure time and the actual time until a failure occurs?
Paul
--
Paul Emmerich
Looking for hel
On Tue, May 19, 2020 at 3:11 PM thoralf schulze
wrote:
>
> On 5/19/20 2:13 PM, Paul Emmerich wrote:
> > 3) if necessary add more OSDs; common problem is having very
> > few dedicated OSDs for the index pool; running the index on
> > all OSDs (and having a fast DB d
On Tue, May 19, 2020 at 2:06 PM Igor Fedotov wrote:
> Hi Thoralf,
>
> given the following indication from your logs:
>
> May 18 21:12:34 ceph-osd-05 ceph-osd[2356578]: 2020-05-18 21:12:34.211
> 7fb25cc80700 0 bluestore(/var/lib/ceph/osd/ceph-293) log_latency_fn
> slow operation observed for _col
that part of erasure profiles are only used when a crush rule is created
when creating a pool without explicitly specifying a crush rule
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel
be challenging for erasure coding on
HDDs, but that's unrelated to rgw/you'd have the same problem with CephFS
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 18
osd df is misleading when using external DB devices, they are always
counted as 100% full there
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Wed, May 13, 2020
First thing I'd try is to use objectstore-tool to scrape the
inactive/broken PGs from the dead OSDs using it's PG export feature.
Then import these PGs into any other OSD which will automatically recover
it.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact u
And many hypervisors will turn writing zeroes into an unmap/trim (qemu
detect-zeroes=unmap), so running trim on the entire empty disk is often the
same as writing zeroes.
So +1 for encryption being the proper way here
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us
Check network connectivity on all configured networks between alle hosts,
OSDs running but being marked as down is usually a network problem
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
cally when encountering a read-only block device
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Mon, May 4, 2020 at 7:05 PM Void Star Nill
wrote:
> Thanks
On Fri, May 1, 2020 at 9:27 PM Paul Emmerich wrote:
> The OpenFileTable objects are safe to delete while the MDS is offline
> anyways, the RADOS object names are mds*_openfiles*
>
I should clarify this a little bit: you shouldn't touch the CephFS internal
state or data structu
The OpenFileTable objects are safe to delete while the MDS is offline
anyways, the RADOS object names are mds*_openfiles*
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585
I've seen issues with clients reconnects on older kernels, yeah. They
sometimes get stuck after a network failure
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
Things to check:
* metadata is on SSD?
* try multiple active MDS servers
* try a larger cache for the MDS
* try a recent version of Ceph
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel
s of crash reports)
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Thu, Apr 30, 2020 at 4:09 PM Francois Legrand
wrote:
> Hi everybody (again),
> We recen
mated upgrade assistant; it's just one button that
does all the right things in the right order.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Wed, Apr 29, 202
On Tue, Apr 21, 2020 at 12:44 PM Brad Hubbard wrote:
>
> On Tue, Apr 21, 2020 at 6:35 PM Paul Emmerich wrote:
> >
> > On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote:
> > >
> > > Wait for recovery to finish so you know whether any data from the down
On Tue, Apr 21, 2020 at 3:20 AM Brad Hubbard wrote:
>
> Wait for recovery to finish so you know whether any data from the down
> OSDs is required. If not just reprovision them.
Recovery will not finish from this state as several PGs are down and/or stale.
Paul
>
> If data is required from the
the client
requirement; I don't know the command to do this off the top of my
head
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
>
> Many thanks and
bit 21 in the features bitmap is upmap support
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Mon, Apr 13, 2020 at 11:53 AM Frank Schilder wrote:
>
> De
On Sat, Apr 11, 2020 at 12:43 AM Reed Dier wrote:
> That said, as a straw man argument, ~380GiB free, times 60 OSDs, should be
> ~22.8TiB free, if all OSD's grew evenly, which they won't
Yes, that's the problem. They won't grow evenly. The fullest one will
grow faster than the others. Also, your
ly as usage patterns change over the lifetime of
a cluster.
Does anyone have any real-world experience with LVM cache?
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Fri
Quick & dirty solution if only one OSD is full (likely as it looks
very unbalanced): take down the full OSD, delete data, take it back
online
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croi
What's the CPU busy with while spinning at 100%?
Check "perf top" for a quick overview
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Wed, Apr 8,
On Tue, Apr 7, 2020 at 6:49 PM Void Star Nill wrote:
> So is there a way to tell ceph to release the lock if the client becomes
> unavailable?
That's the task of the new client trying to take the lock, it needs to
kick out the old client and blacklist the connection to ensure
consistency.
A comm
The keyword to search for is "deferred writes", there are several
parameters that control the size and maximum number of ops that'll be
"cached". Increasing to 1 MB is probably a bad idea.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us a
the
> osd- not actual data. Thus a data-commit to the osd til still be dominated
> by the writelatency of the underlying - very slow HDD.
small writes (<= 32kb, configurable) are written to db first and
written back to the slow disk asynchronous to the original request.
--
Paul Emmerich
Lo
No, this is not supported. You must follow the upgrade order for
services. The reason is that many parts of RGW are implemented in the
OSD themselves, so you can't run a new RGW against an old OSD.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://cro
Safe to ignore/increase the warning threshold. You are seeing this
because the warning level was reduced to 200k from 2M recently.
The file will be sharded in a newer version which will clean this up
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https
tell it to create a
namespace)
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Wed, Mar 11, 2020 at 4:22 PM Rodrigo Severo - Fábrica
wrote:
>
> Em ter., 1
Encountered this one again today, I've updated the issue with new
information: https://tracker.ceph.com/issues/44184
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 18965
This indicates that there's something wrong with the config on that
mon node. The command should work on any Ceph node that has the
keyring.
You should check ceph.conf on the monitor node, maybe there's some
kind of misconfiguration that might cause other problems in the
future.
Paul
There's an xattr for this: ceph.snap.btime IIRC
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Tue, Mar 10, 2020 at 11:42 AM Marc Roos wrote:
>
>
>
uot;stop and then immediately remove before stopping
the next one"? Otherwise that's the problem.
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
>
> The 3 o
There's only one mon keyring that's shared by all mons, the mon user
therefore doesn't contain the mon name.
Try "-n mon."
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
"ceph df" is handled by the mgr, check if your mgr is up and running
and if the user has the necessary permissions for the mgr.
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49
On Mon, Mar 2, 2020 at 7:19 PM Alex Chalkias
wrote:
>
> Thanks for the update. Are you doing a beta-release prior to the official
> launch?
the first RC was tagged a few weeks ago:
https://github.com/ceph/ceph/tree/v15.1.0
Paul
>
>
> On Mon, Mar 2, 2020 at 7:12 PM Sage Weil wrote:
>
> > It's
Also: make a backup using the PG export feature of objectstore-tool
before doing anything else.
Sometimes it's enough to export and delete the PG from the broken OSD
and import it into a different OSD using objectstore-tool.
Paul
--
Paul Emmerich
Looking for help with your Ceph cl
-tool (but you should try to understand what
exactly is happening before running random ceph-objectstore-tool
commands)
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
O
I've also encountered this issue, but luckily without the crashing
OSDs, so marking as lost resolved it for us.
See https://tracker.ceph.com/issues/44286
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 Mü
Possible without downtime: Configure multi-site, create a new zone for
the new pool, let the cluster sync to itself, do a failover to the new
zone, delete old zone.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
x2 replication is perfectly fine as long as you also keep min_size at 2 ;)
(But that means you're offline as soon as something is offline)
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.i
On Wed, Feb 19, 2020 at 10:03 AM Wido den Hollander wrote:
>
>
>
> On 2/19/20 8:49 AM, Sean Matheny wrote:
> > Thanks,
> >
> >> If the OSDs have a newer epoch of the OSDMap than the MON it won't work.
> >
> > How can I verify this? (i.e the epoch of the monitor vs the epoch of the
> > osd(s))
> >
On Wed, Feb 19, 2020 at 7:26 AM Wido den Hollander wrote:
>
>
>
> On 2/18/20 6:54 PM, Paul Emmerich wrote:
> > I've also seen this problem on Nautilus with no obvious reason for the
> > slowness once.
>
> Did this resolve itself? Or did you remove the pool
I've also seen this problem on Nautilus with no obvious reason for the
slowness once.
In my case it was a rather old cluster that was upgraded all the way
from firefly
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
that's probably just https://tracker.ceph.com/issues/43893
(a harmless bug)
Restart the mons to get rid of the message
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 18965
fine.
Unrelated: It's usually not recommended to run the default data pool
on an ec pool, but I guess it's fine if it is the only pool.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel:
rting
it with Samba and not setting up Samba clustering. That's super easy.
If you need high availability or the Ceph VFS then the initial config
can be somewhat tricky (iSCSI HA is easier)
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit
Are you running a multi-site setup?
In this case it's best to set the default shard size to large enough
number *before* enabling multi-site.
If you didn't do this: well... I think the only way is still to
completely re-sync the second site...
Paul
--
Paul Emmerich
Looking for help
bably
also sufficient to just run "ceph osd down" on the primaries on the
affected PGs to get them to re-check.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Mon,
The warning threshold recently changed, I'd just increase it in this
particular case. It just means you have lots of open files.
I think there's some work going on to split the openfiles object into
multiple, so that problem will be fixed.
Paul
--
Paul Emmerich
Looking for help
f a 70k
> files linux source tree went from 15 s to 6 minutes on a local filesystem
> I have at hand.
Don't do it for every file: cp foo bar; sync
>
> Best regards,
> Håkan
>
>
>
> >
> >
> > Paul
> >
> > --
> > Paul Emmerich
If you don't care about the data: set
osd_find_best_info_ignore_history_les = true on the affected OSDs
temporarily.
This means losing data.
For anyone else reading this: don't ever use this option. It's evil
and causes data loss (but gets your PG back and active, yay!)
Paul
--
On Fri, Jan 31, 2020 at 2:06 PM EDH - Manuel Rios
wrote:
>
> Hmm change 40Gbps to 100Gbps networking.
>
> 40Gbps technology its just a bond of 4x10 Links with some latency due link
> aggregation.
> 100 Gbps and 25Gbps got less latency and Good performance. In ceph a 50% of
> the latency comes fr
Yes, data that is not synced is not guaranteed to be written to disk,
this is consistent with POSIX semantics.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Mon
min_size in the crush rule and min_size in the pool are completely
different things that happen to share the same name.
Ignore min_size in the crush rule, it has virtually no meaning in
almost all cases (like this one).
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact
pool application set cephfs
To work with "ceph fs authorize"
We automatically runs this on croit on startup on all cephfs pools to
make the permissions work properly for our users.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit Gm
Sorry, we no longer have these test drives :(
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Thu, Jan 16, 2020 at 1:48 PM wrote:
> Hi,
>
> The res
aster for writes, somewhat faster for reads in some scenarios
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
Looking for Ceph training? We have some free spots available
https://croit.io/training/4-days-ceph-in-depth-training
croit GmbH
Freseniusstr
You got a ~300MB object in there. BlueStore's default limit is 128MB
(option name that controls it is osd_max_object_size)
I think the scrub warning is new/was backported, so the object is probably
older than BlueStore on this cluster, it's only now showing up as a warning.
Paul
https://tracker.ceph.com/issues/42583 ). We also had some problems during
upgrades in the earlier Nautilus releases, but that seems to be fixed.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.i
27;t mix versions like that. Running nautilus and jewel at the same
time is unsupported. Upgrade everything and check if that solves your
problem.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
We're also seeing unusually high mgr CPU usage on some setups, the only
thing they have in common seem to > 300 OSDs.
Threads using the CPU are "mgr-fin" and and "ms_dispatch"
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at
An OSD that is down does not recover or backfill. Faster recovery or
backfill will not resolve down OSDs
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Mon, Dec
This message is expected.
But your current situation is a great example of why having a separate
cluster network is a bad idea in most situations.
First thing I'd do in this scenario is to get rid of the cluster network
and see if that helps
Paul
--
Paul Emmerich
Looking for help with
Home directories probably means lots of small objects. Default minimum
allocation size of BlueStore on HDD is 64 kiB, so there's a lot of overhead
for everything smaller;
Details: google bluestore min alloc size, can only be changed during OSD
creation
Paul
--
Paul Emmerich
Looking for
autilus,
you have to scrub everything on Luminous first as the first scrub on
Luminous performs some data structure migrations that are no longer
supported on Nautilus.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247
It's pretty pointless to discuss erasure coding vs replicated without
knowing how it'll be used.
There are setups where erasure coding is faster than replicated. You
do need to write less data overall, so if that's your bottleneck then
erasure coding will be faster.
Paul
--
Gateway removal is indeed supported since ceph-iscsi 3.0 (or was it
2.7?) and it works while it is offline :)
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Tue
On Mon, Dec 2, 2019 at 4:55 PM Simon Ironside wrote:
>
> Any word on 14.2.5? Nervously waiting here . . .
real soon, the release is 99% done (check the corresponding thread on
the devel mailing list)
Paul
>
> Thanks,
> Simon.
>
> On 18/11/2019 11:29, Simon Ironside wrote:
>
> > I will sit tig
ght about building a specialized cache mode that
just acts as a write buffer, there are quite a few applications that
would benefit from that.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Te
It should take ~25 seconds by default to detect a network failure, the
config option that controls this is "osd heartbeat grace" (default 20
seconds, but it takes a little longer for it to really detect the
failure).
Check ceph -w while performing the test.
Paul
--
Paul Emmerich
L
s for disaster recovery
only, it'll guarantee durability if you lose a room but not
availability.
3+2 erasure coding cannot be split across two rooms in this way
because, well, you need 3 out of 5 shards to survive, so you cannot
lose half of them.
Paul
--
Paul Emmerich
Looking for help with
On Fri, Nov 22, 2019 at 9:33 PM Zoltan Arnold Nagy
wrote:
> The 2^31-1 in there seems to indicate an overflow somewhere - the way we
> were able to figure out where exactly
> is to query the PG and compare the "up" and "acting" sets - only _one_
> of them had the 2^31-1 number in place
> of the c
There should be a warning that says something like "all OSDs are
running nautilus but require-osd-release nautilus is not set"
That warning did exist for older releases, pretty sure nautilus also has it?
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact u
e new locations one by
one and then remove all upmaps and change the rule)... but that's quite
annoying to do and probably doesn't really help.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
Correct, we don't package ceph-deploy, sorry.
ceph-deploy is currently unmaintained, I wouldn't use it for a
production setup at the moment.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
ww
You have way too few PGs in one of the roots. Many OSDs have so few
PGs that you should see a lot of health warnings because of it.
The other root has a factor 5 difference in disk size which isn't ideal either.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact
We maintain an unofficial mirror for Buster packages:
https://croit.io/2019/07/07/2019-07-07-debian-mirror
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Mon, Nov
On Wed, Nov 6, 2019 at 5:57 PM Hermann Himmelbauer wrote:
>
> Dear Vitaliy, dear Paul,
>
> Changing the block size for "dd" makes a huge difference.
>
> However, still some things are not fully clear to me:
>
> As recommended, I tried writing / reading directly to the rbd and this
> is blazingly f
We maintain Nautilus packages for Buster, see
https://croit.io/2019/07/07/2019-07-07-debian-mirror
However, Stretch will probably never support Nautilus because Debian
doesn't do backports of GCC
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://cro
On Mon, Nov 4, 2019 at 11:44 PM Hermann Himmelbauer wrote:
>
> Hi,
> I recently upgraded my 3-node cluster to proxmox 6 / debian-10 and
> recreated my ceph cluster with a new release (14.2.4 bluestore) -
> basically hoping to gain some I/O speed.
>
> The installation went flawlessly, reading is fa
Looks like you didn't tell the whole story, please post the *full*
output of ceph -s and ceph osd df tree.
Wild guess: you need to increase "mon max pg per osd"
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniu
up automatically
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Thu, Oct 31, 2019 at 2:27 PM Thomas Schneider <74cmo...@gmail.com> wrote:
>
> Hi,
>
> a
On Fri, Oct 25, 2019 at 11:14 PM Maged Mokhtar wrote:
> 3. vmotion between Ceph datastore and an external datastore..this will be
> bad. This seems the case you are testing. It is bad because between 2
> different storage systems (iqns are served on different targets), vaai xcopy
> cannot be us
1 - 100 of 150 matches
Mail list logo