Hi Gary,
It looks like everything you did is fine. I think the "problem" is
that cephadm has/had some logic that tried to leave users with an odd
number of monitors. I'm pretty sure this is why two of them were
removed.
This code has been removed in pacific, and should probably be
backported to
Hi Alex,
Thanks for the report! I've opened
https://tracker.ceph.com/issues/50114. It looks like the
target_digests check needs to check for overlap instead of equality.
sage
On Fri, Apr 2, 2021 at 4:04 AM Alexander Sporleder
wrote:
>
> Hello Ceph user list!
>
> I tried to update Ceph 15.2.10
I'm a bit confused by the log messages--I'm not sure why the
target_digests aren't changing. Can you post the whole
ceph-mgr.mon-a-02.tvcrfq.log? (ceph-post-file
/var/log/ceph/*/ceph-mgr.mon-a-02.tvcrfq.log)
Thanks!
s
___
ceph-users mailing list -- cep
On Fri, Apr 2, 2021 at 12:08 PM Alexander Sporleder
wrote:
>
> Hello Sage, thank you for your response!
>
> I had some problems updating 15.2.8 -> 15.2.9 but after updating Podman
> to 3.0.1 and Ceph to 15.2.10 everything was fine again.
>
> Then I started the update 15.2.10 -> 16.2.0 and in the b
I have a proposed fix for this here: https://github.com/ceph/ceph/pull/40577
Unfortunately, this won't help you until it is tested and merged and
included in 16.2.1. If you'd like to finish your upgrade before then,
you can upgrade to the pacific branch tip with
ceph orch upgrade start quay.ce
Can you share the output of 'ceph log last cephadm'? I'm wondering if
you are hitting https://tracker.ceph.com/issues/50114
Thanks!
s
On Mon, Apr 5, 2021 at 4:00 AM Peter Childs wrote:
>
> I am attempting to upgrade a Ceph Upgrade cluster that was deployed with
> Octopus 15.2.8 and upgraded to
You would normally tell cephadm to deploy another mgr with 'ceph orch
apply mgr 2'. In this case, the default placement policy for mgrs is
already either 2 or 3, though--the problem is that you only have 1
host in your cluster, and cephadm currently doesn't handle placing
multiple mgrs on a single
Hi Seba,
The RGW HA mode is still buggy, and is getting reworked. I'm hoping
we'll have it sorted by the .2 release or so. In the meantime, you
can configure haproxy and/or keepalived yourself or use whatever other
load balancer you'd like...
s
On Sat, Apr 3, 2021 at 9:39 PM Seba chanel wrote
Hi!
I hit the same issue. This was a bug in 16.2.0 that wasn't completely
fixed, but I think we have it this time. Kicking of a 16.2.3 build
now to resolve the problem.
(Basically, sometimes docker calls the image docker.io/ceph/ceph:foo
and somethings it's ceph/ceph:foo, and our attempt to nor
The root cause is a bug in conmon. If you can upgrade to >= 2.0.26
this will also fix the problem. What version are you using? The
kubic repos currently have 2.0.27. See
https://build.opensuse.org/project/show/devel:kubic:libcontainers:stable
We'll make sure the next release has the verbosity
I'm arriving late to this thread, but a few things stood out that I
wanted to clarify.
On Wed, Jun 2, 2021 at 4:28 PM Oliver Freyermuth
wrote:
> To conclude, I strongly believe there's no one size fits all here.
>
> That was why I was hopeful when I first heard about the Ceph orchestrator
> idea
On Thu, Jun 3, 2021 at 2:18 AM Marc wrote:
> Not using cephadm, I would also question other things like:
>
> - If it uses docker and docker daemon fails what happens to you containers?
This is an obnoxious feature of docker; podman does not have this problem.
> - I assume the ceph-osd containers
On Wed, Jun 2, 2021 at 9:01 AM Daniel Baumann wrote:
> > * Ceph users will benefit from both approaches being supported into the
> > future
>
> this is rather important for us as well.
>
> we use systemd-nspawn based containers (that act and are managed like
> traditional VMs, just without the ov
Following up with some general comments on the main container
downsides and on the upsides that led us down this path in the first
place.
Aside from a few minor misunderstandings, it seems like most of the
objections to containers boil down to a few major points:
> Containers are more complicated
On Sat, Jun 19, 2021 at 3:43 PM Nico Schottelius
wrote:
> Good evening,
>
> as an operator running Ceph clusters based on Debian and later Devuan
> for years and recently testing ceph in rook, I would like to chime in to
> some of the topics mentioned here with short review:
>
> Devuan/OS package:
On Sun, Jun 20, 2021 at 9:51 AM Marc wrote:
> Remarks about your cephadm approach/design:
>
> 1. I am not interested in learning podman, rook or kubernetes. I am using
> mesos which is also on my osd nodes to use the extra available memory and
> cores. Furthermore your cephadm OC is limited to o
On Tue, Jun 22, 2021 at 11:58 AM Martin Verges wrote:
>
> > There is no "should be", there is no one answer to that, other than 42.
> Containers have been there before Docker, but Docker made them popular,
> exactly for the same reason as why Ceph wants to use them: ship a known
> good version (CI
On Tue, Jun 22, 2021 at 1:25 PM Stefan Kooman wrote:
> On 6/21/21 6:19 PM, Nico Schottelius wrote:
> > And while we are at claiming "on a lot more platforms", you are at the
> > same time EXCLUDING a lot of platforms by saying "Linux based
> > container" (remember Ceph on FreeBSD? [0]).
>
> Indeed
On Fri, Jun 25, 2021 at 10:27 AM Nico Schottelius
wrote:
> Hey Sage,
>
> Sage Weil writes:
> > Thank you for bringing this up. This is in fact a key reason why the
> > orchestration abstraction works the way it does--to allow other
> > runtime environments to be suppo
IIRC 'ceph health mute' is new in octopus (15.2.x). But disabling the
mon_warn_on_insecure_global_id_reclaim_allowed setting should be
sufficient to make the cluster be quiet...
On Mon, Jul 19, 2021 at 10:53 AM Siegfried Höllrigl
wrote:
>
> Hi !
>
> We have upgraded our Ceph Cluster to version 1
Hi everyone,
We set up a pad to collect Ceph-related job listings. If you're
looking for a job, or have a Ceph-related position to advertise, take
a look:
https://pad.ceph.com/p/jobs
sage
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscr
This fall I will be stepping back from a leadership role in the Ceph
project. My primary focus during the next two months will be to work with
developers and community members to ensure a smooth transition to a more
formal system of governance for the Ceph project. My last day at Red Hat
will be in
Hi Manuel,
I'm looking at the ticket for this issue (
https://tracker.ceph.com/issues/51463) and tried to reproduce. This was
initially trivial to do with vstart (rados bench paused for many seconds
afters stopping an osd) but it turns out that was because the vstart
ceph.conf includes `osd_fast_
memory serves, yes, but the notify_mon process can take more time than a
peer OSD getting ECONNREFUSED. The combination above is the recommended
combation (and the default).
> These days I will test the fast_shutdown switch again and will share the
> corresponding logs with you.
>
Thanks!
s
process that follows needs to get OSDs' up_thru values to
update and there is delay there.
Thanks!
sage
On Thu, Nov 4, 2021 at 4:15 AM Manuel Lausch wrote:
> On Tue, 2 Nov 2021 09:02:31 -0500
> Sage Weil wrote:
>
>
> >
> > Just to be clear, you should try
&
Yeah, I think two different things are going on here.
The read leases were new, and I think the way that OSDs are marked down is
the key things that affects that behavior. I'm a bit surprised that the
_notify_mon option helps there, and will take a closer look at that Monday
to make sure it's doin
is empty, expect of the epoch
> and creation date.
>
That is concerning. Can you set debug_mon = 20 and capture a minute or so
of logs? (Enough to include a few osdmap epochs.) You can use
ceph-post-file to send it to us.
Thanks!
sage
>
>
> Manuel
>
>
> On Fri, 5 Nov
rue;
> + }
>if (!HAVE_FEATURE(recovery_state.get_min_upacting_features(),
> SERVER_OCTOPUS)) {
> return true;
>
>
>
> Von: Peter Lieven
> Gesendet: Mittwoch, 10. November 2021 11:37
> An: Manuel Lausch; Sage Weil
> Cc: ceph-users@ceph.io
> Be
load threshold = 1
> osd scrub priority = 1
> osd scrub thread suicide timeout = 0
> osd snap trim priority = 1
> osd snap trim sleep = 1.0
> public network = 10.88.7.0/24
>
> [mon]
> mon allow pool delete = false
> mon health preluminous compat warning = false
ributing factors (in this case, at least 3).
sage
On Tue, Nov 16, 2021 at 9:42 AM Sage Weil wrote:
> On Tue, Nov 16, 2021 at 8:30 AM Manuel Lausch
> wrote:
>
>> Hi Sage,
>>
>> its still the same cluster we talked about. I only upgraded it from
>> 16.2.5 to 16.2
t;
> best regards,
>
> samuel
>
> --
> huxia...@horebdata.cn
>
>
> *From:* Sage Weil
> *Date:* 2021-11-18 22:02
> *To:* Manuel Lausch ; ceph-users
>
> *Subject:* [ceph-users] Re: OSD spend too much time on "waiting for
> rea
https://github.com/ceph/ceph/pull/44228
I don't think this has landed in a pacific backport yet, but probably will
soon!
s
On Tue, Jan 11, 2022 at 6:29 PM Bryan Stillwell
wrote:
> I recently had a server (named aladdin) that was part of my home cluster
> die. It held 6 out of 32 OSDs, so to p
On Thu, 8 Aug 2019, Christian Balzer wrote:
>
> Hello again,
>
> Getting back to this:
> On Sun, 4 Aug 2019 10:47:27 +0900 Christian Balzer wrote:
>
> > Hello,
> >
> > preparing the first production bluestore, nautilus (latest) based cluster
> > I've run into the same things other people and my
On Fri, 9 Aug 2019, Florian Haas wrote:
> Hi everyone,
>
> it seems there have been several reports in the past related to
> BlueStore OSDs crashing from unhandled errors in _txc_add_transaction:
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-April/03.html
> http://lists.ceph.co
Hi,
On Mon, 23 Sep 2019, Koebbe, Brian wrote:
> Our cluster has a little over 100 RBDs. Each RBD is snapshotted with a
> typical "frequently", hourly, daily, monthly type of schedule.
> A while back a 4th monitor was temporarily added to the cluster that took
> hours to synchronize with the oth
g a full record of past span deletions was changed, so we may
need to make further improvements for octopus.
Thanks!
sage
>
> From: Sage Weil
> Sent: Monday, September 23, 2019 9:41 AM
> To: Koebbe, Brian
> Cc: ceph-users@ceph.io ; d...@ceph.io
> Subject: Re: [ce
On Mon, 30 Sep 2019, Reed Dier wrote:
> I currently have two roots in my crush map, one for HDD devices and one for
> SSD devices, and have had it that way since Jewel.
>
> I am currently on Nautilus, and have had my crush device classes for my OSD's
> set since Luminous.
>
> > ID CLASS WEIGHT
On Tue, 1 Oct 2019, f...@lpnhe.in2p3.fr wrote:
> Hi,
> We have a ceph+cephfs cluster runing nautilus version 14.2.4
> We have debian buster/ubuntu bionic clients mounting cephfs in kernel mode
> without problems.
> We now want to mount cephfs from our new centos 8 clients. Unfortunately,
> ceph-c
[adding dev]
On Wed, 9 Oct 2019, Aaron Johnson wrote:
> Hi all
>
> I have a smallish test cluster (14 servers, 84 OSDs) running 14.2.4.
> Monthly OS patching and reboots that go along with it have resulted in
> the cluster getting very unwell.
>
> Many of the servers in the cluster are OOM-ki
On Wed, 23 Oct 2019, Paul Emmerich wrote:
> Hi,
>
> I'm working on a curious case that looks like a bug in PG merging
> maybe related to FileStore.
>
> Setup is 14.2.1 that is half BlueStore half FileStore (being
> migrated), and the number of PGs on an RGW index pool were reduced,
> now one of t
On Fri, 25 Oct 2019, dhils...@performair.com wrote:
> All;
>
> We're setting up our second cluster, using version 14.2.4, and we've run into
> a weird issue: all of our OSDs are created with a size of 0 B. Weights are
> appropriate for the size of the underlying drives, but ceph -s shows this:
This was fixed a few weeks back. It should be resolved in 14.2.5.
https://tracker.ceph.com/issues/41567
https://github.com/ceph/ceph/pull/31100
sage
On Fri, 1 Nov 2019, Lars Täuber wrote:
> Is there anybody who can explain the overcommitment calcuation?
>
> Thanks
>
>
> Mon, 28 Oct 2019 11
On Sat, 2 Nov 2019, Oliver Freyermuth wrote:
> Dear Cephers,
>
> interestingly, after:
> ceph device monitoring off
> the mgrs seem to be stable now - the active one still went silent a few
> minutes later,
> but the standby took over and was stable, and restarting the broken one, it's
> now st
on again,
> > and am waiting for them to become silent again. Let's hope the issue
> > reappears before the disks run full of logs ;-).
> >
> > Cheers,
> > Oliver
> >
> > Am 02.11.19 um 02:56 schrieb Sage Weil:
> >> On Sat, 2 Nov 20
ow disk health monitoring is disabled) -
> > but updating the mgrs alone should also be fine with us. I hope to
> > have time for the experiment later today ;-).
> >
> > Cheers,
> > Oliver
> >
> > Am 07.11.19 um 08:57 schrieb Thomas Schneider:
> >> H
On Sun, 10 Nov 2019, c...@elchaka.de wrote:
> IIRC there is a ~history_ignore Option which could be Help in your Test
> environment.
This option is dangerous and can lead to data loss if used incorrectly.
I suggest making backups of all PG instances with ceph-objectstore-tool
before using it.
Hi everyone,
We've identified a data corruption bug[1], first introduced[2] (by yours
truly) in 14.2.3 and affecting both 14.2.3 and 14.2.4. The corruption
appears as a rocksdb checksum error or assertion that looks like
os/bluestore/fastbmap_allocator_impl.h: 750: FAILED ceph_assert(available
Hi everyone,
We're pleased to announce that the next Cephalocon will be March 3-5 in
Seoul, South Korea!
https://ceph.com/cephalocon/seoul-2020/
The CFP for the conference is now open:
https://linuxfoundation.smapply.io/prog/cephalocon_2020
Main conference: March 4-5
Developer
This is the seventh bugfix release of the Mimic v13.2.x long term stable
release series. We recommend all Mimic users upgrade.
For the full release notes, see
https://ceph.io/releases/v13-2-7-mimic-released/
Notable Changes
MDS:
- Cache trimming is now throttled. Dropping the MDS cac
> > If you are not comfortable sharing device metrics, you can disable that
> > channel first before re-opting-in:
> >
> > ceph config set mgr mgr/telemetry/channel_crash false
>
> This should be channel_device, right?
Yep!
https://github.com/ceph/ceph/pull/32148
Thanks,
sage
__
Hi everyone,
The next Cephalocon is coming up on March 3-5 in Seoul! The CFP is open
until Friday (get your talks in!). We expect to have the program
ready for the first week of January. Registration (early bird) will be
available soon.
We're also looking for sponsors for the conference. T
On Tue, 10 Dec 2019, Sage Weil wrote:
> Hi everyone,
>
> The next Cephalocon is coming up on March 3-5 in Seoul! The CFP is open
> until Friday (get your talks in!). We expect to have the program
> ready for the first week of January. Registration (early bird) will be
&
On Fri, 13 Dec 2019, Manuel Lausch wrote:
> Hi,
>
> I am interested in el8 Packages as well.
> Is there any plan to provide el8 packages in the near future?
Ceph Octopus will be based on CentOS 8. It's due out in March.
The centos8 transition is awkward because our python 2 dependencies don't
On Wed, 18 Dec 2019, Bryan Stillwell wrote:
> After upgrading one of our clusters from Nautilus 14.2.2 to Nautilus 14.2.5
> I'm seeing 100% CPU usage by a single ceph-mgr thread (found using 'top -H').
> Attaching to the thread with strace shows a lot of mmap and munmap calls.
> Here's the dis
Hi everyone,
Quick reminder that the early-bird registration for Cephalocon Seoul (Mar
3-5) ends tonight! We also have the hotel booking link and code up on the
site (finally--sorry for the delay).
https://ceph.io/cephalocon/seoul-2020/
Hope to see you there!
sage
On Tue, 28 Jan 2020, dhils...@performair.com wrote:
> All;
>
> I haven't had a single email come in from the ceph-users list at ceph.io
> since 01/22/2020.
>
> Is there just that little traffic right now?
I'm seeing 10-20 messages per day. Confirm your registration and/or check
your filters?
Hi everyone,
We are sorry to announce that, due to the recent coronavirus outbreak, we
are canceling Cephalocon for March 3-5 in Seoul.
More details will follow about how to best handle cancellation of hotel
reservations and so forth. Registrations will of course be
refunded--expect an email
[Moving this to ceph-users@ceph.io]
This looks like https://tracker.ceph.com/issues/43365, which *looks* like
it is an issue with the standard libraries in ubuntu 18.04. One user
said: "After upgrading our monitor Ubuntu 18.04 packages (apt-get upgrade)
with the 5.3.0-26-generic kernel, it see
There is a 'packaged' mode that does this, but it's a bit different:
- you have to install the cephadm package on each host
- the package sets up a cephadm user and sudoers.d file
- mgr/cephadm will ssh in as that user and sudo as needed
The net is that you have to make sure cephadm is installed
If the pg in question can recover without that OSD, I would use
use ceph-objectstore-tool to export and remove it, and then move on.
I hit a similar issue on my system (due to a bunch in an early octopus
build) and it was super tedious to fix up manually (needed patched
code and manual modificat
It's getting close. My guess is 1-2 weeks away.
On Mon, 2 Mar 2020, Alex Chalkias wrote:
> Hello,
>
> I was looking for an official announcement for Octopus release, as the
> latest update (back in Q3/2019) on the subject said it was scheduled for
> March 1st.
>
> Any updates on that?
>
> BR,
On Thu, 5 Mar 2020, Dan van der Ster wrote:
> Hi all,
>
> There's something broken in our env when we try to add new mons to
> existing clusters, confirmed on two clusters running mimic and
> nautilus. It's basically this issue
> https://tracker.ceph.com/issues/42830
>
> In case something is wron
On Thu, 5 Mar 2020, Dan van der Ster wrote:
> Hi Sage,
>
> On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote:
> >
> > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > Hi all,
> > >
> > > There's something broken in our env when we try to add
On Thu, 5 Mar 2020, Dan van der Ster wrote:
> On Thu, Mar 5, 2020 at 4:42 PM Sage Weil wrote:
> >
> > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > Hi Sage,
> > >
> > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote:
> > > >
> >
On Thu, 5 Mar 2020, Dan van der Ster wrote:
> On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster wrote:
> >
> > On Thu, Mar 5, 2020 at 8:05 PM Sage Weil wrote:
> > >
> > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > On Thu, Mar 5, 2020 at 4:42 PM
On Sat, 7 Mar 2020, m...@silvenga.com wrote:
> Is there another way to disable telemetry then using:
>
> > ceph telemetry off
> > Error EIO: Module 'telemetry' has experienced an error and cannot handle
> > commands: cannot concatenate 'str' and 'UUID' objects
>
> I'm attempting to get all my cl
This is a known issue--it will be fixed in the next nautilus point
release.
On Tue, 17 Mar 2020, Tecnologia Charne.Net wrote:
> Hello!
>
> I updated monitors to 14.2.8 and I have now:
>
> health: HEALTH_ERR
> Module 'telemetry' has failed: cannot concatenate 'str' and 'UUID'
> obje
Hi everyone,
As we wrap up Octopus and kick of development for Pacific, now it seems
like a good idea to sort out what to call the Q release.
Traditionally/historically, these have always been names of cephalopod
species--usually the "common name", but occasionally a latin name
(infernalis).
On Tue, 24 Mar 2020, konstantin.ilya...@mediascope.net wrote:
> Is it poosible to provide instructions about upgrading from CentOs7+
> ceph 14.2.8 to CentOs8+ceph 15.2.0 ?
You have ~2 options:
- First, upgrade Ceph packages to 15.2.0. Note that your dashboard will
break temporarily. Then, upg
One word of caution: there is one known upgrade issue if you
- upgrade from luminous to nautilus, and then
- run nautilus for a very short period of time (hours), and then
- upgrade from nautilus to octopus
that prevents OSDs from starting. We have a fix that will be in 15.2.1,
but until tha
Hi everyone,
I am taking time off from the Ceph project and from Red Hat, starting in
April and extending through the US election in November. I will initially
be working with an organization focused on voter registration and turnout
and combating voter suppression and disinformation campaigns.
On Mon, 30 Mar 2020, Ml Ml wrote:
> Hello List,
>
> is this a bug?
>
> root@ceph02:~# ceph cephadm generate-key
> Error EINVAL: Traceback (most recent call last):
> File "/usr/share/ceph/mgr/cephadm/module.py", line 1413, in _generate_key
> with open(path, 'r') as f:
> FileNotFoundError: [E
72 matches
Mail list logo