https://bugs.launchpad.net/ubuntu/+source/libpod/+bug/2040483
https://bugs.launchpad.net/ubuntu/+source/containerd-app/+bug/2065423
I wonder if you're running into fallout from the above bug. I believe a fix
should be rolling out soon, according to those bugs. We ran into a multitude of
seemingl
I'm not sure, but that's going to break with a lot of people's Pacific
specifications when they upgrade. We heavily utilize this functionality, and
use different device class names for a lot of good reasons. This seems like a
regression to me.
David
On Thu, Oct 3, 2024, at 16:20, Eugen Block w
CLT discussion on Sep 09, 2024
19.2.0 release:
* Cherry picked patch: https://github.com/ceph/ceph/pull/59492
* Approvals requested for re-runs
CentOS Stream/distribution discussions ongoing
* Significant implications in infrastructure for building/testing requiring
ongoing discussions/work to d
Not at all, you're doing the right thing. That's exactly how I would do things
if I were setting out to deploy Ceph on bare metal today. Pick a very stable
underlying distribution and run Ceph in containers. That's exactly what I'm
doing on a massive scale, and it's been one of the best decision
What operating system/distribution are you running? What hardware?
David
On Tue, Aug 6, 2024, at 02:20, Nicola Mori wrote:
> I think I found the problem. Setting the cephadm log level to debug and
> then watching the logs during the upgrade:
>
>ceph config set mgr mgr/cephadm/log_to_cluster_
gt;
>> On Apr 24, 2024, at 15:37, David Orman wrote:
>>
>> Did you ever figure out what was happening here?
>>
>> David
>>
>> On Mon, May 29, 2023, at 07:16, Hector Martin wrote:
>>> On 29/05/2023 20.55, Anthony D'Atri wrote:
>>>&g
Did you ever figure out what was happening here?
David
On Mon, May 29, 2023, at 07:16, Hector Martin wrote:
> On 29/05/2023 20.55, Anthony D'Atri wrote:
>> Check the uptime for the OSDs in question
>
> I restarted all my OSDs within the past 10 days or so. Maybe OSD
> restarts are somehow breakin
I would suggest that you might consider EC vs. replication for index data, and
the latency implications. There's more than just the nvme vs. rotational
discussion to entertain, especially if using the more widely spread EC modes
like 8+3. It would be worth testing for your particular workload.
That tracker's last update indicates it's slated for inclusion.
On Thu, Feb 1, 2024, at 10:47, Zakhar Kirpichenko wrote:
> Hi,
>
> Please consider not leaving this behind:
> https://github.com/ceph/ceph/pull/55109
>
> It's a serious bug, which potentially affects a whole node stability if
> the
Hi,
Just looking back through PyO3 issues, it would appear this functionality was
never supported:
https://github.com/PyO3/pyo3/issues/3451
https://github.com/PyO3/pyo3/issues/576
It just appears attempting to use this functionality (which does not
work/exist) wasn't successfully prevented pre
The "right" way to do this is to not run your metrics system on the cluster you
want to monitor. Use the provided metrics via the exporter and ingest them
using your own system (ours is Mimir/Loki/Grafana + related alerting), so if
you have failures of nodes/etc you still have access to, at a mi
Happy 2024!
Today's CLT meeting covered the following:
1. 2024 brings a focus on performance of Crimson (some information here:
https://docs.ceph.com/en/reef/dev/crimson/crimson/ )
1. Status is available here: https://github.com/ceph/ceph.io/pull/635
2. There will be a new Crimson perform
eouts will likely
happens, so the impact won't be non-zero, but it also won't be catastrophic.
David
On Fri, Nov 17, 2023, at 10:09, David Orman wrote:
> Use BGP/ECMP with something like exabgp on the haproxy servers.
>
> David
>
> On Fri, Nov 17, 2023, at 04:09, Boris Behren
Use BGP/ECMP with something like exabgp on the haproxy servers.
David
On Fri, Nov 17, 2023, at 04:09, Boris Behrens wrote:
> Hi,
> I am looking for some experience on how people make their RGW public.
>
> Currently we use the follow:
> 3 IP addresses that get distributed via keepalived between th
I would suggest updating: https://tracker.ceph.com/issues/59580
We did notice it with 16.2.13, as well, after upgrading from .10, so likely
in-between those two releases.
David
On Fri, Sep 8, 2023, at 04:00, Loïc Tortay wrote:
> On 07/09/2023 21:33, Mark Nelson wrote:
>> Hi Rok,
>>
>> We're st
Hi,
I do not believe this is actively being worked on, but there is a tracker open,
if you can submit an update it may help attract attention/develop a fix:
https://tracker.ceph.com/issues/59580
David
On Fri, Sep 8, 2023, at 03:29, Chris Palmer wrote:
> I first posted this on 17 April but did
https://github.com/ceph/ceph/pull/48070 may be relevant.
I think this may have gone out in 16.2.11. I would tend to agree, personally
this feels quite noisy at default logging levels for production clusters.
David
On Thu, Aug 31, 2023, at 11:17, Zakhar Kirpichenko wrote:
> This is happening to
I'm hoping to see at least one more, if not more than that, but I have no
crystal ball. I definitely support this idea, and strongly suggest it's given
some thought. There have been a lot of delays/missed releases due to all of the
lab issues, and it's significantly impacted the release cadence
Someone who's got data regarding this should file a bug report, it sounds like
a quick fix for defaults if this holds true.
On Sat, May 20, 2023, at 00:59, Hector Martin wrote:
> On 17/05/2023 03.07, 胡 玮文 wrote:
>> Hi Sake,
>>
>> We are experiencing the same. I set “osd_mclock_cost_per_byte_usec
You may want to consider disabling deep scrubs and scrubs while attempting to
complete a backfill operation.
On Tue, Apr 18, 2023, at 01:46, Eugen Block wrote:
> I didn't mean you should split your PGs now, that won't help because
> there is already backfilling going on. I would revert the pg_n
I've seen what appears to be the same post on Reddit, previously, and attempted
to assist. My suspicion is a "stop" command was passed to ceph orch upgrade in
an attempt to stop it, but with the --image flag preceding it, setting the
image to stop. I asked the user to do an actual upgrade stop,
If it's a test cluster, you could try:
root@ceph01:/# radosgw-admin bucket check -h |grep -A1 check-objects
--check-objects bucket check: rebuilds bucket index according to
actual objects state
On Wed, Feb 22, 2023, at 02:22, Robert Sander wrote:
> On 21
MB, 10 MiB) copied, 0.0675647 s, 155 MB/s
>>> --> Zapping successful for:
>>>
>>>
>>> root@ceph-a2-01:/# ceph orch device ls
>>>
>>> ceph-a1-06 /dev/sdm hdd TOSHIBA_X_X 16.0T 21m ago *locked*
>>>
>>>
>>> It shows l
ph-Dashboard.
>
>
> pgs: 3236 active+clean
>
>
> This is the new disk shown as locked (because unzapped at the moment).
>
> # ceph orch device ls
>
> ceph-a1-06 /dev/sdm hdd TOSHIBA_X_X 16.0T 9m ago
> locked
>
>
> Best
&g
What does "ceph orch osd rm status" show before you try the zap? Is your
cluster still backfilling to the other OSDs for the PGs that were on the failed
disk?
David
On Fri, Jan 27, 2023, at 03:25, mailing-lists wrote:
> Dear Ceph-Users,
>
> i am struggling to replace a disk. My ceph-cluster is
I think this would be valuable to have easily accessible during runtime,
perhaps submit a report (and patch if possible)?
David
On Fri, Jan 13, 2023, at 08:14, Robert Sander wrote:
> Hi,
>
> Am 13.01.23 um 14:35 schrieb Konstantin Shalygin:
>
> > ceph-kvstore-tool bluestore-kv /var/lib/ceph/os
or
> everything, but there must be numerous Ceph sites with hundreds of OSD nodes,
> so I'm a bit surprised this isn't more automated...
>
> Cheers,
>
> Erik
>
> --
> Erik Lindahl
> On 10 Jan 2023 at 00:09 +0100, Anthony D'Atri , wrote:
> &g
data losses, but for us we figured
> it's worth replacing a few outlier drives to sleep better.
>
> Cheers,
>
> Erik
>
> --
> Erik Lindahl
> On 9 Jan 2023 at 23:06 +0100, David Orman , wrote:
> > "dmesg" on all the linux hosts and look for
"dmesg" on all the linux hosts and look for signs of failing drives. Look at
smart data, your HBAs/disk controllers, OOB management logs, and so forth. If
you're seeing scrub errors, it's probably a bad disk backing an OSD or OSDs.
Is there a common OSD in the PGs you've run the repairs on?
On
Today's CLT meeting had the following topics of discussion:
* Docs questions
* crushtool options could use additional documentation
* This is being addressed
* sticky header on documentation pages obscuring titles when anchor links
are used
* There will be a follow-up email solic
This was a short meeting, and in summary:
* Testing of upgrades for 17.2.4 in Gibba commenced and slowness during
upgrade has been investigated.
* Workaround available; not a release blocker
___
ceph-users mailing list -- ceph-users@ceph.io
To unsub
Yes. Rotational drives can generally do 100-200IOPS (some outliers, of
course). Do you have all forms of caching disabled on your storage
controllers/disks?
On Tue, Sep 6, 2022 at 11:32 AM Vladimir Brik <
vladimir.b...@icecube.wisc.edu> wrote:
> Setting osd_mclock_force_run_benchmark_on_init to t
https://github.com/ceph/ceph/pull/46480 - you can see the backports/dates
there.
Perhaps it isn't in the version you're running?
On Thu, Aug 4, 2022 at 7:51 AM Kenneth Waegeman
wrote:
> Hi all,
>
> I’m trying to deploy this spec:
>
> spec:
> data_devices:
> model: Dell Ent NVMe AGN MU U.2
Apologies, backport link should be: https://github.com/ceph/ceph/pull/46845
On Fri, Jul 15, 2022 at 9:14 PM David Orman wrote:
> I think you may have hit the same bug we encountered. Cory submitted a
> fix, see if it fits what you've encountered:
>
> https://github.com/cep
I think you may have hit the same bug we encountered. Cory submitted a fix,
see if it fits what you've encountered:
https://github.com/ceph/ceph/pull/46727 (backport to Pacific here:
https://github.com/ceph/ceph/pull/46877 )
https://tracker.ceph.com/issues/54172
On Fri, Jul 15, 2022 at 8:52 AM We
Is this something that makes sense to do the 'quick' fix on for the next
pacific release to minimize impact to users until the improved iteration
can be implemented?
On Tue, Jul 12, 2022 at 6:16 AM Igor Fedotov wrote:
> Hi Dan,
>
> I can confirm this is a regression introduced by
> https://githu
Here are the main topics of discussion during the CLT meeting today:
- make-check/API tests
- Ignoring the doc/ directory would skip an expensive git checkout
operation and save time
- Stale PRs
- Currently an issue with stalebot which is being investigated
- Cephalocon
Hi Robert,
We had the same question and ended up creating a PR for this:
https://github.com/ceph/ceph/pull/46480 - there are backports, as well, so
I'd expect it will be in the next release or two.
David
On Mon, Jun 27, 2022 at 8:07 AM Robert Reihs wrote:
> Hi,
> We are setting up a test clust
Are you thinking it might be a permutation of:
https://tracker.ceph.com/issues/53729 ? There are some posts in it to check
for the issue, #53 and #65 had a few potential ways to check.
On Fri, Jun 10, 2022 at 5:32 AM Marius Leustean
wrote:
> Did you check the mempools?
>
> ceph daemon osd.X dump
I agree with this, just because you can doesn't mean you should. It will
likely be significantly less painful to upgrade the infrastructure to
support doing this the more-correct way, vs. trying to layer swift on top
of cephfs. I say this having a lot of personal experience with Swift at
extremely
Is your client using the DeleteObjects call to delete 1000 per request?:
https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html
On Fri, Jun 3, 2022 at 9:35 AM J-P Methot
wrote:
> Read/writes are super fast. It's only deletes that are incredibly slow,
> both through the s3 api and
In your example, you can login to the server in question with the OSD, and
run "ceph-volume lvm zap --osd-id --destroy" and it will purge the
DB/WAL LV. You don't need to reapply your osd spec, it will detect the
available space on the nvme and redploy that OSD.
On Wed, May 25, 2022 at 3:37 PM Ed
at was the largest cluster that you upgraded that didn't exhibit the new
> issue in 16.2.8 ? Thanks.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Tue, May 17, 2022 at 10:24 AM
We had an issue with our original fix in 45963 which was resolved in
https://github.com/ceph/ceph/pull/46096. It includes the fix as well as
handling for upgraded clusters. This is in the 16.2.8 release. I'm not sure
if it will resolve your problem (or help mitigate it) but it would be worth
trying
Hi,
I don't have any book suggestions, but in my experience, the best way to
learn is to set up a cluster and start intentionally breaking things, and
see how you can fix them. Perform upgrades, add load, etc.
I do suggest starting with Pacific (the upcoming 16.2.8 release would
likely be a good
https://tracker.ceph.com/issues/51429 with
https://github.com/ceph/ceph/pull/45088 for Octopus.
We're also working on: https://tracker.ceph.com/issues/55324 which is
somewhat related in a sense.
On Thu, Apr 21, 2022 at 11:19 AM Guillaume Nobiron
wrote:
> Yes, all the buckets in the reshard list
Is this a versioned bucket?
On Thu, Apr 21, 2022 at 9:51 AM Guillaume Nobiron
wrote:
> Hello,
>
> I have on issue on my ceph cluster (octopus 15.2.16) with several buckets
> raising a LARGE_OMAP_OBJECTS warning.
> I found the buckets in the resharding list but ceph fails to reshard them.
>
> The
We're definitely dealing with something that sounds similar, but hard to
state definitively without more detail. Do you have object lock/versioned
buckets in use (especially if one started being used around the time of the
slowdown)? Was this cluster always 16.2.7?
What is your pool configuration
Hi Gilles,
Did you ever figure this out? Also, your rados ls output indicates that the
prod cluster has fewer objects in the index pool than the backup cluster,
or am I misreading this?
David
On Wed, Dec 1, 2021 at 4:32 AM Gilles Mocellin <
gilles.mocel...@nuagelibre.org> wrote:
> Hello,
>
> We
We use it without major issues, at this point. There are still flaws, but
there are flaws in almost any deployment and management system, and this is
not unique to cephadm. I agree with the general sentiment that you need to
have some knowledge about containers, however. I don't think that's
necess
t; include that in the quincy release - and if not, we'll backport it to
> > quincy in an early point release
> >
> > can SSE-S3 with PutBucketEncryption satisfy your use case?
> >
> > [1]
> https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSide
Is RGW encryption for all objects at rest still testing only, and if not,
which version is it considered stable in?:
https://docs.ceph.com/en/latest/radosgw/encryption/#automatic-encryption-for-testing-only
David
___
ceph-users mailing list -- ceph-user
What version of Ceph are you using? Newer versions deploy a dashboard and
prometheus module, which has some of this built in. It's a great start to
seeing what can be done using Prometheus and the built in exporter. Once
you learn this, if you decide you want something more robust, you can do an
ex
If performance isn't as big a concern, most servers have firmware settings
that enable more aggressive power saving, at the cost of added
latency/reduced cpu power/etc. HPE would be accessible/configurable via
HP's ILO, Dells with DRAC, etc. They'd want to test and see how much of an
impact it made
What are you trying to do that won't work? If you need resources from
outside the container, it doesn't sound like something you should need to
be entering a shell inside the container to accomplish.
On Fri, Jan 7, 2022 at 1:49 PM François RONVAUX
wrote:
> Thanks for the answer.
>
> I would want
What's iostat show for the drive in question? What you're seeing is the
cluster rebalancing initially, then at the end, it's probably that single
drive being filled. I'd expect 25-100MB/s to be the fill rate of the newly
added drive with backfills per osd set to 2 or so (much more than that
doesn't
We've been testing RC1 since release on our 504 OSD/21 host, with split
db/wal test cluster, and have experienced no issues on upgrade or operation
so far.
On Mon, Nov 29, 2021 at 11:23 AM Yuri Weinstein wrote:
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/53
.72899
> 0
> 0 B
> 0 B
> 0 B
> 0 B
> 0 B
> 0 B
> 0
> 0
> 1
> up
>
> Zach
>
> On 2021-12-01 5:20 PM, David Orman wrote:
>
> What's "ceph osd df" show?
>
> On Wed, Dec 1, 2021 at 2:20 PM Zach Heise (SSCC)
> wrote:
>
>> I want
What's "ceph osd df" show?
On Wed, Dec 1, 2021 at 2:20 PM Zach Heise (SSCC) wrote:
> I wanted to swap out on existing OSD, preserve the number, and then remove
> the HDD that had it (osd.14 in this case) and give the ID of 14 to a new
> SSD that would be taking its place in the same node. First
I suggest continuing with manual PG sizing for now. With 16.2.6 we have
seen the autoscaler scale up the device health metrics to 16000+ PGs on
brand new clusters, which we know is incorrect. It's on our company backlog
to investigate, but far down the backlog. It's bitten us enough times in
the pa
The balancer does a pretty good job. It's the PG autoscaler that has bitten
us frequently enough that we always ensure it is disabled for all pools.
David
On Mon, Nov 1, 2021 at 2:08 PM Alexander Closs wrote:
> I can add another 2 positive datapoints for the balancer, my personal and
> work clu
still looking for a more smooth way to do that.
>
> Luis Domingues
>
> ‐‐‐ Original Message ‐‐‐
>
> On Monday, October 4th, 2021 at 10:01 PM, David Orman <
> orma...@corenode.com> wrote:
>
> > We have an older cluster which has been iterated on m
If there's intent to use this for performance comparisons between releases,
I would propose that you include rotational drive(s), as well. It will be
quite some time before everyone is running pure NVME/SSD clusters with the
storage costs associated with that type of workload, and this should be
re
We have an older cluster which has been iterated on many times. It's
always been cephadm deployed, but I am certain the OSD specification
used has changed over time. I believe at some point, it may have been
'rm'd.
So here's our current state:
root@ceph02:/# ceph orch ls osd --export
service_type
It appears when an updated container for 16.2.6 (there was a remoto
version included with a bug in the first release) was pushed, the old
one was removed from quay. We had to update our 16.2.6 clusters to the
'new' 16.2.6 version, and just did the typical upgrade with the image
specified. This shou
We scrape all mgr endpoints since we use external Prometheus clusters,
as well. The query results will have {instance=activemgrhost}. The
dashboards in upstream don't have multiple cluster support, so we have
to modify them to work with our deployments since we have multiple
ceph clusters being pol
With recent releases, 'ceph config' is probably a better option; do
keep in mind this sets things cluster-wide. If you're just wanting to
target specific daemons, then tell may be better for your use case.
# get current value
ceph config get osd osd_max_backfills
# set new value to 2, for example
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-4b2736a28c
^^ if people want to test and provide feedback for a potential merge
to EPEL8 stable.
David
On Wed, Sep 22, 2021 at 11:43 AM David Orman wrote:
>
> I'm wondering if this was installed using pip/pypi before, and now
I'm wondering if this was installed using pip/pypi before, and now
switched to using EPEL? That would explain it - 1.2.1 may never have
been pushed to EPEL.
David
On Wed, Sep 22, 2021 at 11:26 AM David Orman wrote:
>
> We'd worked on pushing a change to fix
> https://trac
cy bug, as it impacts any deployments
with medium to large counts of OSDs or split db/wal devices, like many
modern deployments.
https://koji.fedoraproject.org/koji/packageinfo?packageID=18747
https://dl.fedoraproject.org/pub/epel/8/Everything/x86_64/Packages/p/
Same question here, for clarity, was this on upgrading to 16.2.6 from
16.2.5? Or upgrading
from some other release?
On Mon, Sep 20, 2021 at 8:57 AM Sean wrote:
>
> I also ran into this with v16. In my case, trying to run a repair totally
> exhausted the RAM on the box, and was unable to complete
For clarity, was this on upgrading to 16.2.6 from 16.2.5? Or upgrading
from some other release?
On Mon, Sep 20, 2021 at 8:33 AM Paul Mezzanini wrote:
>
> I got the exact same error on one of my OSDs when upgrading to 16. I
> used it as an exercise on trying to fix a corrupt rocksdb. A spent a fe
--
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> -------
>
> -Original Message-
> From: David Orman
> Sent: Tuesday, September 14, 2021 8:55 PM
> To: Eugen Block
> Cc: ceph-users
> Subject: [ceph-u
Keep in mind performance, as well. Once you start getting into higher
'k' values with EC, you've got a lot more drives involved that need to
return completions for operations, and on rotational drives this
becomes especially painful. We use 8+3 for a lot of our purposes, as
it's a good balance of e
No problem, and it looks like they will. Glad it worked out for you!
David
On Thu, Sep 9, 2021 at 9:31 AM mabi wrote:
>
> Thank you Eugen. Indeed the answer went to Spam :(
>
> So thanks to David for his workaround, it worked like a charm. Hopefully
> these patches can make it into the next pac
Exactly, we minimize the blast radius/data destruction by allocating
more devices for DB/WAL of smaller size than less of larger size. We
encountered this same issue on an earlier iteration of our hardware
design. With rotational drives and NVMEs, we are now aiming for a 6:1
ratio based on our CRUS
undeploy, then re-add the label, and it will
redeploy.
On Wed, Sep 8, 2021 at 7:03 AM David Orman wrote:
>
> This sounds a lot like: https://tracker.ceph.com/issues/51027 which is
> fixed in https://github.com/ceph/ceph/pull/42690
>
> David
>
> On Tue, Sep 7, 2021 a
This sounds a lot like: https://tracker.ceph.com/issues/51027 which is
fixed in https://github.com/ceph/ceph/pull/42690
David
On Tue, Sep 7, 2021 at 7:31 AM mabi wrote:
>
> Hello
>
> I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 nodes on
> Ubuntu 20.04 LTS bare metal. I just u
It may be this:
https://tracker.ceph.com/issues/50526
https://github.com/alfredodeza/remoto/issues/62
Which we resolved with: https://github.com/alfredodeza/remoto/pull/63
What version of ceph are you running, and is it impacted by the above?
David
On Thu, Sep 2, 2021 at 9:53 AM fcid wrote:
>
>
> Without success. Also tried without the "filter_logic: AND" in the yaml file
> and the result was the same.
>
> Best regards,
> Eric
>
>
> -Original Message-
> From: David Orman [mailto:orma...@corenode.com]
> Sent: 27 August 2021 14:56
> To:
This was a bug in some versions of ceph, which has been fixed:
https://tracker.ceph.com/issues/49014
https://github.com/ceph/ceph/pull/39083
You'll want to upgrade Ceph to resolve this behavior, or you can use
size or something else to filter if that is not possible.
David
On Thu, Aug 19, 2021
>
> - Am 9. Aug 2021 um 18:15 schrieb David Orman orma...@corenode.com:
>
> > Hi,
> >
> > We are seeing very similar behavior on 16.2.5, and also have noticed
> > that an undeploy/deploy cycle fixes things. Before we go rummaging
> > through the source code
Just adding our feedback - this is affecting us as well. We reboot
periodically to test durability of the clusters we run, and this is
fairly impactful. I could see power loss/other scenarios in which this
could end quite poorly for those with less than perfect redundancy in
DCs across multiple rac
Hi,
We are seeing very similar behavior on 16.2.5, and also have noticed
that an undeploy/deploy cycle fixes things. Before we go rummaging
through the source code trying to determine the root cause, has
anybody else figured this out? It seems odd that a repeatable issue
(I've seen other mailing l
https://tracker.ceph.com/issues/50526
https://github.com/alfredodeza/remoto/issues/62
If you're brave (YMMV, test first non-prod), we pushed an image with
the issue we encountered fixed as per above here:
https://hub.docker.com/repository/docker/ormandj/ceph/tags?page=1 that
you can use to install
Hi Peter,
We fixed this bug: https://tracker.ceph.com/issues/47738 recently
here:
https://github.com/ceph/ceph/commit/b4316d257e928b3789b818054927c2e98bb3c0d6
which should hopefully be in the next release(s).
David
On Thu, Jun 17, 2021 at 12:13 PM Peter Childs wrote:
>
> Found the issue in the
make it clear.
On Tue, Jun 1, 2021 at 2:30 AM David Orman wrote:
>
> I do not believe it was in 16.2.4. I will build another patched version of
> the image tomorrow based on that version. I do agree, I feel this breaks new
> deploys as well as existing, and hope a point release will
us since we began using it in
> luminous/mimic, but situations such as this are hard to look past. It's
> really unfortunate as our existing production clusters have been rock solid
> thus far, but this does shake one's confidence, and I would wager that I'm
> not al
on reboot the disks disappear, not stop working but not
>> detected by Linux, which makes me think I'm hitting some kernel limit.
>>
>> At this point I'm going to cut my loses and give up and use the small
>> slightly more powerful 30x drive systems I have (with 256g
You may be running into the same issue we ran into (make sure to read
the first issue, there's a few mingled in there), for which we
submitted a patch:
https://tracker.ceph.com/issues/50526
https://github.com/alfredodeza/remoto/issues/62
If you're brave (YMMV, test first non-prod), we pushed an i
We've found that after doing the osd rm, you can use: "ceph-volume lvm
zap --osd-id 178 --destroy" on the server with that OSD as per:
https://docs.ceph.com/en/latest/ceph-volume/lvm/zap/#removing-devices
and it will clean things up so they work as expected.
On Tue, May 25, 2021 at 6:51 AM Kai Sti
We've created a PR to fix the root cause of this issue:
https://github.com/alfredodeza/remoto/pull/63
Thank you,
David
On Mon, May 10, 2021 at 7:29 PM David Orman wrote:
>
> Hi Sage,
>
> We've got 2.0.27 installed. I restarted all the manager pods, just in
> case, and
he problem. What version are you using? The
> kubic repos currently have 2.0.27. See
> https://build.opensuse.org/project/show/devel:kubic:libcontainers:stable
>
> We'll make sure the next release has the verbosity workaround!
>
> sage
>
> On Mon, May 10, 2021 at 5:4
WAL w/ 12 OSDs per NVME), even when new OSDs are not
being deployed, as it still tries to apply the OSD specification.
On Mon, May 10, 2021 at 4:03 PM David Orman wrote:
>
> Hi,
>
> We are seeing the mgr attempt to apply our OSD spec on the various
> hosts, then block. When we inv
Hi,
We are seeing the mgr attempt to apply our OSD spec on the various
hosts, then block. When we investigate, we see the mgr has executed
cephadm calls like so, which are blocking:
root 1522444 0.0 0.0 102740 23216 ?S17:32 0:00
\_ /usr/bin/python3
/var/lib/ceph/X/cep
6.2.x. We
are using 16.2.3.
Thanks,
David
On Fri, May 7, 2021 at 9:06 AM David Orman wrote:
>
> Hi,
>
> I'm not attempting to remove the OSDs, but instead the
> service/placement specification. I want the OSDs/data to persist.
> --force did not work on the service, as noted
ption.
David
On Fri, May 7, 2021 at 4:21 PM Matt Benjamin wrote:
>
> Hi David,
>
> I think the solution is most likely the ops log. It is called for
> every op, and has the transaction id.
>
> Matt
>
> On Fri, May 7, 2021 at 4:58 PM David Orman wrote:
> >
> &g
can do
> that (and more) in "pacific" using lua scripting on the RGW:
> https://docs.ceph.com/en/pacific/radosgw/lua-scripting/
>
> Yuval
>
> On Thu, Apr 1, 2021 at 7:11 PM David Orman wrote:
>>
>> Hi,
>>
>> Is there any way to log the x-amz-reque
r that everything was fine again. This is a Ceph 15.2.11 cluster on
> Ubuntu 20.04 and podman.
>
> Hope that helps.
>
> ‐‐‐ Original Message ‐‐‐
> On Friday, May 7, 2021 1:24 AM, David Orman wrote:
>
> > Has anybody run into a 'stuck' OSD service specificatio
Has anybody run into a 'stuck' OSD service specification? I've tried
to delete it, but it's stuck in 'deleting' state, and has been for
quite some time (even prior to upgrade, on 15.2.x). This is on 16.2.3:
NAME PORTS RUNNING REFRESHED AGE PLACEMENT
osd.osd_spec5
1 - 100 of 155 matches
Mail list logo