Hi Dan,
it went unnoticed and is in all log files + rotated. I also wondered about the
difference in #auth keys and looked at it. However, we have only 23 auth keys
(its a small test cluster). No idea what the 77/78 mean. Maybe including some
history?
I went ahead and rebuilt the mon store bef
Hi Dan,
thanks for your answer. I don't have a problem with increasing osd_max_scrubs
(=1 at the moment) as such. I would simply prefer a somewhat finer grained way
of controlling scrubbing than just doubling or tripling it right away.
Some more info. These 2 pools are data pools for a large FS
Hi John,
firstly, image attachments are filtered out by the list. How about you upload
the image somewhere like https://imgur.com/ and post a link instead?
In my browser, the sticky header contains only "home" and "edit on github",
which are both entirely useless for a user. What exactly is "he
Hi,
can you clarify what exactly you did to get into this situation? What
about the undersized PGs, any chance to bring those OSDs back online?
Regarding the incomplete PGs I'm not sure there's much you can do if
the OSDs are lost. To me it reads like you may have
destroyed/recreated more
Hi,
In a working All-in-one(AIO) test setup of openstack & ceph ( where
making the bucket public works from the browser)
radosgw-admin bucket list
[
"711138fc95764303b83002c567ce0972/demo"
]
I have another cluster where openstack and ceph are separate.
I have set the same config options
You need to stop all daemons, remove the mon stores and wipe the OSDs with
ceph-volume. Find out which OSDs were running on which host (ceph-volume
inventory DEVICE) and use
ceph-volume lvm zap --destroy --osd-id ID
on these hosts.
Best regards,
=
Frank Schilder
AIT Risø Campus
I actually do not mind if i need to scroll up a line, but I also think
it is a good idea to remove it.
Am Mo., 9. Jan. 2023 um 11:06 Uhr schrieb Frank Schilder :
>
> Hi John,
>
> firstly, image attachments are filtered out by the list. How about you upload
> the image somewhere like https://imgur
Hello team
I have an issue on ceph-deployment using ceph-ansible . we have two
categories of disk , HDD and SSD , while deploying ceph only HDD are
appearing no SSD appearing . the cluster is running on ubuntu OS 20.04 ,
unfortunately no errors appearing , did I miss something in configuration?
h
Hi,
it appears that the SSDs were used as db devices (/dev/sd[efgh]).
According to [1] (I don't use ansible) the simple case is that:
[...] most of the decisions on how devices are configured to
provision an OSD are made by the Ceph tooling (ceph-volume lvm batch
in this case).
And I as
For anyone finding this thread down the road: I wrote to the poster yesterday
with the same observation. Browsing the ceph-ansible docs and code, to get
them to deploy as they want, one may pre-create LVs and enumerate them as
explicit data devices. Their configuration also enables primary affi
Thank you very much Anthony and Eugen , I followed your instructions and
now it works fine class are hdd and ssd , and also now we have 60 OSDS from
48
Thanks again
Michel
On Mon, 9 Jan 2023, 17:00 Anthony D'Atri, wrote:
> For anyone finding this thread down the road: I wrote to the poster
> y
Hi,
if you intend to use those disks as OSDs you should wipe them,
depending on your OSD configuration (drivegroup.yml) those disks will
be automatically created. If you don't want that you might need to set:
ceph orch apply osd --all-available-devices --unmanaged=true
See [1] for more inf
Thanks for the insight Eugen.
Here's what basically happened:
- Upgrade from Nautilus to Quincy via migration to new cluster on temp
hardware;
- Data from Nautilus migrated successfully to older / lab-type equipment
running Quincy;
- Nautilus Hardware rebuilt for Quincy, data migrated back;
- As
-- Resending this mail, it seems ceph-users@ceph.io was down for the last
few days.
I see many users recently reporting that they have been struggling with
this Onode::put race condition issue[1] on both the latest Octopus and
pacific.
Igor opened a PR [2] to address this issue, I've been reviewi
Hey all,
I'd like to pick up on this topic, since we also see regular scrub
errors recently.
Roughly one per week for around six weeks now.
It's always a different PG and the repair command always helps after a
while.
But the regular re-occurrence seems it bit unsettling.
How to best troubleshoo
Hi Dongdong,
thanks a lot for your post, it's really helpful.
Thanks,
Igor
On 1/5/2023 6:12 AM, Dongdong Tao wrote:
I see many users recently reporting that they have been struggling
with this Onode::put race condition issue[1] on both the latest
Octopus and pacific.
Igor opened a PR [2]
Running ceph-pacific 16.2.9 using ceph orchestrator.
We made a mistake adding a disk to the cluster and immediately issued a command
to remove it using "ceph orch osd rm ### --replace --force".
This OSD no data on it at the time and was removed after just a few minutes.
"ceph orch osd rm sta
"dmesg" on all the linux hosts and look for signs of failing drives. Look at
smart data, your HBAs/disk controllers, OOB management logs, and so forth. If
you're seeing scrub errors, it's probably a bad disk backing an OSD or OSDs.
Is there a common OSD in the PGs you've run the repairs on?
On
Hi,
We too kept seeing this until a few months ago in a cluster with ~400 HDDs,
while all the drive SMART statistics was always A-OK. Since we use erasure
coding each PG involves up to 10 HDDs.
It took us a while to realize we shouldn't expect scrub errors on healthy
drives, but eventually we
It's important to note we do not suggest using the SMART "OK" indicator as the
drive being valid. We monitor correctable/uncorrectable error counts, as you
can see a dramatic rise when the drives start to fail. 'OK' will be reported
for SMART health long after the drive is throwing many uncorrec
> On Jan 9, 2023, at 17:46, David Orman wrote:
>
> It's important to note we do not suggest using the SMART "OK" indicator as
> the drive being valid. We monitor correctable/uncorrectable error counts, as
> you can see a dramatic rise when the drives start to fail. 'OK' will be
> reported fo
Hi,
Good points; however, given that ceph already collects all this statistics,
isn't there any way to set (?) reasonable thresholds and actually have ceph
detect the amount of read errors and suggest that a given drive should be
replaced?
It seems a bit strange that we all should have to wai
We ship all of this to our centralized monitoring system (and a lot more) and
have dashboards/proactive monitoring/alerting with 100PiB+ of Ceph. If you're
running Ceph in production, I believe host-level monitoring is critical, above
and beyond Ceph level. Things like inlet/outlet temperature,
Hi,
Normally I use rclone to migrate buckets across clusters.
However this time the user has close to 1000 buckets so I wonder what would be
the best approach to do this rather buckets by buckets, any idea?
Thank you
This message is confidential and is for the s
Hi
Just a follow up, the issue was solved by running command
ceph pg 404.1ff mark_unfound_lost delete
-
Kai Stian Olstad
On 04.01.2023 13:00, Kai Stian Olstad wrote:
Hi
We are running Ceph 16.2.6 deployed with Cephadm.
Around Christmas OSD 245 and 327 had about 20 read error so I set th
25 matches
Mail list logo