ther hosts. This was also many
years ago so I hope their firmware does something different now.
What does the health look like on the remaining drives? How long were the
dead ones in service?
-paul
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Institute of
setups and not containers. Ymmv
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Institute of Technology
From: Tim Holloway
Sent: Saturday, April 12, 2025 1:13:05 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: nodes with high densit
I would like to have it on record that I completely agree with the points Chris
(and Janne) made.
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Institute of Technology
From: Chris Palmer
Sent: Wednesday, February 5, 2025 11:17
;m asking are:
What vendors offer paid ceph support?
Do they have specific requirements? (e.g. must run their version of ceph vs
community, must be containerized vs bare metal)
Thanks
-paul
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Ins
We did a proof of concept moving some compute into "the cloud" and exported our
cephfs shares using wireguard as the tunnel. The performance impact on our
storage was completely latency and bandwidth dependent with no noticeable
impact from the tunnel itself.
-paul
--
Paul
We use shared directories for projects in our hpc cluster and use acls to
achieve this.
https://www.redhat.com/sysadmin/linux-access-control-lists
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Institute of Technology
Sent from my phone, please excuse typos and brevity
to refresh without a full restart. That's always been low
enough on my priority list that "future Paul" will do it. I love that guy, he
fixes all my mistakes.
-paul
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Institute of Technology
“End users is
when snapshots
exist. This is what was going on when we hit our last outage that had a huge
journal and thusly painful recovery.
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Institute of Technology
“End users is a description, not a goal.”
_
al love if ceph is going to offer file services that
can match what the back end storage can actually provide.
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Institute of Technology
Sent from my phone, please excuse typos and brevity
Fr
her server and starts
again.
We get around it by "wedging it in a corner" and removing the ability to
migrate. This is as simple as stopping all standby MDS services and just
waiting for the MDS to complete.
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Ins
"Me Too!" so there can be some traction behind
fixing this.
It's a longstanding bug so the version is less important, but we are on 17.2.7.
Thoughts?
-paul
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Institute of Technology
“End users is a descriptio
d be standard procedure for new releases. Numerous linux
distributions already operate this way to make sure mirrors get copes before
the thundering herds do upgrades.
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Institute of Technology
“End users is a descriptio
with recovery speeds. We shall see.
-paul
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Institute of Technology
“End users is a description, not a goal.”
From: Dan van der Ster
Sent: Thursday, July 6, 2023 6:04 PM
To:
er down looking for the sweet spot. It
seems to be very sensitive. A change from .75 to .70 changed my IOPS/BW from
480/1.48GB/s to 515/1.57GB/s. It's easy to overshoot and hit some other
bottleneck.
---
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Institute
When I have seen behavior like this it was a dying drive. It only became
obviously when I did a smart long test and I got failed reads. Still reported
smart OK though so that was a lie.
--
Paul Mezzanini
Platform Engineer III
Research Computing
Rochester Institute of Technology
I looked into nocache vs direct. It looks like nocache just requests that the
caches be dumped before doing it's operations while direct uses direct IO.
Writes getting cached would make it appear much faster. Those tests are not
apples-to-apples.
I'm also trying to decode how you did your
I started installing the SAS version of these drives two years ago in our
cluster and I haven't had one fail yet. I've been working on replacing every
spinner we have with them. I know it's not helping you figure out what is
going on in your environment but hopefully a "the drive works for me"
smartctl can very much read sas drives so I would look into that chain first.
Are they behind a raid controller that is masking the smart commands?
As for monitoring, we run the smartd service to keep an eye on drives. More
often than not I notice weird things with ceph long before smart thr
I've been watching this thread with great interest because I've also noticed
that my benchmarked max capacities seem abnormally high. Out of the box my 18T
sas drives report ~2k as their iops capacity. "sdparm --get WCE /dev/sd*"
shows that the write cache is on across all my spinners. For s
We are using pool level compression (aggressive) for our large EC tier. Since
we already had data in the pool when the feature was enabled I was unable to do
in depth testing and tuning to get the best results. "Low hanging fruit" put
654T Under compression with 327T used. Not bad, but I know
If I remember correctly, mounting cephfs on osd hosts was never an
issue. Mapping RBD images was where the issue would come up.
I've got a few single node ceph clusters that I use for test and they
have a cephfs mount to themselves that has never caused an issue (beyond
systemd not knowing ce
.6 from 16.2.5? Or upgrading
from some other release?
On Mon, Sep 20, 2021 at 8:33 AM Paul Mezzanini wrote:
I got the exact same error on one of my OSDs when upgrading to 16. I
used it as an exercise on trying to fix a corrupt rocksdb. A spent a few
days of poking with no success. I got mostl
I got the exact same error on one of my OSDs when upgrading to 16. I
used it as an exercise on trying to fix a corrupt rocksdb. A spent a few
days of poking with no success. I got mostly tool crashes like you are
seeing with no forward progress.
I eventually just gave up, purged the OSD, did
We are currently running 3 MONs. When one goes into silly town the others get
wedged and won't respond well. I don't think more MONs would solve that... but
I'm not sure.
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technolo
as it was rebalancing
we had another MON go into silly mode. We recover from this situation by just
restarting the MON process on the hung node.
We are running 14.2.15.
I wish I could tell you what the problem actually is and how to fix it. At
least we aren't alone in this failure mode
owing away
the NVMe drives and recreating them...
From: Paul Mezzanini
Sent: Wednesday, January 13, 2021 4:56 PM
To: ceph-users@ceph.io
Subject: [ceph-users] OSDs in pool full : can't restart to clean
Hey all
We landed in a bad place (tm) with our nvm
Hey all
We landed in a bad place (tm) with our nvme metadata tier. I'll root cause how
we got here after it's all back up. I suspect it was a pool got misconfigured
and just filled it all up.
Short version, the OSDs are all full (or full enough) that I can't get them to
spin back up. They c
ression off, see what the numbers look like, try to dance it
around, blow it away and do it again with compression on.
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
ve a new inode number, but it feels suspect that the number is only
one digit higher. Probably largely because I did several runs in a row to
verify and it was just the next inode handed out.
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technol
uts new data in via
globus or migration from a cluster job.
I will test what you proposed though by draining an OSD and refilling it then
checking the stat dump to see what lives under compression and what does not.
-paul
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Comp
orks awesome. In practice however I'm not seeing the needles
move as my script goes through. Does anyone have any ideas into what I may
have missed for this dance?
-paul
Side note: People with file names that include quotes, pipes and spaces should
get their accounts disabled by defau
>From how I understand it, that setting is a rev-limiter to prevent users from
>creating HUGE sparse files and then wasting cluster resources firing off
>deletes.
We have ours set to 32T and haven't seen any issues with large files.
--
Paul Mezzanini
Sr Systems Administra
board and all commands appear to be working.
TL:DR ; successfully used an octopus plunger. The light is green, the trap is
clean.
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of
datapoint: "ceph crash" is also not working. This is smelling more and more
like a mgr issue.
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-324
Still going on. I want to start using the balancer module but all of those
commands are hanging.
I'm just doing shotgun debugging now.
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester
While the "progress off" was hung, I did a systemctl restart of the active
ceph-mgr. The progress toggle command completed and reported that progress
disabled.
All commands that were hanging before are still unresponsive. That was worth a
shot.
Thanks
--
Paul Mezzanini
"ceph progress off" is just hanging like the others.
I'll fiddle with it later tonight to see if I can get it to stick when I bounce
a daemon.
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administrat
I had to roll an upstream version of the smartmon tools because everything with
redhat 7/8 was too old to support the json option.
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Inst
rsized pool. The cluster still has this issue when in health_ok.
I'm free to do a lot of debugging and poking around even though this is our
production cluster. The only service I refuse to play around with is the MDS.
That one bites back. Does anyone have more ideas on where to look to
/ smart monitoring and it killing managers.
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu
Fro
Correct, just comma separated IP addresses
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu
CONFIDENTIALITY NOTE: The information tra
h forward and reverse DNS.
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu
CONFIDENTIALITY NOTE: The information transmitted, including atta
t been doing the scientific
shotgun (one slug at a time) approach to see what changes. Does anyone else
have any ideas?
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technolog
/increase-your-linux-server-internet-speed-with-tcp-bbr-congestion-control/
We have used it before with great success for long distance, high throughput
transfers.
-paul
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Fin
ings were obvious and
it took some trial and error. Now that we have a pure NVMe tier I'll probably
try and turn it back on to see if we notice any changes.
Netdata also proved to be a valuable tool to make sure we had traffic in both
TCP and RDMA
https://www.netdata.cloud/
--
| | + 0.10%
tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned
int)
| | + 0.30% PyObject_Free
| | + 0.20% dict_dealloc
| + 8.10% Gil::Gil(SafeThreadState&, bool)
n't need to dig
for..
-paul
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu
Sent from my phone. Please excuse any brevity or t
vlan links
(enp131s0f[0,1])
This was fine for frontnet because it came after "enp*" but backnet was an
issue. We cheated and just renamed backnet to "zbacknet". The quick and dirty
fix is rename your bond interfaces to something that starts alphabetically
after "enp"
x27;t.
Annoying and harmless.
It makes perfect sense that it showed up for you when it did because there have
been a lot of improvements in the cephfs quota kernel code.
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance &a
>From memory we are in the 700s.
--
Paul Mezzanini
Sr Systems Administrator / Engineer, Research Computing
Information & Technology Services
Finance & Administration
Rochester Institute of Technology
o:(585) 475-3245 | pfm...@rit.edu
Sent from my phone. Please excuse any brevit
Just wanted to say that we are seeing the same thing on our large cluster. It
manifested mainly in the from of Prometheus stats being totally broken (they
take too long to return if at all so the requesting program just gives up)
--
Paul Mezzanini
Sr Systems Administrator / Engineer
51 matches
Mail list logo