Hi Folks,
We had to delete some unfound objects in our cache to get our cluster
back working! but after an hour we see OSD's crash
we found that it is caused by the fact that we deleted the:
"hit_set_8.3fc_archive_2021-09-09 08:25:58.520768Z_2021-09-09
08:26:18.907234Z" Object
Crash-Log can be
Thank you Eugen. Indeed the answer went to Spam :(
So thanks to David for his workaround, it worked like a charm. Hopefully these
patches can make it into the next pacific release.
‐‐‐ Original Message ‐‐‐
On Thursday, September 9th, 2021 at 2:33 PM, Eugen Block wrote:
> You must have
I don't think the bigger tier 1 enterprise vendors have really jumped
on, but I've been curious to see if anyone would create a dense hotswap
m.2 setup (possibly combined with traditional 3.5" HDD bays). The only
vendor I've really seen even attempt something like this is icydock:
https://ww
No problem, and it looks like they will. Glad it worked out for you!
David
On Thu, Sep 9, 2021 at 9:31 AM mabi wrote:
>
> Thank you Eugen. Indeed the answer went to Spam :(
>
> So thanks to David for his workaround, it worked like a charm. Hopefully
> these patches can make it into the next pac
Exactly, we minimize the blast radius/data destruction by allocating
more devices for DB/WAL of smaller size than less of larger size. We
encountered this same issue on an earlier iteration of our hardware
design. With rotational drives and NVMEs, we are now aiming for a 6:1
ratio based on our CRUS
Ceph guarantee data consistency only when its write by Ceph
When NVMe dies - we replace it and fill. Normal for our is a fill osd host for
a two weeks
k
Sent from my iPhone
> On 9 Sep 2021, at 17:10, Michal Strnad wrote:
>
> 2. When DB disk is not completely dead and has only relocated sector
You are probably right ! But this "verification" seems "stupid" !
I created an additional room (with no osd) and then the doashboard
doesn't complain anymore !
Indeed, the rule does what we want because "step choose firstn 0 type
room" will select the different rooms (2 in our case) and for
Den tors 9 sep. 2021 kl 16:09 skrev Michal Strnad :
> When the disk with DB died
> it will cause inaccessibility of all depended OSDs (six or eight in our
> environment),
> How do you do it in your environment?
Have two ssds for 8 OSDs, so only half go away when one ssd dies.
--
May the most si
Hi all,
We are discussing different approaches how to replace a disk with DB
(typically SSD or NVMe disk) for BlueStore. When the disk with DB died
it will cause inaccessibility of all depended OSDs (six or eight in our
environment), so we are looking for a way to minimize data loss or time
f
You must have missed the response to your thread, I suppose:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YA5KLI5MFJRKVQBKUBG7PJG4RFYLBZFA/
Zitat von mabi :
Hello,
A few days later the ceph status progress bar is still stuck and the
third mon is for some unknown reason s
Hi Francois,
I'm not an expert on CRUSH rule internals, but I checked the code and
it assumes that the failure domain (first choose/chooseleaf step) there is
"room": since there are just 2 rooms vs. 3 replicas, it doesn't allow you
to create a pool with a rule that might not optimally work (keep i
Actually, no -- vfs_ceph doesn't really perform better.
Due to the fact that samba forks on new connections, each incoming
connection gets its own ceph client. If multiple SMB clients are
accessing the same files, they tend to compete with one another for caps
and that causes performance to tank.
Hello,
We have a ceph cluster with CephFS and RBD images enabled, from Xen-NG we
connect directly to rbd images. Several times a day the VMs suffer from a
high load/iowait which makes them temporarily inaccessible (arround 10~30
seconds), in the logs on xen-ng I find this:
[Thu Sep 9 02:16:06 20
On Wed, 2021-09-08 at 16:39 +, Frank Schilder wrote:
> Hi all,
>
> I have a question about a ceph fs re-export via nfsd. For NFS v4 mounts the
> exports option sync is now the default instead of async. I just made the
> experience that using async gives more than a factor 10 performance
> i
Hi all,
I have a test ceph cluster with 4 osd servers containing each 3 osds.
The crushmap uses 2 rooms with 2 servers in each room.
We use replica 3 for pools.
I have the following custom crushrule to ensure that I have at least one
copy of each data in each room.
rule replicated3over2room
Hello,
A few days later the ceph status progress bar is still stuck and the third mon
is for some unknown reason still not deploying itself as can be seen from the
"ceph orch ls" output below:
ceph orch ls
NAME PORTSRUNNING REFRESHED AGE PLACEMENT
alertmanager ?:9093,909
Hi Matthew,
*Thanks for the update.*
*For the Part:*
[my Query]
> *Other Query:*
> What if the complete cluster goes down, i.e mon crashes another daemon
> crashes, can we try to restore the data in OSDs, maybe by reusing the
> OSD's in another or new Ceph Cluster or something to save the data.
*
17 matches
Mail list logo