On Tue, 2 Nov 2021 09:02:31 -0500
Sage Weil wrote:
>
> Just to be clear, you should try
> osd_fast_shutdown = true
> osd_fast_shutdown_notify_mon = false
I added some logs to the tracker ticket with this options set.
> You write if the osd rejects messenger connections, because it is
> >
Hello Benoît, (and others in this great thread),
Apologies for replying to this ancient thread.
We have been debugging similar issues during an ongoing migration to
new servers with TOSHIBA MG07ACA14TE hdds.
We see a similar commit_latency_ms issue on the new drives (~60ms in
our env vs ~20ms fo
Hi Dan,
I can't speak for those specific Toshiba drives, but we have absolutely
seen very strange behavior (sometimes with cache enabled and sometimes
not) with different drives and firmwares over the years from various
manufacturers. There was one especially bad case from back in the
Inkta
Den tors 4 nov. 2021 kl 13:37 skrev Szabo, Istvan (Agoda)
:
> Hi,
>
> In case of bucket replication is the replication happening on osd level or
> gateway layer?
bucket == gateway layer.
> Could that be a problem, that in my 3 clustered multisite environment the
> cluster networks are in 2 clus
Thanks Mark.
With the help of the crowd on Telegram, we found that (at least here)
the drive cache needs to be disabled like this:
```
for x in /sys/class/scsi_disk/*/cache_type; do echo 'write through' > $x; done
```
This disables the cache (confirmed afterwards with hdparm) but more
importantl
Hi Andras,
On Wed, Nov 3, 2021 at 10:18 AM Andras Pataki
wrote:
>
> Hi cephers,
>
> Recently we've started using cephfs snapshots more - and seem to be
> running into a rather annoying performance issue with the MDS. The
> cluster in question is on Nautilus 14.2.20.
>
> Typically, the MDS proces
Hi everybody,
we maintain three ceph clusters (2x octopus, 1x nautilus) that use three
zonegroups to sync metadata, without syncing the actual data (only one zone
per zonegroup).
Some customer got buckets with >4m objects in our largest cluster (the
other two a very fresh with close to 0 data in
On Tue, Nov 2, 2021 at 7:03 AM Sage Weil wrote:
> On Tue, Nov 2, 2021 at 8:29 AM Manuel Lausch
> wrote:
>
> > Hi Sage,
> >
> > The "osd_fast_shutdown" is set to "false"
> > As we upgraded to luminous I also had blocked IO issuses with this
> > enabled.
> >
> > Some weeks ago I tried out the opti
AFAIK dynamic resharding is not supported for multisite setups but you can
reshard manually.
Note that this is a very expensive process which requires you to:
- disable the sync of the bucket you want to reshard.
- Stops all the RGW (no more access to your Ceph cluster)
- On a node of the master z
Hello everybody,
I'm quite new to ceph and I'm facing a myriad of issues trying to use it. So
I've subscribed to this mailing list. Hopefully you guys can help me with some
of those issues.
My current goal is to setup a local S3 storage -- i.e. a ceph "cluster" with
radosgw. In my test environ
If sharding is not option at all, then you can increase
osd_deep_scrub_large_omap_object_key threshold which is not the best idea.
I would still go with resharding which might result in taking offline at
least slave sites. In the future you can set the higher number of shards
during initial creatio
Hello everybody,
as ceph newbie I've tried out setting up ceph pacific according to the official
documentation: https://docs.ceph.com/en/latest/cephadm/install/
The intention was to setup a single node "cluster" with radosgw to feature
local S3 storage.
This failed because my ceph "cluster" woul
We're using cephadm with all 5 nodes on 16.2.6. Until today,
grafana has been running only on ceph05.
Before the 16.2.6 update, the embedded frames would pop up an
expected security error for self-signed certificates, but after
accepting would work. After the 16.2
Hi Carsten,
When I had problems on my physical hosts (recycled systems that we wanted to
just use in a test cluster) I found that I needed to use sgdisk --zap-all
/dev/sd{letter} to clean all partition maps off the disks before ceph would
recognize them as available. Worth a shot in your case, eve
Hi,
You should erase any partitions or LVM groups on the disks and restart OSD
hosts so CEPH would be able to detect drives. I usually just do 'dd
if=/dev/zero of=/dev/ bs=1M count=1024' and then reboot host to make
sure it will definitely be clean. Or, alternatively, you can zap the
drives, or you
Hello,
I agree with that point. When ceph creates lvm volumes it adds lvm tags to
them. Thats how ceph finds that those they are occupied by ceph. So you
should remove lvm volumes and even better clean all data on those lvm
volumes. Usually its enough to clean just the head of lvm partition where
Argh - that was it. Tested in Microsoft Edge and it worked fine.
I was using Firefox as my primary browser, and the "enhanced
tracking protection" setting was the issue killing the iframe
loading. Once I disabled that for the mgr daemon's URL the embeds
started loadi
Can you try setting paxos_propose_interval to a smaller number, like .3 (by
default it is 2 seconds) and see if that has any effect.
It sounds like the problem is not related to getting the OSD marked down
(or at least that is not the only thing going on). My next guess is that
the peering proces
Hi all!
I’m new to cephFS. My test file system uses a replicated pool on NVMe SSDs for
metadata and an erasure coded pool on HDDs for data. All OSDs uses bluestore.
I used the ceph version 16.2.6 for all daemons - created with this version and
running this version. The linux kernel that I used f
Hi!
I've got a CEPH 16.2.6 cluster, the hardware is 6 x Supermicro SSG-6029P
nodes, each equipped with:
2 x Intel(R) Xeon(R) Gold 5220R CPUs
384 GB RAM
2 x boot drives
2 x 1.6 TB enterprise NVME drives (DB/WAL)
2 x 6.4 TB enterprise drives (storage tier)
9 x 9TB HDDs (storage tier)
2 x Intel XL71
Hi,
I'm trying to figure out if setting auth caps and/or adding a cache pool
are I/O-disruptive operations, i.e. if caps reset to 'none' for a brief
moment or client I/O momentarily stops for other reasons.
For example, I had the following auth setting in my 16.2.x cluster:
client.cinder
Hi,
I seem to have some stale monitoring alerts in my Mgr UI, which do not want
to go away. For example (I'm also attaching an image for your convenience):
MTU Mismatch: Node ceph04 has a different MTU size (9000) than the median
value on device storage-int.
The alerts appears to be active, but
Yes, it was an attempt to address poor performance, which didn't go well.
Btw, this isn't the first time I'm reading that cache tier is "kind of
deprecated", but the documentation doesn't really say this but explains how
to make a cache tier instead. Perhaps it should be made more clear that
addin
23 matches
Mail list logo