Hi,
sometimes it helps to fail the MGR service, I just had this with a
customer last week where we had to fail it twice within a few hours
because the information was not updated. It was on latest Octopus.
ceph mgr fail
As for the MTU mismatch I believe there was a thread a few weeks ago,
Hi Teoman,
I don't sync the bucket content. It's just the metadata that get's synced.
But turning off the access to our s3 is not an option, because our customer
rely on it (the make backups and serve objects for their web applications
through it).
Am Do., 4. Nov. 2021 um 18:20 Uhr schrieb Teoman
Cheers Istvan,
how do you do this?
Am Do., 4. Nov. 2021 um 19:45 Uhr schrieb Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com>:
> This one you need to prepare, you beed to preshard the bucket which you
> know that will hold more than millions of objects.
>
> I have a bucket where we store 1.2 bill
Hi,
since you can't change a pool's EC profile afterwards you have to
choose a reasonable number of chunks. If you need to start with those
6 hosts I would also recommend to span the EC profile across all those
nodes, but keep in mind that the cluster won't be able to recover if a
host fa
Hi Zakhar,
I don't have much experience with Ceph, so you should read my words with
reasonable skepticism.
If your failure domain should be the host level, then k=4, m=2 is you most
space efficient option for 6 server that allows you to still do write IO when
one of the servers failed. Assumin
Hello,
I have tried with a native client under Linux and its performance is fine, also
the performance under 1GB is fine on the windows machine.
Kind regards
Gabryel
From: Radoslav Milanov
Sent: 01 November 2021 12:55
To: ceph-users@ceph.io
Subject: [ceph-user
Many thanks for your detailed advices, gents, I very much appreciate them!
I read in various places that for production environments it's advised to
keep (k+m) <= host count. Looks like for my setup it is 3+2 then. Would it
be best to proceed with 3+2, or should we go with 4+2?
/Z
On Fri, Nov 5,
Thanks! I'll stick to 3:2 for now then.
/Z
On Fri, Nov 5, 2021 at 1:55 PM Szabo, Istvan (Agoda)
wrote:
> With 6 servers I'd go with 3:2, with 7 can go with 4:2.
>
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
>
I also use this method and I hate it.
Stopping all of the RGW clients is never an option! It shouldn't be.
Sharding is hell. I was have 250M objects in a bucket and reshard failed
after 2days and object count doubled somehow! 2 days of downtime is not an
option.
I wonder if I stop the write-read
Hi Sage,
I tested again with setting paxos_propose_interval = 0.3
Now stopping OSDs causes way less slow ops. While starting OSDs the
slows seems gone.
With osd_fast_shutdown_notify_mon = true the slow ops are gone
completely. So I would like to keep the shutdown notify enabled.
As far as I unde
Am 04.11.21 um 23:51 schrieb Sage Weil:
> Can you try setting paxos_propose_interval to a smaller number, like .3 (by
> default it is 2 seconds) and see if that has any effect.
>
> It sounds like the problem is not related to getting the OSD marked down (or
> at least that is not the only thing g
Maybe this was me in an earlier mail
It started at the point all replica partners are on octopus.
This makes sense if I look at this code snippet:
if (!HAVE_FEATURE(recovery_state.get_min_upacting_features(),
SERVER_OCTOPUS)) {
dout(20) << __func__ << " not all upacting
There should not be any issues using rgw for other buckets while
re-sharding.
As for doubling number of objects after reshard is an interesting
situation. After the manual reshard is done, there might be leftover from
the old bucket index. As during reshard new .dir.new_bucket_index objects
are cr
Hi,
Did you build the Windows client yourself or is it a Suse or Cloudbase build?
Which version is your Ceph cluster running, the one that you’re connecting to?
Early versions had a few known bugs which might behave like that (e.g.
overflows or connection issues) but it shouldn’t be the case wi
Hello,
I observed some interessting behavior change since upgrading to
octopus and above. The OSD map epoch is constantly increasing.
Until nautilus the epoch did only change if OSDs went
down/out/up/in, snapshots are created or deleted, recovery or
backfilling took place, flags like the noout was
Thank you, Magnus for such quick reply!
Good pointers in there!
Cheers,
Artur
On Tue, 2 Nov 2021 at 14:50, Magnus HAGDORN wrote:
> Hi Artur,
> we did write a script (in fact a series of scripts) that we use to
> manage our users and their quotas. Our script adds a new user to our
> LDAP and se
Hello everyone!
I'm pleased to announce Cephalocon 2022 will be taking place April 5-7
in Portland, Oregon + Virtually!
The CFP is now open until December 10th, so don't delay! Registration
and sponsorship details will be available soon!
I am looking forward to seeing you all in person again soo
Hi,
You can get two adjacent osdmap epochs (ceph osd getmap -o map.)
Then use osdmaptool to print those maps, hopefully revealing what is
changing between the two epochs.
Cheers, Dan
On Fri, Nov 5, 2021 at 4:54 PM Manuel Lausch wrote:
>
> Hello,
>
> I observed some interessting behavior chang
Right, setup with single device for everything is affected only.
On 11/5/2021 5:54 PM, J-P Methot wrote:
Hi,
I have a quick question regarding bug #53139, as the language in the
report is slightly confusing. This bug affects any setup where a
single OSD's data, Bluestore DB and WAL are all l
I haven't seen the beginning of the story - OSDs I was troubleshooting
were failing on startup. But I think the initial failure had happened
during regular operation - there is nothing specific to startup for that
issue to pop up...
On 11/5/2021 10:47 PM, J-P Methot wrote:
I see. This issue
Yeah, I think two different things are going on here.
The read leases were new, and I think the way that OSDs are marked down is
the key things that affects that behavior. I'm a bit surprised that the
_notify_mon option helps there, and will take a closer look at that Monday
to make sure it's doin
Hi Stefan,
thank you for sharing your experience! After I read your mail I did some more
testing and for me the issue is strictly related to snapshots and perfectly
reproducible. However, I mad two new observations that was not clear for me
until now.
First, snapshots that was created before
22 matches
Mail list logo