Hi Nathan,
thanks for the update. This seems to be a different and worse instance than the
centos 7 case. We are using Centos 8 Stream for a few clients. I will check if
they are affected.
Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
__
Can you make the devs aware of the regression?
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Nathan Fish
Sent: 08 September 2021 19:33
To: ceph-users
Subject: [ceph-users] Re: Data loss on appends, prod outage
Hi all,
I have a question about a ceph fs re-export via nfsd. For NFS v4 mounts the
exports option sync is now the default instead of async. I just made the
experience that using async gives more than a factor 10 performance
improvement. I couldn't find any advice within ceph community informat
I think It's just a compat with legacy (v1) clusters. In the kernel the same.
Your cluster already msgr2 enabled, you don't need any compats
k
Sent from my iPhone
> On 8 Sep 2021, at 22:53, Tony Liu wrote:
>
> Good to know. Thank you Konstantin!=0A=
> Will test it out.=0A=
> Is this some kno
Good to know. Thank you Konstantin!
Will test it out.
Is this some known issue? Any tracker or fix?
Thanks!
Tony
From: Konstantin Shalygin
Sent: September 8, 2021 12:47 PM
To: Tony Liu
Cc: ceph-users@ceph.io; d...@ceph.io
Subject: Re: [ceph-users] debug RB
Try to simplify it to
[global]
fsid = 35d050c0-77c0-11eb-9242-2cea7ff9d07c
mon_host = 10.250.50.80:3300,10.250.50.81:3300,10.250.50.82:3300
And try again
We are found that with only msgr2 enabled clusters, clients with mon_host
settings without hardcoded 3300 port may be timeouted from time to
Thank you Xiubo. confirm=true worked and I was able to update via gwcli and
then get everything reset back to normal again. I’m stable for now but still
hoping that this fix can get in soon to make sure the crash doesn’t happen
again.
Appreciate all your help on this.
-Paul
On Sep 6, 2021, a
Here it is.
[global]
fsid = 35d050c0-77c0-11eb-9242-2cea7ff9d07c
mon_host = [v2:10.250.50.80:3300/0,v1:10.250.50.80:6789/0]
[v2:10.250.50.81:3300/0,v1:10.250.50.81:6789/0]
[v2:10.250.50.82:3300/0,v1:10.250.50.82:6789/0]
Thanks!
Tony
From: Konstantin Sha
In previous email I was ask you to show your ceph.conf...
k
> On 8 Sep 2021, at 22:20, Tony Liu wrote:
>
> Sorry Konstantin, I didn't get it. Could you elaborate a bit?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email
Hi Samuel,
I am not really that fortunate to be in an environment that would allow to
trace and resolve incidents with a ceph production cluster quickly before
clients start complaining.
So I am more or less forced to choose a path that is least likely to fail. Rbd
has been longer around than c
The bug appears to have already been reported:
https://tracker.ceph.com/issues/51948
Also, it should be noted that the write append bug does sometimes
occur when writing from a single client, so controlling write patterns
is not sufficient to stop data loss.
On Wed, Sep 8, 2021 at 1:39 PM Frank S
This may be just a connection string problem
k
> On 8 Sep 2021, at 19:59, Tony Liu wrote:
>
> That's what I am trying to figure out, "what exactly could cause a timeout".
> User creates 10 VMs (boot on volume and an attached volume) by Terraform,
> then destroy them. Repeat the same, it works
Thanks Ernesto.
ceph dashboard set-grafana-api-url fixed the problem. I’m not sure how it got
set to the wrong server (I am using cephadm and I’m the only administrator) but
at least it’s fixed now, so appreciate the help.
-Paul
On Sep 8, 2021, at 1:45 PM, Ernesto Puerta
mailto:epuer...@redh
Hi Paul,
You can check what's the currently set value with: [1]
$ ceph mgr dashboard get-dashboard-api-url
In some set-ups (multi-homed, proxied, ...), you might also need to set up
the user-facing IP: [2]
$ ceph dashboard set-grafana-frontend-api-url
If you're running a Cephadm-deployed cl
Rolling back to kernel 5.4 has resolved the issue.
On Tue, Sep 7, 2021 at 3:51 PM Frank Schilder wrote:
>
> Hi Nathan,
>
> > Is this the bug you are referring to? https://tracker.ceph.com/issues/37713
>
> yes, its one of them. I believe there were more such reports.
>
> > The main prod filesystem
That's what I am trying to figure out, "what exactly could cause a timeout".
User creates 10 VMs (boot on volume and an attached volume) by Terraform,
then destroy them. Repeat the same, it works fine most times, timeout happens
sometimes at different places, volume creation or volume deletion.
Sin
For some reason, the grafana dashboards in the dashboard are all pointing to a
node that does not and has never run the grafana / Prometheus services. I’m not
sure where this value is kept and how to change to back.
My two manager nodes are 10.122.242.196 and 10.122.242.198. For some reason,
t
Hello again.
I came with a different question. If a bucket has "fill_status": "OVER
100.00%" do I need to use --inconsistent-index parameter?
--inconsistent-index
When specified with bucket deletion and bypass-gc set to
true, ignores bucket index consistency.
mhnx
Just create new one with your failure domain and switch the pool rule. Then
delete old rule
k
Sent from my iPhone
> On 8 Sep 2021, at 01:11, Budai Laszlo wrote:
>
> Thank you for your answers. Yes, I'm aware of this option, but this is not
> changing the failure domain of an existing rule.
Den ons 8 sep. 2021 kl 16:32 skrev Sage Weil :
> Hi everyone,
> We set up a pad to collect Ceph-related job listings. If you're
> looking for a job, or have a Ceph-related position to advertise, take
> a look:
> https://pad.ceph.com/p/jobs
Thanks. One position added in Scandinavia.
--
May the
What is ceoh.conf for this rbd client?
k
Sent from my iPhone
> On 7 Sep 2021, at 19:54, Tony Liu wrote:
>
>
> I have OpenStack Ussuri and Ceph Octopus. Sometimes, I see timeout when create
> or delete volumes. I can see RBD timeout from cinder-volume. Has anyone seen
> such
> issue? I'd lik
Hi everyone,
We set up a pad to collect Ceph-related job listings. If you're
looking for a job, or have a Ceph-related position to advertise, take
a look:
https://pad.ceph.com/p/jobs
sage
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscr
Thanks for the tip. I’ve just been using ‘docker exec -it
/bin/bash’ to get into the containers, but those commands sound useful. I think
I’ll install cephadm on all nodes just for this.
Thanks again,
-Paul
> On Sep 8, 2021, at 10:11 AM, Eugen Block wrote:
>
> Okay, I'm glad it worked!
>
Okay, I'm glad it worked!
At first I tried cephadm rm-daemon on the bootstrap node that I
usually do all management from and it indicated that it could not
remove the daemon:
[root@cxcto-c240-j27-01 ~]# cephadm rm-daemon --name
iscsi.cxcto-c240-j27-04.lgqtxo --fsid
4a29e724-c4a6-11eb-b
Hi,
I checked our environment (Nautilus) where I enabled the RGW dashboard
integration. Please note that we don't use RGW ourselves heavily and I
don't have access to our customer's RGWs, so this might look
differently for an actual prod environment. Anyway, to get it up and
running it co
Thanks Eugen.
At first I tried cephadm rm-daemon on the bootstrap node that I usually do all
management from and it indicated that it could not remove the daemon:
[root@cxcto-c240-j27-01 ~]# cephadm rm-daemon --name
iscsi.cxcto-c240-j27-04.lgqtxo --fsid 4a29e724-c4a6-11eb-b14a-5c838f8013a5
ER
Hi,
On 06/09/2021 08:37, Lokendra Rathour wrote:
Thanks, Mathew for the Update.
The upgrade got failed for some random wired reasons, Checking further
Ceph's status shows that "Ceph health is OK" and times it gives certain
warnings but I think that is ok.
OK...
but what if we see the Versio
I assume the cluster is used in roughly the same way as before the
upgrade and the load has not increased since, correct? What is the
usual load, can you share some 'ceph daemonperf mds.' output? It
might be unrelated but have you tried to compact the OSDs belonging to
this pool, online or
We've started hitting this issue again, despite having bitmap allocator
configured. The logs just before the crash look similar to before (pasted
below).
So perhaps this isn't a hybrid allocator issue after all?
I'm still struggling to collect the full set of diags / run ceph-bluestore-tool
c
I forgot to mention, the progress not updating is a seperate bug, you
can fail the mgr (ceph mgr fail ceph1a.guidwn in your example) to
resolve that. On the monitor side, I assume you deployed using labels?
If so - just remove the label from the host where the monitor did not
start, let it fully un
This sounds a lot like: https://tracker.ceph.com/issues/51027 which is
fixed in https://github.com/ceph/ceph/pull/42690
David
On Tue, Sep 7, 2021 at 7:31 AM mabi wrote:
>
> Hello
>
> I have a test ceph octopus 16.2.5 cluster with cephadm out of 7 nodes on
> Ubuntu 20.04 LTS bare metal. I just u
If you only configured 1 iscsi gw but you see 3 running, have you
tried to destroy them with 'cephadm rm-daemon --name ...'? On the
active MGR host run 'journalctl -f' and you'll see plenty of
information, it should also contain information about the iscsi
deployment. Or run 'cephadm logs -
Hi,
from an older cloud version I remember having to increase these settings:
[DEFAULT]
block_device_allocate_retries = 300
block_device_allocate_retries_interval = 10
block_device_creation_timeout = 300
The question is what exactly could cause a timeout. You write that you
only see these ti
Dear Marc,
Is there specific reason for "not to use the cephfs for important things" ?
What the major concerns then?
thanks,
samuel
huxia...@horebdata.cn
From: Marc
Date: 2021-09-07 20:37
To: Frank Schilder
CC: ceph-users
Subject: [ceph-users] Re: Kworker 100% with ceph-msgr (after upgrad
34 matches
Mail list logo