Thank you for your suggestion, tried it, really seems like the other
osds think the osd is dead(if I understand this right), however the
networking seems absolutely fine between the nodes(no issues in graphs
etc).
-13> 2018-08-08 09:13:58.466119 7fe053d41700 1 --
10.12.3.17:0/706864 <==
The formula seems correct for a 100 pg/OSD target.
> Le 8 août 2018 à 04:21, Satish Patel a écrit :
>
> Thanks!
>
> Do you have any comments on Question: 1 ?
>
> On Tue, Aug 7, 2018 at 10:59 AM, Sébastien VIGNERON
> wrote:
>> Question 2:
>>
>> ceph osd pool set-quota max_objects|max_bytes
Hi, I find an old server which mounted cephfs and has the debug files.
# cat osdc
REQUESTS 0 homeless 0
LINGER REQUESTS
BACKOFFS
# cat monc
have monmap 2 want 3+
have osdmap 3507
have fsmap.user 0
have mdsmap 55 want 56+
fs_cluster_id -1
# cat mdsc
194 mds0getattr #1036ae3
What does i
What is the load like on the osd host at the time and what does the
disk utilization look like?
Also, what does the transaction look like from one of the osds that
sends the "you died" message with debugging osd 20 and ms 1 enabled?
On Wed, Aug 8, 2018 at 5:34 PM, Josef Zelenka
wrote:
> Thank yo
Do you see "internal heartbeat not healthy" messages in the log of the
osd that suicides?
On Wed, Aug 8, 2018 at 5:45 PM, Brad Hubbard wrote:
> What is the load like on the osd host at the time and what does the
> disk utilization look like?
>
> Also, what does the transaction look like from one
Checked the system load on the host with the OSD that is suiciding
currently and it's fine, however i can see a noticeably higher IO
(around 700), though that seems rather like a symptom of the constant
flapping/attempting to come up to me(it's an SSD based Ceph so this
shouldn't cause much har
Hi All, exactly the same story today, same 8 OSDs and a lot of garbage
collection objects to process
Below is the number of "cls_rgw.cc:3284: gc_iterate_entries end_key="
entries per OSD log file
hostA:
/var/log/ceph/ceph-osd.58.log
1826467
hostB:
/var/log/ceph/ceph-osd.88.log
2924241
host
On Tue, Aug 7, 2018 at 11:41 PM Scott Petersen
wrote:
> We are using kernel 4.15.17 and we keep receiving this error
> mount.ceph: unrecognized mount option "mds_namespace", passing to kernel.
>
That message is harmless -- it just means that the userspace mount.ceph
utility doesn't do anything w
Thx for the command line, I did take a look too it what I don’t really know
what to search for, my bad….
All this flapping is due to deep-scrub when it starts on an OSD things start to
go bad.
I set out all the OSDs that were flapping the most (1 by 1 after rebalancing)
and it looks better even
Hi,
We are still blocked by this problem on our end. Glen did you or someone else
figure out something for this ?
Regards
Jocelyn Thode
From: Glen Baars [mailto:g...@onsitecomputers.com.au]
Sent: jeudi, 2 août 2018 05:43
To: Erik McCormick
Cc: Thode Jocelyn ; Vasu Kulkarni ;
ceph-users@lists
So your OSDs are really too busy to respond heartbeats.
You'll be facing this for sometime until cluster loads get lower.
I would set `ceph osd set nodeep-scrub` until the heavy disk IO stops.
maybe you can schedule it for enable during the night and disabling in the
morning.
Regards,
Webert Lim
You could also see open sessions at the MDS server by issuing `ceph daemon
mds.XX session ls`
Regards,
Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
On Wed, Aug 8, 2018 at 5:08 AM Zhenshi Zhou wrote:
> Hi, I find an old server which mounted ce
Hi again Frederic,
It may be worth looking at a recovery sleep.
osd recovery sleep
Description:
Time in seconds to sleep before next recovery or backfill op. Increasing this
value will slow down recovery operation while client operations will be less
impacted.
Type:
Float
Default:
0
osd re
Hi Webert,
That command shows the current sessions, whereas the server which I get the
files(osdc,mdsc,monc) disconnect for a long time.
So I cannot get useful infomation from the command you provide.
Thanks
Webert de Souza Lima 于2018年8月8日周三 下午10:10写道:
> You could also see open sessions at the
Hi Zhenshi,
if you still have the client mount hanging but no session is connected, you
probably have some PID waiting with blocked IO from cephfs mount.
I face that now and then and the only solution is to reboot the server, as
you won't be able to kill a process with pending IO.
Regards,
Weber
I'm not using this feature, so maybe I'm missing something, but from
the way I understand cluster naming to work...
I still don't understand why this is blocking for you. Unless you are
attempting to mirror between two clusters running on the same hosts
(why would you do this?) then systemd doesn'
Hi,
Is there any other way excpet rebooting the server when the client hangs?
If the server is in production environment, I can't restart it everytime.
Webert de Souza Lima 于2018年8月8日周三 下午10:33写道:
> Hi Zhenshi,
>
> if you still have the client mount hanging but no session is connected,
> you pro
You can only try to remount the cephs dir. It will probably not work,
giving you I/O Errors, so the fallback would be to use a fuse-mount.
If I recall correctly you could do a lazy umount on the current dir (umount
-fl /mountdir) and remount it using the FUSE client.
it will work for new sessions
Hi John,
With regard to memory pressure; Does the cephfs fuse client also cause a
deadlock - or is this just the kernel client?
We run the fuse client on ten OSD nodes, and use parsync (parallel
rsync) to backup two beegfs systems (~1PB).
Ordinarily fuse works OK, but any OSD problems can cause
On Wed, Aug 8, 2018 at 4:46 PM Jake Grimmett wrote:
>
> Hi John,
>
> With regard to memory pressure; Does the cephfs fuse client also cause a
> deadlock - or is this just the kernel client?
TBH, I'm not expert enough on the kernel-side implementation of fuse
to say. Ceph does have the fuse_disab
On Tue, Aug 7, 2018 at 6:27 PM Raju Rangoju wrote:
> Hi,
>
>
>
> I have been running into some connection issues with the latest ceph-14
> version, so we thought the feasible solution would be to roll back the
> cluster to previous version (ceph-13.0.1) where things are known to work
> properly.
There is an undocumented part of the cephx authentication framework called
the 'auid' (auth uid) that assigns an integer identifier to cephx users
and to rados pools and allows you to craft cephx capabilities that apply
to those pools. This is leftover infrastructure from an ancient time in
wh
I looked at this a bit and it turns out anybody who's already in the slack
group can invite people with unrestricted domains. I think it's just part
of Slack that you need to specify which domains are allowed in by default?
Patrick set things up a couple years ago so I suppose our next community
ma
Hi,
I have upgraded to 12.2.7 , 2 weeks ago,
and I don't see anymore memory increase ! (can't confirm that it was related
to your patch).
Thanks again for helping !
Regards,
Alexandre Derumier
- Mail original -
De: "Zheng Yan"
À: "aderumier"
Cc: "ceph-users"
Envoyé: Mardi 29 Mai
If, in the above case, osd 13 was not too busy to respond (resource
shortage) then you need to find out why else osd 5, etc. could not
contact it.
On Wed, Aug 8, 2018 at 6:47 PM, Josef Zelenka
wrote:
> Checked the system load on the host with the OSD that is suiciding currently
> and it's fine, h
Thanks Greg.
I think I have to re-install ceph v13 from scratch then.
-Raju
From: Gregory Farnum
Sent: 09 August 2018 01:54
To: Raju Rangoju
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] permission errors rolling back ceph cluster to v13
On Tue, Aug 7, 2018 at 6:27 PM Raju Rangoju
m
Hi Erik,
The thing is that the rbd-mirror service uses the /etc/sysconfig/ceph file to
determine which configuration file to use (from CLUSTER_NAME). So you need to
set this to the name you chose for rbd-mirror to work. However setting this
CLUSTER_NAME variable in /etc/sysconfig/ceph makes it
You could try flushing out the FileStore journals off the SSD and creating
new ones elsewhere (eg, colocated). This will obviously have a substantial
impact on performance but perhaps that’s acceptable during your upgrade
window?
On Mon, Aug 6, 2018 at 12:32 PM Robert Stanford
wrote:
>
> Eugen:
28 matches
Mail list logo