No, there is no split brain problem even with size/mine_size 2/1. A PG will
not go active if it doesn't have the latest data because all other OSDs
that might have seen writes are currently offline.
That's what the history_ignore_les_bounds option effectively does: it tells
ceph to take a PG active
I have got this today again? I cannot unmount the filesystem and
looks like some osd's are having 100% cpu utilization?
-Original Message-
From: Marc Roos
Sent: maandag 20 mei 2019 12:42
To: ceph-users
Subject: [ceph-users] cephfs causing high load on vm, taking down 15 min
later
I have evicted all client connections and have still high load on osd's
And ceph osd pool stats shows still client activity?
pool fs_data id 20
client io 565KiB/s rd, 120op/s rd, 0op/s wr
-Original Message-
From: Marc Roos
Sent: dinsdag 21 mei 2019 9:51
To: ceph-users@lists.
Hi Marc
Is there any scrub / deepscrub running in the affected OSDs?
Best Regards,
Manuel
-Mensaje original-
De: ceph-users En nombre de Marc Roos
Enviado el: martes, 21 de mayo de 2019 10:01
Para: ceph-users ; Marc Roos
Asunto: Re: [ceph-users] cephfs causing high load on vm, taking d
No, but even if, I never had any issues when running multiple scrubs.
-Original Message-
From: EDH - Manuel Rios Fernandez [mailto:mrios...@easydatahost.com]
Sent: dinsdag 21 mei 2019 10:03
To: Marc Roos; 'ceph-users'
Subject: RE: [ceph-users] cephfs causing high load on vm, taking d
Should a not active mds be doing something??? When I restarted the not
active mds.c, My client io on the fs_data pool disappeared.
services:
mon: 3 daemons, quorum a,b,c
mgr: c(active), standbys: a, b
mds: cephfs-1/1/1 up {0=a=up:active}, 1 up:standby
osd: 32 osds: 32 up, 3
Hi Poul,
maybe we misunderstood each other here or I'm misunderstanding something. My HA
comment was not on PGs becoming active/inactive or data loss.
As far as I understand the discussions, the OSD flapping itself may be caused
by the 2-member HA group, because the OSDs keep marking each other
Hi Marc,
have you configured the other MDS to be standby-replay for the active
MDS? I have three MDS servers, one is active, the second is
active-standby and the third just standby. If the active fails, the
second takes over within seconds. This is what I have in my ceph.conf:
[mds.]
mds_
Den tis 21 maj 2019 kl 02:12 skrev mr. non non :
> Does anyone have this issue before? As research, many people have issue
> with rgw.index which related to small small number of index sharding (too
> many objects per index).
> I also check on this thread
> http://lists.ceph.com/pipermail/ceph-us
I have not configured anything for the msd except this
[mds]
# 100k+ files in 2 folders
mds bal fragment size max = 12
# maybe for nfs-ganesha problems?
# http://docs.ceph.com/docs/master/cephfs/eviction/
mds_session_blacklist_on_timeout = false
mds_session_blacklist_on_evict = false
mds_c
[@ceph]# ps -aux | grep D
USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND
root 12527 0.0 0.0 123520 932 pts/1D+ 09:26 0:00 umount
/home/mail-archive
root 14549 0.2 0.0 0 0 ?D09:29 0:09
[kworker/0:0]
root 23350 0.0 0.
epel-testing has a ansible 2.8 package
/Torben
On 21.05.2019 03:14, solarflow99 wrote:
> Does anyone know the necessary steps to install ansible 2.8 in rhel7? I'm
> assuming most people are doing it with pip?
>
> ___
> ceph-users mailing list
> ce
Hello Jason,
Am 20.05.19 um 23:49 schrieb Jason Dillaman:
> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote:
>> Hello cephers,
>>
>> we have a few systems which utilize a rbd-bd map/mount to get access to a
>> rbd volume.
>> (This problem seems to be related to "[ceph-users] Slow requests f
I have this on a cephfs client, I had ceph common on 12.2.11, and
upgraded to 12.2.12 while having this error. They are writing here [0]
you need to upgrade kernel and it is fixed in 12.2.2
[@~]# uname -a
Linux mail03 3.10.0-957.5.1.el7.x86_6
[Tue May 21 11:23:26 2019] libceph: mon2 192.
On Tue, May 21, 2019 at 11:28 AM Marc Schöchlin wrote:
>
> Hello Jason,
>
> Am 20.05.19 um 23:49 schrieb Jason Dillaman:
>
> On Mon, May 20, 2019 at 2:17 PM Marc Schöchlin wrote:
>
> Hello cephers,
>
> we have a few systems which utilize a rbd-bd map/mount to get access to a rbd
> volume.
> (Thi
Hi Jason,
Should we disable fstrim services inside VM which runs on top of RBD?
I recall Ubuntu OS has weekly fstrim cronjob enabled by default, while we
have to enable fstrim service manually on Debian and CentOS.
Kind regards,
Charles Alva
Sent from Gmail Mobile
On Tue, May 21, 2019, 4:49 A
On Tue, May 21, 2019 at 12:03 PM Charles Alva wrote:
>
> Hi Jason,
>
> Should we disable fstrim services inside VM which runs on top of RBD?
It has a potential to be a thundering herd issue if you have lots of
VMs all issuing discards all at the same time and your RBD images do
not have object-ma
Dear all,
I am doing some tests with Nautilus and cephfs on erasure coding pool.
I noticed something strange between k+m in my erasure profile and size+min_size
in the pool created:
> test@icadmin004:~$ ceph osd erasure-code-profile get ecpool-4-2
> crush-device-class=
> crush-failure-domain=os
Got it. Thanks for the explanation, Jason!
Kind regards,
Charles Alva
Sent from Gmail Mobile
On Tue, May 21, 2019 at 5:16 PM Jason Dillaman wrote:
> On Tue, May 21, 2019 at 12:03 PM Charles Alva
> wrote:
> >
> > Hi Jason,
> >
> > Should we disable fstrim services inside VM which runs on top
Hi,
this question comes up regularly and is been discussed just now:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034867.html
Regards,
Eugen
Zitat von Yoann Moulin :
Dear all,
I am doing some tests with Nautilus and cephfs on erasure coding pool.
I noticed something strang
I am still stuck with this situation, and do not want to restart(reset)
this host. I tried bringing down the eth connected to the client network
for a while, but after bringing it up, I am getting the same messages
-Original Message-
From: Marc Roos
Sent: dinsdag 21 mei 2019 11:42
>> I am doing some tests with Nautilus and cephfs on erasure coding pool.
>>
>> I noticed something strange between k+m in my erasure profile and
>> size+min_size in the pool created:
>>
>>> test@icadmin004:~$ ceph osd erasure-code-profile get ecpool-4-2
>>> crush-device-class=
>>> crush-failure-d
Setup:
Ceph version: 13.2.4
OpenStack release: Rocky
We have Rados GW setup with keystone integration.
Integeration seems to be working fine with a strange
issue with multipart copy operations.
Test:
Using the test program at
https://javiermunhoz.com/blog/content/mpu-part-copy/multipart-upload-co
On Tue, 21 May 2019 at 19:32, Yoann Moulin wrote:
>
> >> I am doing some tests with Nautilus and cephfs on erasure coding pool.
[...]
> > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/034867.html
>
> Oh thanks, I missed that thread, make sense. I agree with some comment that
> it i
Hi,
we gave up on the incomplete pgs since we do not have enough complete
shards to restore them. What is the procedure to get rid of these pgs?
regards,
Kevin
On 20.05.19 9:22 vorm., Kevin Flöh wrote:
Hi Frederic,
we do not have access to the original OSDs. We exported the remaining
shar
On 5/21/19 4:48 PM, Kevin Flöh wrote:
> Hi,
>
> we gave up on the incomplete pgs since we do not have enough complete
> shards to restore them. What is the procedure to get rid of these pgs?
>
You need to start with marking the OSDs as 'lost' and then you can
force_create_pg to get the PGs bac
Hi,
Thank you so much for sharing your case.
2 weeks ago, one of my users purged old swift objects with custom script
manually but didn't use object expiry feature. This might be the case.
I will leave heath_warn message if it has no impact.
Regards,
Arnondh
Fro
I'm at a new job working with Ceph again and am excited to back in the
community!
I can't find any documentation to support this, so please help me
understand if I got this right.
I've got a Jewel cluster with CephFS and we have an inconsistent PG. All
copies of the object are zero size, but the
Hello cephers,
I know that there was similar question posted 5 years ago. However the answer
was inconclusive for me.
I installed a new Nautilus 14.2.1 cluster and started pre-production testing.
I followed RedHat document and simulated a soft disk failure by
# echo 1 > /sys/block/sdc/devic
The simple answer is because k+1 is the default min_size for EC pools.
min_size means that the pool will still accept writes if that many failure
domains are still available. If you set min_size to k then you have entered
the dangerous territory that if you loose another failure domain (OSD or
host
Hi, Folks,
I just encountered an OSD being down and can not be up again. Below atttached
is the log messages. Anyone can tell what is wrong with the OSD? and what
should i do?
thanks in advance,
Samuel
***
# tail -500 /var/lo
31 matches
Mail list logo