[ceph-users] Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout)

2019-07-26 Thread Ajitha Robert
I have a rbd mirroring setup with primary and secondary clusters as peers and I have a pool enabled image mode.., In this i created a rbd image , enabled with journaling. But whenever i enable mirroring on the image, I m getting error in rbdmirror.log and osd.log. I have increased the timeouts..

Re: [ceph-users] [Disarmed] Re: ceph-ansible firewalld blocking ceph comms

2019-07-26 Thread Nathan Harper
The firewalld service 'ceph' includes the range of ports required. Not sure why it helped, but after a reboot of each OSD node the issue went away! On Thu, 25 Jul 2019 at 23:14, wrote: > Nathan; > > I'm not an expert on firewalld, but shouldn't you have a list of open > ports? > > ports: ?

[ceph-users] loaded dup inode (but no mds crash)

2019-07-26 Thread Dan van der Ster
Hi all, Last night we had 60 ERRs like this: 2019-07-26 00:56:44.479240 7efc6cca1700 0 mds.2.cache.dir(0x617) _fetched badness: got (but i already had) [inode 0x10006289992 [...2,head] ~mds2/stray1/10006289992 auth v14438219972 dirtyparent s=116637332 nl=8 n(v0 rc2019-07-26 00:56:17.199090 b116

[ceph-users] MDS / CephFS behaviour with unusual directory layout

2019-07-26 Thread Stefan Kooman
Hi List, We are planning to move a filesystem workload (currently nfs) to CephFS. It's around 29 TB. The unusual thing here is the amount of directories in use to host the files. In order to combat a "too many files in one directory" scenario a "let's make use of recursive directories" approach. N

Re: [ceph-users] Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout)

2019-07-26 Thread Mykola Golub
On Fri, Jul 26, 2019 at 12:31:59PM +0530, Ajitha Robert wrote: > I have a rbd mirroring setup with primary and secondary clusters as peers > and I have a pool enabled image mode.., In this i created a rbd image , > enabled with journaling. > But whenever i enable mirroring on the image, I m getti

Re: [ceph-users] Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout)

2019-07-26 Thread Ajitha Robert
Thank you for the clarification. But i was trying with openstack-cinder.. when i load some data into the volume around 50gb, the image sync will stop by 5 % or something within 15%... What could be the reason? On Fri, Jul 26, 2019 at 3:01 PM Mykola Golub wrote: > On Fri, Jul 26, 2019 at 1

Re: [ceph-users] Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout)

2019-07-26 Thread Ajitha Robert
Thank you for the clarification. But i was trying with openstack-cinder.. when i load some data into the volume around 50gb, either it will say ImageReplayer: 0x7f7264016c50 [17/244d1ab5-8147-45ed-8cd1-9b3613f1f104] handle_shut_down: mirror image no longer exists or the image sync will stop by

Re: [ceph-users] New best practices for osds???

2019-07-26 Thread Mark Nelson
On 7/25/19 9:27 PM, Anthony D'Atri wrote: We run few hundred HDD OSDs for our backup cluster, we set one RAID 0 per HDD in order to be able to use -battery protected- write cache from the RAID controller. It really improves performance, for both bluestore and filestore OSDs. Having run someth

[ceph-users] Adding block.db afterwards

2019-07-26 Thread Frank Rothenstein
Hi, I'm running a small (3 hosts) ceph cluster. ATM I want to speed up my cluster by adding seperate block.db SSDs. OSDs at creation were pure spinning HDDs, no "--block.db /dev/sdxx"-parameter. So there is no symlink block.db in /var/lib/ceph/osd/ceph-xx/ There is a "ceph-bluestore-tool bluefs-bd

Re: [ceph-users] Adding block.db afterwards

2019-07-26 Thread Igor Fedotov
Hi Frank, you can specify new db size in the following way: CEPH_ARGS="--bluestore-block-db-size 107374182400" ceph-bluestore-tool bluefs-bdev-new-db Thanks, Igor On 7/26/2019 2:49 PM, Frank Rothenstein wrote: Hi, I'm running a small (3 hosts) ceph cluster. ATM I want to speed up my

Re: [ceph-users] How to add 100 new OSDs...

2019-07-26 Thread Peter Sabaini
What kind of commit/apply latency increases have you seen when adding a large numbers of OSDs? I'm nervous how sensitive workloads might react here, esp. with spinners. cheers, peter. On 24.07.19 20:58, Reed Dier wrote: > Just chiming in to say that this too has been my preferred method for > add

Re: [ceph-users] How to add 100 new OSDs...

2019-07-26 Thread Stefan Kooman
Quoting Peter Sabaini (pe...@sabaini.at): > What kind of commit/apply latency increases have you seen when adding a > large numbers of OSDs? I'm nervous how sensitive workloads might react > here, esp. with spinners. You mean when there is backfilling going on? Instead of doing "a big bang" you ca

Re: [ceph-users] Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout)

2019-07-26 Thread Mykola Golub
On Fri, Jul 26, 2019 at 04:40:35PM +0530, Ajitha Robert wrote: > Thank you for the clarification. > > But i was trying with openstack-cinder.. when i load some data into the > volume around 50gb, the image sync will stop by 5 % or something within > 15%... What could be the reason? I suppose you

Re: [ceph-users] Error in ceph rbd mirroring(rbd::mirror::InstanceWatcher: C_NotifyInstanceRequestfinish: resending after timeout)

2019-07-26 Thread Jason Dillaman
On Fri, Jul 26, 2019 at 9:26 AM Mykola Golub wrote: > > On Fri, Jul 26, 2019 at 04:40:35PM +0530, Ajitha Robert wrote: > > Thank you for the clarification. > > > > But i was trying with openstack-cinder.. when i load some data into the > > volume around 50gb, the image sync will stop by 5 % or som

Re: [ceph-users] pools limit

2019-07-26 Thread M Ranga Swami Reddy
ceph using for RBD only On Wed, Jul 24, 2019 at 12:55 PM Wido den Hollander wrote: > > > On 7/16/19 6:53 PM, M Ranga Swami Reddy wrote: > > Thanks for your reply.. > > Here, new pool creations and pg auto scale may cause rebalance..which > > impact the ceph cluster performance.. > > > > Please s

Re: [ceph-users] MDS / CephFS behaviour with unusual directory layout

2019-07-26 Thread Nathan Fish
MDS CPU load is proportional to metadata ops/second. MDS RAM cache is proportional to # of files (including directories) in the working set. Metadata pool size is proportional to total # of files, plus everything in the RAM cache. I have seen that the metadata pool can balloon 8x between being idle

Re: [ceph-users] MDS / CephFS behaviour with unusual directory layout

2019-07-26 Thread Stefan Kooman
Quoting Nathan Fish (lordci...@gmail.com): > MDS CPU load is proportional to metadata ops/second. MDS RAM cache is > proportional to # of files (including directories) in the working set. > Metadata pool size is proportional to total # of files, plus > everything in the RAM cache. I have seen that

Re: [ceph-users] MDS / CephFS behaviour with unusual directory layout

2019-07-26 Thread Nathan Fish
Ok, great. Some numbers for you: I have a filesystem of 50 million files, 5.4 TB. The data pool is on HDD OSDs with Optane DB/WAL, size=3. The metadata pool (Optane OSDs) has 17GiB "stored", 20GiB "used", at size=3. 5.18M objects. When doing parallel rsyncs, with ~14M inodes open, the MDS cache goe

Re: [ceph-users] MDS / CephFS behaviour with unusual directory layout

2019-07-26 Thread Burkhard Linke
Hi, one particular interesting point in setups with a large number of active files/caps is the failover. If your MDS fails (assuming single MDS, multiple MDS with multiple active ranks behave in the same way for _each_ rank), the monitors will detect the failure and update the mds map. Cep

Re: [ceph-users] MDS / CephFS behaviour with unusual directory layout

2019-07-26 Thread Nathan Fish
Yes, definitely enable standby-replay. I saw sub-second failovers with standby-replay, but when I restarted the new rank 0 (previously 0-s) while the standby was syncing up to become 0-s, the failover took several minutes. This was with ~30GiB of cache. On Fri, Jul 26, 2019 at 12:41 PM Burkhard Li

Re: [ceph-users] Upgrading and lost OSDs

2019-07-26 Thread Alfredo Deza
On Thu, Jul 25, 2019 at 7:00 PM Bob R wrote: > I would try 'mv /etc/ceph/osd{,.old}' then run 'ceph-volume simple scan' > again. We had some problems upgrading due to OSDs (perhaps initially > installed as firefly?) missing the 'type' attribute and iirc the > 'ceph-volume simple scan' command re

Re: [ceph-users] How to add 100 new OSDs...

2019-07-26 Thread Peter Sabaini
On 26.07.19 15:03, Stefan Kooman wrote: > Quoting Peter Sabaini (pe...@sabaini.at): >> What kind of commit/apply latency increases have you seen when adding a >> large numbers of OSDs? I'm nervous how sensitive workloads might react >> here, esp. with spinners. > > You mean when there is backfilli

Re: [ceph-users] Should I use "rgw s3 auth order = local, external"

2019-07-26 Thread Christian
Hi, I found this (rgw s3 auth order = local, external) on the web: https://opendev.org/openstack/charm-ceph-radosgw/commit/3e54b570b1124354704bd5c35c93dce6d260a479 Which is seemingly exactly what I need for circumventing higher latency when switching on keystone authentication. In fact it even im

Re: [ceph-users] New best practices for osds???

2019-07-26 Thread Anthony D'Atri
> This is worse than I feared, but very much in the realm of concerns I > had with using single-disk RAID0 setups.? Thank you very much for > posting your experience!? My money would still be on using *high write > endurance* NVMes for DB/WAL and whatever I could afford for block.? yw. Of cou