Re: [ceph-users] pros/cons of multiple OSD's per host

2017-08-22 Thread Nick Tan
Hi Christian, > > Hi David, > > > > The planned usage for this CephFS cluster is scratch space for an image > > processing cluster with 100+ processing nodes. > > Lots of clients, how much data movement would you expect, how many images > come in per timeframe, lets say an hour? > Typical size o

[ceph-users] Cache tier unevictable objects

2017-08-22 Thread Eugen Block
Hi list, we have a productive Hammer cluster for our OpenStack cloud and recently a colleague added a cache tier consisting of 2 SSDs and also a pool size of 2, we're still experimenting with this topic. Now we have some hardware maintenance to do and need to shutdown nodes, one at a time

[ceph-users] WBThrottle

2017-08-22 Thread Ranjan Ghosh
Hi Ceph gurus, I've got the following problem with our Ceph installation (Jewel): There are various websites served from the CephFS mount. Sometimes, when I copy many new (large?) files onto this mount, it seems that after a certain delay, everything grinds to a halt. No websites are served;

Re: [ceph-users] Cache tier unevictable objects

2017-08-22 Thread Christian Balzer
On Tue, 22 Aug 2017 09:54:34 + Eugen Block wrote: > Hi list, > > we have a productive Hammer cluster for our OpenStack cloud and > recently a colleague added a cache tier consisting of 2 SSDs and also > a pool size of 2, we're still experimenting with this topic. > Risky, but I guess you

Re: [ceph-users] pros/cons of multiple OSD's per host

2017-08-22 Thread Christian Balzer
Hello, On Tue, 22 Aug 2017 16:51:47 +0800 Nick Tan wrote: > Hi Christian, > > > > > > Hi David, > > > > > > The planned usage for this CephFS cluster is scratch space for an image > > > processing cluster with 100+ processing nodes. > > > > Lots of clients, how much data movement would you

Re: [ceph-users] RBD encryption options?

2017-08-22 Thread Marc Roos
I had some issues with the iscsi software starting to early, maybe this can give you some ideas. systemctl show target.service -p After mkdir /etc/systemd/system/target.service.d cat << 'EOF' > /etc/systemd/system/target.service.d/10-waitforrbd.conf [Unit] After=systemd-journald.socket sys-

[ceph-users] Blocked requests problem

2017-08-22 Thread Ramazan Terzi
Hello, I have a Ceph Cluster with specifications below: 3 x Monitor node 6 x Storage Node (6 disk per Storage Node, 6TB SATA Disks, all disks have SSD journals) Distributed public and private networks. All NICs are 10Gbit/s osd pool default size = 3 osd pool default min size = 2 Ceph version is

Re: [ceph-users] PG reported as inconsistent in status, but no inconsistencies visible to rados

2017-08-22 Thread Edward R Huyer
Neat, hadn't seen that command before. Here's the fsck log from the primary OSD: https://pastebin.com/nZ0H5ag3 Looks like the OSD's bluestore "filesystem" itself has some underlying errors, though I'm not sure what to do about them. -Original Message- From: Brad Hubbard [mailto:bhubb.

Re: [ceph-users] Blocked requests problem

2017-08-22 Thread Ranjan Ghosh
Hi Ramazan, I'm no Ceph expert, but what I can say from my experience using Ceph is: 1) During "Scrubbing", Ceph can be extremely slow. This is probably where your "blocked requests" are coming from. BTW: Perhaps you can even find out which processes are currently blocking with: ps aux | grep

[ceph-users] Help with file system with failed mds daemon

2017-08-22 Thread Bryan Banister
Hi all, I'm still new to ceph and cephfs. Trying out the multi-fs configuration on at Luminous test cluster. I shutdown the cluster to do an upgrade and when I brought the cluster back up I now have a warnings that one of the file systems has a failed mds daemon: 2017-08-21 17:00:00.81 m

Re: [ceph-users] Help with file system with failed mds daemon

2017-08-22 Thread John Spray
On Tue, Aug 22, 2017 at 4:58 PM, Bryan Banister wrote: > Hi all, > > > > I’m still new to ceph and cephfs. Trying out the multi-fs configuration on > at Luminous test cluster. I shutdown the cluster to do an upgrade and when > I brought the cluster back up I now have a warnings that one of the f

[ceph-users] OSDs in EC pool flapping

2017-08-22 Thread george.vasilakakos
Hey folks, I'm staring at a problem that I have found no solution for and which is causing major issues. We've had a PG go down with the first 3 OSDs all crashing and coming back only to crash again with the following error in their logs: -1> 2017-08-22 17:27:50.961633 7f4af4057700 -1 osd.

Re: [ceph-users] Blocked requests problem

2017-08-22 Thread Ramazan Terzi
Hi Ranjan, Thanks for your reply. I did set scrub and nodeep-scrub flags. But active scrubbing operation can’t working properly. Scrubbing operation always in same pg (20.1e). $ ceph pg dump | grep scrub dumped all in format plain pg_stat objects mip degrmispunf bytes log

Re: [ceph-users] Blocked requests problem

2017-08-22 Thread Ranjan Ghosh
Hm. That's quite weird. On our cluster, when I set "noscrub", "nodeep-scrub", scrubbing will always stop pretty quickly (a few minutes). I wonder why this doesnt happen on your cluster. When exactly did you set the flag? Perhaps it just needs some more time... Or there might be a disk problem w

[ceph-users] Small-cluster performance issues

2017-08-22 Thread fcid
Hello everyone, I've been using ceph to provide storage using RBD for 60 KVM virtual machines running on proxmox. The ceph cluster we have is very small (2 OSDs + 1 mon per node, and a total of 3 nodes) and we are having some performace issues, like big latency times (apply lat:~0.5 s; commi

Re: [ceph-users] OSDs in EC pool flapping

2017-08-22 Thread Paweł Woszuk
Have you experienced huge memory consumption by flapping OSD daemons? Restart could be triggered by no memory (omkiller). If yes,this could be connected with osd device error,(bad blocks?), but we've experienced something similar on Jewel, not Kraken release. Solution was to find PG that cause

Re: [ceph-users] Small-cluster performance issues

2017-08-22 Thread Maged Mokhtar
It is likely your 2 spinning disks cannot keep up with the load. Things are likely to improve if you double your OSDs hooking them up to your existing SSD journal. Technically it would be nice to run a load/performance tool (either atop/collectl/sysstat) and measure how busy your resources are, but

Re: [ceph-users] Exclusive-lock Ceph

2017-08-22 Thread lista
Dears,   Some days  ago, I read about this comands rbd lock add  and rbd lock remove , this commands will go maintened in ceph in future versions, or the better form, to use lock in ceph, will go exclusive-lock and this commands will go depreciated ?   Thanks a Lot, Marcelo Em 24/07/2017, Jaso

Re: [ceph-users] Small-cluster performance issues

2017-08-22 Thread Mazzystr
Also examine your network layout. Any saturation in the private cluster network or client facing network will be felt in clients / libvirt / virtual machines As OSD count increases... - Ensure client network private cluster network seperation - different nics, different wires, different sw

Re: [ceph-users] Help with file system with failed mds daemon

2017-08-22 Thread Bryan Banister
Hi John, Seems like you're right... strange that it seemed to work with only one mds before I shut the cluster down. Here is the `ceph fs get` output for the two file systems: [root@carf-ceph-osd15 ~]# ceph fs get carf_ceph_kube01 Filesystem 'carf_ceph_kube01' (2) fs_name carf_ceph_kube0

Re: [ceph-users] Small-cluster performance issues

2017-08-22 Thread David Turner
I would run some benchmarking throughout the cluster environment to see where your bottlenecks are before putting time and money into something that might not be your limiting resource. Sebastian Han put together a great guide for benchmarking your cluster here. https://www.sebastien-han.fr/blog/

Re: [ceph-users] Help with file system with failed mds daemon

2017-08-22 Thread John Spray
On Tue, Aug 22, 2017 at 8:49 PM, Bryan Banister wrote: > Hi John, > > > > Seems like you're right... strange that it seemed to work with only one mds > before I shut the cluster down. Here is the `ceph fs get` output for the > two file systems: > > > > [root@carf-ceph-osd15 ~]# ceph fs get carf_c

Re: [ceph-users] Help with file system with failed mds daemon

2017-08-22 Thread Bryan Banister
All sounds right to me... looks like this is a little too bleeding edge for my taste! I'll probably drop it at this point and just wait till we are actually on a 4.8 kernel before checking on status again. Thanks for your help! -Bryan -Original Message- From: John Spray [mailto:jsp...@

[ceph-users] Anybody gotten boto3 and ceph RGW working?

2017-08-22 Thread Bryan Banister
Hello, I have the boto python API working with our ceph cluster but haven't figured out a way to get boto3 to communicate yet to our RGWs. Anybody have a simple example? Cheers for any help! -Bryan Note: This email is for the confidential use of the named add

Re: [ceph-users] Small-cluster performance issues

2017-08-22 Thread fcid
Thanks for your advices Maged, Chris I'll answer bellow On 08/22/2017 04:30 PM, Mazzystr wrote: Also examine your network layout. Any saturation in the private cluster network or client facing network will be felt in clients / libvirt / virtual machines As OSD count increases... * Ensur

Re: [ceph-users] Small-cluster performance issues

2017-08-22 Thread fcid
Hi David I'll try to perform these tests soon. Thank you. On 08/22/2017 04:52 PM, David Turner wrote: I would run some benchmarking throughout the cluster environment to see where your bottlenecks are before putting time and money into something that might not be your limiting resource. Seb

Re: [ceph-users] ceph-fuse hanging on df with ceph luminous >= 12.1.3

2017-08-22 Thread Patrick Donnelly
On Mon, Aug 21, 2017 at 5:37 PM, Alessandro De Salvo wrote: > Hi, > > when trying to use df on a ceph-fuse mounted cephfs filesystem with ceph > luminous >= 12.1.3 I'm having hangs with the following kind of messages in > the logs: > > > 2017-08-22 02:20:51.094704 7f80addb7700 0 client.174216 ms_

Re: [ceph-users] pros/cons of multiple OSD's per host

2017-08-22 Thread Nick Tan
Thanks for the advice Christian. I think I'm leaning more towards the 'traditional' storage server with 12 disks - as you say they give a lot more flexibility with the performance tuning/network options etc. The cache pool is an interesting idea but as you say it can get quite expensive for the c

Re: [ceph-users] pros/cons of multiple OSD's per host

2017-08-22 Thread Christian Balzer
On Wed, 23 Aug 2017 13:38:25 +0800 Nick Tan wrote: > Thanks for the advice Christian. I think I'm leaning more towards the > 'traditional' storage server with 12 disks - as you say they give a lot > more flexibility with the performance tuning/network options etc. > > The cache pool is an intere