Re: [ceph-users] Poor ceph cluster performance

2018-11-28 Thread Paul Emmerich
Cody : > > > And this exact problem was one of the reasons why we migrated > > everything to PXE boot where the OS runs from RAM. > > Hi Paul, > > I totally agree with and admire your diskless approach. If I may ask, > what kind of OS image do you use? 1GB footprint sounds really small. It's based

Re: [ceph-users] Poor ceph cluster performance

2018-11-27 Thread Paul Emmerich
And this exact problem was one of the reasons why we migrated everything to PXE boot where the OS runs from RAM. That kind of failure is just the worst to debug... Also, 1 GB of RAM is cheaper than a separate OS disk. -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https:

Re: [ceph-users] Poor ceph cluster performance

2018-11-27 Thread Cody
Hi everyone, Many, many thanks to all of you! The root cause was due to a failed OS drive on one storage node. The server was responsive to ping, but unable to login. After a reboot via IPMI, docker daemon failed to start due to I/O errors and dmesg complained about the failing OS disk. I failed

Re: [ceph-users] Poor ceph cluster performance

2018-11-27 Thread Vitaliy Filippov
CPU: 2 x E5-2603 @1.8GHz RAM: 16GB Network: 1G port shared for Ceph public and cluster traffics Journaling device: 1 x 120GB SSD (SATA3, consumer grade) OSD device: 2 x 2TB 7200rpm spindle (SATA3, consumer grade) 0.84 MB/s sequential write is impossibly bad, it's not normal with any kind of de

Re: [ceph-users] Poor ceph cluster performance

2018-11-27 Thread Darius Kasparavičius
Hi, Most likely the issue is with your consumer grade journal ssd. Run this to your ssd to check if it performs: fio --filename= --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test On Tue, Nov 27, 2018 at 2:06 AM Cody wro

Re: [ceph-users] Poor ceph cluster performance

2018-11-26 Thread Stefan Kooman
Quoting Cody (codeology@gmail.com): > The Ceph OSD part of the cluster uses 3 identical servers with the > following specifications: > > CPU: 2 x E5-2603 @1.8GHz > RAM: 16GB > Network: 1G port shared for Ceph public and cluster traffics This will hamper throughput a lot. > Journaling device

[ceph-users] Poor ceph cluster performance

2018-11-26 Thread Cody
Hello, I have a Ceph cluster deployed together with OpenStack using TripleO. While the Ceph cluster shows a healthy status, its performance is painfully slow. After eliminating a possibility of network issues, I have zeroed in on the Ceph cluster itself, but have no experience in further debugging