Any other options or ideas? Thanks, Dinu
On Oct 31, 2013, at 6:35 PM, Dinu Vlad <dinuvla...@gmail.com> wrote: > > I tested the osd performance from a single node. For this purpose I deployed > a new cluster (using ceph-deploy, as before) and on fresh/repartitioned > drives. I created a single pool, 1800 pgs. I ran the rados bench both on the > osd server and on a remote one. Cluster configuration stayed "default", with > the same additions about xfs mount & mkfs.xfs as before. > > With a single host, the pgs were "stuck unclean" (active only, not > active+clean): > > # ceph -s > cluster ffd16afa-6348-4877-b6bc-d7f9d82a4062 > health HEALTH_WARN 1800 pgs stuck unclean > monmap e1: 3 mons at > {cephmon1=10.4.0.250:6789/0,cephmon2=10.4.0.251:6789/0,cephmon3=10.4.0.252:6789/0}, > election epoch 4, quorum 0,1,2 cephmon1,cephmon2,cephmon3 > osdmap e101: 18 osds: 18 up, 18 in > pgmap v1055: 1800 pgs: 1800 active; 0 bytes data, 732 MB used, 16758 GB / > 16759 GB avail > mdsmap e1: 0/0/1 up > > > Test results: > Local test, 1 process, 16 threads: 241.7 MB/s > Local test, 8 processes, 128 threads: 374.8 MB/s > Remote test, 1 process, 16 threads: 231.8 MB/s > Remote test, 8 processes, 128 threads: 366.1 MB/s > > Maybe it's just me, but it seems on the low side too. > > Thanks, > Dinu > > > On Oct 30, 2013, at 8:59 PM, Mark Nelson <mark.nel...@inktank.com> wrote: > >> On 10/30/2013 01:51 PM, Dinu Vlad wrote: >>> Mark, >>> >>> The SSDs are >>> http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/ssd/enterprise-sata-ssd/?sku=ST240FN0021 >>> and the HDDs are >>> http://www.seagate.com/internal-hard-drives/enterprise-hard-drives/hdd/constellation/?sku=ST91000640SS. >>> >>> The chasis is a "SiliconMechanics C602" - but I don't have the exact model. >>> It's based on Supermicro, has 24 slots front and 2 in the back and a SAS >>> expander. >>> >>> I did a fio test (raw partitions, 4M blocksize, ioqueue maxed out according >>> to what the driver reports in dmesg). here are the results (filtered): >>> >>> Sequential: >>> Run status group 0 (all jobs): >>> WRITE: io=176952MB, aggrb=2879.0MB/s, minb=106306KB/s, maxb=191165KB/s, >>> mint=60444msec, maxt=61463msec >>> >>> Individually, the HDDs had best:worst 103:109 MB/s while the SSDs gave >>> 153:189 MB/s >> >> Ok, that looks like what I'd expect to see given the controller being used. >> SSDs are probably limited by total aggregate throughput. >> >>> >>> Random: >>> Run status group 0 (all jobs): >>> WRITE: io=106868MB, aggrb=1727.2MB/s, minb=67674KB/s, maxb=106493KB/s, >>> mint=60404msec, maxt=61875msec >>> >>> Individually (best:worst) HDD 71:73 MB/s, SSD 68:101 MB/s (with only one >>> out of 6 doing 101) >>> >>> This is on just one of the osd servers. >> >> Where the ceph tests to one OSD server or across all servers? It might be >> worth trying tests against a single server with no replication using >> multiple rados bench instances and just seeing what happens. >> >>> >>> Thanks, >>> Dinu >>> >>> >>> On Oct 30, 2013, at 6:38 PM, Mark Nelson <mark.nel...@inktank.com> wrote: >>> >>>> On 10/30/2013 09:05 AM, Dinu Vlad wrote: >>>>> Hello, >>>>> >>>>> I've been doing some tests on a newly installed ceph cluster: >>>>> >>>>> # ceph osd create bench1 2048 2048 >>>>> # ceph osd create bench2 2048 2048 >>>>> # rbd -p bench1 create test >>>>> # rbd -p bench1 bench-write test --io-pattern rand >>>>> elapsed: 483 ops: 396579 ops/sec: 820.23 bytes/sec: 2220781.36 >>>>> >>>>> # rados -p bench2 bench 300 write --show-time >>>>> # (run 1) >>>>> Total writes made: 20665 >>>>> Write size: 4194304 >>>>> Bandwidth (MB/sec): 274.923 >>>>> >>>>> Stddev Bandwidth: 96.3316 >>>>> Max bandwidth (MB/sec): 748 >>>>> Min bandwidth (MB/sec): 0 >>>>> Average Latency: 0.23273 >>>>> Stddev Latency: 0.262043 >>>>> Max latency: 1.69475 >>>>> Min latency: 0.057293 >>>>> >>>>> These results seem to be quite poor for the configuration: >>>>> >>>>> MON: dual-cpu Xeon E5-2407 2.2 GHz, 48 GB RAM, 2xSSD for OS >>>>> OSD: dual-cpu Xeon E5-2620 2.0 GHz, 64 GB RAM, 2xSSD for OS (on-board >>>>> controller), 18 HDD 1TB 7.2K rpm SAS for OSD drives and 6 SSDs (SATA) for >>>>> journal, attached to a LSI 9207-8i controller. >>>>> All servers have dual 10GE network cards, connected to a pair of >>>>> dedicated switches. Each SSD has 3 10 GB partitions for journals. >>>> >>>> Agreed, you should see much higher throughput with that kind of storage >>>> setup. What brand/model SSDs are these? Also, what brand and model of >>>> chassis? With 24 drives and 8 SSDs I could push 2GB/s (no replication >>>> though) with a couple of concurrent rados bench processes going on our >>>> SC847A chassis, so ~550MB/s aggregate throughput for 18 drives and 6 SSDs >>>> is definitely on the low side. >>>> >>>> I'm actually not too familiar with what the RBD benchmarking commands are >>>> doing behind the scenes. Typically I've tested fio on top of a filesystem >>>> on RBD. >>>> >>>>> >>>>> Using ubuntu 13.04, ceph 0.67.4, XFS for backend storage. Cluster was >>>>> installed using ceph-deploy. ceph.conf pretty much out of the box (diff >>>>> from default follows) >>>>> >>>>> osd_journal_size = 10240 >>>>> osd mount options xfs = "rw,noatime,nobarrier,inode64" >>>>> osd mkfs options xfs = "-f -i size=2048" >>>>> >>>>> [osd] >>>>> public network = 10.4.0.0/24 >>>>> cluster network = 10.254.254.0/24 >>>>> >>>>> All tests were run from a server outside the cluster, connected to the >>>>> storage network with 2x 10 GE nics. >>>>> >>>>> I've done a few other tests of the individual components: >>>>> - network: avg. 7.6 Gbit/s (iperf, mtu=1500), 9.6 Gbit/s (mtu=9000) >>>>> - md raid0 write across all 18 HDDs - 1.4 GB/s sustained throughput >>>>> - fio SSD write (xfs, 4k blocks, directio): ~ 250 MB/s, ~55K IOPS >>>> >>>> What you might want to try doing is 4M direct IO writes using libaio and a >>>> high iodepth to all drives (spinning disks and SSDs) concurrently and see >>>> how both the per-drive and aggregate throughput is. >>>> >>>> With just SSDs, I've been able to push the 9207-8i up to around 3GB/s with >>>> Ceph writes (1.5GB/s if you don't count journal writes), but perhaps there >>>> is something interesting about the way the hardware is setup on your >>>> system. >>>> >>>>> >>>>> I'd appreciate any suggestion that might help improve the performance or >>>>> identify a bottleneck. >>>>> >>>>> Thanks >>>>> Dinu >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com