Sorry about the repost from the cbt list, but it was suggested I post here as 
well:

I am attempting to track down some performance issues in a Ceph cluster 
recently deployed.  Our configuration is as follows:
        3 storage nodes, each with:
                - 8 Cores
                - 64GB of RAM
                - 2x 1TB 7200 RPM Spindle
                - 1x 120GB Intel SSD
                - 2x 10GBit NICs (In LACP Port-channel)

The OSD pool min_size is set to “1” and “size” is set to “3”.  When creating a 
new pool and running RADOS benchmarks, performance isn’t bad — about what I 
would expect from this hardware configuration:

WRITES:
Total writes made:      207
Write size:             4194304
Bandwidth (MB/sec):     80.017 

Stddev Bandwidth:       34.9212
Max bandwidth (MB/sec): 120
Min bandwidth (MB/sec): 0
Average Latency:        0.797667
Stddev Latency:         0.313188
Max latency:            1.72237
Min latency:            0.253286

RAND READS:
Total time run:        10.127990
Total reads made:     1263
Read size:            4194304
Bandwidth (MB/sec):    498.816 

Average Latency:       0.127821
Max latency:           0.464181
Min latency:           0.0220425

This all looks fine, until we try to use the cluster for its purpose, which is 
to house images for qemu-kvm, which are access using librbd.  I/O inside VMs 
have excessive I/O wait times (in the hundreds of ms at times, making some 
operating systems, like Windows unusable) and throughput struggles to exceed 
10MB/s (or less).  Looking at ceph health, we see very low op/s numbers as well 
as throughput and the requests blocked number seems very high.  Any ideas as to 
what to look at here?

     health HEALTH_WARN
            8 requests are blocked > 32 sec
     monmap e3: 3 mons at 
{storage-1=10.0.0.1:6789/0,storage-2=10.0.0.2:6789/0,storage-3=10.0.0.3:6789/0}
            election epoch 128, quorum 0,1,2 storage-1,storage-2,storage-3
     osdmap e69615: 6 osds: 6 up, 6 in
      pgmap v3148541: 224 pgs, 1 pools, 819 GB data, 227 kobjects
            2726 GB used, 2844 GB / 5571 GB avail
                 224 active+clean
  client io 3957 B/s rd, 3494 kB/s wr, 30 op/s

Of note, on the other list, I was asked to provide the following:
        - ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
        - The SSD is split into 8GB partitions. These 8GB partitions are used 
as journal devices, specified in /etc/ceph/ceph.conf.  For example:
                [osd.0]
                host = storage-1
                osd journal = 
/dev/mapper/INTEL_SSDSC2BB120G4_CVWL4363006R120LGNp1
        - rbd_cache is enabled and qemu cache is set to “writeback"
        - rbd_concurrent_management_ops is unset, so it appears the default is 
“10”

Thanks,

--
Kenneth Van Alstyne
Systems Architect
Knight Point Systems, LLC
Service-Disabled Veteran-Owned Business
1775 Wiehle Avenue Suite 101 | Reston, VA 20190
c: 228-547-8045 f: 571-266-3106
www.knightpoint.com 
DHS EAGLE II Prime Contractor: FC1 SDVOSB Track
GSA Schedule 70 SDVOSB: GS-35F-0646S
GSA MOBIS Schedule: GS-10F-0404Y
ISO 20000 / ISO 27001

Notice: This e-mail message, including any attachments, is for the sole use of 
the intended recipient(s) and may contain confidential and privileged 
information. Any unauthorized review, copy, use, disclosure, or distribution is 
STRICTLY prohibited. If you are not the intended recipient, please contact the 
sender by reply e-mail and destroy all copies of the original message.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to