My workaround to your single threaded performance issue was to increase the thread count of the tgtd process (I added --nr_iothreads=128 as an argument to tgtd). This does help my workload.
FWIW below are my rados bench numbers from my cluster with 1 thread: This first one is a "cold" run. This is a test pool, and it's not in use. This is the first time I've written to it in a week (but I have written to it before). Total time run: 60.049311 Total writes made: 1196 Write size: 4194304 Bandwidth (MB/sec): 79.668 Stddev Bandwidth: 80.3998 Max bandwidth (MB/sec): 208 Min bandwidth (MB/sec): 0 Average Latency: 0.0502066 Stddev Latency: 0.47209 Max latency: 12.9035 Min latency: 0.013051 This next one is the 6th run. I honestly don't understand why there is such a huge performance difference. Total time run: 60.042933 Total writes made: 2980 Write size: 4194304 Bandwidth (MB/sec): 198.525 Stddev Bandwidth: 32.129 Max bandwidth (MB/sec): 224 Min bandwidth (MB/sec): 0 Average Latency: 0.0201471 Stddev Latency: 0.0126896 Max latency: 0.265931 Min latency: 0.013211 75 OSDs, all 2TB SAS spinners. There are 9 OSD servers each has a 2GB BBU RAID cache. I have tuned my CPU c-state and freq to max, I have 8x 2.5MHz cores, so just about one core per OSD. I have 40G networking. I don't use journals, but I have the RAID cache enabled. Nick, What NFS server are you using? Jake On Thursday, July 21, 2016, Nick Fisk <n...@fisk.me.uk> wrote: > I've had a lot of pain with this, smaller block sizes are even worse. You > want to try and minimize latency at every point as there > is no buffering happening in the iSCSI stack. This means:- > > 1. Fast journals (NVME or NVRAM) > 2. 10GB or better networking > 3. Fast CPU's (Ghz) > 4. Fix CPU c-state's to C1 > 5. Fix CPU's Freq to max > > Also I can't be sure, but I think there is a metadata update happening > with VMFS, particularly if you are using thin VMDK's, this > can also be a major bottleneck. For my use case, I've switched over to NFS > as it has given much more performance at scale and less > headache. > > For the RADOS Run, here you go (400GB P3700): > > Total time run: 60.026491 > Total writes made: 3104 > Write size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 206.842 > Stddev Bandwidth: 8.10412 > Max bandwidth (MB/sec): 224 > Min bandwidth (MB/sec): 180 > Average IOPS: 51 > Stddev IOPS: 2 > Max IOPS: 56 > Min IOPS: 45 > Average Latency(s): 0.0193366 > Stddev Latency(s): 0.00148039 > Max latency(s): 0.0377946 > Min latency(s): 0.015909 > > Nick > > > -----Original Message----- > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com > <javascript:;>] On Behalf Of Horace > > Sent: 21 July 2016 10:26 > > To: w...@globe.de <javascript:;> > > Cc: ceph-users@lists.ceph.com <javascript:;> > > Subject: Re: [ceph-users] Ceph + VMware + Single Thread Performance > > > > Hi, > > > > Same here, I've read some blog saying that vmware will frequently verify > the locking on VMFS over iSCSI, hence it will have much > > slower performance than NFS (with different locking mechanism). > > > > Regards, > > Horace Ng > > > > ----- Original Message ----- > > From: w...@globe.de <javascript:;> > > To: ceph-users@lists.ceph.com <javascript:;> > > Sent: Thursday, July 21, 2016 5:11:21 PM > > Subject: [ceph-users] Ceph + VMware + Single Thread Performance > > > > Hi everyone, > > > > we see at our cluster relatively slow Single Thread Performance on the > iscsi Nodes. > > > > > > Our setup: > > > > 3 Racks: > > > > 18x Data Nodes, 3 Mon Nodes, 3 iscsi Gateway Nodes with tgt (rbd cache > off). > > > > 2x Samsung SM863 Enterprise SSD for Journal (3 OSD per SSD) and 6x WD > > Red 1TB per Data Node as OSD. > > > > Replication = 3 > > > > chooseleaf = 3 type Rack in the crush map > > > > > > We get only ca. 90 MByte/s on the iscsi Gateway Servers with: > > > > rados bench -p rbd 60 write -b 4M -t 1 > > > > > > If we test with: > > > > rados bench -p rbd 60 write -b 4M -t 32 > > > > we get ca. 600 - 700 MByte/s > > > > > > We plan to replace the Samsung SSD with Intel DC P3700 PCIe NVM'e for > > the Journal to get better Single Thread Performance. > > > > Is anyone of you out there who has an Intel P3700 for Journal an can > > give me back test results with: > > > > > > rados bench -p rbd 60 write -b 4M -t 1 > > > > > > Thank you very much !! > > > > Kind Regards !! > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com <javascript:;> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com <javascript:;> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com <javascript:;> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com