> On 02 Sep 2015, at 17:50, Robert LeBlanc <rob...@leblancnet.us> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> Thanks for the responses.
> 
> I forgot to include the fio test for completeness:
> 
> 8 job QD=8
> [ext4-test]
> runtime=150
> name=ext4-test
> readwrite=randrw
> size=15G
> blocksize=4k
> ioengine=sync
> iodepth=8
> numjobs=8
> thread
> group_reporting
> time_based
> direct=1
> 
> 
> 1 job QD=1
> [ext4-test]
> runtime=150
> name=ext4-test
> readwrite=randrw
> size=15G
> blocksize=4k
> ioengine=sync
> iodepth=1
> numjobs=1
> thread
> group_reporting
> time_based
> direct=1
> 
> I have not disabled all of the power management, I've only prevented the CPU 
> from going to an idle state below C1. I'll have to check on Jan's suggestion 
> of swapping out the intel_idle driver to see what difference it makes. I did 
> not run powertop as I did the testing because it (or cpupower monitor) 
> impacted performance and would have thrown off the results. I'll do some runs 
> with lower clocks and make sure that it is staying at the lower speeds. Here 
> is some additional output:

AFAIK TurboBoost doesn't kick in unless some cores are in C2, someone should go 
and take a look at the specs :-)
> 
> # cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor              
> userspace
> # cpupower monitor
>     |Nehalem                    || Mperf              || Idle_Stats         
> CPU | C3   | C6   | PC3  | PC6  || C0   | Cx   | Freq || POLL | C1-A | C6-A 
>    0|  0.00| 94.19|  0.00|  0.00||  5.70| 94.30|  1299||  0.00|  0.00| 94.32
>    1|  0.00| 99.39|  0.00|  0.00||  0.53| 99.47|  1298||  0.00|  0.00| 99.48
>    2|  0.00| 99.60|  0.00|  0.00||  0.38| 99.62|  1299||  0.00|  0.00| 99.61
>    3|  0.00| 99.63|  0.00|  0.00||  0.36| 99.64|  1299||  0.00|  0.00| 99.64
>    4|  0.00| 99.84|  0.00|  0.00||  0.11| 99.89|  1301||  0.00|  0.00| 99.97
>    5|  0.00| 99.57|  0.00|  0.00||  0.40| 99.60|  1299||  0.00|  0.00| 99.61
>    6|  0.00| 99.72|  0.00|  0.00||  0.27| 99.73|  1299||  0.00|  0.00| 99.73
>    7|  0.00| 99.98|  0.00|  0.00||  0.01| 99.99|  1321||  0.00|  0.00| 99.99
> # cat /sys/devices/system/cpu/cpuidle/current_driver 
> intel_idle
> 
> I then echo "1" into /dev/cpu_dma_latency. We can see that the idle time 
> moves from C6 to C1
> 
This should not work. You need to leave the file descriptor open after writing 
the value, it's not a sysfs/proc-type tunable.

> # cpupower monitor
>     |Nehalem                    || Mperf              || Idle_Stats         
> CPU | C3   | C6   | PC3  | PC6  || C0   | Cx   | Freq || POLL | C1-A | C6-A 
>    0|  0.00|  0.00|  0.00|  0.00||  0.37| 99.63|  1299||  0.00| 99.63|  0.00
>    1|  0.00|  0.00|  0.00|  0.00||  0.16| 99.84|  1299||  0.00| 99.84|  0.00
>    2|  0.00|  0.00|  0.00|  0.00||  0.47| 99.53|  1299||  0.00| 99.53|  0.00
>    3|  0.00|  0.00|  0.00|  0.00||  0.43| 99.57|  1299||  0.00| 99.57|  0.00
>    4|  0.00|  0.00|  0.00|  0.00||  0.09| 99.91|  1300||  0.00| 99.91|  0.00
>    5|  0.00|  0.00|  0.00|  0.00||  0.06| 99.94|  1298||  0.00| 99.94|  0.00
>    6|  0.00|  0.00|  0.00|  0.00||  0.09| 99.91|  1300||  0.00| 99.91|  0.00
>    7|  0.00|  0.00|  0.00|  0.00||  0.28| 99.72|  1299||  0.00| 99.72|  0.00
> # cat /sys/devices/system/cpu/cpu0/cpuidle/state*/latency
> 0
> 2
> 15
> # cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_{min,max,cur}_freq 
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 1200000
> 2401000
> 2401000
> 2401000
> 2401000
> 2401000
> 2401000
> 2401000
> 1200000
> 1200000
> 1200000
> 1600000
> 1200000
> 1200000
> 1200000
> 1200000
> 
> Thanks for taking the time to collaborate with me on this.
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.0.2
> Comment: https://www.mailvelope.com <https://www.mailvelope.com/>
> 
> wsFcBAEBCAAQBQJV5xrBCRDmVDuy+mK58QAAWaoP/2bIKlsp+fmlViP4pFV7
> Sv+y/1nCQdNs0l2AJdiDX2l7OQrYavDh5LldJBkcmTyB74KjDJ+i88VGYkdG
> n8Q6tTbF4erw8P/gPf3DIrvQazdQm+a/6rUBpkM+MNTRyKRczxeyCu8kCNzb
> jDP7erwnj0WzCZMAA1uFLa9sMKBNxOfpK9wQR5NbQCkOcsDtprNL2KPfxrFV
> Rgk0OBGBSLtz9BE/PMYpbeqr9o1nChCp4hkg5AUcFrAuceOKdA7R8lKPIUZ6
> 0zTL1OjGsGfy/sp856poqmF02bANF9LXzmcBMKBNMO0iS89xv0YyIgRBlt/Z
> lXc4M7IWtYzbbUVAtSLcOtWrzS8Yp0hMKlPrhA7LZFrhZ4+t45mvyrS3RbiP
> RG8osdvjz58ZBS7/jk1gDZd8Xbj5bsU3n01DTFJ3CeAE2etAqgheAGlj4OTR
> kfs/g1jbYArEgnfX3jTJ2wECjfVRTrgXJGjceoYtJYbQ4Ns/0dBWpZBrkEu0
> AX4VU1dk9R1B0rootvKsWedcKvof4cSOyKRtQxGHS7ipqtkyep+1JquO41mr
> cBC9p/TOXgh90M8476G1CpMqWwWHneHJ6bjO5V1W8uWGXTNFnaGbqS4v3mWk
> ge1qukr9et0Su0llUb8Rz3hCDqD6PfMJpquBTAB/kaanS+t0pi+00wxu7zzB
> zVQ/
> =v4sY
> -----END PGP SIGNATURE-----
> 
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> On Wed, Sep 2, 2015 at 3:21 AM, Nick Fisk <n...@fisk.me.uk 
> <mailto:n...@fisk.me.uk>> wrote:
> I think this may be related to what I had to do, it rings a bell at least.
> 
> http://unix.stackexchange.com/questions/153693/cant-use-userspace-cpufreq-governor-and-set-cpu-frequency
>  
> <http://unix.stackexchange.com/questions/153693/cant-use-userspace-cpufreq-governor-and-set-cpu-frequency>
> 
> The P-state drive doesn't support userspace, so you need to disable it and 
> make Linux use the old acpi drive instead.
> 
> > -----Original Message-----
> > From: Nick Fisk [mailto:n...@fisk.me.uk <mailto:n...@fisk.me.uk>]
> > Sent: 01 September 2015 22:21
> > To: 'Robert LeBlanc' <rob...@leblancnet.us <mailto:rob...@leblancnet.us>>
> > Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > Subject: RE: [ceph-users] Ceph SSD CPU Frequency Benchmarks
> >
> > > -----Original Message-----
> > > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com 
> > > <mailto:ceph-users-boun...@lists.ceph.com>] On Behalf
> > > Of Robert LeBlanc
> > > Sent: 01 September 2015 21:48
> > > To: Nick Fisk <n...@fisk.me.uk <mailto:n...@fisk.me.uk>>
> > > Cc: ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > > Subject: Re: [ceph-users] Ceph SSD CPU Frequency Benchmarks
> > >
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA256
> > >
> > > Nick,
> > >
> > > I've been trying to replicate your results without success. Can you
> > > help me understand what I'm doing that is not the same as your test?
> > >
> > > My setup is two boxes, one is a client and the other is a server. The
> > > server has Intel(R) Atom(TM) CPU  C2750  @ 2.40GHz, 32 GB RAM and 2
> > > Intel S3500
> > > 240 GB SSD drives. The boxes have Infiniband FDR cards connected to a
> > > QDR switch using IPoIB. I set up OSDs on the 2 SSDs and set pool
> > > size=1. I mapped a 200GB RBD using the kernel module ran fio on the
> > > RBD. I adjusted the number of cores, clock speed and C-states of the
> > > server and here are my
> > > results:
> > >
> > > Adjusted core number and set the processor to a set frequency using
> > > the userspace governor.
> > >
> > > 8 jobs 8 depth   Cores
> > >                   1    2     3     4     5     6     7     8
> > > Frequency  2.4  387  762  1121  1432  1657  1900  2092  2260
> > > GHz        2    386  758  1126  1428  1657  1890  2090  2232
> > >            1.6  382  756  1127  1428  1656  1894  2083  2201
> > >            1.2  385  756  1125  1431  1656  1885  2093  2244
> > >
> >
> > I tested at QD=1 as this tends to highlight the difference in clock speed,
> > whereas a higher queue depth will probably scale with both frequency and
> > cores. I'm not sure this is your problem, but to make sure your environment
> > is doing what you want I would suggest QD=1 and 1 job to start with.
> >
> > But thank you for sharing these results regardless of your current frequency
> > scaling issues. Information like this is really useful for people trying to 
> > decide
> > on hardware purchases. Those Atom boards look like they could support 12x
> > normal HDD's quite happily, assuming 80 IOPsx12.
> >
> > I wonder if we can get enough data from various people to generate a
> > IOPs/CPU Freq for various CPU architectures?
> >
> >
> > > I then adjusted the processor to not go in a deeper sleep state than
> > > C1 and also tested setting the highest CPU frequency with the ondemand
> > governor.
> > >
> > > 1 job 1 depth
> > > Cores  1
> > >               <=C1, feq range  C0-C6, freq range  C0-C6, static freq      
> > >   <=C1, static
> > > freq
> > > Frequency 2.4  381             381                379                 381
> > > GHz       2    382             380                381                 381
> > >           1.6  380             381                379                 382
> > >           1.2  383             378                379                 383
> > > Cores  8
> > >               <=C1, feq range  C0-C6, freq range  C0-C6, static freq      
> > >   <=C1, static
> > > freq
> > > Frequency 2.4  629             580                584                 629
> > > GHz       2    630             579                584                 634
> > >           1.6  630             579                584                 634
> > >           1.2  632             581                582                 634
> > >
> > > Here I'm see a correlation between # cores and C-states, but not
> > frequency.
> > >
> > > Frequency was controlled with:
> > > cpupower frequency-set -d 1.2GHz -u 1.2GHz -g userspace and cpupower
> > > frequency-set -d 1.2GHz -u 2.0GHz -g ondemand
> > >
> > > Core count adjusted by:
> > > for i in {1..7}; do echo 0 > /sys/devices/system/cpu/cpu$i/online;
> > > done
> > >
> > > C-states controlled by:
> > > # python
> > > Python 2.7.5 (default, Jun 24 2015, 00:41:19) [GCC 4.8.3 20140911 (Red
> > > Hat 4.8.3-9)] on linux2 Type "help", "copyright", "credits" or
> > > "license" for more information.
> > > >>> fd = open('/dev/cpu_dma_latency','wb')
> > > >>> fd.write('1')
> > > >>> fd.flush()
> > > >>> fd.close() # Don't run this until the tests are completed (the
> > > >>> handle has
> > > to stay open).
> > > >>>
> > >
> > > I'd like to replicate your results. I'd also like if you can verify
> > > some of mine in your set-up around C-States and cores.
> >
> > I can't remember exactly, but I think I had to do something to get the
> > userspace governor to behave as I expected it to. I tend to recall setting 
> > the
> > frequency low and yet still seeing it bursting up to max. I will have a look
> > through my notes tomorrow and see if I can recall anything. One thing I do
> > remember though is that the Intel powertop utility was very useful in
> > confirming what the actual CPU frequency was. It might be worth installing
> > and running this and seeing what the CPU cores are doing.
> >
> >
> > >
> > > Thanks,
> > >
> > > -----BEGIN PGP SIGNATURE-----
> > > Version: Mailvelope v1.0.2
> > > Comment: https://www.mailvelope.com <https://www.mailvelope.com/>
> > >
> > >
> > wsFcBAEBCAAQBQJV5g8GCRDmVDuy+mK58QAAe6YP/j+SNGFI2z7ndnbOk87
> > > D
> > > UjxG+hiZT5bkdt2/wVfI6QiH0UGDA3rLBsttOHPgfxP6/CEy801q8/fO0QOk
> > > tLxIgX01K4ECls2uhiFAM3bhKalFsKDM6rHYFx96tIGWonQeou36ouDG8pfz
> > > YsprvQ2XZEX1+G4dfZZ4lc3A3mfIY6Wsn7DC0tup9eRp3cl9hQLXEu4Zg8CZ
> > > 7867FNaud4S4f6hYV0KUC0fv+hZvyruMCt/jgl8gVr8bAdNgiW5u862gsk5b
> > > sO9mb7H679G8t47m3xd89jTh9siMshbcakF9PXKzrN7DxBb/sBuN3GykesZA
> > > +5jdUTzPCxFu+LocJ91by8FybatpLwxycmfP2gRxd/owclXk5BqqJUnrdYVm
> > >
> > n2GcHobdHVv9k/s+iBVV0xbwqOY+IO9UNUfLAKNy7E1xtpXdTpQBuokmu/4D
> > >
> > WXg3C4u+DsZNvcziO4s/edQ1koOQm1Fcj5VnbouSqmsHpB5nHeJbGmiKNTB
> > > A
> > > 9pE/hTph56YRqOE3bq3X/ohjtziL7/e/MVF3VUisDJieaLxV9weLxKIf0W9t
> > > L7NMhX7iUIMps5ulA9qzd8qJK6yBa65BVXtk5M0A5oTA/VvxHQT6e5nSZS+Z
> > >
> > WLjavMnmSSJT1BQZ5GkVbVqo4UVjndcXEvkBm3+McaGKliO2xvxP+U3nCKpZ
> > > js+h
> > > =4WAa
> > > -----END PGP SIGNATURE-----
> > >
> > >
> > > ----------------
> > > Robert LeBlanc
> > > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> > >
> > > On Sat, Jun 13, 2015 at 8:58 AM, Nick Fisk <n...@fisk.me.uk 
> > > <mailto:n...@fisk.me.uk>> wrote:
> > > Hi All,
> > >
> > > I know there has been lots of discussions around needing fast CPU's to
> > > get the most out of SSD's. However I have never really ever seen an
> > > solid numbers to make a comparison about how much difference a faster
> > > CPU makes and if Ceph scales linearly with clockspeed. So I did a
> > > little experiment today.
> > >
> > > I setup a 1 OSD Ceph instance on a Desktop PC. The Desktop has a i5
> > > Sandbybridge CPU with the CPU turbo overclocked to 4.3ghz. By using
> > > the userspace governor in Linux, I was able to set static clock speeds
> > > to see the possible performance effects on Ceph. My pc only has an old
> > > X25M-G2 SSD, so I had to limit the IO testing to 4kb QD=1, as
> > > otherwise the SSD ran out of puff when I got to the higher clock
> > > speeds.
> > >
> > > CPU Mhz 4Kb Write IO    Min Latency (us)        Avg Latency (us)        
> > > CPU
> > > usr     CPU sys
> > > 1600            797             886                     1250
> > > 10.14           2.35
> > > 2000            815             746                     1222
> > > 8.45            1.82
> > > 2400            1161            630                     857
> > > 9.5             1.6
> > > 2800            1227            549                     812
> > > 8.74            1.24
> > > 3300            1320            482                     755
> > > 7.87            1.08
> > > 4300            1548            437                     644
> > > 7.72            0.9
> > >
> > > The figures show a fairly linear trend right through the clock range
> > > and clearly shows the importance of having fast CPU's (Ghz not cores)
> > > if you want to achieve high IO, especially at low queue depths.
> > >
> > >
> > > Things to Note
> > > These figures are from a desktop CPU, no doubt Xeons will be slightly
> > > faster at the same clock speed I assuming using the userspace governor
> > > in this way is a realistic way to simulate different CPU clock speeds?
> > > My old SSD is probably skewing the figures slightly I have complete
> > > control over the turbo settings and big cooling, many server CPU's
> > > will limit the max turbo if multiple cores are under load or get too
> > > hot Ceph SSD OSD nodes are probably best with high end E3 CPU's as
> > > they have the highest clock speeds HDD's with Journals will probably
> > > benefit slightly from higher clock speeds, if the disk isn't the
> > > bottleneck (ie small block sequential writes) These numbers are for
> > > Replica=1, at 2 or 3 these numbers will be at least half I would
> > > imagine
> > >
> > >
> > > I hope someone finds this useful
> > >
> > > Nick
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> > > <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to