Re: [ceph-users] Luminous - bad performance

David Turner Mon, 22 Jan 2018 09:03:07 -0800

I'm not speaking to anything other than your configuration.

"I am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 miimon=100
xmit_hash_policy=1 lacp_rate=1")  for cluster and 1 x 1GB for public"
It might not be a bad idea for you to forgo the public network on the 1Gb
interfaces and either put everything on one network or use VLANs on the
10Gb connections.  I lean more towards that in particular because your
public network doesn't have a bond on it.  Just as a note, communication
between the OSDs and the MONs are all done on the public network.  If that
interface goes down, then the OSDs are likely to be marked down/out from
your cluster.  I'm a fan of VLANs, but if you don't have the equipment or
expertise to go that route, then just using the same subnet for public and
private is a decent way to go.


On Mon, Jan 22, 2018 at 11:37 AM Steven Vacaroaia <ste...@gmail.com> wrote:

> I did test with rados bench ..here are the results
>
> rados bench -p ssdpool 300 -t 12 write --no-cleanup && rados bench -p
> ssdpool 300 -t 12  seq
>
> Total time run:         300.322608
> Total writes made:      10632
> Write size:             4194304
> Object size:            4194304
> Bandwidth (MB/sec):     141.608
> Stddev Bandwidth:       74.1065
> Max bandwidth (MB/sec): 264
> Min bandwidth (MB/sec): 0
> Average IOPS:           35
> Stddev IOPS:            18
> Max IOPS:               66
> Min IOPS:               0
> Average Latency(s):     0.33887
> Stddev Latency(s):      0.701947
> Max latency(s):         9.80161
> Min latency(s):         0.015171
>
> Total time run:       300.829945
> Total reads made:     10070
> Read size:            4194304
> Object size:          4194304
> Bandwidth (MB/sec):   133.896
> Average IOPS:         33
> Stddev IOPS:          14
> Max IOPS:             68
> Min IOPS:             3
> Average Latency(s):   0.35791
> Max latency(s):       4.68213
> Min latency(s):       0.0107572
>
>
> rados bench -p scbench256 300 -t 12 write --no-cleanup && rados bench -p
> scbench256 300 -t 12  seq
>
> Total time run:         300.747004
> Total writes made:      10239
> Write size:             4194304
> Object size:            4194304
> Bandwidth (MB/sec):     136.181
> Stddev Bandwidth:       75.5
> Max bandwidth (MB/sec): 272
> Min bandwidth (MB/sec): 0
> Average IOPS:           34
> Stddev IOPS:            18
> Max IOPS:               68
> Min IOPS:               0
> Average Latency(s):     0.352339
> Stddev Latency(s):      0.72211
> Max latency(s):         9.62304
> Min latency(s):         0.00936316
> hints = 1
>
>
> Total time run:       300.610761
> Total reads made:     7628
> Read size:            4194304
> Object size:          4194304
> Bandwidth (MB/sec):   101.5
> Average IOPS:         25
> Stddev IOPS:          11
> Max IOPS:             61
> Min IOPS:             0
> Average Latency(s):   0.472321
> Max latency(s):       15.636
> Min latency(s):       0.0188098
>
>
> On 22 January 2018 at 11:34, Steven Vacaroaia <ste...@gmail.com> wrote:
>
>> sorry ..send the message too soon
>> Here is more info
>> Vendor Id          : SEAGATE
>>                 Product Id         : ST600MM0006
>>                 State              : Online
>>                 Disk Type          : SAS,Hard Disk Device
>>                 Capacity           : 558.375 GB
>>                 Power State        : Active
>>
>> ( SSD is in slot 0)
>>
>>  megacli -LDGetProp  -Cache -LALL -a0
>>
>> Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone,
>> Direct, No Write Cache if bad BBU
>> Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive,
>> Direct, No Write Cache if bad BBU
>> Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive,
>> Direct, No Write Cache if bad BBU
>> Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive,
>> Direct, No Write Cache if bad BBU
>> Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAdaptive,
>> Direct, No Write Cache if bad BBU
>> Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAdaptive,
>> Direct, No Write Cache if bad BBU
>>
>> [root@osd01 ~]#  megacli -LDGetProp  -DskCache -LALL -a0
>>
>> Adapter 0-VD 0(target id: 0): Disk Write Cache : Disabled
>> Adapter 0-VD 1(target id: 1): Disk Write Cache : Disk's Default
>> Adapter 0-VD 2(target id: 2): Disk Write Cache : Disk's Default
>> Adapter 0-VD 3(target id: 3): Disk Write Cache : Disk's Default
>> Adapter 0-VD 4(target id: 4): Disk Write Cache : Disk's Default
>> Adapter 0-VD 5(target id: 5): Disk Write Cache : Disk's Default
>>
>>
>> CPU
>> Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
>>
>> Centos 7 kernel 3.10.0-693.11.6.el7.x86_64
>>
>> sysctl -p
>> net.ipv4.tcp_sack = 0
>> net.core.netdev_budget = 600
>> net.ipv4.tcp_window_scaling = 1
>> net.core.rmem_max = 16777216
>> net.core.wmem_max = 16777216
>> net.core.rmem_default = 16777216
>> net.core.wmem_default = 16777216
>> net.core.optmem_max = 40960
>> net.ipv4.tcp_rmem = 4096 87380 16777216
>> net.ipv4.tcp_wmem = 4096 65536 16777216
>> net.ipv4.tcp_syncookies = 0
>> net.core.somaxconn = 1024
>> net.core.netdev_max_backlog = 20000
>> net.ipv4.tcp_max_syn_backlog = 30000
>> net.ipv4.tcp_max_tw_buckets = 2000000
>> net.ipv4.tcp_tw_reuse = 1
>> net.ipv4.tcp_slow_start_after_idle = 0
>> net.ipv4.conf.all.send_redirects = 0
>> net.ipv4.conf.all.accept_redirects = 0
>> net.ipv4.conf.all.accept_source_route = 0
>> vm.min_free_kbytes = 262144
>> vm.swappiness = 0
>> vm.vfs_cache_pressure = 100
>> fs.suid_dumpable = 0
>> kernel.core_uses_pid = 1
>> kernel.msgmax = 65536
>> kernel.msgmnb = 65536
>> kernel.randomize_va_space = 1
>> kernel.sysrq = 0
>> kernel.pid_max = 4194304
>> fs.file-max = 100000
>>
>>
>> ceph.conf
>>
>>
>> public_network = 10.10.30.0/24
>> cluster_network = 192.168.0.0/24
>>
>>
>> osd_op_num_threads_per_shard = 2
>> osd_op_num_shards = 25
>> osd_pool_default_size = 2
>> osd_pool_default_min_size = 1 # Allow writing 1 copy in a degraded state
>> osd_pool_default_pg_num = 256
>> osd_pool_default_pgp_num = 256
>> osd_crush_chooseleaf_type = 1
>> osd_scrub_load_threshold = 0.01
>> osd_scrub_min_interval = 137438953472
>> osd_scrub_max_interval = 137438953472
>> osd_deep_scrub_interval = 137438953472
>> osd_max_scrubs = 16
>> osd_op_threads = 8
>> osd_max_backfills = 1
>> osd_recovery_max_active = 1
>> osd_recovery_op_priority = 1
>>
>>
>>
>>
>> debug_lockdep = 0/0
>> debug_context = 0/0
>> debug_crush = 0/0
>> debug_buffer = 0/0
>> debug_timer = 0/0
>> debug_filer = 0/0
>> debug_objecter = 0/0
>> debug_rados = 0/0
>> debug_rbd = 0/0
>> debug_journaler = 0/0
>> debug_objectcatcher = 0/0
>> debug_client = 0/0
>> debug_osd = 0/0
>> debug_optracker = 0/0
>> debug_objclass = 0/0
>> debug_filestore = 0/0
>> debug_journal = 0/0
>> debug_ms = 0/0
>> debug_monc = 0/0
>> debug_tp = 0/0
>> debug_auth = 0/0
>> debug_finisher = 0/0
>> debug_heartbeatmap = 0/0
>> debug_perfcounter = 0/0
>> debug_asok = 0/0
>> debug_throttle = 0/0
>> debug_mon = 0/0
>> debug_paxos = 0/0
>> debug_rgw = 0/0
>>
>>
>> [mon]
>> mon_allow_pool_delete = true
>>
>> [osd]
>> osd_heartbeat_grace = 20
>> osd_heartbeat_interval = 5
>> bluestore_block_db_size = 16106127360 <(610)%20612-7360>
>> bluestore_block_wal_size = 1073741824
>>
>> [osd.6]
>> host = osd01
>> osd_journal =
>> /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.1d58775a-5019-42ea-8149-a126f51a2501
>> crush_location = root=ssds host=osd01-ssd
>>
>> [osd.7]
>> host = osd02
>> osd_journal =
>> /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.683dc52d-5d69-4ff0-b5d9-b17056a55681
>> crush_location = root=ssds host=osd02-ssd
>>
>> [osd.8]
>> host = osd04
>> osd_journal =
>> /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.bd7c0088-b724-441e-9b88-9457305c541d
>> crush_location = root=ssds host=osd04-ssd
>>
>>
>> On 22 January 2018 at 11:29, Steven Vacaroaia <ste...@gmail.com> wrote:
>>
>>> Hi David,
>>>
>>> Yes, I meant no separate partitions for WAL and DB
>>>
>>> I am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 miimon=100
>>> xmit_hash_policy=1 lacp_rate=1")  for cluster and 1 x 1GB for public
>>> Disks are
>>> Vendor Id          : TOSHIBA
>>>                 Product Id         : PX05SMB040Y
>>>                 State              : Online
>>>                 Disk Type          : SAS,Solid State Device
>>>                 Capacity           : 372.0 GB
>>>
>>>
>>> On 22 January 2018 at 11:24, David Turner <drakonst...@gmail.com> wrote:
>>>
>>>> Disk models, other hardware information including CPU, network config?
>>>> You say you're using Luminous, but then say journal on same device.  I'm
>>>> assuming you mean that you just have the bluestore OSD configured without a
>>>> separate WAL or DB partition?  Any more specifics you can give will be
>>>> helpful.
>>>>
>>>> On Mon, Jan 22, 2018 at 11:20 AM Steven Vacaroaia <ste...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'll appreciate if you can provide some guidance / suggestions
>>>>> regarding perfomance issues on a test cluster ( 3 x DELL R620, 1 
>>>>> Entreprise
>>>>> SSD, 3 x 600 GB ,Entreprise HDD, 8 cores, 64 GB RAM)
>>>>>
>>>>> I created 2 pools ( replication factor 2) one with only SSD and the
>>>>> other with only HDD
>>>>> ( journal on same disk for both)
>>>>>
>>>>> The perfomance is quite similar although I was expecting to be at
>>>>> least 5 times better
>>>>> No issues noticed using atop
>>>>>
>>>>> What  should I check / tune ?
>>>>>
>>>>> Many thanks
>>>>> Steven
>>>>>
>>>>>
>>>>>
>>>>> HDD based pool ( journal on the same disk)
>>>>>
>>>>> ceph osd pool get scbench256 all
>>>>>
>>>>> size: 2
>>>>> min_size: 1
>>>>> crash_replay_interval: 0
>>>>> pg_num: 256
>>>>> pgp_num: 256
>>>>> crush_rule: replicated_rule
>>>>> hashpspool: true
>>>>> nodelete: false
>>>>> nopgchange: false
>>>>> nosizechange: false
>>>>> write_fadvise_dontneed: false
>>>>> noscrub: false
>>>>> nodeep-scrub: false
>>>>> use_gmt_hitset: 1
>>>>> auid: 0
>>>>> fast_read: 0
>>>>>
>>>>>
>>>>> rbd bench --io-type write  image1 --pool=scbench256
>>>>> bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern
>>>>> sequential
>>>>>   SEC       OPS   OPS/SEC   BYTES/SEC
>>>>>     1     46816  46836.46  191842139.78
>>>>>     2     90658  45339.11  185709011.80
>>>>>     3    133671  44540.80  182439126.08
>>>>>     4    177341  44340.36  181618100.14
>>>>>     5    217300  43464.04  178028704.54
>>>>>     6    259595  42555.85  174308767.05
>>>>> elapsed:     6  ops:   262144  ops/sec: 42694.50  bytes/sec:
>>>>> 174876688.23
>>>>>
>>>>> fio /home/cephuser/write_256.fio
>>>>> write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
>>>>> iodepth=32
>>>>> fio-2.2.8
>>>>> Starting 1 process
>>>>> rbd engine: RBD version: 1.12.0
>>>>> Jobs: 1 (f=1): [r(1)] [100.0% done] [66284KB/0KB/0KB /s] [16.6K/0/0
>>>>> iops] [eta 00m:00s]
>>>>>
>>>>>
>>>>> fio /home/cephuser/write_256.fio
>>>>> write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
>>>>> iodepth=32
>>>>> fio-2.2.8
>>>>> Starting 1 process
>>>>> rbd engine: RBD version: 1.12.0
>>>>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/14464KB/0KB /s] [0/3616/0
>>>>> iops] [eta 00m:00s]
>>>>>
>>>>>
>>>>> SSD based pool
>>>>>
>>>>>
>>>>> ceph osd pool get ssdpool all
>>>>>
>>>>> size: 2
>>>>> min_size: 1
>>>>> crash_replay_interval: 0
>>>>> pg_num: 128
>>>>> pgp_num: 128
>>>>> crush_rule: ssdpool
>>>>> hashpspool: true
>>>>> nodelete: false
>>>>> nopgchange: false
>>>>> nosizechange: false
>>>>> write_fadvise_dontneed: false
>>>>> noscrub: false
>>>>> nodeep-scrub: false
>>>>> use_gmt_hitset: 1
>>>>> auid: 0
>>>>> fast_read: 0
>>>>>
>>>>>  rbd -p ssdpool create --size 52100 image2
>>>>>
>>>>> rbd bench --io-type write  image2 --pool=ssdpool
>>>>> bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern
>>>>> sequential
>>>>>   SEC       OPS   OPS/SEC   BYTES/SEC
>>>>>     1     42412  41867.57  171489557.93
>>>>>     2     78343  39180.86  160484805.88
>>>>>     3    118082  39076.48  160057256.16
>>>>>     4    155164  38683.98  158449572.38
>>>>>     5    192825  38307.59  156907885.84
>>>>>     6    230701  37716.95  154488608.16
>>>>> elapsed:     7  ops:   262144  ops/sec: 36862.89  bytes/sec:
>>>>> 150990387.29
>>>>>
>>>>>
>>>>> [root@osd01 ~]# fio /home/cephuser/write_256.fio
>>>>> write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
>>>>> iodepth=32
>>>>> fio-2.2.8
>>>>> Starting 1 process
>>>>> rbd engine: RBD version: 1.12.0
>>>>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/20224KB/0KB /s] [0/5056/0
>>>>> iops] [eta 00m:00s]
>>>>>
>>>>>
>>>>> fio /home/cephuser/write_256.fio
>>>>> write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
>>>>> iodepth=32
>>>>> fio-2.2.8
>>>>> Starting 1 process
>>>>> rbd engine: RBD version: 1.12.0
>>>>> Jobs: 1 (f=1): [r(1)] [100.0% done] [76096KB/0KB/0KB /s] [19.3K/0/0
>>>>> iops] [eta 00m:00s]
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous - bad performance

Reply via email to