I'm not speaking to anything other than your configuration. "I am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 miimon=100 xmit_hash_policy=1 lacp_rate=1") for cluster and 1 x 1GB for public" It might not be a bad idea for you to forgo the public network on the 1Gb interfaces and either put everything on one network or use VLANs on the 10Gb connections. I lean more towards that in particular because your public network doesn't have a bond on it. Just as a note, communication between the OSDs and the MONs are all done on the public network. If that interface goes down, then the OSDs are likely to be marked down/out from your cluster. I'm a fan of VLANs, but if you don't have the equipment or expertise to go that route, then just using the same subnet for public and private is a decent way to go.
On Mon, Jan 22, 2018 at 11:37 AM Steven Vacaroaia <ste...@gmail.com> wrote: > I did test with rados bench ..here are the results > > rados bench -p ssdpool 300 -t 12 write --no-cleanup && rados bench -p > ssdpool 300 -t 12 seq > > Total time run: 300.322608 > Total writes made: 10632 > Write size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 141.608 > Stddev Bandwidth: 74.1065 > Max bandwidth (MB/sec): 264 > Min bandwidth (MB/sec): 0 > Average IOPS: 35 > Stddev IOPS: 18 > Max IOPS: 66 > Min IOPS: 0 > Average Latency(s): 0.33887 > Stddev Latency(s): 0.701947 > Max latency(s): 9.80161 > Min latency(s): 0.015171 > > Total time run: 300.829945 > Total reads made: 10070 > Read size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 133.896 > Average IOPS: 33 > Stddev IOPS: 14 > Max IOPS: 68 > Min IOPS: 3 > Average Latency(s): 0.35791 > Max latency(s): 4.68213 > Min latency(s): 0.0107572 > > > rados bench -p scbench256 300 -t 12 write --no-cleanup && rados bench -p > scbench256 300 -t 12 seq > > Total time run: 300.747004 > Total writes made: 10239 > Write size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 136.181 > Stddev Bandwidth: 75.5 > Max bandwidth (MB/sec): 272 > Min bandwidth (MB/sec): 0 > Average IOPS: 34 > Stddev IOPS: 18 > Max IOPS: 68 > Min IOPS: 0 > Average Latency(s): 0.352339 > Stddev Latency(s): 0.72211 > Max latency(s): 9.62304 > Min latency(s): 0.00936316 > hints = 1 > > > Total time run: 300.610761 > Total reads made: 7628 > Read size: 4194304 > Object size: 4194304 > Bandwidth (MB/sec): 101.5 > Average IOPS: 25 > Stddev IOPS: 11 > Max IOPS: 61 > Min IOPS: 0 > Average Latency(s): 0.472321 > Max latency(s): 15.636 > Min latency(s): 0.0188098 > > > On 22 January 2018 at 11:34, Steven Vacaroaia <ste...@gmail.com> wrote: > >> sorry ..send the message too soon >> Here is more info >> Vendor Id : SEAGATE >> Product Id : ST600MM0006 >> State : Online >> Disk Type : SAS,Hard Disk Device >> Capacity : 558.375 GB >> Power State : Active >> >> ( SSD is in slot 0) >> >> megacli -LDGetProp -Cache -LALL -a0 >> >> Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone, >> Direct, No Write Cache if bad BBU >> Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive, >> Direct, No Write Cache if bad BBU >> Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive, >> Direct, No Write Cache if bad BBU >> Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive, >> Direct, No Write Cache if bad BBU >> Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAdaptive, >> Direct, No Write Cache if bad BBU >> Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAdaptive, >> Direct, No Write Cache if bad BBU >> >> [root@osd01 ~]# megacli -LDGetProp -DskCache -LALL -a0 >> >> Adapter 0-VD 0(target id: 0): Disk Write Cache : Disabled >> Adapter 0-VD 1(target id: 1): Disk Write Cache : Disk's Default >> Adapter 0-VD 2(target id: 2): Disk Write Cache : Disk's Default >> Adapter 0-VD 3(target id: 3): Disk Write Cache : Disk's Default >> Adapter 0-VD 4(target id: 4): Disk Write Cache : Disk's Default >> Adapter 0-VD 5(target id: 5): Disk Write Cache : Disk's Default >> >> >> CPU >> Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz >> >> Centos 7 kernel 3.10.0-693.11.6.el7.x86_64 >> >> sysctl -p >> net.ipv4.tcp_sack = 0 >> net.core.netdev_budget = 600 >> net.ipv4.tcp_window_scaling = 1 >> net.core.rmem_max = 16777216 >> net.core.wmem_max = 16777216 >> net.core.rmem_default = 16777216 >> net.core.wmem_default = 16777216 >> net.core.optmem_max = 40960 >> net.ipv4.tcp_rmem = 4096 87380 16777216 >> net.ipv4.tcp_wmem = 4096 65536 16777216 >> net.ipv4.tcp_syncookies = 0 >> net.core.somaxconn = 1024 >> net.core.netdev_max_backlog = 20000 >> net.ipv4.tcp_max_syn_backlog = 30000 >> net.ipv4.tcp_max_tw_buckets = 2000000 >> net.ipv4.tcp_tw_reuse = 1 >> net.ipv4.tcp_slow_start_after_idle = 0 >> net.ipv4.conf.all.send_redirects = 0 >> net.ipv4.conf.all.accept_redirects = 0 >> net.ipv4.conf.all.accept_source_route = 0 >> vm.min_free_kbytes = 262144 >> vm.swappiness = 0 >> vm.vfs_cache_pressure = 100 >> fs.suid_dumpable = 0 >> kernel.core_uses_pid = 1 >> kernel.msgmax = 65536 >> kernel.msgmnb = 65536 >> kernel.randomize_va_space = 1 >> kernel.sysrq = 0 >> kernel.pid_max = 4194304 >> fs.file-max = 100000 >> >> >> ceph.conf >> >> >> public_network = 10.10.30.0/24 >> cluster_network = 192.168.0.0/24 >> >> >> osd_op_num_threads_per_shard = 2 >> osd_op_num_shards = 25 >> osd_pool_default_size = 2 >> osd_pool_default_min_size = 1 # Allow writing 1 copy in a degraded state >> osd_pool_default_pg_num = 256 >> osd_pool_default_pgp_num = 256 >> osd_crush_chooseleaf_type = 1 >> osd_scrub_load_threshold = 0.01 >> osd_scrub_min_interval = 137438953472 >> osd_scrub_max_interval = 137438953472 >> osd_deep_scrub_interval = 137438953472 >> osd_max_scrubs = 16 >> osd_op_threads = 8 >> osd_max_backfills = 1 >> osd_recovery_max_active = 1 >> osd_recovery_op_priority = 1 >> >> >> >> >> debug_lockdep = 0/0 >> debug_context = 0/0 >> debug_crush = 0/0 >> debug_buffer = 0/0 >> debug_timer = 0/0 >> debug_filer = 0/0 >> debug_objecter = 0/0 >> debug_rados = 0/0 >> debug_rbd = 0/0 >> debug_journaler = 0/0 >> debug_objectcatcher = 0/0 >> debug_client = 0/0 >> debug_osd = 0/0 >> debug_optracker = 0/0 >> debug_objclass = 0/0 >> debug_filestore = 0/0 >> debug_journal = 0/0 >> debug_ms = 0/0 >> debug_monc = 0/0 >> debug_tp = 0/0 >> debug_auth = 0/0 >> debug_finisher = 0/0 >> debug_heartbeatmap = 0/0 >> debug_perfcounter = 0/0 >> debug_asok = 0/0 >> debug_throttle = 0/0 >> debug_mon = 0/0 >> debug_paxos = 0/0 >> debug_rgw = 0/0 >> >> >> [mon] >> mon_allow_pool_delete = true >> >> [osd] >> osd_heartbeat_grace = 20 >> osd_heartbeat_interval = 5 >> bluestore_block_db_size = 16106127360 <(610)%20612-7360> >> bluestore_block_wal_size = 1073741824 >> >> [osd.6] >> host = osd01 >> osd_journal = >> /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.1d58775a-5019-42ea-8149-a126f51a2501 >> crush_location = root=ssds host=osd01-ssd >> >> [osd.7] >> host = osd02 >> osd_journal = >> /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.683dc52d-5d69-4ff0-b5d9-b17056a55681 >> crush_location = root=ssds host=osd02-ssd >> >> [osd.8] >> host = osd04 >> osd_journal = >> /dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.bd7c0088-b724-441e-9b88-9457305c541d >> crush_location = root=ssds host=osd04-ssd >> >> >> On 22 January 2018 at 11:29, Steven Vacaroaia <ste...@gmail.com> wrote: >> >>> Hi David, >>> >>> Yes, I meant no separate partitions for WAL and DB >>> >>> I am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 miimon=100 >>> xmit_hash_policy=1 lacp_rate=1") for cluster and 1 x 1GB for public >>> Disks are >>> Vendor Id : TOSHIBA >>> Product Id : PX05SMB040Y >>> State : Online >>> Disk Type : SAS,Solid State Device >>> Capacity : 372.0 GB >>> >>> >>> On 22 January 2018 at 11:24, David Turner <drakonst...@gmail.com> wrote: >>> >>>> Disk models, other hardware information including CPU, network config? >>>> You say you're using Luminous, but then say journal on same device. I'm >>>> assuming you mean that you just have the bluestore OSD configured without a >>>> separate WAL or DB partition? Any more specifics you can give will be >>>> helpful. >>>> >>>> On Mon, Jan 22, 2018 at 11:20 AM Steven Vacaroaia <ste...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I'll appreciate if you can provide some guidance / suggestions >>>>> regarding perfomance issues on a test cluster ( 3 x DELL R620, 1 >>>>> Entreprise >>>>> SSD, 3 x 600 GB ,Entreprise HDD, 8 cores, 64 GB RAM) >>>>> >>>>> I created 2 pools ( replication factor 2) one with only SSD and the >>>>> other with only HDD >>>>> ( journal on same disk for both) >>>>> >>>>> The perfomance is quite similar although I was expecting to be at >>>>> least 5 times better >>>>> No issues noticed using atop >>>>> >>>>> What should I check / tune ? >>>>> >>>>> Many thanks >>>>> Steven >>>>> >>>>> >>>>> >>>>> HDD based pool ( journal on the same disk) >>>>> >>>>> ceph osd pool get scbench256 all >>>>> >>>>> size: 2 >>>>> min_size: 1 >>>>> crash_replay_interval: 0 >>>>> pg_num: 256 >>>>> pgp_num: 256 >>>>> crush_rule: replicated_rule >>>>> hashpspool: true >>>>> nodelete: false >>>>> nopgchange: false >>>>> nosizechange: false >>>>> write_fadvise_dontneed: false >>>>> noscrub: false >>>>> nodeep-scrub: false >>>>> use_gmt_hitset: 1 >>>>> auid: 0 >>>>> fast_read: 0 >>>>> >>>>> >>>>> rbd bench --io-type write image1 --pool=scbench256 >>>>> bench type write io_size 4096 io_threads 16 bytes 1073741824 pattern >>>>> sequential >>>>> SEC OPS OPS/SEC BYTES/SEC >>>>> 1 46816 46836.46 191842139.78 >>>>> 2 90658 45339.11 185709011.80 >>>>> 3 133671 44540.80 182439126.08 >>>>> 4 177341 44340.36 181618100.14 >>>>> 5 217300 43464.04 178028704.54 >>>>> 6 259595 42555.85 174308767.05 >>>>> elapsed: 6 ops: 262144 ops/sec: 42694.50 bytes/sec: >>>>> 174876688.23 >>>>> >>>>> fio /home/cephuser/write_256.fio >>>>> write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, >>>>> iodepth=32 >>>>> fio-2.2.8 >>>>> Starting 1 process >>>>> rbd engine: RBD version: 1.12.0 >>>>> Jobs: 1 (f=1): [r(1)] [100.0% done] [66284KB/0KB/0KB /s] [16.6K/0/0 >>>>> iops] [eta 00m:00s] >>>>> >>>>> >>>>> fio /home/cephuser/write_256.fio >>>>> write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, >>>>> iodepth=32 >>>>> fio-2.2.8 >>>>> Starting 1 process >>>>> rbd engine: RBD version: 1.12.0 >>>>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/14464KB/0KB /s] [0/3616/0 >>>>> iops] [eta 00m:00s] >>>>> >>>>> >>>>> SSD based pool >>>>> >>>>> >>>>> ceph osd pool get ssdpool all >>>>> >>>>> size: 2 >>>>> min_size: 1 >>>>> crash_replay_interval: 0 >>>>> pg_num: 128 >>>>> pgp_num: 128 >>>>> crush_rule: ssdpool >>>>> hashpspool: true >>>>> nodelete: false >>>>> nopgchange: false >>>>> nosizechange: false >>>>> write_fadvise_dontneed: false >>>>> noscrub: false >>>>> nodeep-scrub: false >>>>> use_gmt_hitset: 1 >>>>> auid: 0 >>>>> fast_read: 0 >>>>> >>>>> rbd -p ssdpool create --size 52100 image2 >>>>> >>>>> rbd bench --io-type write image2 --pool=ssdpool >>>>> bench type write io_size 4096 io_threads 16 bytes 1073741824 pattern >>>>> sequential >>>>> SEC OPS OPS/SEC BYTES/SEC >>>>> 1 42412 41867.57 171489557.93 >>>>> 2 78343 39180.86 160484805.88 >>>>> 3 118082 39076.48 160057256.16 >>>>> 4 155164 38683.98 158449572.38 >>>>> 5 192825 38307.59 156907885.84 >>>>> 6 230701 37716.95 154488608.16 >>>>> elapsed: 7 ops: 262144 ops/sec: 36862.89 bytes/sec: >>>>> 150990387.29 >>>>> >>>>> >>>>> [root@osd01 ~]# fio /home/cephuser/write_256.fio >>>>> write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, >>>>> iodepth=32 >>>>> fio-2.2.8 >>>>> Starting 1 process >>>>> rbd engine: RBD version: 1.12.0 >>>>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/20224KB/0KB /s] [0/5056/0 >>>>> iops] [eta 00m:00s] >>>>> >>>>> >>>>> fio /home/cephuser/write_256.fio >>>>> write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, >>>>> iodepth=32 >>>>> fio-2.2.8 >>>>> Starting 1 process >>>>> rbd engine: RBD version: 1.12.0 >>>>> Jobs: 1 (f=1): [r(1)] [100.0% done] [76096KB/0KB/0KB /s] [19.3K/0/0 >>>>> iops] [eta 00m:00s] >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@lists.ceph.com >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>> >>> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com