Re: [ceph-users] KVM problems when rebalance occurs
Hi, benchmarking is done via fio and different blocksizes. I compared with benchmarks I did before the ceph.conf change and encountered very similar numbers. Thanks for the hint with mysql benchmarking. I will try it out. Cheers Nick On Friday, January 08, 2016 06:59:13 AM Josef Johansson wrote: > Hi, > > How did you benchmark? > > I would recommend to have a lot of mysql with a lot of innodb tables that > are utilised heavily. During a recover you should see the latency rise at > least. Maybe using one of the tools here > https://dev.mysql.com/downloads/benchmarks.html > > Regards, > Josef > > On 7 Jan 2016 16:36, "Robert LeBlanc" wrote: > > With these min,max settings, we didn't have any problem going to more > > backfills. > > > > Robert LeBlanc > > > > Sent from a mobile device please excuse any typos. > > > > On Jan 7, 2016 8:30 AM, "nick" wrote: > >> Heya, > >> thank you for your answers. We will try to set 16/32 as values for > >> osd_backfill_scan_[min|max]. I also set the debug logging config. Here is > >> an > >> excerpt of our new ceph.conf: > >> > >> """ > >> [osd] > >> osd max backfills = 1 > >> osd backfill scan max = 32 > >> osd backfill scan min = 16 > >> osd recovery max active = 1 > >> osd recovery op priority = 1 > >> osd op threads = 8 > >> > >> [global] > >> debug optracker = 0/0 > >> debug asok = 0/0 > >> debug hadoop = 0/0 > >> debug mds migrator = 0/0 > >> debug objclass = 0/0 > >> debug paxos = 0/0 > >> debug context = 0/0 > >> debug objecter = 0/0 > >> debug mds balancer = 0/0 > >> debug finisher = 0/0 > >> debug auth = 0/0 > >> debug buffer = 0/0 > >> debug lockdep = 0/0 > >> debug mds log = 0/0 > >> debug heartbeatmap = 0/0 > >> debug journaler = 0/0 > >> debug mon = 0/0 > >> debug client = 0/0 > >> debug mds = 0/0 > >> debug throttle = 0/0 > >> debug journal = 0/0 > >> debug crush = 0/0 > >> debug objectcacher = 0/0 > >> debug filer = 0/0 > >> debug perfcounter = 0/0 > >> debug filestore = 0/0 > >> debug rgw = 0/0 > >> debug monc = 0/0 > >> debug rbd = 0/0 > >> debug tp = 0/0 > >> debug osd = 0/0 > >> debug ms = 0/0 > >> debug mds locker = 0/0 > >> debug timer = 0/0 > >> debug mds log expire = 0/0 > >> debug rados = 0/0 > >> debug striper = 0/0 > >> debug rbd replay = 0/0 > >> debug none = 0/0 > >> debug keyvaluestore = 0/0 > >> debug compressor = 0/0 > >> debug crypto = 0/0 > >> debug xio = 0/0 > >> debug civetweb = 0/0 > >> debug newstore = 0/0 > >> """ > >> > >> I already made a benchmark on our staging setup with the new config and > >> fio, but > >> did not really get different results than before. > >> > >> For us it is hardly possible to reproduce the 'stalling' problems on the > >> staging cluster so I will have to wait and test this in production. > >> > >> Does anyone know if 'osd max backfills' > 1 could have an impact as well? > >> The > >> default seems to be 10... > >> > >> Cheers > >> Nick > >> > >> On Wednesday, January 06, 2016 09:17:43 PM Josef Johansson wrote: > >> > Hi, > >> > > >> > Also make sure that you optimize the debug log config. There's a lot on > >> > >> the > >> > >> > ML on how to set them all to low values (0/0). > >> > > >> > Not sure how it's in infernalis but it did a lot in previous versions. > >> > > >> > Regards, > >> > Josef > >> > > >> > On 6 Jan 2016 18:16, "Robert LeBlanc" wrote: > >> > > -BEGIN PGP SIGNED MESSAGE- > >> > > Hash: SHA256 > >> > > > >> > > There has been a lot of "discussion" about osd_backfill_scan[min,max] > >> > > lately. My experience with hammer has been opposite that of what > >> > > people have said before. Increasing those values for us has reduced > >> > > the load of recovery and has prevented a lot of the disruption seen > >> > > in > >> > > our cluster caused by backfilling. It does increase the amount of > >> > > time > >> > > to do the recovery (a new node added to the cluster took about 3-4 > >> > > hours before, now takes about 24 hours). > >> > > > >> > > We are currently using these values and seem to work well for us. > >> > > osd_max_backfills = 1 > >> > > osd_backfill_scan_min = 16 > >> > > osd_recovery_max_active = 1 > >> > > osd_backfill_scan_max = 32 > >> > > > >> > > I would be interested in your results if you try these values. > >> > > -BEGIN PGP SIGNATURE- > >> > > Version: Mailvelope v1.3.2 > >> > > Comment: https://www.mailvelope.com > >> > > > >> > > wsFcBAEBCAAQBQJWjUu/CRDmVDuy+mK58QAArdMQAI+0Er/sdN7TF7knGey2 > >> > > 5wJ6Ie81KJlrt/X9fIMpFdwkU2g5ET+sdU9R2hK4XcBpkonfGvwS8Ctha5Aq > >> > > XOJPrN4bMMeDK9Z4angK86ioLJevTH7tzp3FZL0U4Kbt1s9ZpwF6t+wlvkKl > >> > > mt6Tkj4VKr0917TuXqk58AYiZTYcEjGAb0QUe/gC24yFwZYrPO0vUVb4gmTQ > >> > > klNKAdTinGSn4Ynj+lBsEstWGVlTJiL3FA6xRBTz1BSjb4vtb2SoIFwHlAp+ > >> > > GO+bKSh19YIasXCZfRqC/J2XcNauOIVfb4l4viV23JN2fYavEnLCnJSglYjF > >> > > Rjxr0wK+6NhRl7naJ1yGNtdMkw+h+nu/xsbYhNqT0EVq1d0nhgzh6ZjAhW1w > >> > > oRiHYA4KNn2uWiUgigpISFi4hJSP4CEPToO8jbhXhARs0H6v33oWrR8RYKxO > >> > > dFz+Lxx969rpDkk+1nRks9hTeIF+oFnW7eezSiR6TIL
[ceph-users] can rbd block_name_prefix be changed?
Hi, can rbd block_name_prefix be changed? Is it constant for a rbd image? thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Swift use Rados backend
Hi, How could I use Ceph as Backend for Swift? I follow these git: https://github.com/stackforge/swift-ceph-backend https://github.com/enovance/swiftceph-ansible I try to install manually, but I am stucking in configuring entry for ring. What device I use in 'swift-ring-builder account.builder add z1-10.10.10.53:6002/*sdb1 *100' if I use Rados? Thanks and regards ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs (ceph-fuse) and file-layout: "operation not supported" in a client Ubuntu Trusty
Hi @all, I'm using ceph Infernalis (9.2.0) in the client and cluster side. I have a Ubuntu Trusty client where cephfs is mounted via ceph-fuse and I would like to put a sub-directory of cephfs in a specific pool (a ssd pool). In the cluster, I have: ~# ceph auth get client.cephfs exported keyring for client.cephfs [client.cephfs] key = XX== caps mds = "allow" caps mon = "allow r" caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=cephfsdata, allow rwx pool=poolssd" ~# ceph fs ls name: cephfs, metadata pool: cephfsmetadata, data pools: [cephfsdata poolssd ] Now, in the Ubuntu Trusty client, I have installed the "attr" package and I try this: ~# mkdir /mnt/cephfs/ssd ~# setfattr -n ceph.dir.layout.pool -v poolssd /mnt/cephfs/ssd/ setfattr: /mnt/cephfs/ssd/: Operation not supported ~# getfattr -n ceph.dir.layout /mnt/cephfs/ /mnt/cephfs/: ceph.dir.layout: Operation not supported Here is my fstab line which mount the cephfs: id=cephfs,keyring=/etc/ceph/ceph.client.cephfs.keyring,client_mountpoint=/data1 /mnt/cephfs fuse.ceph noatime,defaults,_netdev 0 0 Where is my problem? Thanks in advance for your help. ;) -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Intel P3700 PCI-e as journal drives?
Hi, I want to start another round of SSD discussion since we are about to buy some new servers for our ceph cluster. We plan to use hosts with 12x 4TB drives and two SSD journals drives. I'm fancying Intel P3700 PCI-e drives, but Sebastien Han's blog does not contain performance data for these drives yet. Is anyone able to share some benchmark results for Intel P3700 PCI-e drives? Best regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph osd tree output
That is not set as far as I can tell. Actually it is strange that I don't see that setting at all. [root@cpn1 ~]# ceph daemon osd.0 config show | grep update | grep crush [root@cpn1 ~]# grep update /etc/ceph/ceph.conf [root@cpn1 ~]# On Fri, Jan 8, 2016 at 1:50 AM Mart van Santen wrote: > > > Hi, > > Do you have by any chance disabled automatic crushmap updates in your ceph > config? > > osd crush update on start = false > > If this is the case, and you move disks around hosts, they won't update > their position/host in the crushmap, even if the crushmap does not reflect > reality. > > Regards, > > Mart > > > > > > On 01/08/2016 02:16 AM, Wade Holler wrote: > > Sure. Apologies for all the text: We have 12 Nodes for OSDs, 15 OSDs per > node, but I will only include a sample: > > ceph osd tree | head -35 > > ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > > -1 130.98450 root default > > -2 5.82153 host cpn1 > > 4 0.72769 osd.4 up 1.0 1.0 > > 14 0.72769 osd.14 up 1.0 1.0 > > 3 0.72769 osd.3 up 1.0 1.0 > > 24 0.72769 osd.24 up 1.0 1.0 > > 5 0.72769 osd.5 up 1.0 1.0 > > 2 0.72769 osd.2 up 1.0 1.0 > > 17 0.72769 osd.17 up 1.0 1.0 > > 69 0.72769 osd.69 up 1.0 1.0 > > -3 6.54922 host cpn3 > > 7 0.72769 osd.7 up 1.0 1.0 > > 8 0.72769 osd.8 up 1.0 1.0 > > 9 0.72769 osd.9 up 1.0 1.0 > > 0 0.72769 osd.0 up 1.0 1.0 > > 28 0.72769 osd.28 up 1.0 1.0 > > 10 0.72769 osd.10 up 1.0 1.0 > > 1 0.72769 osd.1 up 1.0 1.0 > > 6 0.72769 osd.6 up 1.0 1.0 > > 29 0.72769 osd.29 up 1.0 1.0 > > -4 2.91077 host cpn4 > > > Compared with the actual processes that are running: > > > [root@cpx1 ~]# ssh cpn1 ps -ef | grep ceph\-osd > > ceph 92638 1 26 16:19 ?01:00:55 /usr/bin/ceph-osd -f > --cluster ceph --id 6 --setuser ceph --setgroup ceph > > ceph 92667 1 20 16:19 ?00:48:04 /usr/bin/ceph-osd -f > --cluster ceph --id 0 --setuser ceph --setgroup ceph > > ceph 92673 1 18 16:19 ?00:42:48 /usr/bin/ceph-osd -f > --cluster ceph --id 8 --setuser ceph --setgroup ceph > > ceph 92681 1 19 16:19 ?00:45:52 /usr/bin/ceph-osd -f > --cluster ceph --id 7 --setuser ceph --setgroup ceph > > ceph 92701 1 15 16:19 ?00:36:05 /usr/bin/ceph-osd -f > --cluster ceph --id 12 --setuser ceph --setgroup ceph > > ceph 92748 1 14 16:19 ?00:34:07 /usr/bin/ceph-osd -f > --cluster ceph --id 10 --setuser ceph --setgroup ceph > > ceph 92756 1 16 16:19 ?00:38:40 /usr/bin/ceph-osd -f > --cluster ceph --id 9 --setuser ceph --setgroup ceph > > ceph 92758 1 17 16:19 ?00:39:28 /usr/bin/ceph-osd -f > --cluster ceph --id 13 --setuser ceph --setgroup ceph > > ceph 92777 1 19 16:19 ?00:46:17 /usr/bin/ceph-osd -f > --cluster ceph --id 1 --setuser ceph --setgroup ceph > > ceph 92988 1 18 16:19 ?00:42:47 /usr/bin/ceph-osd -f > --cluster ceph --id 5 --setuser ceph --setgroup ceph > > ceph 93058 1 18 16:19 ?00:43:18 /usr/bin/ceph-osd -f > --cluster ceph --id 11 --setuser ceph --setgroup ceph > > ceph 93078 1 17 16:19 ?00:41:38 /usr/bin/ceph-osd -f > --cluster ceph --id 14 --setuser ceph --setgroup ceph > > ceph 93127 1 15 16:19 ?00:36:29 /usr/bin/ceph-osd -f > --cluster ceph --id 4 --setuser ceph --setgroup ceph > > ceph 93130 1 17 16:19 ?00:40:44 /usr/bin/ceph-osd -f > --cluster ceph --id 2 --setuser ceph --setgroup ceph > > ceph 93173 1 21 16:19 ?00:49:37 /usr/bin/ceph-osd -f > --cluster ceph --id 3 --setuser ceph --setgroup ceph > > [root@cpx1 ~]# ssh cpn3 ps -ef | grep ceph\-osd > > ceph 82454 1 18 16:19 ?00:43:58 /usr/bin/ceph-osd -f > --cluster ceph --id 25 --setuser ceph --setgroup ceph > > ceph 82464 1 24 16:19 ?00:55:40 /usr/bin/ceph-osd -f > --cluster ceph --id 21 --setuser ceph --setgroup ceph > > ceph 82473 1 21 16:19 ?00:50:14 /usr/bin/ceph-osd -f > --cluster ceph --id 17 --setuser ceph --setgroup ceph > > ceph 82612 1 19 16:19 ?00:45:25 /usr/bin/ceph-osd -f > --cluster ceph --id 22 --setuser ceph --setgroup ceph > > ceph 82629 1 20 16:19 ?00:48:38 /usr/bin/cep
Re: [ceph-users] Intel P3700 PCI-e as journal drives?
Hi, Quick results for 1/5/10 jobs: # fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.1.3 Starting 1 process Jobs: 1 (f=1): [W] [100.0% done] [0KB/373.2MB/0KB /s] [0/95.6K/0 iops] [eta 00m:00s] journal-test: (groupid=0, jobs=1): err= 0: pid=99634: Fri Jan 8 13:51:53 2016 write: io=21116MB, bw=360373KB/s, iops=90093, runt= 6msec clat (usec): min=7, max=14738, avg=10.79, stdev=29.04 lat (usec): min=7, max=14738, avg=10.84, stdev=29.04 clat percentiles (usec): | 1.00th=[8], 5.00th=[8], 10.00th=[8], 20.00th=[8], | 30.00th=[8], 40.00th=[8], 50.00th=[9], 60.00th=[9], | 70.00th=[9], 80.00th=[ 12], 90.00th=[ 18], 95.00th=[ 22], | 99.00th=[ 34], 99.50th=[ 37], 99.90th=[ 50], 99.95th=[ 54], | 99.99th=[ 72] bw (KB /s): min=192456, max=394392, per=99.97%, avg=360254.66, stdev=46490.05 lat (usec) : 10=73.77%, 20=18.79%, 50=7.33%, 100=0.10%, 250=0.01% lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01% cpu : usr=15.92%, sys=13.08%, ctx=5405192, majf=0, minf=27 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued: total=r=0/w=5405592/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=21116MB, aggrb=360372KB/s, minb=360372KB/s, maxb=360372KB/s, mint=6msec, maxt=6msec Disk stats (read/write): nvme0n1: ios=0/5397207, merge=0/0, ticks=0/42596, in_queue=42596, util=71.01% # fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4k --numjobs=5 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 ... journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.1.3 Starting 5 processes Jobs: 5 (f=5): [W] [100.0% done] [0KB/1023MB/0KB /s] [0/262K/0 iops] [eta 00m:00s] journal-test: (groupid=0, jobs=5): err= 0: pid=99932: Fri Jan 8 13:57:07 2016 write: io=57723MB, bw=985120KB/s, iops=246279, runt= 60001msec clat (usec): min=7, max=23102, avg=20.00, stdev=78.26 lat (usec): min=7, max=23102, avg=20.05, stdev=78.26 clat percentiles (usec): | 1.00th=[8], 5.00th=[9], 10.00th=[ 10], 20.00th=[ 12], | 30.00th=[ 14], 40.00th=[ 15], 50.00th=[ 16], 60.00th=[ 18], | 70.00th=[ 21], 80.00th=[ 25], 90.00th=[ 29], 95.00th=[ 36], | 99.00th=[ 62], 99.50th=[ 77], 99.90th=[ 193], 99.95th=[ 612], | 99.99th=[ 1816] bw (KB /s): min=139512, max=225144, per=19.99%, avg=196941.33, stdev=20911.73 lat (usec) : 10=6.84%, 20=59.99%, 50=31.33%, 100=1.61%, 250=0.14% lat (usec) : 500=0.03%, 750=0.02%, 1000=0.01% lat (msec) : 2=0.02%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01% cpu : usr=8.79%, sys=7.32%, ctx=14776785, majf=0, minf=138 IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued: total=r=0/w=14777043/d=0, short=r=0/w=0/d=0 Run status group 0 (all jobs): WRITE: io=57723MB, aggrb=985119KB/s, minb=985119KB/s, maxb=985119KB/s, mint=60001msec, maxt=60001msec Disk stats (read/write): nvme0n1: ios=0/14754265, merge=0/0, ticks=0/253092, in_queue=254880, util=100.00% # fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4k --numjobs=10 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 ... journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1 fio-2.1.3 Starting 10 processes Jobs: 10 (f=10): [WW] [100.0% done] [0KB/1026MB/0KB /s] [0/263K/0 iops] [eta 00m:00s] journal-test: (groupid=0, jobs=10): err= 0: pid=14: Fri Jan 8 13:58:24 2016 write: io=65679MB, bw=1094.7MB/s, iops=280224, runt= 60001msec clat (usec): min=7, max=23679, avg=35.33, stdev=118.33 lat (usec): min=7, max=23679, avg=35.39, stdev=118.34 clat percentiles (usec): | 1.00th=[8], 5.00th=[9], 10.00th=[ 10], 20.00th=[ 12], | 30.00th=[ 14], 40.00th=[ 17], 50.00th=[ 22], 60.00th=[ 27], | 70.00th=[ 33], 80.00th=[ 45], 90.00th=[ 68], 95.00th=[ 90], | 99.00th=[ 167], 99.50th=[ 231], 99.90th=[ 1064], 99.95th=[ 1528], | 99.99th=[ 2416] bw (KB /s): min=66600, max=141064, per=10.01%, avg=112165.00, stdev=16560.67 lat (usec) : 10=6.54%, 20=38.42%, 50=37.34%, 100=1
[ceph-users] pg is stuck stale (osd.21 still removed)
Hi, we had a HW-problem with OSD.21 today. The OSD daemon was down and "smartctl" told me about some hardware errors. I decided to remove the HDD: ceph osd out 21 ceph osd crush remove osd.21 ceph auth del osd.21 ceph osd rm osd.21 But afterwards I saw that I have some stucked pg's for osd.21: root@ceph-admin:~# ceph -w cluster c7b12656-15a6-41b0-963f-4f47c62497dc health HEALTH_WARN 50 pgs stale 50 pgs stuck stale monmap e4: 3 mons at {ceph-mon1=192.168.135.31:6789/0,ceph-mon2=192.168.135.32:6789/0,ceph-mon3=192.168.135.33:6789/0} election epoch 404, quorum 0,1,2 ceph-mon1,ceph-mon2,ceph-mon3 mdsmap e136: 1/1/1 up {0=ceph-mon1=up:active} osdmap e18259: 23 osds: 23 up, 23 in pgmap v47879105: 6656 pgs, 10 pools, 23481 GB data, 6072 kobjects 54974 GB used, 30596 GB / 85571 GB avail 6605 active+clean 50 stale+active+clean 1 active+clean+scrubbing+deep root@ceph-admin:~# ceph health HEALTH_WARN 50 pgs stale; 50 pgs stuck stale root@ceph-admin:~# ceph health detail HEALTH_WARN 50 pgs stale; 50 pgs stuck stale; noout flag(s) set pg 34.225 is stuck stale for 98780.399254, current state stale+active+clean, last acting [21] pg 34.186 is stuck stale for 98780.399195, current state stale+active+clean, last acting [21] ... root@ceph-admin:~# ceph pg 34.225 query Error ENOENT: i don't have pgid 34.225 root@ceph-admin:~# ceph pg 34.225 list_missing Error ENOENT: i don't have pgid 34.225 root@ceph-admin:~# ceph osd lost 21 --yes-i-really-mean-it osd.21 is not down or doesn't exist # checking the crushmap ceph osd getcrushmap -o crush.map crushtool -d crush.map -o crush.txt root@ceph-admin:~# grep 21 crush.txt -> nothing here Of course, I cannot start OSD.21, because it's not available anymore - I removed it. Is there a way to remap the stucked pg's to other OSD's than osd.21? How can I help my cluster (ceph 0.94.2)? best regards Danny smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs (ceph-fuse) and file-layout: "operation not supported" in a client Ubuntu Trusty
Hi, Some news... On 08/01/2016 12:42, Francois Lafont wrote: > ~# mkdir /mnt/cephfs/ssd > > ~# setfattr -n ceph.dir.layout.pool -v poolssd /mnt/cephfs/ssd/ > setfattr: /mnt/cephfs/ssd/: Operation not supported > > ~# getfattr -n ceph.dir.layout /mnt/cephfs/ > /mnt/cephfs/: ceph.dir.layout: Operation not supported > > Here is my fstab line which mount the cephfs: > > id=cephfs,keyring=/etc/ceph/ceph.client.cephfs.keyring,client_mountpoint=/data1 > /mnt/cephfs fuse.ceph noatime,defaults,_netdev 0 0 In fact, I have retried the same thing without the "noatime" mount option and after that it worked. Then I have retried _with_ the "noatime" to be sure and... it worked too. Now, it just works with or witout the option. So I have 2 possible explanations: 1. The fact to remove noatime and mount just once has unblocked something... 2. or I have another explanation terrible for me. Maybe during my first attempt, the cephfs was just not mounted in fact. Indeed, now I have a doubt on this point because few minutes after the attempt I have seen that the cephfs was not mounted (and I don't know why). -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] can rbd block_name_prefix be changed?
It's constant for an RBD image and is tied to the image's internal unique ID. -- Jason Dillaman - Original Message - > From: "min fang" > To: "ceph-users" > Sent: Friday, January 8, 2016 4:50:08 AM > Subject: [ceph-users] can rbd block_name_prefix be changed? > Hi, can rbd block_name_prefix be changed? Is it constant for a rbd image? > thanks. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] using cache-tier with writeback mode, raods bench result degrade
Hi,guyes Recentlly,I am testing cache-tier using writeback mode.but I found a strange things. the performance using rados bench degrade.Is it correct? If so,how to explain.following some info about my test: storage node:4 machine,two INTEL SSDSC2BB120G4(one for systaem,the other one used as OSD),four sata as OSD. before using cache-tier: root@ceph1:~# rados bench -p coldstorage 300 write --no-cleanup Total time run: 301.236355 Total writes made: 6041 Write size: 4194304 Bandwidth (MB/sec): 80.216 Stddev Bandwidth: 10.5358 Max bandwidth (MB/sec): 104 Min bandwidth (MB/sec): 0 Average Latency:0.797838 Stddev Latency: 0.619098 Max latency:4.89823 Min latency:0.158543 root@ceph1:/root/cluster# rados bench -p coldstorage 300 seq Total time run:133.563980 Total reads made: 6041 Read size:4194304 Bandwidth (MB/sec):180.917 Average Latency: 0.353559 Max latency: 1.83356 Min latency: 0.027878 after configure cache-tier: root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier add coldstorage hotstorage pool 'hotstorage' is now (or already was) a tier of 'coldstorage' root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier cache-mode hotstorage writeback set cache-mode for pool 'hotstorage' to writeback root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier set-overlay coldstorage hotstorage overlay for 'coldstorage' is now (or already was) 'hotstorage' oot@ubuntu:~# ceph osd dump|grep storage pool 6 'coldstorage' replicated size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 216 lfor 216 flags hashpspool tiers 7 read_tier 7 write_tier 7 stripe_width 0 pool 7 'hotstorage' replicated size 3 min_size 1 crush_ruleset 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 228 flags hashpspool,incomplete_clones tier_of 6 cache_mode writeback target_bytes 1000 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 3600s x6 stripe_width 0 - rados bench -p coldstorage 300 write --no-cleanup Total time run: 302.207573 Total writes made: 4315 Write size: 4194304 Bandwidth (MB/sec): 57.113 Stddev Bandwidth: 23.9375 Max bandwidth (MB/sec): 104 Min bandwidth (MB/sec): 0 Average Latency: 1.1204 Stddev Latency: 0.717092 Max latency: 6.97288 Min latency: 0.158371 root@ubuntu:/# rados bench -p coldstorage 300 seq Total time run: 153.869741 Total reads made: 4315 Read size: 4194304 Bandwidth (MB/sec): 112.173 Average Latency: 0.570487 Max latency: 1.75137 Min latency: 0.039635 ceph.conf: [global] fsid = 4ec1eb64-226c-4d90-8c5c-b6b6644be831 mon_initial_members = ceph2, ceph3, ceph4 mon_host = 10.**.**.241,10.**.**.242,10.**.**.243 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx filestore_xattr_use_omap = true osd_pool_default_size = 3 osd_pool_default_min_size = 1 auth_supported = cephx osd_journal_size = 10240 osd_mkfs_type = xfs osd crush update on start = false [client] rbd_cache = true rbd_cache_writethrough_until_flush = false rbd_cache_size = 33554432 rbd_cache_max_dirty = 25165824 rbd_cache_target_dirty = 16777216 rbd_cache_max_dirty_age = 1 rbd_cache_block_writes_upfront = false [osd] filestore_omap_header_cache_size = 4 filestore_fd_cache_size = 4 filestore_fiemap = true client_readahead_min = 2097152 client_readahead_max_bytes = 0 client_readahead_max_periods = 4 filestore_journal_writeahead = false filestore_max_sync_interval = 10 filestore_queue_max_ops = 500 filestore_queue_max_bytes = 1048576000 filestore_queue_committing_max_ops = 5000 filestore_queue_committing_max_bytes = 1048576000 keyvaluestore_queue_max_ops = 500 keyvaluestore_queue_max_bytes = 1048576000 journal_queue_max_ops = 3 journal_queue_max_bytes = 3355443200 osd_op_threads = 20 osd_disk_threads = 8 filestore_op_threads = 4 osd_mount_options_xfs = rw,noatime,nobarrier,inode64,logbsize=256k,delaylog [mon] mon_osd_allow_primary_affinity=true -- 使用Opera的电子邮件客户端:http://www.opera.com/mail/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph osd tree output
Yeah,this setting can not see in asok config. You just set it in ceph.conf and restart mon and osd service(sorry I forget if these restart is necessary) what I use this config is when I changed crushmap manually,and I do not want the service init script to rebuild crushmap as default way. maybe this is not siut for your problem.just have a try. 在 Fri, 08 Jan 2016 21:51:32 +0800,Wade Holler 写道: That is not set as far as I can tell. Actually it is strange that I don't see that setting at all. [root@cpn1 ~]# ceph daemon osd.0 config show | grep update | grep crush [root@cpn1 ~]# grep update /etc/ceph/ceph.conf [root@cpn1 ~]# On Fri, Jan 8, 2016 at 1:50 AM Mart van Santen wrote: Hi, Do you have by any chance disabled automatic crushmap updates in your ceph config? osd crush update on start = false If this is the case, and you move disks around hosts, they won't update their position/host in the crushmap, even if the crushmap does not reflect reality. Regards, Mart On 01/08/2016 02:16 AM, Wade Holler wrote: Sure. Apologies for all the text: We have 12 Nodes for OSDs, 15 OSDs per node, but I will only include a sample: ceph osd tree | head -35 ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 130.98450 root default -2 5.82153 host cpn1 4 0.72769 osd.4 up 1.0 1.0 14 0.72769 osd.14 up 1.0 1.0 3 0.72769 osd.3 up 1.0 1.0 24 0.72769 osd.24 up 1.0 1.0 5 0.72769 osd.5 up 1.0 1.0 2 0.72769 osd.2 up 1.0 1.0 17 0.72769 osd.17 up 1.0 1.0 69 0.72769 osd.69 up 1.0 1.0 -3 6.54922 host cpn3 7 0.72769 osd.7 up 1.0 1.0 8 0.72769 osd.8 up 1.0 1.0 9 0.72769 osd.9 up 1.0 1.0 0 0.72769 osd.0 up 1.0 1.0 28 0.72769 osd.28 up 1.0 1.0 10 0.72769 osd.10 up 1.0 1.0 1 0.72769 osd.1 up 1.0 1.0 6 0.72769 osd.6 up 1.0 1.0 29 0.72769 osd.29 up 1.0 1.0 -4 2.91077 host cpn4 Compared with the actual processes that are running: [root@cpx1 ~]# ssh cpn1 ps -ef | grep ceph\-osd ceph 92638 1 26 16:19 ?01:00:55 /usr/bin/ceph-osd -f --cluster ceph --id 6 --setuser ceph --setgroup ceph ceph 92667 1 20 16:19 ?00:48:04 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph ceph 92673 1 18 16:19 ?00:42:48 /usr/bin/ceph-osd -f --cluster ceph --id 8 --setuser ceph --setgroup ceph ceph 92681 1 19 16:19 ?00:45:52 /usr/bin/ceph-osd -f --cluster ceph --id 7 --setuser ceph --setgroup ceph ceph 92701 1 15 16:19 ?00:36:05 /usr/bin/ceph-osd -f --cluster ceph --id 12 --setuser ceph --setgroup ceph ceph 92748 1 14 16:19 ?00:34:07 /usr/bin/ceph-osd -f --cluster ceph --id 10 --setuser ceph --setgroup ceph ceph 92756 1 16 16:19 ?00:38:40 /usr/bin/ceph-osd -f --cluster ceph --id 9 --setuser ceph --setgroup ceph ceph 92758 1 17 16:19 ?00:39:28 /usr/bin/ceph-osd -f --cluster ceph --id 13 --setuser ceph --setgroup ceph ceph 92777 1 19 16:19 ?00:46:17 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph ceph 92988 1 18 16:19 ?00:42:47 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph ceph 93058 1 18 16:19 ?00:43:18 /usr/bin/ceph-osd -f --cluster ceph --id 11 --setuser ceph --setgroup ceph ceph 93078 1 17 16:19 ?00:41:38 /usr/bin/ceph-osd -f --cluster ceph --id 14 --setuser ceph --setgroup ceph ceph 93127 1 15 16:19 ?00:36:29 /usr/bin/ceph-osd -f --cluster ceph --id 4 --setuser ceph --setgroup ceph ceph 93130 1 17 16:19 ?00:40:44 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph ceph 93173 1 21 16:19 ?00:49:37 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph [root@cpx1 ~]# ssh cpn3 ps -ef | grep ceph\-osd ceph 82454 1 18 16:19 ?00:43:58 /usr/bin/ceph-osd -f --cluster ceph --id 25 --setuser ceph --setgroup ceph ceph 82464 1 24 16:19 ?00:55:40 /usr/bin/ceph-osd -f --
Re: [ceph-users] ceph osd tree output
It is not set in the conf file. So why do I still have this behavior ? On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin wrote: > Yeah,this setting can not see in asok config. > You just set it in ceph.conf and restart mon and osd service(sorry I > forget if these restart is necessary) > > what I use this config is when I changed crushmap manually,and I do not > want the service init script to rebuild crushmap as default way. > > maybe this is not siut for your problem.just have a try. > > 在 Fri, 08 Jan 2016 21:51:32 +0800,Wade Holler 写道: > > That is not set as far as I can tell. Actually it is strange that I don't > see that setting at all. > > [root@cpn1 ~]# ceph daemon osd.0 config show | grep update | grep > crush > > [root@cpn1 ~]# grep update /etc/ceph/ceph.conf > > [root@cpn1 ~]# > > On Fri, Jan 8, 2016 at 1:50 AM Mart van Santen wrote: > >> >> >> Hi, >> >> Do you have by any chance disabled automatic crushmap updates in your >> ceph config? >> >> osd crush update on start = false >> >> If this is the case, and you move disks around hosts, they won't update >> their position/host in the crushmap, even if the crushmap does not reflect >> reality. >> >> Regards, >> >> Mart >> >> >> >> >> >> On 01/08/2016 02:16 AM, Wade Holler wrote: >> >> Sure. Apologies for all the text: We have 12 Nodes for OSDs, 15 OSDs per >> node, but I will only include a sample: >> >> ceph osd tree | head -35 >> >> ID WEIGHTTYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> >> -1 130.98450 root default >> >> -2 5.82153 host cpn1 >> >> 4 0.72769 osd.4 up 1.0 1.0 >> >> 14 0.72769 osd.14 up 1.0 1.0 >> >> 3 0.72769 osd.3 up 1.0 1.0 >> >> 24 0.72769 osd.24 up 1.0 1.0 >> >> 5 0.72769 osd.5 up 1.0 1.0 >> >> 2 0.72769 osd.2 up 1.0 1.0 >> >> 17 0.72769 osd.17 up 1.0 1.0 >> >> 69 0.72769 osd.69 up 1.0 1.0 >> >> -3 6.54922 host cpn3 >> >> 7 0.72769 osd.7 up 1.0 1.0 >> >> 8 0.72769 osd.8 up 1.0 1.0 >> >> 9 0.72769 osd.9 up 1.0 1.0 >> >> 0 0.72769 osd.0 up 1.0 1.0 >> >> 28 0.72769 osd.28 up 1.0 1.0 >> >> 10 0.72769 osd.10 up 1.0 1.0 >> >> 1 0.72769 osd.1 up 1.0 1.0 >> >> 6 0.72769 osd.6 up 1.0 1.0 >> >> 29 0.72769 osd.29 up 1.0 1.0 >> >> -4 2.91077 host cpn4 >> >> >> Compared with the actual processes that are running: >> >> >> [root@cpx1 ~]# ssh cpn1 ps -ef | grep ceph\-osd >> >> ceph 92638 1 26 16:19 ?01:00:55 /usr/bin/ceph-osd -f >> --cluster ceph --id 6 --setuser ceph --setgroup ceph >> >> ceph 92667 1 20 16:19 ?00:48:04 /usr/bin/ceph-osd -f >> --cluster ceph --id 0 --setuser ceph --setgroup ceph >> >> ceph 92673 1 18 16:19 ?00:42:48 /usr/bin/ceph-osd -f >> --cluster ceph --id 8 --setuser ceph --setgroup ceph >> >> ceph 92681 1 19 16:19 ?00:45:52 /usr/bin/ceph-osd -f >> --cluster ceph --id 7 --setuser ceph --setgroup ceph >> >> ceph 92701 1 15 16:19 ?00:36:05 /usr/bin/ceph-osd -f >> --cluster ceph --id 12 --setuser ceph --setgroup ceph >> >> ceph 92748 1 14 16:19 ?00:34:07 /usr/bin/ceph-osd -f >> --cluster ceph --id 10 --setuser ceph --setgroup ceph >> >> ceph 92756 1 16 16:19 ?00:38:40 /usr/bin/ceph-osd -f >> --cluster ceph --id 9 --setuser ceph --setgroup ceph >> >> ceph 92758 1 17 16:19 ?00:39:28 /usr/bin/ceph-osd -f >> --cluster ceph --id 13 --setuser ceph --setgroup ceph >> >> ceph 92777 1 19 16:19 ?00:46:17 /usr/bin/ceph-osd -f >> --cluster ceph --id 1 --setuser ceph --setgroup ceph >> >> ceph 92988 1 18 16:19 ?00:42:47 /usr/bin/ceph-osd -f >> --cluster ceph --id 5 --setuser ceph --setgroup ceph >> >> ceph 93058 1 18 16:19 ?00:43:18 /usr/bin/ceph-osd -f >> --cluster ceph --id 11 --setuser ceph --setgroup ceph >> >> ceph 93078 1 17 16:19 ?00:41:38 /usr/bin/ceph-osd -f >> --cluster ceph --id 14 --setuser ceph --setgroup ceph >> >> ceph 93127 1 15 16:19 ?00:36:29 /usr/bin/ceph-osd -f >> --cluster ceph --id 4 --setuser ceph --setgroup ceph >> >> ceph 93130 1 17 16:19 ?00:40:44 /usr/bin/ceph-osd -f >> --cluster ceph --id 2 --setuser ceph --setgroup ceph >> >> ceph 93173 1 21 16:19 ?00:49:37 /usr/bin/ceph-osd -f >> --cluster ceph --id 3 --set
Re: [ceph-users] using cache-tier with writeback mode, raods bench result degrade
My experience is performance degrades dramatically when dirty objects are flushed. Best Regards, Wade On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin wrote: > Hi,guyes > Recentlly,I am testing cache-tier using writeback mode.but I found a > strange things. > the performance using rados bench degrade.Is it correct? > If so,how to explain.following some info about my test: > > storage node:4 machine,two INTEL SSDSC2BB120G4(one for systaem,the other > one used as OSD),four sata as OSD. > > before using cache-tier: > root@ceph1:~# rados bench -p coldstorage 300 write --no-cleanup > > Total time run: 301.236355 > Total writes made: 6041 > Write size: 4194304 > Bandwidth (MB/sec): 80.216 > > Stddev Bandwidth: 10.5358 > Max bandwidth (MB/sec): 104 > Min bandwidth (MB/sec): 0 > Average Latency:0.797838 > Stddev Latency: 0.619098 > Max latency:4.89823 > Min latency:0.158543 > > root@ceph1:/root/cluster# rados bench -p coldstorage 300 seq > Total time run:133.563980 > Total reads made: 6041 > Read size:4194304 > Bandwidth (MB/sec):180.917 > > Average Latency: 0.353559 > Max latency: 1.83356 > Min latency: 0.027878 > > after configure cache-tier: > root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier add coldstorage > hotstorage > pool 'hotstorage' is now (or already was) a tier of 'coldstorage' > > root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier cache-mode > hotstorage writeback > set cache-mode for pool 'hotstorage' to writeback > > root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier set-overlay > coldstorage hotstorage > overlay for 'coldstorage' is now (or already was) 'hotstorage' > > oot@ubuntu:~# ceph osd dump|grep storage > pool 6 'coldstorage' replicated size 3 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 512 pgp_num 512 last_change 216 lfor 216 flags > hashpspool tiers 7 read_tier 7 write_tier 7 stripe_width 0 > pool 7 'hotstorage' replicated size 3 min_size 1 crush_ruleset 1 > object_hash rjenkins pg_num 128 pgp_num 128 last_change 228 flags > hashpspool,incomplete_clones tier_of 6 cache_mode writeback target_bytes > 1000 hit_set bloom{false_positive_probability: 0.05, target_size: > 0, seed: 0} 3600s x6 stripe_width 0 > - > rados bench -p coldstorage 300 write --no-cleanup > Total time run: 302.207573 > Total writes made: 4315 > Write size: 4194304 > Bandwidth (MB/sec): 57.113 > > Stddev Bandwidth: 23.9375 > Max bandwidth (MB/sec): 104 > Min bandwidth (MB/sec): 0 > Average Latency: 1.1204 > Stddev Latency: 0.717092 > Max latency: 6.97288 > Min latency: 0.158371 > > root@ubuntu:/# rados bench -p coldstorage 300 seq > Total time run: 153.869741 > Total reads made: 4315 > Read size: 4194304 > Bandwidth (MB/sec): 112.173 > > Average Latency: 0.570487 > Max latency: 1.75137 > Min latency: 0.039635 > > > ceph.conf: > > [global] > fsid = 4ec1eb64-226c-4d90-8c5c-b6b6644be831 > mon_initial_members = ceph2, ceph3, ceph4 > mon_host = 10.**.**.241,10.**.**.242,10.**.**.243 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > osd_pool_default_size = 3 > osd_pool_default_min_size = 1 > auth_supported = cephx > osd_journal_size = 10240 > osd_mkfs_type = xfs > osd crush update on start = false > > [client] > rbd_cache = true > rbd_cache_writethrough_until_flush = false > rbd_cache_size = 33554432 > rbd_cache_max_dirty = 25165824 > rbd_cache_target_dirty = 16777216 > rbd_cache_max_dirty_age = 1 > rbd_cache_block_writes_upfront = false > [osd] > filestore_omap_header_cache_size = 4 > filestore_fd_cache_size = 4 > filestore_fiemap = true > client_readahead_min = 2097152 > client_readahead_max_bytes = 0 > client_readahead_max_periods = 4 > filestore_journal_writeahead = false > filestore_max_sync_interval = 10 > filestore_queue_max_ops = 500 > filestore_queue_max_bytes = 1048576000 > filestore_queue_committing_max_ops = 5000 > filestore_queue_committing_max_bytes = 1048576000 > keyvaluestore_queue_max_ops = 500 > keyvaluestore_queue_max_bytes = 1048576000 > journal_queue_max_ops = 3 > journal_queue_max_bytes = 3355443200 > osd_op_threads = 20 > osd_disk_threads = 8 > filestore_op_threads = 4 > osd_mount_options_xfs = rw,noatime,nobarrier,inode64,logbsize=256k,delaylog > > [mon] > mon_osd_allow_primary_affinity=true > > -- > 使用Opera的电子邮件客户端:http://www.opera.com/mail/ > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] using cache-tier with writeback mode, raods bench result degrade
There was/is a bug in Infernalis and older, where objects will always get promoted on the 2nd read/write regardless of what you set the min_recency_promote settings to. This can have a dramatic effect on performance. I wonder if this is what you are experiencing? This has been fixed in Jewel https://github.com/ceph/ceph/pull/6702 . You can compile the changes above to see if it helps or I have a .deb for Infernalis where this is fixed if it's easier. Nick > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Wade Holler > Sent: 08 January 2016 16:14 > To: hnuzhoulin ; ceph-de...@vger.kernel.org > Cc: ceph-us...@ceph.com > Subject: Re: [ceph-users] using cache-tier with writeback mode, raods bench > result degrade > > My experience is performance degrades dramatically when dirty objects are > flushed. > > Best Regards, > Wade > > > On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin wrote: > Hi,guyes > Recentlly,I am testing cache-tier using writeback mode.but I found a > strange things. > the performance using rados bench degrade.Is it correct? > If so,how to explain.following some info about my test: > > storage node:4 machine,two INTEL SSDSC2BB120G4(one for systaem,the > other > one used as OSD),four sata as OSD. > > before using cache-tier: > root@ceph1:~# rados bench -p coldstorage 300 write --no-cleanup > > Total time run: 301.236355 > Total writes made: 6041 > Write size: 4194304 > Bandwidth (MB/sec): 80.216 > > Stddev Bandwidth: 10.5358 > Max bandwidth (MB/sec): 104 > Min bandwidth (MB/sec): 0 > Average Latency:0.797838 > Stddev Latency: 0.619098 > Max latency:4.89823 > Min latency:0.158543 > > root@ceph1:/root/cluster# rados bench -p coldstorage 300 seq > Total time run:133.563980 > Total reads made: 6041 > Read size:4194304 > Bandwidth (MB/sec):180.917 > > Average Latency: 0.353559 > Max latency: 1.83356 > Min latency: 0.027878 > > after configure cache-tier: > root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier add coldstorage > hotstorage > pool 'hotstorage' is now (or already was) a tier of 'coldstorage' > > root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier cache-mode > hotstorage writeback > set cache-mode for pool 'hotstorage' to writeback > > root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier set-overlay > coldstorage hotstorage > overlay for 'coldstorage' is now (or already was) 'hotstorage' > > oot@ubuntu:~# ceph osd dump|grep storage > pool 6 'coldstorage' replicated size 3 min_size 1 crush_ruleset 0 > object_hash rjenkins pg_num 512 pgp_num 512 last_change 216 lfor 216 > flags > hashpspool tiers 7 read_tier 7 write_tier 7 stripe_width 0 > pool 7 'hotstorage' replicated size 3 min_size 1 crush_ruleset 1 > object_hash rjenkins pg_num 128 pgp_num 128 last_change 228 flags > hashpspool,incomplete_clones tier_of 6 cache_mode writeback > target_bytes > 1000 hit_set bloom{false_positive_probability: 0.05, target_size: > 0, seed: 0} 3600s x6 stripe_width 0 > - > rados bench -p coldstorage 300 write --no-cleanup > Total time run: 302.207573 > Total writes made: 4315 > Write size: 4194304 > Bandwidth (MB/sec): 57.113 > > Stddev Bandwidth: 23.9375 > Max bandwidth (MB/sec): 104 > Min bandwidth (MB/sec): 0 > Average Latency: 1.1204 > Stddev Latency: 0.717092 > Max latency: 6.97288 > Min latency: 0.158371 > > root@ubuntu:/# rados bench -p coldstorage 300 seq > Total time run: 153.869741 > Total reads made: 4315 > Read size: 4194304 > Bandwidth (MB/sec): 112.173 > > Average Latency: 0.570487 > Max latency: 1.75137 > Min latency: 0.039635 > > > ceph.conf: > > [global] > fsid = 4ec1eb64-226c-4d90-8c5c-b6b6644be831 > mon_initial_members = ceph2, ceph3, ceph4 > mon_host = 10.**.**.241,10.**.**.242,10.**.**.243 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > osd_pool_default_size = 3 > osd_pool_default_min_size = 1 > auth_supported = cephx > osd_journal_size = 10240 > osd_mkfs_type = xfs > osd crush update on start = false > > [client] > rbd_cache = true > rbd_cache_writethrough_until_flush = false > rbd_cache_size = 33554432 > rbd_cache_max_dirty = 25165824 > rbd_cache_target_dirty = 16777216 > rbd_cache_max_dirty_age = 1 > rbd_cache_block_writes_upfront = false > [osd] > filestore_omap_header_cache_size = 4 > filestore_fd_cache_size = 4 > filestore_fiemap = true > client_readahead_min = 2097152 > client_readahead_max_bytes = 0 > client_readahead_max_periods = 4 > filestore_journal_writeahead = false > filestore_max_sync_interval = 10 > filestore_queue_max_ops = 500 > filestore_queue_max_bytes = 1048576000 > filestore_queue_committin
[ceph-users] Infernalis
Hi Cephers Just fired up first Infernalis cluster on RHEL7.1. The following: [root@citrus ~]# systemctl status ceph-osd@0.service ceph-osd@0.service - Ceph object storage daemon Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled) Active: active (running) since Fri 2016-01-08 15:57:11 GMT; 1h 8min ago Main PID: 7578 (ceph-osd) CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service └─7578 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph Jan 08 15:57:10 citrus.arch.velocix.com systemd[1]: Starting Ceph object storage daemon... Jan 08 15:57:10 citrus.arch.velocix.com ceph-osd-prestart.sh[7520]: getopt: unrecognized option '--setuser' Jan 08 15:57:10 citrus.arch.velocix.com ceph-osd-prestart.sh[7520]: getopt: unrecognized option '--setgroup' Jan 08 15:57:11 citrus.arch.velocix.com ceph-osd-prestart.sh[7520]: create-or-move updating item name 'osd.0' weight 0.2678 at location {host=citrus,root=default} to crush map Jan 08 15:57:11 citrus.arch.velocix.com systemd[1]: Started Ceph object storage daemon. Jan 08 15:57:11 citrus.arch.velocix.com ceph-osd[7578]: starting osd.0 at :/0 osd_data /var/lib/ceph/osd/ceph-0 /var/lib/ceph/osd/ceph-0/journal Jan 08 15:57:11 citrus.arch.velocix.com ceph-osd[7578]: 2016-01-08 15:57:11.743134 7f61ee37e900 -1 osd.0 0 log_to_monitors {default=true} Jan 08 15:57:11 citrus.arch.velocix.com systemd[1]: Started Ceph object storage daemon. Jan 08 15:57:12 citrus.arch.velocix.com systemd[1]: Started Ceph object storage daemon. Jan 08 15:57:12 citrus.arch.velocix.com systemd[1]: Started Ceph object storage daemon. Jan 08 15:57:12 citrus.arch.velocix.com systemd[1]: Started Ceph object storage daemon. Jan 08 15:57:14 citrus.arch.velocix.com systemd[1]: Started Ceph object storage daemon. Shows some warnings: - setuser unrecognised option (and setgroup) - Is this an error? - why 5 msgs about starting the Ceph object storage daemon? Is this also an error of some kind? Paul ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] using cache-tier with writeback mode, raods bench result degrade
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Are you backporting that to hammer? We'd love it. - Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Fri, Jan 8, 2016 at 9:28 AM, Nick Fisk wrote: > There was/is a bug in Infernalis and older, where objects will always get > promoted on the 2nd read/write regardless of what you set the > min_recency_promote settings to. This can have a dramatic effect on > performance. I wonder if this is what you are experiencing? > > This has been fixed in Jewel https://github.com/ceph/ceph/pull/6702 . > > You can compile the changes above to see if it helps or I have a .deb for > Infernalis where this is fixed if it's easier. > > Nick > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of >> Wade Holler >> Sent: 08 January 2016 16:14 >> To: hnuzhoulin ; ceph-de...@vger.kernel.org >> Cc: ceph-us...@ceph.com >> Subject: Re: [ceph-users] using cache-tier with writeback mode, raods bench >> result degrade >> >> My experience is performance degrades dramatically when dirty objects are >> flushed. >> >> Best Regards, >> Wade >> >> >> On Fri, Jan 8, 2016 at 11:08 AM hnuzhoulin wrote: >> Hi,guyes >> Recentlly,I am testing cache-tier using writeback mode.but I found a >> strange things. >> the performance using rados bench degrade.Is it correct? >> If so,how to explain.following some info about my test: >> >> storage node:4 machine,two INTEL SSDSC2BB120G4(one for systaem,the >> other >> one used as OSD),four sata as OSD. >> >> before using cache-tier: >> root@ceph1:~# rados bench -p coldstorage 300 write --no-cleanup >> >> Total time run: 301.236355 >> Total writes made: 6041 >> Write size: 4194304 >> Bandwidth (MB/sec): 80.216 >> >> Stddev Bandwidth: 10.5358 >> Max bandwidth (MB/sec): 104 >> Min bandwidth (MB/sec): 0 >> Average Latency:0.797838 >> Stddev Latency: 0.619098 >> Max latency:4.89823 >> Min latency:0.158543 >> >> root@ceph1:/root/cluster# rados bench -p coldstorage 300 seq >> Total time run:133.563980 >> Total reads made: 6041 >> Read size:4194304 >> Bandwidth (MB/sec):180.917 >> >> Average Latency: 0.353559 >> Max latency: 1.83356 >> Min latency: 0.027878 >> >> after configure cache-tier: >> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier add coldstorage >> hotstorage >> pool 'hotstorage' is now (or already was) a tier of 'coldstorage' >> >> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier cache-mode >> hotstorage writeback >> set cache-mode for pool 'hotstorage' to writeback >> >> root@ubuntu:~/benchmarkcollect/Monitor# ceph osd tier set-overlay >> coldstorage hotstorage >> overlay for 'coldstorage' is now (or already was) 'hotstorage' >> >> oot@ubuntu:~# ceph osd dump|grep storage >> pool 6 'coldstorage' replicated size 3 min_size 1 crush_ruleset 0 >> object_hash rjenkins pg_num 512 pgp_num 512 last_change 216 lfor 216 >> flags >> hashpspool tiers 7 read_tier 7 write_tier 7 stripe_width 0 >> pool 7 'hotstorage' replicated size 3 min_size 1 crush_ruleset 1 >> object_hash rjenkins pg_num 128 pgp_num 128 last_change 228 flags >> hashpspool,incomplete_clones tier_of 6 cache_mode writeback >> target_bytes >> 1000 hit_set bloom{false_positive_probability: 0.05, target_size: >> 0, seed: 0} 3600s x6 stripe_width 0 >> - >> rados bench -p coldstorage 300 write --no-cleanup >> Total time run: 302.207573 >> Total writes made: 4315 >> Write size: 4194304 >> Bandwidth (MB/sec): 57.113 >> >> Stddev Bandwidth: 23.9375 >> Max bandwidth (MB/sec): 104 >> Min bandwidth (MB/sec): 0 >> Average Latency: 1.1204 >> Stddev Latency: 0.717092 >> Max latency: 6.97288 >> Min latency: 0.158371 >> >> root@ubuntu:/# rados bench -p coldstorage 300 seq >> Total time run: 153.869741 >> Total reads made: 4315 >> Read size: 4194304 >> Bandwidth (MB/sec): 112.173 >> >> Average Latency: 0.570487 >> Max latency: 1.75137 >> Min latency: 0.039635 >> >> >> ceph.conf: >> >> [global] >> fsid = 4ec1eb64-226c-4d90-8c5c-b6b6644be831 >> mon_initial_members = ceph2, ceph3, ceph4 >> mon_host = 10.**.**.241,10.**.**.242,10.**.**.243 >> auth_cluster_required = cephx >> auth_service_required = cephx >> auth_client_required = cephx >> filestore_xattr_use_omap = true >> osd_pool_default_size = 3 >> osd_pool_default_min_size = 1 >> auth_supported = cephx >> osd_journal_size = 10240 >> osd_mkfs_type = xfs >> osd crush update on start = false >> >> [client] >> rbd_cache = true >> rbd_cache_writethrough_until_flush = false >> rbd_cache_size = 33554432 >> rbd_cache_max_dirty = 25165824 >> rbd_cache_target_dirty = 16777216 >> rbd_cache_max_dirty_age = 1 >> rbd_cache_block_writes_upfront = false >> [os
Re: [ceph-users] Unable to see LTTng tracepoints in Ceph
Have you started ceph-osd with LD_PRELOAD=/usr/lib64/liblttng-ust-fork.so [matched to correct OS path]? I just tested ceph-osd on the master branch and was able to generate OSD trace events. You should also make sure that AppArmor / SElinux isn't denying access to /dev/shm/lttng-ust-*. What tracing events do you see being generated from ceph-mon? I didn't realize it had any registered tracepoint events. -- Jason Dillaman - Original Message - > From: "Aakanksha Pudipeddi-SSI" > To: ceph-users@lists.ceph.com > Sent: Wednesday, January 6, 2016 10:36:13 PM > Subject: [ceph-users] Unable to see LTTng tracepoints in Ceph > Hello Cephers, > A very happy new year to you all! > I wanted to enable LTTng tracepoints for a few tests with infernalis and > configured Ceph with the –with-lttng option. Seeing a recent post on conf > file options for tracing, I added these lines: > osd_tracing = true > osd_objectstore_tracing = true > rados_tracing = true > rbd_tracing = true > However, I am unable to see LTTng tracepoints within ceph-osd. I can see > tracepoints in ceph-mon though. The main difference with respect to tracing > between ceph-mon and ceph-osd seems to be TracepointProvider and I thought > the addition in my config file should do the trick but that didn’t change > anything. I do not know if this is relevant but I also checked with lsof and > I see ceph-osd is accessing the lttng library as is ceph-mon. Did anyone > come across this issue and if so, could you give me some direction on this? > Thanks a lot for your help! > Aakanksha > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] pg is stuck stale (osd.21 still removed)
One more - I tried to recreate the pg but now this pg this "stuck inactive": root@ceph-admin:~# ceph pg force_create_pg 34.225 pg 34.225 now creating, ok root@ceph-admin:~# ceph health detail HEALTH_WARN 49 pgs stale; 1 pgs stuck inactive; 49 pgs stuck stale; 1 pgs stuck unclean pg 34.225 is stuck inactive since forever, current state creating, last acting [] pg 34.225 is stuck unclean since forever, current state creating, last acting [] pg 34.186 is stuck stale for 118481.013632, current state stale+active+clean, last acting [21] ... Maybe somebody has an idea how to fix this situation? regards Danny smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] [Ceph-Users] The best practice, ""ceph.conf""
Hello, Since ""ceph.conf"" is getting more complicated because there has been a bunch of parameters. It's because of bug fixes, performance optimization or whatever making the Ceph cluster more strong, stable and something. I'm pretty sure that I have not been able to catch up -; [ceph@ceph01 ~]$ ceph --version ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) [ceph@ceph01 ~]$ ceph --show-config | wc -l 840 [ceph@ceph-stack src]$ ./ceph --show-config | wc -l *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH *** 2016-01-08 18:05:15.195421 7fbce7462700 -1 WARNING: the following dangerous and experimental features are enabled: * 946 I know it depends but I would like to be suggested to about what is the best practice of setting of ""ceph.conf"". And where should I go to get specific explanation of each parameter, and know exactly what each parameter means, and how it works. Rgds, Shinobu ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] very high OSD RAM usage values
Hi, I would say this is normal. 1GB of ram per 1TB is what we designed the cluster for, I would believe that an EC-pool demands a lot more. Buy more ram and start everything 32GB ram is quite little, when the cluster is operating OK you'll see that extra ram getting used as file cache which makes the cluster faster. Regards, Josef On 6 Jan 2016 12:12, "Kenneth Waegeman" wrote: > Hi all, > > We experienced some serious trouble with our cluster: A running cluster > started failing and started a chain reaction until the ceph cluster was > down, as about half the OSDs are down (in a EC pool) > > Each host has 8 OSDS of 8 TB (i.e. RAID 0 of 2 4TB disk) for an EC pool > (10+3, 14 hosts) and 2 cache OSDS and 32 GB of RAM. > The reason we have the Raid0 of the disks, is because we tried with 16 > disk before, but 32GB didn't seem enough to keep the cluster stable > > We don't know for sure what triggered the chain reaction, but what we > certainly see, is that while recovering, our OSDS are using a lot of > memory. We've seen some OSDS using almost 8GB of RAM (resident; virtual > 11GB) > So right now we don't have enough memory to recover the cluster, because > the OSDS get killed by OOMkiller before they can recover.. > And I don't know doubling our memory will be enough.. > > A few questions: > > * Does someone has seen this before? > * 2GB was still normal, but 8GB seems a lot, is this expected behaviour? > * We didn't see this with an nearly empty cluster. Now it was filled about > 1/4 (270TB). I guess it would become worse when filled half or more? > * How high can this memory usage become ? Can we calculate the maximum > memory of an OSD? Can we limit it ? > * We can upgrade/reinstall to infernalis, will that solve anything? > > This is related to a previous post of me : > http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22259 > > > Thank you very much !! > > Kenneth > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] very high OSD RAM usage values
Maybe changing the number of concurrent back fills could limit the memory usage. On 9 Jan 2016 05:52, "Josef Johansson" wrote: > Hi, > > I would say this is normal. 1GB of ram per 1TB is what we designed the > cluster for, I would believe that an EC-pool demands a lot more. Buy more > ram and start everything 32GB ram is quite little, when the cluster is > operating OK you'll see that extra ram getting used as file cache which > makes the cluster faster. > > Regards, > Josef > On 6 Jan 2016 12:12, "Kenneth Waegeman" wrote: > >> Hi all, >> >> We experienced some serious trouble with our cluster: A running cluster >> started failing and started a chain reaction until the ceph cluster was >> down, as about half the OSDs are down (in a EC pool) >> >> Each host has 8 OSDS of 8 TB (i.e. RAID 0 of 2 4TB disk) for an EC pool >> (10+3, 14 hosts) and 2 cache OSDS and 32 GB of RAM. >> The reason we have the Raid0 of the disks, is because we tried with 16 >> disk before, but 32GB didn't seem enough to keep the cluster stable >> >> We don't know for sure what triggered the chain reaction, but what we >> certainly see, is that while recovering, our OSDS are using a lot of >> memory. We've seen some OSDS using almost 8GB of RAM (resident; virtual >> 11GB) >> So right now we don't have enough memory to recover the cluster, because >> the OSDS get killed by OOMkiller before they can recover.. >> And I don't know doubling our memory will be enough.. >> >> A few questions: >> >> * Does someone has seen this before? >> * 2GB was still normal, but 8GB seems a lot, is this expected behaviour? >> * We didn't see this with an nearly empty cluster. Now it was filled >> about 1/4 (270TB). I guess it would become worse when filled half or more? >> * How high can this memory usage become ? Can we calculate the maximum >> memory of an OSD? Can we limit it ? >> * We can upgrade/reinstall to infernalis, will that solve anything? >> >> This is related to a previous post of me : >> http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22259 >> >> >> Thank you very much !! >> >> Kenneth >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com