Thanks Josh ! I am able to successfully add this noshare option in the image mapping now. Looking at dmesg output, I found that was indeed the secret key problem. Block performance is scaling now.
Regards Somnath -----Original Message----- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Josh Durgin Sent: Thursday, September 19, 2013 12:24 PM To: Somnath Roy Cc: Sage Weil; ceph-de...@vger.kernel.org; Anirban Ray; ceph-users@lists.ceph.com Subject: Re: [ceph-users] Scaling RBD module On 09/19/2013 12:04 PM, Somnath Roy wrote: > Hi Josh, > Thanks for the information. I am trying to add the following but hitting some > permission issue. > > root@emsclient:/etc# echo <mon-1>:6789,<mon-2>:6789,<mon-3>:6789 > name=admin,key=client.admin,noshare test_rbd ceph_block_test' > > /sys/bus/rbd/add > -bash: echo: write error: Operation not permitted If you check dmesg, it will probably show an error trying to authenticate to the cluster. Instead of key=client.admin, you can pass the base64 secret value as shown in 'ceph auth list' with the secret=XXXXXXXXXXXXXXXXXXXXX option. BTW, there's a ticket for adding the noshare option to rbd map so using the sysfs interface like this is never necessary: http://tracker.ceph.com/issues/6264 Josh > Here is the contents of rbd directory.. > > root@emsclient:/sys/bus/rbd# ll > total 0 > drwxr-xr-x 4 root root 0 Sep 19 11:59 ./ > drwxr-xr-x 30 root root 0 Sep 13 11:41 ../ > --w------- 1 root root 4096 Sep 19 11:59 add > drwxr-xr-x 2 root root 0 Sep 19 12:03 devices/ > drwxr-xr-x 2 root root 0 Sep 19 12:03 drivers/ > -rw-r--r-- 1 root root 4096 Sep 19 12:03 drivers_autoprobe > --w------- 1 root root 4096 Sep 19 12:03 drivers_probe > --w------- 1 root root 4096 Sep 19 12:03 remove > --w------- 1 root root 4096 Sep 19 11:59 uevent > > > I checked even if I am logged in as root , I can't write anything on /sys. > > Here is the Ubuntu version I am using.. > > root@emsclient:/etc# lsb_release -a > No LSB modules are available. > Distributor ID: Ubuntu > Description: Ubuntu 13.04 > Release: 13.04 > Codename: raring > > Here is the mount information.... > > root@emsclient:/etc# mount > /dev/mapper/emsclient--vg-root on / type ext4 (rw,errors=remount-ro) > proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type > sysfs (rw,noexec,nosuid,nodev) none on /sys/fs/cgroup type tmpfs (rw) > none on /sys/fs/fuse/connections type fusectl (rw) none on > /sys/kernel/debug type debugfs (rw) none on /sys/kernel/security type > securityfs (rw) udev on /dev type devtmpfs (rw,mode=0755) devpts on > /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620) > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) > none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880) > none on /run/shm type tmpfs (rw,nosuid,nodev) none on /run/user type > tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755) > /dev/sda1 on /boot type ext2 (rw) > /dev/mapper/emsclient--vg-home on /home type ext4 (rw) > > > Any idea what went wrong here ? > > Thanks & Regards > Somnath > > -----Original Message----- > From: Josh Durgin [mailto:josh.dur...@inktank.com] > Sent: Wednesday, September 18, 2013 6:10 PM > To: Somnath Roy > Cc: Sage Weil; ceph-de...@vger.kernel.org; Anirban Ray; > ceph-users@lists.ceph.com > Subject: Re: [ceph-users] Scaling RBD module > > On 09/17/2013 03:30 PM, Somnath Roy wrote: >> Hi, >> I am running Ceph on a 3 node cluster and each of my server node is running >> 10 OSDs, one for each disk. I have one admin node and all the nodes are >> connected with 2 X 10G network. One network is for cluster and other one >> configured as public network. >> >> Here is the status of my cluster. >> >> ~/fio_test# ceph -s >> >> cluster b2e0b4db-6342-490e-9c28-0aadf0188023 >> health HEALTH_WARN clock skew detected on mon. <server-name-2>, mon. >> <server-name-3> >> monmap e1: 3 mons at {<server-name-1>=xxx.xxx.xxx.xxx:6789/0, >> <server-name-2>=xxx.xxx.xxx.xxx:6789/0, >> <server-name-3>=xxx.xxx.xxx.xxx:6789/0}, election epoch 64, quorum 0,1,2 >> <server-name-1>,<server-name-2>,<server-name-3> >> osdmap e391: 30 osds: 30 up, 30 in >> pgmap v5202: 30912 pgs: 30912 active+clean; 8494 MB data, 27912 MB >> used, 11145 GB / 11172 GB avail >> mdsmap e1: 0/0/1 up >> >> >> I started with rados bench command to benchmark the read performance of this >> Cluster on a large pool (~10K PGs) and found that each rados client has a >> limitation. Each client can only drive up to a certain mark. Each server >> node cpu utilization shows it is around 85-90% idle and the admin node >> (from where rados client is running) is around ~80-85% idle. I am trying >> with 4K object size. > > Note that rados bench with 4k objects is different from rbd with 4k-sized > I/Os - rados bench sends each request to a new object, while rbd objects are > 4M by default. > >> Now, I started running more clients on the admin node and the performance is >> scaling till it hits the client cpu limit. Server still has the cpu of >> 30-35% idle. With small object size I must say that the ceph per osd cpu >> utilization is not promising! >> >> After this, I started testing the rados block interface with kernel rbd >> module from my admin node. >> I have created 8 images mapped on the pool having around 10K PGs and I am >> not able to scale up the performance by running fio (either by creating a >> software raid or running on individual /dev/rbd* instances). For example, >> running multiple fio instances (one in /dev/rbd1 and the other in /dev/rbd2) >> the performance I am getting is half of what I am getting if running one >> instance. Here is my fio job script. >> >> [random-reads] >> ioengine=libaio >> iodepth=32 >> filename=/dev/rbd1 >> rw=randread >> bs=4k >> direct=1 >> size=2G >> numjobs=64 >> >> Let me know if I am following the proper procedure or not. >> >> But, If my understanding is correct, kernel rbd module is acting as a client >> to the cluster and in one admin node I can run only one of such kernel >> instance. >> If so, I am then limited to the client bottleneck that I stated earlier. The >> cpu utilization of the server side is around 85-90% idle, so, it is clear >> that client is not driving. >> >> My question is, is there any way to hit the cluster with more client from a >> single box while testing the rbd module ? > > You can run multiple librbd instances easily (for example with multiple runs > of the rbd bench-write command). > > The kernel rbd driver uses the same rados client instance for multiple block > devices by default. There's an option (noshare) to use a new rados client > instance for a newly mapped device, but it's not exposed by the rbd cli. You > need to use the sysfs interface that 'rbd map' uses instead. > > Once you've used rbd map once on a machine, the kernel will already have the > auth key stored, and you can use: > > echo '1.2.3.4:6789 name=admin,key=client.admin,noshare poolname > imagename' > /sys/bus/rbd/add > > Where 1.2.3.4:6789 is the address of a monitor, and you're connecting as > client.admin. > > You can use 'rbd unmap' as usual. > > Josh > > > ________________________________ > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby notified > that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and destroy > any and all copies of this message in your possession (whether hard copies or > electronically stored copies). > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com