This "noshare" option may have just helped me a ton -- I sure wish I would
have asked similar questions sooner, because I have seen the same failure
to scale.  =)

One question -- when using the "noshare" option (or really, even without
it) are there any practical limits on the number of RBDs that can be
mounted?  I have servers with ~100 RBDs on them each, and am wondering if I
switch them all over to using "noshare" if anything is going to blow up,
use a ton more memory, etc.  Even without noshare, are there any known
limits to how many RBDs can be mapped?

Thanks!

 - Travis


On Thu, Sep 19, 2013 at 8:03 PM, Somnath Roy <somnath....@sandisk.com>wrote:

> Thanks Josh !
> I am able to successfully add this noshare option in the image mapping
> now. Looking at dmesg output, I found that was indeed the secret key
> problem. Block performance is scaling now.
>
> Regards
> Somnath
>
> -----Original Message-----
> From: ceph-devel-ow...@vger.kernel.org [mailto:
> ceph-devel-ow...@vger.kernel.org] On Behalf Of Josh Durgin
> Sent: Thursday, September 19, 2013 12:24 PM
> To: Somnath Roy
> Cc: Sage Weil; ceph-de...@vger.kernel.org; Anirban Ray;
> ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Scaling RBD module
>
> On 09/19/2013 12:04 PM, Somnath Roy wrote:
> > Hi Josh,
> > Thanks for the information. I am trying to add the following but hitting
> some permission issue.
> >
> > root@emsclient:/etc# echo <mon-1>:6789,<mon-2>:6789,<mon-3>:6789
> > name=admin,key=client.admin,noshare test_rbd ceph_block_test' >
> > /sys/bus/rbd/add
> > -bash: echo: write error: Operation not permitted
>
> If you check dmesg, it will probably show an error trying to authenticate
> to the cluster.
>
> Instead of key=client.admin, you can pass the base64 secret value as shown
> in 'ceph auth list' with the secret=XXXXXXXXXXXXXXXXXXXXX option.
>
> BTW, there's a ticket for adding the noshare option to rbd map so using
> the sysfs interface like this is never necessary:
>
> http://tracker.ceph.com/issues/6264
>
> Josh
>
> > Here is the contents of rbd directory..
> >
> > root@emsclient:/sys/bus/rbd# ll
> > total 0
> > drwxr-xr-x  4 root root    0 Sep 19 11:59 ./
> > drwxr-xr-x 30 root root    0 Sep 13 11:41 ../
> > --w-------  1 root root 4096 Sep 19 11:59 add
> > drwxr-xr-x  2 root root    0 Sep 19 12:03 devices/
> > drwxr-xr-x  2 root root    0 Sep 19 12:03 drivers/
> > -rw-r--r--  1 root root 4096 Sep 19 12:03 drivers_autoprobe
> > --w-------  1 root root 4096 Sep 19 12:03 drivers_probe
> > --w-------  1 root root 4096 Sep 19 12:03 remove
> > --w-------  1 root root 4096 Sep 19 11:59 uevent
> >
> >
> > I checked even if I am logged in as root , I can't write anything on
> /sys.
> >
> > Here is the Ubuntu version I am using..
> >
> > root@emsclient:/etc# lsb_release -a
> > No LSB modules are available.
> > Distributor ID: Ubuntu
> > Description:    Ubuntu 13.04
> > Release:        13.04
> > Codename:       raring
> >
> > Here is the mount information....
> >
> > root@emsclient:/etc# mount
> > /dev/mapper/emsclient--vg-root on / type ext4 (rw,errors=remount-ro)
> > proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type
> > sysfs (rw,noexec,nosuid,nodev) none on /sys/fs/cgroup type tmpfs (rw)
> > none on /sys/fs/fuse/connections type fusectl (rw) none on
> > /sys/kernel/debug type debugfs (rw) none on /sys/kernel/security type
> > securityfs (rw) udev on /dev type devtmpfs (rw,mode=0755) devpts on
> > /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
> > tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
> > none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
> > none on /run/shm type tmpfs (rw,nosuid,nodev) none on /run/user type
> > tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)
> > /dev/sda1 on /boot type ext2 (rw)
> > /dev/mapper/emsclient--vg-home on /home type ext4 (rw)
> >
> >
> > Any idea what went wrong here ?
> >
> > Thanks & Regards
> > Somnath
> >
> > -----Original Message-----
> > From: Josh Durgin [mailto:josh.dur...@inktank.com]
> > Sent: Wednesday, September 18, 2013 6:10 PM
> > To: Somnath Roy
> > Cc: Sage Weil; ceph-de...@vger.kernel.org; Anirban Ray;
> > ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] Scaling RBD module
> >
> > On 09/17/2013 03:30 PM, Somnath Roy wrote:
> >> Hi,
> >> I am running Ceph on a 3 node cluster and each of my server node is
> running 10 OSDs, one for each disk. I have one admin node and all the nodes
> are connected with 2 X 10G network. One network is for cluster and other
> one configured as public network.
> >>
> >> Here is the status of my cluster.
> >>
> >> ~/fio_test# ceph -s
> >>
> >>     cluster b2e0b4db-6342-490e-9c28-0aadf0188023
> >>      health HEALTH_WARN clock skew detected on mon. <server-name-2>,
> mon. <server-name-3>
> >>      monmap e1: 3 mons at {<server-name-1>=xxx.xxx.xxx.xxx:6789/0,
> <server-name-2>=xxx.xxx.xxx.xxx:6789/0,
> <server-name-3>=xxx.xxx.xxx.xxx:6789/0}, election epoch 64, quorum 0,1,2
> <server-name-1>,<server-name-2>,<server-name-3>
> >>      osdmap e391: 30 osds: 30 up, 30 in
> >>       pgmap v5202: 30912 pgs: 30912 active+clean; 8494 MB data, 27912
> MB used, 11145 GB / 11172 GB avail
> >>      mdsmap e1: 0/0/1 up
> >>
> >>
> >> I started with rados bench command to benchmark the read performance of
> this Cluster on a large pool (~10K PGs) and found that each rados client
> has a limitation. Each client can only drive up to a certain mark. Each
> server  node cpu utilization shows it is  around 85-90% idle and the admin
> node (from where rados client is running) is around ~80-85% idle. I am
> trying with 4K object size.
> >
> > Note that rados bench with 4k objects is different from rbd with
> 4k-sized I/Os - rados bench sends each request to a new object, while rbd
> objects are 4M by default.
> >
> >> Now, I started running more clients on the admin node and the
> performance is scaling till it hits the client cpu limit. Server still has
> the cpu of 30-35% idle. With small object size I must say that the ceph per
> osd cpu utilization is not promising!
> >>
> >> After this, I started testing the rados block interface with kernel rbd
> module from my admin node.
> >> I have created 8 images mapped on the pool having around 10K PGs and I
> am not able to scale up the performance by running fio (either by creating
> a software raid or running on individual /dev/rbd* instances). For example,
> running multiple fio instances (one in /dev/rbd1 and the other in
> /dev/rbd2)  the performance I am getting is half of what I am getting if
> running one instance. Here is my fio job script.
> >>
> >> [random-reads]
> >> ioengine=libaio
> >> iodepth=32
> >> filename=/dev/rbd1
> >> rw=randread
> >> bs=4k
> >> direct=1
> >> size=2G
> >> numjobs=64
> >>
> >> Let me know if I am following the proper procedure or not.
> >>
> >> But, If my understanding is correct, kernel rbd module is acting as a
> client to the cluster and in one admin node I can run only one of such
> kernel instance.
> >> If so, I am then limited to the client bottleneck that I stated
> earlier. The cpu utilization of the server side is around 85-90% idle, so,
> it is clear that client is not driving.
> >>
> >> My question is, is there any way to hit the cluster  with more client
> from a single box while testing the rbd module ?
> >
> > You can run multiple librbd instances easily (for example with multiple
> runs of the rbd bench-write command).
> >
> > The kernel rbd driver uses the same rados client instance for multiple
> block devices by default. There's an option (noshare) to use a new rados
> client instance for a newly mapped device, but it's not exposed by the rbd
> cli. You need to use the sysfs interface that 'rbd map' uses instead.
> >
> > Once you've used rbd map once on a machine, the kernel will already have
> the auth key stored, and you can use:
> >
> > echo '1.2.3.4:6789 name=admin,key=client.admin,noshare poolname
> > imagename' > /sys/bus/rbd/add
> >
> > Where 1.2.3.4:6789 is the address of a monitor, and you're connecting
> as client.admin.
> >
> > You can use 'rbd unmap' as usual.
> >
> > Josh
> >
> >
> > ________________________________
> >
> > PLEASE NOTE: The information contained in this electronic mail message
> is intended only for the use of the designated recipient(s) named above. If
> the reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
> >
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org More majordomo info at
>  http://vger.kernel.org/majordomo-info.html
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to