Hi, I am just wondering if anyone has any thoughts on the questions below...I would like to order some additional hardware ASAP...and the order that I place may change depending on the feedback that I receive.
Thanks again, Shain Sent from my iPhone > On Jan 9, 2015, at 2:45 PM, Shain Miley <smi...@npr.org> wrote: > > Although it seems like having a regularly scheduled cron job to do a > recursive directory listing may be ok for us as a bit of a work around...I am > still in the processes of trying to improve performance. > > A few other questions have come up as a result. > > a)I am in the process of looking at specs for a new rbd 'headnode' that will > be used to mount our 100TB rbd image. At some point in the future we may > look into the performance, and multi client access that cephfs could > offer...is there any reason that I would not be able to use this new server > as both an rbd client and an mds server (assuming the hardware is good > enough)? I know that some cluster functions should not and cannot be mixed > on the same server...is this by any chance one of them? > > b)Currently the 100TB rbd image is acting as one large repository for our > archive....this will only grow over time. I understand that ceph is pool > based...however I am wondering if I would somehow see any better per rbd > image performance...if for example...instead of having 1 x 100TB rbd > image...I had 4 x 25TB rbd images (since we really could split these up based > on our internal groups). > > c)Would adding a few ssd drives (in the right quantity) to each node help out > with reads as well as writes? > > d)I am a bit confused about how to enable the rbd cache option on the > client...is this change something that only needs to be made to the ceph.conf > file on the rbd kernel client server...or do the mds and osd servers need the > ceph.conf file modified as well and their services restarted? > > Other options that I might be looking into going forward are moving some of > this data (the data actually needed by our php apps) to rgw...although that > option adds some more complexity and unfamiliarity for our users. > > Thanks again for all the help so far. > > Shain > >> On 01/07/2015 03:40 PM, Shain Miley wrote: >> Just to follow up on this thread, the main reason that the rbd directory >> listing latency was an issue for us, was that we were seeing a large amount >> of IO delay in a PHP app that reads from that rbd image. >> >> It occurred to me (based on Roberts cache_dir suggestion below) that maybe >> doing a recursive find or a recursive directory listing inside the one >> folder in question might speed things up. >> >> After doing the recursive find...the directory listing seems much faster and >> the responsiveness of the PHP app has increased as well. >> >> Hopefully nothing else will need to be done here, however it seems that >> worst case...a daily or weekly cronjob that traverses the directory tree in >> that folder might be all we need. >> >> Thanks again for all the help. >> >> Shain >> >> >> >> Shain Miley | Manager of Systems and Infrastructure, Digital Media | >> smi...@npr.org | 202.513.3649 >> >> ________________________________________ >> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Shain >> Miley [smi...@npr.org] >> Sent: Tuesday, January 06, 2015 8:16 PM >> To: Christian Balzer; ceph-us...@ceph.com >> Subject: Re: [ceph-users] rbd directory listing performance issues >> >> Christian, >> >> Each of the OSD's server nodes are running on Dell R-720xd's with 64 GB or >> RAM. >> >> We have 107 OSD's so I have not checked all of them..however the ones I have >> checked with xfs_db, have shown anywhere from 1% to 4% fragmentation. >> >> I'll try to upgrade the client server to 32 or 64 GB of ram at some point >> soon...however at this point all the tuning that I have done has not yielded >> all that much in terms of results. >> >> It maybe a simple fact that I need to look into adding some SSD's, and the >> overall bottleneck here are the 4TB 7200 rpm disks we are using. >> >> In general, when looking at the graphs in Calamari, we see around 20ms >> latency (await) for our OSD's however there are lots of times where we see >> (via the graphs) spikes of 250ms to 400ms as well. >> >> Thanks again, >> >> Shain >> >> >> Shain Miley | Manager of Systems and Infrastructure, Digital Media | >> smi...@npr.org | 202.513.3649 >> >> ________________________________________ >> From: Christian Balzer [ch...@gol.com] >> Sent: Tuesday, January 06, 2015 7:34 PM >> To: ceph-us...@ceph.com >> Cc: Shain Miley >> Subject: Re: [ceph-users] rbd directory listing performance issues >> >> Hello, >> >>> On Tue, 6 Jan 2015 15:29:50 +0000 Shain Miley wrote: >>> >>> Hello, >>> >>> We currently have a 12 node (3 monitor+9 OSD) ceph cluster, made up of >>> 107 x 4TB drives formatted with xfs. The cluster is running ceph version >>> 0.80.7: >> I assume journals on the same HDD then. >> >> How much memory per node? >> >> [snip] >>> A while back I created an 80 TB rbd image to be used as an archive >>> repository for some of our audio and video files. We are still seeing >>> good rados and rbd read and write throughput performance, however we >>> seem to be having quite a long delay in response times when we try to >>> list out the files in directories with a large number of folders, files, >>> etc. >>> >>> Subsequent directory listing times seem to run a lot faster (but I am >>> not sure for long that is the case before we see another instance of >>> slowness), however the initial directory listings can take 20 to 45 >>> seconds. >> Basically the same thing(s) that Robert said. >> How big is "large"? >> How much memory on the machine you're mounting this image? >> Ah, never mind, just saw your follow-up. >> >> Definitely add memory to this machine if you can. >> >> The initial listing is always going to be slow-ish of sorts depending on >> a number of things in the cluster. >> >> As in, how busy is it (IOPS)? With journals on disk your HDDs are going to >> be sluggish individually and your directory information might reside >> mostly in one object (on one OSD), thus limiting you to the speed of that >> particular disk. >> >> And this is also where the memory of your storage nodes comes in, if it is >> large enough your "hot" objects will get cached there as well. >> To see if that's the case (at least temporarily), drop the caches on all >> of your storage nodes (echo 3 > /proc/sys/vm/drop_caches), mount your >> image, do the "ls -l" until it's "fast", umount it, mount it again and do >> the listing again. >> In theory, unless your cluster is extremely busy or your storage node have >> very little pagecache, the re-mounted image should get all the info it >> needs from said pagecache on your storage nodes, never having to go to the >> actual OSD disks and thus be fast(er) than the initial test. >> >> Finally to potentially improve the initial scan that has to come from the >> disks obviously, see how fragmented your OSDs are and depending on the >> results defrag them. >> >> Christian >> -- >> Christian Balzer Network/Systems Engineer >> ch...@gol.com Global OnLine Japan/Fusion Communications >> http://www.gol.com/ >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com