Hi,
I am just wondering if anyone has any thoughts on the questions below...I would 
like to order some additional hardware ASAP...and the order that I place may 
change depending on the feedback that I receive.

Thanks again,

Shain

Sent from my iPhone

> On Jan 9, 2015, at 2:45 PM, Shain Miley <smi...@npr.org> wrote:
> 
> Although it seems like having a regularly scheduled cron job to do a 
> recursive directory listing may be ok for us as a bit of a work around...I am 
> still in the processes of trying to improve performance.
> 
> A few other questions have come up as a result.
> 
> a)I am in the process of looking at specs for a new rbd 'headnode' that will 
> be used to mount our 100TB rbd image.  At some point in the future we may 
> look into the performance, and multi client access that cephfs could 
> offer...is there any reason that I would not be able to use this new server 
> as both an rbd client and an mds server (assuming the hardware is good 
> enough)?  I know that some cluster functions should not and cannot be mixed 
> on the same server...is this by any chance one of them?
> 
> b)Currently the 100TB rbd image is acting as one large repository for our 
> archive....this will only grow over time.   I understand that ceph is pool 
> based...however I am wondering if I would somehow see any better per rbd 
> image performance...if for example...instead of having 1 x 100TB rbd 
> image...I had 4 x 25TB rbd images (since we really could split these up based 
> on our internal groups).
> 
> c)Would adding a few ssd drives (in the right quantity) to each node help out 
> with reads as well as writes?
> 
> d)I am a bit confused about how to enable the rbd cache option on the 
> client...is this change something that only needs to be made to the ceph.conf 
> file on the rbd kernel client server...or do the mds and osd servers need the 
> ceph.conf file modified as well and their services restarted?
> 
> Other options that I might be looking into going forward are moving some of 
> this data (the data actually needed by our php apps) to rgw...although that 
> option adds some more complexity and unfamiliarity for our users.
> 
> Thanks again for all the help so far.
> 
> Shain
> 
>> On 01/07/2015 03:40 PM, Shain Miley wrote:
>> Just to follow up on this thread, the main reason that the rbd directory 
>> listing latency was an issue for us,  was that we were seeing a large amount 
>> of IO delay in a PHP app that reads from that rbd image.
>> 
>> It occurred to me (based on Roberts cache_dir suggestion below) that maybe 
>> doing a recursive find or a recursive directory listing inside the one 
>> folder in question might speed things up.
>> 
>> After doing the recursive find...the directory listing seems much faster and 
>> the responsiveness of the PHP app has increased as well.
>> 
>> Hopefully nothing else will need to be done here, however it seems that 
>> worst case...a daily or weekly cronjob that traverses the directory tree in 
>> that folder might be all we need.
>> 
>> Thanks again for all the help.
>> 
>> Shain
>> 
>> 
>> 
>> Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
>> smi...@npr.org | 202.513.3649
>> 
>> ________________________________________
>> From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Shain 
>> Miley [smi...@npr.org]
>> Sent: Tuesday, January 06, 2015 8:16 PM
>> To: Christian Balzer; ceph-us...@ceph.com
>> Subject: Re: [ceph-users] rbd directory listing performance issues
>> 
>> Christian,
>> 
>> Each of the OSD's server nodes are running on Dell R-720xd's with 64 GB or 
>> RAM.
>> 
>> We have 107 OSD's so I have not checked all of them..however the ones I have 
>> checked with xfs_db, have shown anywhere from 1% to 4% fragmentation.
>> 
>> I'll try to upgrade the client server to 32 or 64 GB of ram at some point 
>> soon...however at this point all the tuning that I have done has not yielded 
>> all that much in terms of results.
>> 
>> It maybe a simple fact that I need to look into adding some SSD's, and the 
>> overall bottleneck here are the 4TB 7200 rpm disks we are using.
>> 
>> In general, when looking at the graphs in Calamari, we see around 20ms 
>> latency (await) for our OSD's however there are lots of times where we see 
>> (via the graphs) spikes of 250ms to 400ms as well.
>> 
>> Thanks again,
>> 
>> Shain
>> 
>> 
>> Shain Miley | Manager of Systems and Infrastructure, Digital Media | 
>> smi...@npr.org | 202.513.3649
>> 
>> ________________________________________
>> From: Christian Balzer [ch...@gol.com]
>> Sent: Tuesday, January 06, 2015 7:34 PM
>> To: ceph-us...@ceph.com
>> Cc: Shain Miley
>> Subject: Re: [ceph-users] rbd directory listing performance issues
>> 
>> Hello,
>> 
>>> On Tue, 6 Jan 2015 15:29:50 +0000 Shain Miley wrote:
>>> 
>>> Hello,
>>> 
>>> We currently have a 12 node (3 monitor+9 OSD) ceph cluster, made up of
>>> 107 x 4TB drives formatted with xfs. The cluster is running ceph version
>>> 0.80.7:
>> I assume journals on the same HDD then.
>> 
>> How much memory per node?
>> 
>> [snip]
>>> A while back I created an 80 TB rbd image to be used as an archive
>>> repository for some of our audio and video files. We are still seeing
>>> good rados and rbd read and write throughput performance, however we
>>> seem to be having quite a long delay in response times when we try to
>>> list out the files in directories with a large number of folders, files,
>>> etc.
>>> 
>>> Subsequent directory listing times seem to run a lot faster (but I am
>>> not sure for long that is the case before we see another instance of
>>> slowness), however the initial directory listings can take 20 to 45
>>> seconds.
>> Basically the same thing(s) that Robert said.
>> How big is "large"?
>> How much memory on the machine you're mounting this image?
>> Ah, never mind, just saw your follow-up.
>> 
>> Definitely add memory to this machine if you can.
>> 
>> The initial listing is always going to be slow-ish of sorts depending on
>> a number of things in the cluster.
>> 
>> As in, how busy is it (IOPS)? With journals on disk your HDDs are going to
>> be sluggish individually and your directory information might reside
>> mostly in one object (on one OSD), thus limiting you to the speed of that
>> particular disk.
>> 
>> And this is also where the memory of your storage nodes comes in, if it is
>> large enough your "hot" objects will get cached there as well.
>> To see if that's the case (at least temporarily), drop the caches on all
>> of your storage nodes (echo 3 > /proc/sys/vm/drop_caches), mount your
>> image, do the "ls -l" until it's "fast", umount it, mount it again and do
>> the listing again.
>> In theory, unless your cluster is extremely busy or your storage node have
>> very little pagecache, the re-mounted image should get all the info it
>> needs from said pagecache on your storage nodes, never having to go to the
>> actual OSD disks and thus be fast(er) than the initial test.
>> 
>> Finally to potentially improve the initial scan that has to come from the
>> disks obviously, see how fragmented your OSDs are and depending on the
>> results defrag them.
>> 
>> Christian
>> --
>> Christian Balzer        Network/Systems Engineer
>> ch...@gol.com           Global OnLine Japan/Fusion Communications
>> http://www.gol.com/
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to