On Wed, Jan 9, 2019 at 4:34 PM Patrick Donnelly <pdonn...@redhat.com> wrote:

> Hello Jonathan,
>
> On Wed, Jan 9, 2019 at 5:37 AM Jonathan Woytek <woy...@dryrose.com> wrote:
> > While working on examining performance under load at scale, I see a
> marked performance improvement whenever I would restart certain mds
> daemons. I was able to duplicate the performance improvement by issuing a
> "daemon mds.blah cache drop". The performance bump lasts for quite a long
> time--far longer than it takes for the cache to "fill" according to the
> stats.
>
> What version of Ceph are you running? Can you expand on what this
> performance improvement is?
>

Sure. We're running Mimic 13.2.3 across all cluster nodes and clients.

I have to quantify the performance improvement in terms of our file
processing speed. More specifically, the particular process I'm framing
this with does a few operations with files on disk. It looks at a file on
disk in a temporary directory (located in ceph), determines if it is
already in the larger data store (also in ceph), and if not will create the
necessary directories, then move the file from temporary storage
(technically, this is a copy-and-unlink). File size can be anywhere from
hundreds of bytes to hundreds of megabytes, but sizes tend to average out
somewhere around 5MB, if I remember correctly. We process millions of new
files per day.

After clearing the cache OR restarting all mds daemons, the process moving
these files will spike up to about 5-6k files/minute (this process is
distributed across six systems and running about 120 total threads), and it
stays there for about an hour. Performance will start to trail off after
that, dropping pretty quickly to 2-3k/minute. We'll stay in that window for
a few hours, by which time we'll be down between 700-1k/minute.

There are many other operations happening on this filesystem while this
operation is taking place, many read-only, but a few writers as well
(specifically, the processes that are dropping new files into the temporary
space for the above process to examine and handle).

The copy-and-unlink process seems to be I/O bound, as adding more threads
does not change the performance. I understand that operation is somewhat
metadata intensive, as it is checking to see if a file exists, then
checking to see if the directories exist for where we need to move the
file, creating them if needed, then actually copying the file, and finally
unlinking the source file. That said, I still find it a little surprising
that dropping the cache seems to make performance so much better.

jonathan



> --
> Patrick Donnelly
>


-- 
Jonathan Woytek
http://www.dryrose.com
KB3HOZ
PGP:  462C 5F50 144D 6B09 3B65  FCE8 C1DC DEC4 E8B6 AABC
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to