On Thu, Jan 10, 2019 at 8:02 AM Jonathan Woytek <woy...@dryrose.com> wrote: > > On Wed, Jan 9, 2019 at 4:34 PM Patrick Donnelly <pdonn...@redhat.com> wrote: >> >> Hello Jonathan, >> >> On Wed, Jan 9, 2019 at 5:37 AM Jonathan Woytek <woy...@dryrose.com> wrote: >> > While working on examining performance under load at scale, I see a marked >> > performance improvement whenever I would restart certain mds daemons. I >> > was able to duplicate the performance improvement by issuing a "daemon >> > mds.blah cache drop". The performance bump lasts for quite a long >> > time--far longer than it takes for the cache to "fill" according to the >> > stats. >> >> What version of Ceph are you running? Can you expand on what this >> performance improvement is? > > > Sure. We're running Mimic 13.2.3 across all cluster nodes and clients. > > I have to quantify the performance improvement in terms of our file > processing speed. More specifically, the particular process I'm framing this > with does a few operations with files on disk. It looks at a file on disk in > a temporary directory (located in ceph), determines if it is already in the > larger data store (also in ceph), and if not will create the necessary > directories, then move the file from temporary storage (technically, this is > a copy-and-unlink). File size can be anywhere from hundreds of bytes to > hundreds of megabytes, but sizes tend to average out somewhere around 5MB, if > I remember correctly. We process millions of new files per day. > > After clearing the cache OR restarting all mds daemons, the process moving > these files will spike up to about 5-6k files/minute (this process is > distributed across six systems and running about 120 total threads), and it > stays there for about an hour. Performance will start to trail off after > that, dropping pretty quickly to 2-3k/minute. We'll stay in that window for a > few hours, by which time we'll be down between 700-1k/minute. >
Could you please run following command (for each active mds) when operations are fast and when operations are slow - for i in `seq 10`; do ceph daemon mds.xxx dump_historic_ops > mds.xxx.$i; sleep 1; done Then send the results to us Regards Yan, Zheng > There are many other operations happening on this filesystem while this > operation is taking place, many read-only, but a few writers as well > (specifically, the processes that are dropping new files into the temporary > space for the above process to examine and handle). > > The copy-and-unlink process seems to be I/O bound, as adding more threads > does not change the performance. I understand that operation is somewhat > metadata intensive, as it is checking to see if a file exists, then checking > to see if the directories exist for where we need to move the file, creating > them if needed, then actually copying the file, and finally unlinking the > source file. That said, I still find it a little surprising that dropping the > cache seems to make performance so much better. > > jonathan > > >> >> -- >> Patrick Donnelly > > > > -- > Jonathan Woytek > http://www.dryrose.com > KB3HOZ > PGP: 462C 5F50 144D 6B09 3B65 FCE8 C1DC DEC4 E8B6 AABC > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com