On Thu, Jan 10, 2019 at 8:02 AM Jonathan Woytek <woy...@dryrose.com> wrote:
>
> On Wed, Jan 9, 2019 at 4:34 PM Patrick Donnelly <pdonn...@redhat.com> wrote:
>>
>> Hello Jonathan,
>>
>> On Wed, Jan 9, 2019 at 5:37 AM Jonathan Woytek <woy...@dryrose.com> wrote:
>> > While working on examining performance under load at scale, I see a marked 
>> > performance improvement whenever I would restart certain mds daemons. I 
>> > was able to duplicate the performance improvement by issuing a "daemon 
>> > mds.blah cache drop". The performance bump lasts for quite a long 
>> > time--far longer than it takes for the cache to "fill" according to the 
>> > stats.
>>
>> What version of Ceph are you running? Can you expand on what this
>> performance improvement is?
>
>
> Sure. We're running Mimic 13.2.3 across all cluster nodes and clients.
>
> I have to quantify the performance improvement in terms of our file 
> processing speed. More specifically, the particular process I'm framing this 
> with does a few operations with files on disk. It looks at a file on disk in 
> a temporary directory (located in ceph), determines if it is already in the 
> larger data store (also in ceph), and if not will create the necessary 
> directories, then move the file from temporary storage (technically, this is 
> a copy-and-unlink). File size can be anywhere from hundreds of bytes to 
> hundreds of megabytes, but sizes tend to average out somewhere around 5MB, if 
> I remember correctly. We process millions of new files per day.
>
> After clearing the cache OR restarting all mds daemons, the process moving 
> these files will spike up to about 5-6k files/minute (this process is 
> distributed across six systems and running about 120 total threads), and it 
> stays there for about an hour. Performance will start to trail off after 
> that, dropping pretty quickly to 2-3k/minute. We'll stay in that window for a 
> few hours, by which time we'll be down between 700-1k/minute.
>

Could you please run following command (for each active mds) when
operations are fast and when operations are slow

- for i in `seq 10`; do ceph daemon mds.xxx dump_historic_ops >
mds.xxx.$i; sleep 1; done

Then send the results to us

Regards
Yan, Zheng


> There are many other operations happening on this filesystem while this 
> operation is taking place, many read-only, but a few writers as well 
> (specifically, the processes that are dropping new files into the temporary 
> space for the above process to examine and handle).
>
> The copy-and-unlink process seems to be I/O bound, as adding more threads 
> does not change the performance. I understand that operation is somewhat 
> metadata intensive, as it is checking to see if a file exists, then checking 
> to see if the directories exist for where we need to move the file, creating 
> them if needed, then actually copying the file, and finally unlinking the 
> source file. That said, I still find it a little surprising that dropping the 
> cache seems to make performance so much better.
>
> jonathan
>
>
>>
>> --
>> Patrick Donnelly
>
>
>
> --
> Jonathan Woytek
> http://www.dryrose.com
> KB3HOZ
> PGP:  462C 5F50 144D 6B09 3B65  FCE8 C1DC DEC4 E8B6 AABC
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to