Hi, I've been playing with KIO speed improvements for quite a while now and found some very interesting issues with KIO in combination with my SSD drive: "Samsung SSD 840 PRO Series".
My testcase is one directory filled with 500.000 files to test directory listing speed. Please don't comment on this big number. I'm well aware that it's insane! However, it shows bottlenecks that are there but don't become visible with small sized folders like 1000 entries. Some numbers. Listing a directory using just C++ and Qt (so QT_STATBUF, QT_READDIR and QT_LSTAT -- those are just platform defines. Nothing custom is done there) 500.000 files: ~700ms Executing the same test using KIO::listDir: 500.000 files: ~4.500ms Note: my recent incremental cleanups already made it about a second faster then it used to be. It used to be around 5.500ms. However, the current speed is still ~7x slower then using raw C++ and Qt. My goal is to get it within 2x slower then raw C++/Qt. Now you could say that KIO is using a multiprocess approach thus it can be expected to be a bit slower. That is exactly what i expect as well, but not a 7x difference. So i did another benchmark. Testing how long it takes the kio slave itself (file slave) to list the directory without sending it back to the client. That gives me an accurate timing for listing a folder inside the slave. Be aware, for that i also disabled batching so it really is only doing a listdir and UDSEntry creation. The numbers: 500.000 files: ~2.700ms - the patch i used to measure this: http://p.sc2.nl/p4cs6a26o And that is very surprising to me. This number means that the CPU all the stuff we do to an entry is enough to slow down the IO speed. And my CPU isn't slow by any means, yet there is enough time spend on the CPU to slow down my SSD performance. Whatever you think of this insane optimization, i think that the CPU should never be the cause of slowing the SSD down. So i want to fix this. The CPU should not slow the SSD down, but this can't be easily done. In fact, my SSD seems so insanely fast that i can't even do this on the same thread without slowing down the SSD. My theoretical solution (and i really hope to get feedback on this) is to introduce a worker thread in slavebase.cpp. The slave should be reduced in functionality. It should not make a UDSEntry anymore. Instead, it should send the STAT object to slavebase.cpp (no function for it yet). SlaveBase then puts it in a QVector object which would be shared with the actual worker thread. The thread would only read from the QVector which takes away the need to care for thread safety. The workerthread should then process the entries (in a new short lived vector i guess), create UDSEntry objects and send them over the socket connection to the client. This way the IO time can be as fast as possible where the remaining time is spend in a worker thread. The only real issue i see here is how to handle the current SlaveBase::send function. That is executed in the same thread as the slave thus will still block IO time while sending a batch. I think i need to move this in the worker thread as well, right? I'm looking forward to your feedback. Cheers, Mark _______________________________________________ Kde-frameworks-devel mailing list Kde-frameworks-devel@kde.org https://mail.kde.org/mailman/listinfo/kde-frameworks-devel