I was in a similar situation, our index was way too big compared to the RAM on the nodes. I was seeing constant %100 disk read, query timeouts and dead nodes because the default directory reader (nrtcaching) was trying to cache a different part of the index in memory for every other request but queries were rarely against the same collection. Our disks could do 1gb per second read but a single simple query would cause 40 seconds of constant reading to return just a few documents. Time to time solr went completely unresponsive until it could finish doing disk reads for previous requests.
I switched to NIOFS reader and disk problem was solved. Just don’t expect Solr to be super fast as it was with a small index which could fit in RAM. -ufuk Sent from Mail for Windows From: Jayesh Shende Sent: Friday, August 4, 2023 8:44 PM To: users@solr.apache.org Subject: Re: Changing Solr collection's DirectoryFactory Hi Shawn, Thanks for responding so quickly. The server box is shared by multiple Solr nodes, each node is having more than 100gb of disk usage (~2-4 replicas of different collections on one Solr). The NRTCachingDirectoryFactory is trying to cache as much segments as possible into the memory, but the queries are for different collections and are varying (less of repetitive query terms), so thinking this cached segments are not actually very useful here, and RAM (apart from JVM assigned) is not enough to cache even 10% of the index, for each Solr node running. Also it is an existing Solr, trying to improve performance, and as we know NIO is better than IO in java and I can increase IOPS and throughput for disk, so was gathering how will it affect? Before changing anything will try removing the explicit configuration for directoryFactory to see how it works/how it picks the best for underlying OS. *As this should not affect the underlying indexed data. for the collectios. Thanks. Jayesh Shende On Fri, 4 Aug 2023, 22:35 Shawn Heisey, <apa...@elyograg.org> wrote: > On 8/4/23 09:56, Jayesh Shende wrote: > > Using: Solr 8.11.2 with rhel9 > > > > Currently using "solr.NRTCachingDirectoryFactory" for a collection, > > the collection has grown big in size, but don't want to add more RAM to > > machine(AWS), > > I can increase IOPS and througput for data volume. > > > > Was thinking of using "solr.NIOFSDirectoryFactory", > > but wanted to know, how will it impact to existing collection? > > May be it is just a way to read index files, but to be sure, will it > > affect my existing indexed data? > > It's generally not a good idea to explicitly configure the directory > factory. That should only be done in very unusual circumstances. Your > situation probably does not qualify. > > Remove any config for that and let Solr/Lucene pick the class that's > best for the environment. It will probably choose > NRTCachingDirectoryFactory. If a better option becomes available in a > newer Solr version, it will most likely be automatically chosen as long > as the value isn't explicitly configured. > > Looking at the source, I cannot tell for sure whether NOIFS uses mmap, > but I suspect it does not. For nearly all use cases, you want a > directory implementation that uses mmap, which the NRTCaching > implementation does. > > Changing the directory factory is very unlikely to cause any problems > with the existing index. But I am curious why you want to change > that... what have you encountered and why do you think you should go > with a non-default class? > > If you have enough memory installed, the disk speed will have very > little impact on performance. Disk performance only becomes important > in situations where you do not have enough spare memory for effective > disk caching. Memory is faster than disk, even if the disk is extremely > fast SSD. > > A directory implementation that uses mmap will be the fastest option. > > Thanks, > Shawn >