I have a problem with my SolrCloud cluster which when I request a few
stored fields disk read rate caps at the maximum for a long period of
time, but when I request no fields response time is consistent with a
few seconds.
My cluster has 4 nodes. Total index size is 400GB per node. Each node
has 96GB ram, 24GB of it is allocated to Solr heap. All data is
persisted on SSD's.
These tests are done when no other reads are being made (iotop command
shows 0 read) but indexing is going on with 200kilobytes per second
being written to the disk.
When I send a query with rows=0 response time is consistent around 0.5 -
2 seconds.
But when I request rows=5000 and a single stored field (field type is
text_general with stored=true), response time jumps to a 3 - 10 minutes,
during which disk read is topped at 1000M/s (maximum my disks can do)
and stays at the top until request finishes. Document size is around
1-4KB and typical result set is 50-1000 docs. If I send a few requests
at the same time, it gets even worse and I start to get errors.
Why does Solr need to read hundreds of gigabytes of data to return a few
hundred kilobytes of stored fields?
I have been reading on how index and stored fields are organised to find
out if this is expected. If queries with rows=0 was slow too, I'd simply
say the index is too big for my machines.
Do you have any pointers for this issue?
--uyilmaz