For me personally on any system that swap has been disabled makes a much better 
situation, as well as setting the jvm xms and xmx to the exact same value, 
being 31gb, NOT higher than that as more actually makes gc slower. Also yeah, 
solr on a network disk is going to be slow unless it’s on an ssd based San, 
otherwise you can just get one on Amazon for $100 and drop it in the machine 
and you will be way better off for next to no money.  The nice part about them 
is they are cheap and as long as you have redundancy in place already you don’t 
need to worry about reliability. Just buy a couple per solr server and keep 
them on hand and replicate the index back in and you’re done.  Another fun 
trick is to make sure the solr server itself, at least it’s log files, are on 
an ssd. 

> On Jul 23, 2021, at 10:14 AM, Shawn Heisey <apa...@elyograg.org> wrote:
> 
> On 7/22/2021 11:53 AM, Jon Morisi wrote:
> 
>> RE Shawn and Michael,
>> I am just looking for a way to speed it up.  Mike Drob had mentioned 
>> docvalues, which is why I was researching that route.
>> 
>> I am running my search tests from solr admin, no facets, no sorting.  I am 
>> using Dsolr.directoryFactory=HdfsDirectoryFactory
> 
> Getting good caching with HDFS is something I am not sure how to do.  I would 
> bet that you have to assign a whole bunch of memory to the Solr heap and then 
> allocate a lot of that to the HDFS client for caching purposes.
> 
> You can take a look at this wiki page I wrote, but keep in mind that it is 
> tailored for local disks, not HDFS:
> 
> https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems
> 
> Is there any way you can switch to local disks instead of HDFS? Solr tends to 
> perform badly with indexes on the network instead of local.  What are you 
> trying to achieve with your usage of HDFS?
> 
>> URL:
>> . /select?q=ptokens:8974561 AND ptokens:9844554 AND ptokens:8564484 AND 
>> ptokens:9846541&echoParams=all
>> 
>> Response once it ran (timeout on first attempt, waited 5min for re-try):
>> responseHeader    
>> zkConnected    true
>> status    0
>> QTime    2411
>> params    
>> q    "ptokens:243796009 AND ptokens:410512000 AND ptokens:410604004 AND 
>> ptokens:408729009"
>> df    "data"
>> rows    "10"
>> echoParams    "all"
> 
> What is the field definition for ptokens and what is the fieldType definition 
> for the type referenced in the field definition?  If this field is set up as 
> a numeric Point type, you're running into a known limitation -- single-value 
> lookups on Point fields are slow, and if the field cardinality is high, then 
> make that VERY slow.  The workaround would be to switch to either a String 
> type or a Trie type, and completely reindex.  Trie types are deprecated and 
> will eventually be removed from Solr.  Or you could turn the query into a 
> range query, and it would work much better -- Point types are EXCELLENT for 
> range queries.
> 
>> dashboard info:
>> System 0.16 0.13 0.14
>> 
>> Physical Memory 97.7%
>> 377.39 GB
>> 368.77 GB
>> 
>> Swap Space 4.7%
>> 4.00 GB
>> 193.25 MB
>> 
>> File Descriptor Count 0.2%
>> 128000
>> 226
>> 
>> JVM-Memory 22.7%
>> 15.33 GB
>> 15.33 GB
> 
> If disabling swap as Michael is suggesting DOES make performance better, I 
> think that would be an indication of some very strange system level problems. 
>  I don't expect it to change anything.
> 
> Thanks,
> Shawn
> 

Reply via email to