Advice in order to optimise resource usage of a huge server
Hi, One of our customer have huge servers - Bar-metal - 64 CPU - 512 Gb RAM - 6x2Tb disk in RAID 6 (so 2Tb disk space available) I think the best way to optimize resources usage of these servers is to install several Solr instances. I imagine 2 scenarios to be tested according to data volumes, update rate, request volume, ... Do not configure disks in RAID 6 but, leave 6 standard volumes (more space disk, more I/O available) Install 3 or 6 solr instances each one using 1 ou 2 disk volumes Obviously, replicate shards and verify replicates of a shard are not located on the same physical server. What I am not sure is how MMapDirectory will work with several Solr instances. Will off heap memory correctly managed and shared between several Solr instances ? Thank you for your advice. Dominique
Re: Advice in order to optimise resource usage of a huge server
Why do you want to split it up at all? On Thu, Oct 6, 2022 at 3:58 AM Dominique Bejean wrote: > > Hi, > > One of our customer have huge servers > >- Bar-metal >- 64 CPU >- 512 Gb RAM >- 6x2Tb disk in RAID 6 (so 2Tb disk space available) > > > I think the best way to optimize resources usage of these servers is to > install several Solr instances. > > I imagine 2 scenarios to be tested according to data volumes, update rate, > request volume, ... > > Do not configure disks in RAID 6 but, leave 6 standard volumes (more space > disk, more I/O available) > Install 3 or 6 solr instances each one using 1 ou 2 disk volumes > > Obviously, replicate shards and verify replicates of a shard are not > located on the same physical server. > > What I am not sure is how MMapDirectory will work with several Solr > instances. Will off heap memory correctly managed and shared between > several Solr instances ? > > Thank you for your advice. > > Dominique
Re: Advice in order to optimise resource usage of a huge server
It depends... on your data, on your usage, etc. The best answers are obtained by testing various configurations, if possible by replaying captured query load from production. There is (for all java programs) an advantage to staying under 32 GB RAM, but without an idea of the number of machines you describe, the size of the corpus (docs and disk) and what your expected usage patterns are (both indexing and query) one can't say if you need more heap than that, either in one VM or across several VMs. To understand how "unallocated" memory not assigned to the java heap (or other processes) is utilized to improve search performance, this article is helpful: https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html -Gus On Thu, Oct 6, 2022 at 8:31 AM matthew sporleder wrote: > Why do you want to split it up at all? > > On Thu, Oct 6, 2022 at 3:58 AM Dominique Bejean > wrote: > > > > Hi, > > > > One of our customer have huge servers > > > >- Bar-metal > >- 64 CPU > >- 512 Gb RAM > >- 6x2Tb disk in RAID 6 (so 2Tb disk space available) > > > > > > I think the best way to optimize resources usage of these servers is to > > install several Solr instances. > > > > I imagine 2 scenarios to be tested according to data volumes, update > rate, > > request volume, ... > > > > Do not configure disks in RAID 6 but, leave 6 standard volumes (more > space > > disk, more I/O available) > > Install 3 or 6 solr instances each one using 1 ou 2 disk volumes > > > > Obviously, replicate shards and verify replicates of a shard are not > > located on the same physical server. > > > > What I am not sure is how MMapDirectory will work with several Solr > > instances. Will off heap memory correctly managed and shared between > > several Solr instances ? > > > > Thank you for your advice. > > > > Dominique > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
Re: Advice in order to optimise resource usage of a huge server
What would the iops look like? Deepak "The greatness of a nation can be judged by the way its animals are treated - Mahatma Gandhi" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool LinkedIn: www.linkedin.com/in/deicool "Plant a Tree, Go Green" Make In India : http://www.makeinindia.com/home On Thu, Oct 6, 2022 at 8:49 PM Gus Heck wrote: > It depends... on your data, on your usage, etc. The best answers are > obtained by testing various configurations, if possible by replaying > captured query load from production. There is (for all java programs) an > advantage to staying under 32 GB RAM, but without an idea of the number of > machines you describe, the size of the corpus (docs and disk) and what your > expected usage patterns are (both indexing and query) one can't say if you > need more heap than that, either in one VM or across several VMs. > > To understand how "unallocated" memory not assigned to the java heap (or > other processes) is utilized to improve search performance, this article is > helpful: > https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > -Gus > > On Thu, Oct 6, 2022 at 8:31 AM matthew sporleder > wrote: > > > Why do you want to split it up at all? > > > > On Thu, Oct 6, 2022 at 3:58 AM Dominique Bejean > > wrote: > > > > > > Hi, > > > > > > One of our customer have huge servers > > > > > >- Bar-metal > > >- 64 CPU > > >- 512 Gb RAM > > >- 6x2Tb disk in RAID 6 (so 2Tb disk space available) > > > > > > > > > I think the best way to optimize resources usage of these servers is to > > > install several Solr instances. > > > > > > I imagine 2 scenarios to be tested according to data volumes, update > > rate, > > > request volume, ... > > > > > > Do not configure disks in RAID 6 but, leave 6 standard volumes (more > > space > > > disk, more I/O available) > > > Install 3 or 6 solr instances each one using 1 ou 2 disk volumes > > > > > > Obviously, replicate shards and verify replicates of a shard are not > > > located on the same physical server. > > > > > > What I am not sure is how MMapDirectory will work with several Solr > > > instances. Will off heap memory correctly managed and shared between > > > several Solr instances ? > > > > > > Thank you for your advice. > > > > > > Dominique > > > > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play) >
Re: Advice in order to optimise resource usage of a huge server
On 2022-10-06 2:57 AM, Dominique Bejean wrote: Do not configure disks in RAID 6 but, leave 6 standard volumes (more space disk, more I/O available) If they're running linux: throw out the raid controller, replace with ZFS on 2 SSDs and 4 spinning rust drives. You're not going to have more i/o than your drives and bus can support, but with ZFS's 2-level read cache (RAM and SSD) you could get close to saturating the bus. In theory. You get hot-resizeable storage pool as a bonus. Dima
Re: Advice in order to optimise resource usage of a huge server
On 10/6/22 01:57, Dominique Bejean wrote: One of our customer have huge servers - Bar-metal - 64 CPU - 512 Gb RAM - 6x2Tb disk in RAID 6 (so 2Tb disk space available) I think the best way to optimize resources usage of these servers is to install several Solr instances. That is not what I would do. Do not configure disks in RAID 6 but, leave 6 standard volumes (more space disk, more I/O available) Install 3 or 6 solr instances each one using 1 ou 2 disk volumes RAID10 will get you the best performance. Six 2TB drives in RAID10 has 6TB of total space. The ONLY disadvantage that RAID10 has is that you pay for twice the usable storage. Disks are relatively cheap, though hard to get in quantity these days. I would recommend going with the largest stripe size your hardware can support. 1MB is typically where that maxes out. Any use of RAID5 or RAID6 has two major issues: 1) A serious performance problem that also affects reads if there are ANY writes happening. 2) If a disk fails, performance across the board is terrible. When the bad disk is replaced, performance is REALLY terrible as long as a rebuild is happening, and I have seen a RAID5/6 rebuild take 24 to 48 hours with 2TB disks on a busy array. It would take even longer with larger disks. What I am not sure is how MMapDirectory will work with several Solr instances. Will off heap memory correctly managed and shared between several Solr instances ? With symlinks or multiple mount points in the solr home, you can have a single instance handle indexes on multiple storage devices. One instance has less overhead, particularly in memory, than multiple instances. Off heap memory for the disk cache should function as expected with multiple instances or one instances. Thanks, Shawn
Re: Advice in order to optimise resource usage of a huge server
I know these machines. Sharding is kind of useless. Set the ssd tb drives up in fastest raid read available, 31 xms xmx, one solr instance. Buy back up ssd drives when you burn one out and it fails over to the master server. Multiple solr instances on one machine makes little sense unless they have different purposes like a ml instance and a text highlighting instance but even then you get no performance improvement > On Oct 6, 2022, at 12:21 PM, Shawn Heisey wrote: > > On 10/6/22 01:57, Dominique Bejean wrote: >> One of our customer have huge servers >> >>- Bar-metal >>- 64 CPU >>- 512 Gb RAM >>- 6x2Tb disk in RAID 6 (so 2Tb disk space available) >> >> >> I think the best way to optimize resources usage of these servers is to >> install several Solr instances. > > That is not what I would do. > >> Do not configure disks in RAID 6 but, leave 6 standard volumes (more space >> disk, more I/O available) >> Install 3 or 6 solr instances each one using 1 ou 2 disk volumes > > RAID10 will get you the best performance. Six 2TB drives in RAID10 has 6TB > of total space. The ONLY disadvantage that RAID10 has is that you pay for > twice the usable storage. Disks are relatively cheap, though hard to get in > quantity these days. I would recommend going with the largest stripe size > your hardware can support. 1MB is typically where that maxes out. > > Any use of RAID5 or RAID6 has two major issues: 1) A serious performance > problem that also affects reads if there are ANY writes happening. 2) If a > disk fails, performance across the board is terrible. When the bad disk is > replaced, performance is REALLY terrible as long as a rebuild is happening, > and I have seen a RAID5/6 rebuild take 24 to 48 hours with 2TB disks on a > busy array. It would take even longer with larger disks. > >> What I am not sure is how MMapDirectory will work with several Solr >> instances. Will off heap memory correctly managed and shared between >> several Solr instances ? > > With symlinks or multiple mount points in the solr home, you can have a > single instance handle indexes on multiple storage devices. One instance has > less overhead, particularly in memory, than multiple instances. Off heap > memory for the disk cache should function as expected with multiple instances > or one instances. > > Thanks, > Shawn >
Re: Advice in order to optimise resource usage of a huge server
We have kept a 72 CPU machine busy with a single Solr process, so I doubt that multiple processes are needed. The big question is the size of the index. If it is too big to fit in RAM (OS file buffers), then the system is IO bound and CPU doesn’t really matter. Everything will depend on the speed and capacity of the disk system. If the index does fit in RAM, then you should be fine. You may want to spend some effort on reducing index size if it is near the limit. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 6, 2022, at 8:18 AM, Gus Heck wrote: > > It depends... on your data, on your usage, etc. The best answers are > obtained by testing various configurations, if possible by replaying > captured query load from production. There is (for all java programs) an > advantage to staying under 32 GB RAM, but without an idea of the number of > machines you describe, the size of the corpus (docs and disk) and what your > expected usage patterns are (both indexing and query) one can't say if you > need more heap than that, either in one VM or across several VMs. > > To understand how "unallocated" memory not assigned to the java heap (or > other processes) is utilized to improve search performance, this article is > helpful: > https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > -Gus > > On Thu, Oct 6, 2022 at 8:31 AM matthew sporleder > wrote: > >> Why do you want to split it up at all? >> >> On Thu, Oct 6, 2022 at 3:58 AM Dominique Bejean >> wrote: >>> >>> Hi, >>> >>> One of our customer have huge servers >>> >>> - Bar-metal >>> - 64 CPU >>> - 512 Gb RAM >>> - 6x2Tb disk in RAID 6 (so 2Tb disk space available) >>> >>> >>> I think the best way to optimize resources usage of these servers is to >>> install several Solr instances. >>> >>> I imagine 2 scenarios to be tested according to data volumes, update >> rate, >>> request volume, ... >>> >>> Do not configure disks in RAID 6 but, leave 6 standard volumes (more >> space >>> disk, more I/O available) >>> Install 3 or 6 solr instances each one using 1 ou 2 disk volumes >>> >>> Obviously, replicate shards and verify replicates of a shard are not >>> located on the same physical server. >>> >>> What I am not sure is how MMapDirectory will work with several Solr >>> instances. Will off heap memory correctly managed and shared between >>> several Solr instances ? >>> >>> Thank you for your advice. >>> >>> Dominique >> > > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play)
Utilizing the Script Update Processor
The documentation here: https://solr.apache.org/guide/solr/latest/configuration-guide/script-update-processor.html#javascript Provides an example using the Script Update processor. When accessing the script, the example includes reference to it in the parameters. https://solr.apache.org/guide/solr/latest/configuration-guide/script-update-processor.html#try-it-out My question is, is this the only way to utilize the processor? Do you always have to include it in the parameters, or can you configure the handler in solarconfig.xml such that the script is always called without the added parameter? If so, how would that be configured in solarconfig.xml? Thank you, -- "This message is intended for the use of the person or entity to which it is addressed and may contain information that is confidential or privileged, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this information is strictly prohibited. If you have received this message by error, please notify us immediately and destroy the related message."
Re: Advice in order to optimise resource usage of a huge server
Hi, Thank you all for your responses. I will try to answer your questions in one single message. We are starting to investigate performance issues with a new customer. There are several bad practices (commit, sharding, replicas count and types, heap size, ...), that can explain these issues and we will work on it in the next few days. I agree we need to better understand specific usage and make some tests after fixing the bad practices. Anyway, one of the specific aspects is these huge servers, so I am trying to see what is the best way to use all these ressources. * Why do you want to split it up at all? Because one of the bad practices is a huge heap size (80 Gb). I am pretty sure this heap size is not required and anyway it doesn't respect the 31Gb limit. After determining the best heap size, if this size is near 31Gb, I imagine it is better to have several Solr JVMs with less heap size. For instance 2 Solr JVMs with 20 Gb each or 4 Solr JVMs with 10 Gb each. According to Walter's response and Mattew's question, that doesn't seem like a good idea. * MMapDirectory JVM sharing This point is the main reason for my message. If several Solr JVMs are running on one server, will MMapDirectory work fine or will the JVMs fight with each other in order to use off heap memory ? According to Shawn's response it should work fine. What would the iops look like? Not monitored yet. Storage configuration is the second point that I would like to investigate in order to better share disk resources. Instead have one single RAID 6 volume, isn't it better to have one distinct not RAID volume per Solr node (if multiple Solr nodes are running on the server) or multiple not RAID volumes use by a single Solr JVM (if only one Solr node is running on the server) ? I note the various suggestions in your answers (ZFS, RAID 10, ...) Thank you Dima and Shawn Regards Dominique Le jeu. 6 oct. 2022 à 09:57, Dominique Bejean a écrit : > Hi, > > One of our customer have huge servers > >- Bar-metal >- 64 CPU >- 512 Gb RAM >- 6x2Tb disk in RAID 6 (so 2Tb disk space available) > > > I think the best way to optimize resources usage of these servers is to > install several Solr instances. > > I imagine 2 scenarios to be tested according to data volumes, update rate, > request volume, ... > > Do not configure disks in RAID 6 but, leave 6 standard volumes (more space > disk, more I/O available) > Install 3 or 6 solr instances each one using 1 ou 2 disk volumes > > Obviously, replicate shards and verify replicates of a shard are not > located on the same physical server. > > What I am not sure is how MMapDirectory will work with several Solr > instances. Will off heap memory correctly managed and shared between > several Solr instances ? > > Thank you for your advice. > > Dominique > > > > >
Re: Solr Admin Connection reset when connecting to Zookeeper
Hi All, I did a further search and found a known bug (now resolved) of Solr https://issues.apache.org/jira/browse/SOLR-15849 Cursory reading of the notes suggest that this may be the root cause of the below mentioned issue. Can anyone confirm ? We are planning to upgrade our Solr environment from 8.11.1 to 8.11.2 Wanted to check with someone on the know before we proceed with the work. Regards Vishal From: Vishal Shanbhag Date: September 26, 2022 at 4:07:48 PM IST To: users@solr.apache.org Subject: Solr Admin Connection reset when connecting to Zookeeper All, I am having trouble setting up a Solr + Zookeeper set up with nodes present in different data centers. I have raised a question on Stackoverflow. https://stackoverflow.com/questions/73852766/solr-admin-connection-reset-when-connecting-to-zookeeper I need help with troubleshooting. Appreciate any ideas. Regards Vishal YuDo:[-242587.3.b5f43e]
coustom sharding
Hello , dear Solr team. I hope you are doing well. Would you help me with little question. I need separate data by shards when I do commit with SolrJ. For example fields of my collection: phoneNo— +9985612525 , country- UZ phoneNo — +3809523636 , country- UKR all UKR goes to Shard 1 of my collection phones all others goes to Shard 2. I can separate them logically with Java , but how can I send them to exact shard with solrJ commit? CloudSolrClient solr = new CloudHttp2SolrClient.Builder(urls).build(); solr.setDefaultCollection("phones"); solr.addBeans(phoneList); solr.commit(); I am sorry. I was looking for an answer in internet network several days before asking you. I hope for your help. Thank you so much. -- Why do I need this. Because of adding data to collection with default sharding and then doing facet search we are getting quantity data from only one shard. We don’t know bag is it or not. That’s why decide to separate data by shards. Thanks. -- ___ Kind regadrs Dmitry Prus Skype: live:31bd8a868323415e e-mail: pru...@inbox.ru
SOLR internal error
Hello Community Members, I am doing a query in SOLR and solr is throwing error as given below: "error":{ "msg":"0", "trace":"java.lang.ArrayIndexOutOfBoundsException: 0\n\tat org.apache.lucene.util.QueryBuilder.newSynonymQuery(QueryBuilder.java:653)\n\tat org.apache.solr.parser.SolrQueryParserBase.newSynonymQuery(SolrQueryParserBase.java:617)\n\tat org.apache.lucene.util.QueryBuilder.analyzeGraphBoolean(QueryBuilder.java:533)\n\tat org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:320)\n\tat org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:240)\n\tat org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:524)\n\tat org.apache.solr.parser.QueryParser.newFieldQuery(QueryParser.java:62)\n\tat org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:1072)\n\tat org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:806)\n\tat org.apache.solr.parser.QueryParser.Term(QueryParser.java:421)\n\tat org.apache.solr.parser.QueryParser.Clause(QueryParser.java:278)\n\tat org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)\n\tat org.apache.solr.parser.QueryParser.Clause(QueryParser.java:282)\n\tat org.apache.solr.parser.QueryParser.Query(QueryParser.java:222)\n\tat org.apache.solr.parser.QueryParser.Clause(QueryParser.java:282)\n\tat org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)\n\tat org.apache.solr.parser.QueryParser.Clause(QueryParser.java:282)\n\tat org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)\n\tat org.apache.solr.parser.QueryParser.Clause(QueryParser.java:282)\n\tat org.apache.solr.parser.QueryParser.Query(QueryParser.java:222)\n\tat org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:131)\n\tat org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:260)\n\tat org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:49)\n\tat org.apache.solr.search.QParser.getQuery(QParser.java:173)\n\tat org.apache.solr.search.ExtendedDismaxQParser.getBoostQueries(ExtendedDismaxQParser.java:566)\n\tat org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:187)\n\tat org.apache.solr.search.QParser.getQuery(QParser.java:173)\n\tat org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:159)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272)\n\tat com.knovel.solr.util.ModifiedHypenKnovelSearchHandler.handleRequestBody(ModifiedHypenKnovelSearchHandler.java:47)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat com.knovel.solr.util.ModifiedHypenKnovelSearchHandler.handleRequest(ModifiedHypenKnovelSearchHandler.java:462)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)\n\tat org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:397)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handler
Re: SOLR removal
Hello dear members! I registered to receive solr community emails because we were using solr in my previous company. Now, i am no longer using solr in my assessments. COULD ADMIN PLEASE REMOVE MY EMAIL FROM THE MAILING LIST? MY EMAIL: leoncehavugim...@gmail.com Thank you in advance! Kind regards, Léonce On Fri, 7 Oct 2022, 00:04 Biswas, Akash (ELS-BLR), wrote: > Hello Community Members, > > > I am doing a query in SOLR and solr is throwing error as given below: > > "error":{ > "msg":"0", > "trace":"java.lang.ArrayIndexOutOfBoundsException: 0\n\tat > org.apache.lucene.util.QueryBuilder.newSynonymQuery(QueryBuilder.java:653)\n\tat > org.apache.solr.parser.SolrQueryParserBase.newSynonymQuery(SolrQueryParserBase.java:617)\n\tat > org.apache.lucene.util.QueryBuilder.analyzeGraphBoolean(QueryBuilder.java:533)\n\tat > org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:320)\n\tat > org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:240)\n\tat > org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:524)\n\tat > org.apache.solr.parser.QueryParser.newFieldQuery(QueryParser.java:62)\n\tat > org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:1072)\n\tat > org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:806)\n\tat > org.apache.solr.parser.QueryParser.Term(QueryParser.java:421)\n\tat > org.apache.solr.parser.QueryParser.Clause(QueryParser.java:278)\n\tat > org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)\n\tat > org.apache.solr.parser.QueryParser.Clause(QueryParser.java:282)\n\tat > org.apache.solr.parser.QueryParser.Query(QueryParser.java:222)\n\tat > org.apache.solr.parser.QueryParser.Clause(QueryParser.java:282)\n\tat > org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)\n\tat > org.apache.solr.parser.QueryParser.Clause(QueryParser.java:282)\n\tat > org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)\n\tat > org.apache.solr.parser.QueryParser.Clause(QueryParser.java:282)\n\tat > org.apache.solr.parser.QueryParser.Query(QueryParser.java:222)\n\tat > org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:131)\n\tat > org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:260)\n\tat > org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:49)\n\tat > org.apache.solr.search.QParser.getQuery(QParser.java:173)\n\tat > org.apache.solr.search.ExtendedDismaxQParser.getBoostQueries(ExtendedDismaxQParser.java:566)\n\tat > org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:187)\n\tat > org.apache.solr.search.QParser.getQuery(QParser.java:173)\n\tat > org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:159)\n\tat > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272)\n\tat > com.knovel.solr.util.ModifiedHypenKnovelSearchHandler.handleRequestBody(ModifiedHypenKnovelSearchHandler.java:47)\n\tat > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat > com.knovel.solr.util.ModifiedHypenKnovelSearchHandler.handleRequest(ModifiedHypenKnovelSearchHandler.java:462)\n\tat > org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)\n\tat > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:397)\n\tat > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)\n\tat > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat > org.eclipse.jetty.server.h
Re: Utilizing the Script Update Processor
I attempted to add the scripting module as described here by adding SOLR_MODULES=scripting in solr.xml but got the following error message when I attempted to start: HTTP ERROR 404 javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down. URI:/solr/ STATUS: 404 MESSAGE:javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down. SERVLET:default CAUSED BY: javax.servlet.ServletException: javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down. CAUSED BY: javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down. Caused by: javax.servlet.ServletException: javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down. at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:162) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322) at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:516) at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:400) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:645) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:392) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) at java.lang.Thread.run(Unknown Source) Caused by: javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down. at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:376) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191) at org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146) ... 20 more _
Re: Advice in order to optimise resource usage of a huge server
On 2022-10-06 4:54 PM, Dominique Bejean wrote: Storage configuration is the second point that I would like to investigate in order to better share disk resources. Instead have one single RAID 6 volume, isn't it better to have one distinct not RAID volume per Solr node (if multiple Solr nodes are running on the server) or multiple not RAID volumes use by a single Solr JVM (if only one Solr node is running on the server) ? The best option is to have the indexes in RAM cache. The 2nd best option is the 2-level cache w/ RAM + SSD -- that's what you get with ZFS, and you can use the cheaper HDDs for primary storage. The next one is all SSDs -- in that case RAID-1(0) may give you better read performance than a dedicated drive, but probably not enough to notice. There's very little point in going RAID-5 or 6 on SSDs. In terms of performance RAID5/6 on HDDs is likely the worst option, and a single RAID6 volume is also the works option in terms of flexibility and maintenance. If your customer doesn't have money to fill those slots with SSDs, I'd probably go with one small SSD for system + swap, a 4-disk RAID-10, and a hot spare for it. Dima
Re: Advice in order to optimise resource usage of a huge server
Thank you Dima, Updates are highly multi-threaded batch processes at any time. We won't have all index in RAM cache Disks are SSD Dominique Le ven. 7 oct. 2022 à 00:28, dmitri maziuk a écrit : > On 2022-10-06 4:54 PM, Dominique Bejean wrote: > > > Storage configuration is the second point that I would like to > investigate > > in order to better share disk resources. > > Instead have one single RAID 6 volume, isn't it better to have one > distinct > > not RAID volume per Solr node (if multiple Solr nodes are running on the > > server) or multiple not RAID volumes use by a single Solr JVM (if only > one > > Solr node is running on the server) ? > > The best option is to have the indexes in RAM cache. The 2nd best option > is the 2-level cache w/ RAM + SSD -- that's what you get with ZFS, and > you can use the cheaper HDDs for primary storage. The next one is all > SSDs -- in that case RAID-1(0) may give you better read performance than > a dedicated drive, but probably not enough to notice. There's very > little point in going RAID-5 or 6 on SSDs. > > In terms of performance RAID5/6 on HDDs is likely the worst option, and > a single RAID6 volume is also the works option in terms of flexibility > and maintenance. If your customer doesn't have money to fill those slots > with SSDs, I'd probably go with one small SSD for system + swap, a > 4-disk RAID-10, and a hot spare for it. > > Dima > >
Re: Advice in order to optimise resource usage of a huge server
You should never index directly into your query servers by the way. Index to the indexing server and replicate out to you query servers and tune each as needed > On Oct 6, 2022, at 6:52 PM, Dominique Bejean > wrote: > > Thank you Dima, > > Updates are highly multi-threaded batch processes at any time. > We won't have all index in RAM cache > Disks are SSD > > Dominique > > >> Le ven. 7 oct. 2022 à 00:28, dmitri maziuk a >> écrit : >> >>> On 2022-10-06 4:54 PM, Dominique Bejean wrote: >>> >>> Storage configuration is the second point that I would like to >> investigate >>> in order to better share disk resources. >>> Instead have one single RAID 6 volume, isn't it better to have one >> distinct >>> not RAID volume per Solr node (if multiple Solr nodes are running on the >>> server) or multiple not RAID volumes use by a single Solr JVM (if only >> one >>> Solr node is running on the server) ? >> >> The best option is to have the indexes in RAM cache. The 2nd best option >> is the 2-level cache w/ RAM + SSD -- that's what you get with ZFS, and >> you can use the cheaper HDDs for primary storage. The next one is all >> SSDs -- in that case RAID-1(0) may give you better read performance than >> a dedicated drive, but probably not enough to notice. There's very >> little point in going RAID-5 or 6 on SSDs. >> >> In terms of performance RAID5/6 on HDDs is likely the worst option, and >> a single RAID6 volume is also the works option in terms of flexibility >> and maintenance. If your customer doesn't have money to fill those slots >> with SSDs, I'd probably go with one small SSD for system + swap, a >> 4-disk RAID-10, and a hot spare for it. >> >> Dima >> >>
Re: Advice in order to optimise resource usage of a huge server
On 2022-10-06 5:52 PM, Dominique Bejean wrote: Thank you Dima, Updates are highly multi-threaded batch processes at any time. We won't have all index in RAM cache Disks are SSD You'd have to benchmark, pref. with you real jobs, on RAID-10 (as per my previous e-mail) vs JBOD. I suspect you won't see much practical difference, but who knows. Dima
Re: Advice in order to optimise resource usage of a huge server
Run a GC analyzer on that JVM. I cannot imagine that they need 80 GB of heap. I’ve never run with more than 16 GB, even for a collection with 70 million documents. Look at the amount of heap used after full collections. Add a safety factor to that, then use that heap size. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 6, 2022, at 2:54 PM, Dominique Bejean > wrote: > > Hi, > > Thank you all for your responses. I will try to answer your questions in > one single message. > > We are starting to investigate performance issues with a new customer. > There are several bad practices (commit, sharding, replicas count and > types, heap size, ...), that can explain these issues and we will work on > it in the next few days. I agree we need to better understand specific > usage and make some tests after fixing the bad practices. > > Anyway, one of the specific aspects is these huge servers, so I am trying > to see what is the best way to use all these ressources. > > > * Why do you want to split it up at all? > > Because one of the bad practices is a huge heap size (80 Gb). I am pretty > sure this heap size is not required and anyway it doesn't respect the 31Gb > limit. After determining the best heap size, if this size is near 31Gb, I > imagine it is better to have several Solr JVMs with less heap size. For > instance 2 Solr JVMs with 20 Gb each or 4 Solr JVMs with 10 Gb each. > > According to Walter's response and Mattew's question, that doesn't seem > like a good idea. > > > * MMapDirectory JVM sharing > > This point is the main reason for my message. If several Solr JVMs are > running on one server, will MMapDirectory work fine or will the JVMs fight > with each other in order to use off heap memory ? > > According to Shawn's response it should work fine. > > > What would the iops look like? > > Not monitored yet. > Storage configuration is the second point that I would like to investigate > in order to better share disk resources. > Instead have one single RAID 6 volume, isn't it better to have one distinct > not RAID volume per Solr node (if multiple Solr nodes are running on the > server) or multiple not RAID volumes use by a single Solr JVM (if only one > Solr node is running on the server) ? > > I note the various suggestions in your answers (ZFS, RAID 10, ...) > > Thank you Dima and Shawn > > > Regards > > Dominique > > Le jeu. 6 oct. 2022 à 09:57, Dominique Bejean a > écrit : > >> Hi, >> >> One of our customer have huge servers >> >> - Bar-metal >> - 64 CPU >> - 512 Gb RAM >> - 6x2Tb disk in RAID 6 (so 2Tb disk space available) >> >> >> I think the best way to optimize resources usage of these servers is to >> install several Solr instances. >> >> I imagine 2 scenarios to be tested according to data volumes, update rate, >> request volume, ... >> >> Do not configure disks in RAID 6 but, leave 6 standard volumes (more space >> disk, more I/O available) >> Install 3 or 6 solr instances each one using 1 ou 2 disk volumes >> >> Obviously, replicate shards and verify replicates of a shard are not >> located on the same physical server. >> >> What I am not sure is how MMapDirectory will work with several Solr >> instances. Will off heap memory correctly managed and shared between >> several Solr instances ? >> >> Thank you for your advice. >> >> Dominique >> >> >> >> >>
Re: Advice in order to optimise resource usage of a huge server
A reason for sharding on a single server is the 2.1b max docs per core limitation. On Thu, Oct 6, 2022, 12:51 PM Dave wrote: > I know these machines. Sharding is kind of useless. Set the ssd tb drives > up in fastest raid read available, 31 xms xmx, one solr instance. Buy back > up ssd drives when you burn one out and it fails over to the master server. > Multiple solr instances on one machine makes little sense unless they have > different purposes like a ml instance and a text highlighting instance but > even then you get no performance improvement > > > > On Oct 6, 2022, at 12:21 PM, Shawn Heisey wrote: > > > > On 10/6/22 01:57, Dominique Bejean wrote: > >> One of our customer have huge servers > >> > >>- Bar-metal > >>- 64 CPU > >>- 512 Gb RAM > >>- 6x2Tb disk in RAID 6 (so 2Tb disk space available) > >> > >> > >> I think the best way to optimize resources usage of these servers is to > >> install several Solr instances. > > > > That is not what I would do. > > > >> Do not configure disks in RAID 6 but, leave 6 standard volumes (more > space > >> disk, more I/O available) > >> Install 3 or 6 solr instances each one using 1 ou 2 disk volumes > > > > RAID10 will get you the best performance. Six 2TB drives in RAID10 has > 6TB of total space. The ONLY disadvantage that RAID10 has is that you pay > for twice the usable storage. Disks are relatively cheap, though hard to > get in quantity these days. I would recommend going with the largest > stripe size your hardware can support. 1MB is typically where that maxes > out. > > > > Any use of RAID5 or RAID6 has two major issues: 1) A serious > performance problem that also affects reads if there are ANY writes > happening. 2) If a disk fails, performance across the board is terrible. > When the bad disk is replaced, performance is REALLY terrible as long as a > rebuild is happening, and I have seen a RAID5/6 rebuild take 24 to 48 hours > with 2TB disks on a busy array. It would take even longer with larger > disks. > > > >> What I am not sure is how MMapDirectory will work with several Solr > >> instances. Will off heap memory correctly managed and shared between > >> several Solr instances ? > > > > With symlinks or multiple mount points in the solr home, you can have a > single instance handle indexes on multiple storage devices. One instance > has less overhead, particularly in memory, than multiple instances. Off > heap memory for the disk cache should function as expected with multiple > instances or one instances. > > > > Thanks, > > Shawn > > >
Re: Advice in order to optimise resource usage of a huge server
Hi Dave, Are you suggesting to use historical Solr master/slave architecture ? In Sorlcloud / SolrJ architecture this can be achieved by creating only TLOG replicas then FORCELEADER located on a specific server (then indexing server) and search only on TLOG replicas with the parameter "shards.preference=replica.type:TLOG". Is this what you are suggesting ? Regards Dominique Le ven. 7 oct. 2022 à 00:59, Dave a écrit : > You should never index directly into your query servers by the way. Index > to the indexing server and replicate out to you query servers and tune each > as needed > > > On Oct 6, 2022, at 6:52 PM, Dominique Bejean > wrote: > > > > Thank you Dima, > > > > Updates are highly multi-threaded batch processes at any time. > > We won't have all index in RAM cache > > Disks are SSD > > > > Dominique > > > > > >> Le ven. 7 oct. 2022 à 00:28, dmitri maziuk a > >> écrit : > >> > >>> On 2022-10-06 4:54 PM, Dominique Bejean wrote: > >>> > >>> Storage configuration is the second point that I would like to > >> investigate > >>> in order to better share disk resources. > >>> Instead have one single RAID 6 volume, isn't it better to have one > >> distinct > >>> not RAID volume per Solr node (if multiple Solr nodes are running on > the > >>> server) or multiple not RAID volumes use by a single Solr JVM (if only > >> one > >>> Solr node is running on the server) ? > >> > >> The best option is to have the indexes in RAM cache. The 2nd best option > >> is the 2-level cache w/ RAM + SSD -- that's what you get with ZFS, and > >> you can use the cheaper HDDs for primary storage. The next one is all > >> SSDs -- in that case RAID-1(0) may give you better read performance than > >> a dedicated drive, but probably not enough to notice. There's very > >> little point in going RAID-5 or 6 on SSDs. > >> > >> In terms of performance RAID5/6 on HDDs is likely the worst option, and > >> a single RAID6 volume is also the works option in terms of flexibility > >> and maintenance. If your customer doesn't have money to fill those slots > >> with SSDs, I'd probably go with one small SSD for system + swap, a > >> 4-disk RAID-10, and a hot spare for it. > >> > >> Dima > >> > >> >