Ports to be opened for SolrCloud to connect to ZooKeeper

2024-11-26 Thread Mingchun Zhao
Hi, We are building a cluster environment on Azure using Solr 9.7.0 and ZooKeeper 3.9.2. Regarding communication from SolrCloud to ZooKeeper servers, aside from the client port 2181, are there any other ports that need to be allowed? In our previous environment with Solr 8.4.0 and ZooKeeper 3.5.7

Re: Sharing a post: "How to fork: Best practices and guide"

2024-11-26 Thread Dave
I used to fork my solr indexer across 64 cpu cores, memory consumption was my major issue so just threw ssds and ram at the issue, 500 gb index post a commit followed by an optimize later it worked fine. Obviously the last commit was heavy but it didn’t need to be real time so I had that advan

Sharing a post: "How to fork: Best practices and guide"

2024-11-26 Thread David Smiley
I discovered this well-written post from someone who has forked projects a number of times and wrote down his lessons learned. It might be interesting to some of you that have forks of Solr. The advice made sense from my experience as well. https://joaquimrocha.com/2024/09/22/how-to-fork/ Obvio

Re: Solr Langid backwards compatibility with "langid.whitelist" is borken?

2024-11-26 Thread Alex Z.
I have a PR ready with changes (locally). I am just waiting for my JIRA account to arrive. On Tue, Nov 26, 2024 at 2:37 PM Alex Z. wrote: > Hi Jan, > > Thank you. I just applied for the ASF JIRA account. I will raise a ticket > once my account is approved. I can try attempt a PR as well. > > Reg

Re: Solr Langid backwards compatibility with "langid.whitelist" is borken?

2024-11-26 Thread Alex Z.
Hi Jan, Thank you. I just applied for the ASF JIRA account. I will raise a ticket once my account is approved. I can try attempt a PR as well. Regards On Tue, Nov 26, 2024 at 1:43 PM Jan Høydahl wrote: > Hi, > > Thanks for finding this. Although I have not checked the code paths you > mention,

Re: RAMDirectoryFactory with Solrj 9.7.0

2024-11-26 Thread Gus Heck
IIRC the ByteBuffersDirectoryFactory is what new code should be using: https://issues.apache.org/jira/browse/SOLR-12861 On Tue, Nov 26, 2024 at 4:13 PM Péter Király wrote: > Dear all, > > I am developing an application that intensively use Apache Solr, that > among others makes library catalogue

Re: RAMDirectoryFactory with Solrj 9.7.0

2024-11-26 Thread Jan Høydahl
You might want to try org.apache.solr.core.MockDirectoryFactory in your tests when using embedded. See https://github.com/apache/solr/pull/2598 for a similar issue in the main tests. Jan > 26. nov. 2024 kl. 22:13 skrev Péter Király : > > Dear all, > > I am developing an application that inten

Re: Solr Langid backwards compatibility with "langid.whitelist" is borken?

2024-11-26 Thread Jan Høydahl
Hi, Thanks for finding this. Although I have not checked the code paths you mention, I think this warrants a JIRA issue and a bug fix. Would you lke to file a JIRA issue for us, and perhaps also attempt a GitHub Pull Request with a fix. Ideally the PR would add a unit test that fails due to the

RAMDirectoryFactory with Solrj 9.7.0

2024-11-26 Thread Péter Király
Dear all, I am developing an application that intensively use Apache Solr, that among others makes library catalogue records searchable. In order to test indexing features of the application I wrote some junit test that utilied EmbeddedSolr server. The intention is that these tests create im memor

RE: Solr Langid backwards compatibility with "langid.whitelist" is borken?

2024-11-26 Thread Alex Z.
Hello Solr Community, I’m seeking your feedback regarding an issue I’ve encountered when configuring the Solr Langid module, specifically when using the deprecated langid.whitelist property instead of Solr’s newer langid.allowlist property to define allowed language codes. As you are likely aware

Re: Solr in write-only mode?

2024-11-26 Thread Walter Underwood
Sorry, incompletely edited. Should be “I use moderate sized batches…” The two threads per CPU thing also works if you want to keep some CPU available for queries. So a 4 CPU machine being indexed with 4 threads will have roughly 2 CPUs available for queries. Very roughly. wunder > On Nov 26,

Re: Solr in write-only mode?

2024-11-26 Thread Walter Underwood
Use multiple threads to send batches. I use two moderate sized batches and two threads per CPU. You can tune it until you see near 100% CPU utilization. Why two client threads per CPU? Roughly, one batch being processed by the CPU and one batch in flight over the network, so it is ready to be p

Re: Solr in write-only mode?

2024-11-26 Thread ufuk yılmaz
Hello Noah I remember a trick but I didn’t try it myself before. Turn off all soft and hard commits and do a singular manual commit at the end.I don’t know if it can work for the whole 40 million documents but it might speed up indexing when done in large chunks. —ufuk — > On Nov 26, 20

Re: [WRONG SIGNATURE]Re: Solr Index Size Analysis Tool on cloudless installation

2024-11-26 Thread Josef Svoboda
Hi Jan, Thanks for the reply. Now I see I was not clear enough and I also forgot to attach the URL. So, I would like something that is shown at https://solr.apache.org/guide/8_2/collection-management.html#colstatus when you grep for "fieldsBySize". I would like to know which indexes (by na

Solr in write-only mode?

2024-11-26 Thread Noah Torp-Smith
Hello, We have a setup where we periodically index a solr “offline” and then copy the data folder to a storage location. When we then deploy our solrs to production, the containers then download that data folder to the right place in the file system before the solr server is started. After the

Re: Solr Index Size Analysis Tool on cloudless installation

2024-11-26 Thread Jan Høydahl
Hi Can you elaborate? What do you mean by aggregate? Sum of all local cores? Why uncompressed size? Stored fields are compressed on disk. I’m not aware of the tool you refer to, I only remember a spreadsheet we had long ago to estimate index size. The simplest tool is to measure size on disk.