[jira] [Commented] (SOLR-17492) Introduce recommendations of WAYS of running Solr from small to massive

Arda (Jira) Sat, 30 Nov 2024 08:20:42 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-17492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902048#comment-17902048
 ]


Arda commented on SOLR-17492:
-----------------------------

Whether you're just getting started with Solr or looking to fine-tune an 
existing setup, these practical tips and real-world scenarios may help you get 
the most out of this powerful search platform.

*Best Practices for Using Solr*

*1.Run Solr as a Cluster for Better Performance*
Solr works best when deployed as a cluster. Start with at least three nodes for 
fault tolerance and scalability, and scale horizontally as your needs grow.
 * {*}Sharding and Replication{*}: Break your data into shards for parallel 
processing and use replicas for redundancy. A good starting point is two 
replicas per shard, but adjust this based on your workload.
 * {*}Optimize Indexing{*}: Carefully plan your schema to ensure efficient 
indexing and querying. Use *dynamic fields* and *copy fields* where appropriate 
to keep things flexible without overloading your system.
 * {*}Caching for Speed{*}: Solr provides powerful caching options like query, 
document, and filter caches. Use these for frequently accessed data to speed up 
query times significantly.
 * {*}Tune the JVM{*}: Since Solr is Java-based, JVM tuning is crucial. Adjust 
heap size to balance memory usage and garbage collection. Monitor GC logs and 
experiment with policies like G1GC or CMS for optimal performance.

*2. Always Use Solr in Cloud Mode*
For a robust, scalable setup, *Solr Cloud Mode* is the way to go. This setup 
requires {*}ZooKeeper{*}, which manages cluster coordination, leader election, 
and configuration.
 * {*}ZooKeeper’s Role{*}: ZooKeeper ensures your Solr cluster runs smoothly by 
handling shard placement, failover, and configuration changes dynamically.
 * {*}Backups and Security{*}:
 ** Always back up your Solr and ZooKeeper data regularly. Use Solr's built-in 
backup tools or external snapshot mechanisms for safety.
 ** Secure your cluster with {*}SSL/TLS{*}, and set up role-based access 
control, ideally with tools like {*}Apache Ranger{*}. If Ranger isn’t an 
option, manual permissions management works too.
 * {*}Monitoring is Essential{*}: Keeping an eye on your Solr cluster is 
crucial for ensuring smooth operations. A great place to start is the {*}Solr 
Web UI{*}, which provides a user-friendly interface to monitor metrics like 
query performance, index health, and cache usage. It's easy to use and perfect 
for quickly spotting any issues. For more advanced needs, you may integrate 
tools like *Prometheus* and *Grafana* for custom dashboards and alerting. 
However, I should mention that I don’t have direct experience with Prometheus 
or Grafana specifically when working with Solr.

h3. *Using Scenarios: Real-World Applications of Solr*

*1. Managing Solr for a Large Dataset*
I used open-source Solr as a search engine for a mobile app. Instead of 
interacting with Solr directly, I managed the setup via ZooKeeper APIs. Here’s 
what that looked like:
 * {*}Cluster Configuration{*}:

 * 
 ** The cluster handled over *100 TB of data* spread across {*}11 physical 
machines{*}, each running {*}16 Solr instances{*}.
 * {*}Sharding and Replication{*}:

 * 
 ** Data was stored in {*}shards{*}, with each shard having *two replicas* to 
ensure fault tolerance and load balancing.
 * {*}Data Storage{*}:

 * 
 ** Data was stored directly on the local file system, which was a great fit 
for this use case.
 * {*}Management Approach{*}:

 # 
 ** Instead of accessing Solr directly, I managed the system via ZooKeeper 
APIs. This approach, even with an {*}embedded ZooKeeper{*}, worked efficiently 
under heavy load.
 # *Using Solr with Cloudera and HDFS*
Another scenario involved deploying Solr in a *Cloudera ecosystem* with *HDFS* 
for storage. Here’s what worked and what didn’t:

 ** {*}Cluster Management{*}:
 *** ZooKeeper handled cluster coordination, while *Ranger* (and previously 
{*}Sentry{*}) managed permissions.
 ** {*}Challenges{*}:
 *** Occasionally, node failures caused {*}HDFS file locks{*}, which were 
difficult to resolve without downtime. These required manual fixes and a lot of 
patience!

If you’ve got questions or need help with something specific, just let me know. 
I’m happy to share more!

> Introduce recommendations of WAYS of running Solr from small to massive
> -----------------------------------------------------------------------
>
>                 Key: SOLR-17492
>                 URL: https://issues.apache.org/jira/browse/SOLR-17492
>             Project: Solr
>          Issue Type: Sub-task
>          Components: documentation
>            Reporter: Eric Pugh
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://solr.apache.org/guide/solr/latest/deployment-guide/installing-solr.html]
>  makes solrcloud sound like its for only crazy scale



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17492) Introduce recommendations of WAYS of running Solr from small to massive

Reply via email to