[ https://issues.apache.org/jira/browse/SOLR-17492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17902048#comment-17902048 ]
Arda commented on SOLR-17492: ----------------------------- Whether you're just getting started with Solr or looking to fine-tune an existing setup, these practical tips and real-world scenarios may help you get the most out of this powerful search platform. *Best Practices for Using Solr* *1.Run Solr as a Cluster for Better Performance* Solr works best when deployed as a cluster. Start with at least three nodes for fault tolerance and scalability, and scale horizontally as your needs grow. * {*}Sharding and Replication{*}: Break your data into shards for parallel processing and use replicas for redundancy. A good starting point is two replicas per shard, but adjust this based on your workload. * {*}Optimize Indexing{*}: Carefully plan your schema to ensure efficient indexing and querying. Use *dynamic fields* and *copy fields* where appropriate to keep things flexible without overloading your system. * {*}Caching for Speed{*}: Solr provides powerful caching options like query, document, and filter caches. Use these for frequently accessed data to speed up query times significantly. * {*}Tune the JVM{*}: Since Solr is Java-based, JVM tuning is crucial. Adjust heap size to balance memory usage and garbage collection. Monitor GC logs and experiment with policies like G1GC or CMS for optimal performance. *2. Always Use Solr in Cloud Mode* For a robust, scalable setup, *Solr Cloud Mode* is the way to go. This setup requires {*}ZooKeeper{*}, which manages cluster coordination, leader election, and configuration. * {*}ZooKeeper’s Role{*}: ZooKeeper ensures your Solr cluster runs smoothly by handling shard placement, failover, and configuration changes dynamically. * {*}Backups and Security{*}: ** Always back up your Solr and ZooKeeper data regularly. Use Solr's built-in backup tools or external snapshot mechanisms for safety. ** Secure your cluster with {*}SSL/TLS{*}, and set up role-based access control, ideally with tools like {*}Apache Ranger{*}. If Ranger isn’t an option, manual permissions management works too. * {*}Monitoring is Essential{*}: Keeping an eye on your Solr cluster is crucial for ensuring smooth operations. A great place to start is the {*}Solr Web UI{*}, which provides a user-friendly interface to monitor metrics like query performance, index health, and cache usage. It's easy to use and perfect for quickly spotting any issues. For more advanced needs, you may integrate tools like *Prometheus* and *Grafana* for custom dashboards and alerting. However, I should mention that I don’t have direct experience with Prometheus or Grafana specifically when working with Solr. h3. *Using Scenarios: Real-World Applications of Solr* *1. Managing Solr for a Large Dataset* I used open-source Solr as a search engine for a mobile app. Instead of interacting with Solr directly, I managed the setup via ZooKeeper APIs. Here’s what that looked like: * {*}Cluster Configuration{*}: * ** The cluster handled over *100 TB of data* spread across {*}11 physical machines{*}, each running {*}16 Solr instances{*}. * {*}Sharding and Replication{*}: * ** Data was stored in {*}shards{*}, with each shard having *two replicas* to ensure fault tolerance and load balancing. * {*}Data Storage{*}: * ** Data was stored directly on the local file system, which was a great fit for this use case. * {*}Management Approach{*}: # ** Instead of accessing Solr directly, I managed the system via ZooKeeper APIs. This approach, even with an {*}embedded ZooKeeper{*}, worked efficiently under heavy load. # *Using Solr with Cloudera and HDFS* Another scenario involved deploying Solr in a *Cloudera ecosystem* with *HDFS* for storage. Here’s what worked and what didn’t: ** {*}Cluster Management{*}: *** ZooKeeper handled cluster coordination, while *Ranger* (and previously {*}Sentry{*}) managed permissions. ** {*}Challenges{*}: *** Occasionally, node failures caused {*}HDFS file locks{*}, which were difficult to resolve without downtime. These required manual fixes and a lot of patience! If you’ve got questions or need help with something specific, just let me know. I’m happy to share more! > Introduce recommendations of WAYS of running Solr from small to massive > ----------------------------------------------------------------------- > > Key: SOLR-17492 > URL: https://issues.apache.org/jira/browse/SOLR-17492 > Project: Solr > Issue Type: Sub-task > Components: documentation > Reporter: Eric Pugh > Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > [https://solr.apache.org/guide/solr/latest/deployment-guide/installing-solr.html] > makes solrcloud sound like its for only crazy scale -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org