Re: REINDEXCOLLECTION gradually slows
A bit more data on this to help... really welcome any ideas as the slow down over time just means we're never really able to finish a reindex via REINDEXCOLLECTION We have disabled auto commits on the destination collection during the reindex, however we're still seeing commits happen as the reindexing happens. Does the REINDEXCOLLECTION perform a commit after every batch or every N seconds? They seem to be bang on about 30 seconds? ``` /karl-test/config/updateHandler { * responseHeader: { * status: 0, * QTime: 0 }, * config: { * updateHandler: { * indexWriter: { * closeWaitsForMerges: true }, * commitWithin: { * softCommit: true }, * autoCommit: { * maxDocs: -1, * maxTime: -1, * openSearcher: false }, * autoSoftCommit: { * maxDocs: -1, * maxTime: 90 } } } } ``` This in turn seems to be causing cache warming, which obviously gets increasingly slower and increasingly adds load. From: Karl Stoney Sent: 17 March 2021 19:16 To: solr-u...@lucene.apache.org Subject: REINDEXCOLLECTION gradually slows Hey, So we're trying to use REINDEXCOLLECTION again (solr 8.8) and similar to the last time we tried it (8.1) we see it gradually slow down; as you can see from the logs below. I terminated it early but you can see the trend in the processingRatePerSecond. I'm wondering if anyone has any idea's why this might happen? 🤷‍♂️. The machine isn't CPU or Memory constrained, so it makes reindexing larger collections really hard (we've had to code a separate app to do it). Thanks ``` {"timestamp":"2021-03-17T19:04:44.703Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":156000,"targetDocs":787176,"percentComplete":20,"durationSeconds":125,"processingRatePerSecond":1248} {"timestamp":"2021-03-17T19:04:49.719Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":156000,"targetDocs":787176,"percentComplete":20,"durationSeconds":130,"processingRatePerSecond":1200} {"timestamp":"2021-03-17T19:04:54.736Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":172000,"targetDocs":787176,"percentComplete":22,"durationSeconds":135,"processingRatePerSecond":1274} {"timestamp":"2021-03-17T19:04:59.751Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":172000,"targetDocs":787176,"percentComplete":22,"durationSeconds":140,"processingRatePerSecond":1229} {"timestamp":"2021-03-17T19:05:04.765Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":188000,"targetDocs":787176,"percentComplete":24,"durationSeconds":145,"processingRatePerSecond":1297} {"timestamp":"2021-03-17T19:05:09.782Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":188000,"targetDocs":787176,"percentComplete":24,"durationSeconds":150,"processingRatePerSecond":1253} {"timestamp":"2021-03-17T19:05:14.798Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":202000,"targetDocs":787176,"percentComplete":26,"durationSeconds":155,"processingRatePerSecond":1303} {"timestamp":"2021-03-17T19:05:19.816Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":202000,"targetDocs":787176,"percentComplete":26,"durationSeconds":161,"processingRatePerSecond":1255} {"timestamp":"2021-03-17T19:05:24.832Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":212000,"targetDocs":787176,"percentComplete":27,"durationSeconds":166,"processingRatePerSecond":1277} {"timestamp":"2021-03-17T19:05:29.850Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":212000,"targetDocs":787176,"percentComplete":27,"durationSeconds":171,"processingRatePerSe
Problem with Backup - Standalone Mode
I have a problem while performing a backup. I have a Solr instance in Standalone Mode. I use the following command to do backup: http://solr-content:8983/solr/core_example/replication?command=backup&location=\\resource\backup\core_example&name=core_example In the first time I will see the files in the \\resource\backup directory. But after few minutes I have no files in location: \\resource\backup\core_example. The files dissapeared. The core has 127 GB size. When I done backup files which has 500 MB size backup ends with success. What's happened ? What may be a reason of this situation ? -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: REINDEXCOLLECTION gradually slows
And a bit more... it's almost certainly to do with the commits happening after every batch. When i set rows rather high (for example; 25k) I hold a sustained throughput for a longer duration and on the smaller index, things finish faster. Downside is obviously more memory being used (but we have quite a bit). Which really leads me down a route of "how to minimise the ever increasing costs of the commits. FYI the index is about 30gb This is my merge policy: 128 5 5 5000 25 10 15 I'm thinking I could disable via configAPI cache autoWarming on the index for the reindex duration? From: Karl Stoney Sent: 18 March 2021 10:21 To: solr-u...@lucene.apache.org ; users@solr.apache.org Subject: Re: REINDEXCOLLECTION gradually slows A bit more data on this to help... really welcome any ideas as the slow down over time just means we're never really able to finish a reindex via REINDEXCOLLECTION We have disabled auto commits on the destination collection during the reindex, however we're still seeing commits happen as the reindexing happens. Does the REINDEXCOLLECTION perform a commit after every batch or every N seconds? They seem to be bang on about 30 seconds? ``` /karl-test/config/updateHandler { * responseHeader: { * status: 0, * QTime: 0 }, * config: { * updateHandler: { * indexWriter: { * closeWaitsForMerges: true }, * commitWithin: { * softCommit: true }, * autoCommit: { * maxDocs: -1, * maxTime: -1, * openSearcher: false }, * autoSoftCommit: { * maxDocs: -1, * maxTime: 90 } } } } ``` This in turn seems to be causing cache warming, which obviously gets increasingly slower and increasingly adds load. From: Karl Stoney Sent: 17 March 2021 19:16 To: solr-u...@lucene.apache.org Subject: REINDEXCOLLECTION gradually slows Hey, So we're trying to use REINDEXCOLLECTION again (solr 8.8) and similar to the last time we tried it (8.1) we see it gradually slow down; as you can see from the logs below. I terminated it early but you can see the trend in the processingRatePerSecond. I'm wondering if anyone has any idea's why this might happen? 🤷‍♂️. The machine isn't CPU or Memory constrained, so it makes reindexing larger collections really hard (we've had to code a separate app to do it). Thanks ``` {"timestamp":"2021-03-17T19:04:44.703Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":156000,"targetDocs":787176,"percentComplete":20,"durationSeconds":125,"processingRatePerSecond":1248} {"timestamp":"2021-03-17T19:04:49.719Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":156000,"targetDocs":787176,"percentComplete":20,"durationSeconds":130,"processingRatePerSecond":1200} {"timestamp":"2021-03-17T19:04:54.736Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":172000,"targetDocs":787176,"percentComplete":22,"durationSeconds":135,"processingRatePerSecond":1274} {"timestamp":"2021-03-17T19:04:59.751Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":172000,"targetDocs":787176,"percentComplete":22,"durationSeconds":140,"processingRatePerSecond":1229} {"timestamp":"2021-03-17T19:05:04.765Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":188000,"targetDocs":787176,"percentComplete":24,"durationSeconds":145,"processingRatePerSecond":1297} {"timestamp":"2021-03-17T19:05:09.782Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":188000,"targetDocs":787176,"percentComplete":24,"durationSeconds":150,"processingRatePerSecond":1253} {"timestamp":"2021-03-17T19:05:14.798Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":202000,"targetDocs":787176,"percentComplete":26,"durationSeconds":155,"processingRatePerSecond":1303} {"timestamp":"2021-03-17T19:05:19.816Z","level":"info","module":"solr","message":"reindex still running","sourceCollection":"at-uk-002","
Re: Disable commits during a REINDEXCOLLECTION
Commits need to happen or you'll get a memory leak. There are parts of the transaction log which are held in memory for real-time get. Commits release this memory as indexing occurs. I'm not aware of way to disable this behavior. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Mar 17, 2021 at 6:47 PM Karl Stoney wrote: > Bit of a hacky workaround but: > > > * create a collection from the target config name before reindex > * set property on that (which writes overlay to zookeeper) > * delete the collection > * run reindex > * unset-property on the created collection > > > From: Karl Stoney > Sent: 17 March 2021 22:20 > To: users@solr.apache.org > Subject: Re: Disable commits during a REINDEXCOLLECTION > > That's the source collection, I'm referring to the destination. > > @David - seeing as REINDEX creates the target collection, I had to start > the process and then attempt to set the config on the destination. > However setting the destination config kills the REINDEX: > > ``` > 22:18:40.389 > [DaemonStream-karl-test-20015-thread-1-processing-n:solr-0.search-solr.svc.cluster.local:80_solr > x:at-uk-002_shard1_replica_n1 c:at-uk-002 s:shard1 r:core_node2] ERROR > org.apache.solr.client.solrj.io.stream.DaemonStream - Err > or in DaemonStream: karl-test > java.io.IOException: org.apache.solr.common.SolrException: Could not find > a healthy node to handle the request. > at > org.apache.solr.client.solrj.io.stream.TopicStream.persistCheckpoints(TopicStream.java:472) > ~[solr-solrj-8.8.1.jar:8.8.1 64f3b496bfee762a9d2dbff40700f457f4464dfe - tjp > - 2021-02-16 15:52:04] > at > org.apache.solr.client.solrj.io.stream.TopicStream.close(TopicStream.java:342) > ~[solr-solrj-8.8.1.jar:8.8.1 64f3b496bfee762a9d2dbff40700f457f4464dfe - tjp > - 2021-02-16 15:52:04] > at > org.apache.solr.client.solrj.io.stream.PushBackStream.close(PushBackStream.java:75) > ~[solr-solrj-8.8.1.jar:8.8.1 64f3b496bfee762a9d2dbff40700f457f4464dfe - tjp > - 2021-02-16 15:52:04] > at > org.apache.solr.client.solrj.io.stream.UpdateStream.close(UpdateStream.java:147) > ~[solr-solrj-8.8.1.jar:8.8.1 64f3b496bfee762a9d2dbff40700f457f4464dfe - tjp > - 2021-02-16 15:52:04] > at > org.apache.solr.client.solrj.io.stream.CommitStream.close(CommitStream.java:155) > ~[solr-solrj-8.8.1.jar:8.8.1 64f3b496bfee762a9d2dbff40700f457f4464dfe - tjp > - 2021-02-16 15:52:04] > at > org.apache.solr.client.solrj.io.stream.DaemonStream$StreamRunner.stream(DaemonStream.java:380) > ~[solr-solrj-8.8.1.jar:8.8.1 64f3b496bfee762a9d2dbff40700f457f4464dfe - tjp > - 2021-02-16 15:52:04] > ``` > > If i remove the set-config; it's fine. > > From: Vivaldi > Sent: 17 March 2021 21:59 > To: users@solr.apache.org > Subject: Re: Disable commits during a REINDEXCOLLECTION > > I thought the REINDEXCOLLECTION puts the source collection in read-only > mode, am I wrong? Doesn’t that alsı disable commits? > > Sent from my iPhone > > > On 18 Mar 2021, at 00:28, Karl Stoney > wrote: > > > > Can you believe I had never come across that before! Thanks! > > > > Every day’s a school day > > > > Get Outlook for iOS< > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fo0ukef&data=04%7C01%7CKarl.Stoney%40autotrader.co.uk%7Ce328a4cc735f49dc70a708d8e9900e49%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637516152316874603%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=hAgyg3QJVOFoukpXl6TlulQHhtOYASlxhiUvnY5O1Ac%3D&reserved=0 > > > > > > From: David Smiley > > Sent: Wednesday, March 17, 2021 8:48:23 PM > > To: users@solr.apache.org > > Subject: Re: Disable commits during a REINDEXCOLLECTION > > > > Hi Karl, > > > > Look into the "config apI". Let us know how it goes! > > Ideally, this feature would do this automatically. > > > > ~ David Smiley > > Apache Lucene/Solr Search Developer > > > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fdavidwsmiley&data=04%7C01%7CKarl.Stoney%40autotrader.co.uk%7Ce328a4cc735f49dc70a708d8e9900e49%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637516152316874603%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=7fZ17ijzSk3rs34K01tJ4SjP4kV4XU6Md8yzpcnu1jw%3D&reserved=0 > > > > > > On Wed, Mar 17, 2021 at 2:02 PM Karl Stoney > > wrote: > > > >> Hi all, > >> We're wanting to use REINDEXCOLLECTION, but our config has a relatively > >> aggressive autoCommit interval configured by default (intentionally). > >> > >> Ideally I'd like to be able to disable hard commits for the duration of > >> the reindex, but can't see a way to do that without pushing a whole new > >> config and reloading that collection. > >> > >> Does anyone know of a ninja way at runtime to disable autoCommits on a > >> collection (solr cloud)? > >>
Re: Cannot find Solr 7.4.1 release
Hi, Thanks for your answer. It was just to be sure. I created the binary package meantime. Best regards, Olivier Le jeu. 18 févr. 2021 à 14:30, Shawn Heisey a écrit : > On 2/18/2021 1:05 AM, Olivier Tavard wrote: > > I wanted to download Solr 7.4.1, but I cannot find the 7.4.1 release into > > http://archive.apache.org/dist/lucene/solr/ : there are Solr 7.4 and > after > > directly 7.5. > > Of course I can build from source code, but this is frustrating because I > > can see that in the 7_4_branch there is a fix that I need (SOLR-12594) > with > > the status fixed into 7.4.1 and 7.5 versions. Everythings seems to have > > been prepared to release the 7.4.1, but I cannot find it. > > Does this release exist ? > > That release does not exist. There was never any discussion about it on > the dev list. > > 7.4.1 was added to Jira for tracking purposes, and the code change for > that issue was saved to branch_7_4 just in case somebody felt a 7.4.1 > release was required. That issue deals with a problem in metrics, which > is outside of basic Solr functionality -- not critical enough to warrant > a point release. > > The release process for 7.5.0 was underway about a month after that > issue was committed. > > If 7.5.0 (or one of the many later releases) will not work for your > needs, then you will need to compile branch_7_4 yourself. I have used > custom-compiled versions before in production because we needed a bugfix > that was not deemed severe enough for a new point release. > > You can create binary packages similar to what is available for download > by running "ant package" in the solr directory of your code checkout. I > think that build target only works on *NIX systems -- Windows is missing > some of the required pieces. > > Thanks, > Shawn >
How to set maxExpansions parameter for fuzzy search
Hi, I have a question regarding the fuzzy search. By reading previous questions on the ML in the past I saw that the parameter max expansions is set to 50 in the code. I have the same behavior that for other users meaning that if I have a Solrcloud cluster with many shards, I obtain more results for a fuzzy search than if I were on a monoserver with a single shard cause of the max expansions parameter because the value is per shard. So I would like to increase the value but to my knowledge, I need to change it in the code and recompile Solr. Is there a way to set it directly at the query, I did not find anything in the documentation. I am aware that it can cause poor search performance but I need to increase it. I saw that in ElasticSearch, recently they did a modification to set it at query time but not in Solr as I see, am I correct ? Thank you, Olivier
Solr config and schema linting and validation
Hi all, I'm trying to add some tests to the pipeline which deploys our solr config, and right now i'm working with bash/xmlint/xmlstarlet, and it's a bit crude. Does anyone have any tips/pointers on how best to validate solrconfig.xml and schema.xml. Is there published schemas for specific solr versions we can validate against? etc etc. Thanks Karl Unless expressly stated otherwise in this email, this e-mail is sent on behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited is part of the Auto Trader Group Plc group. This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
Re: Solr config and schema linting and validation
I am not aware of any, and I looked. In fact I have two pages of notes of what a service like that would look like if I chose to create one. My main thought was that solrconfig and schema are effectively a graph database and that can be used for validation both against rules and for unused/wrong references. So, I would be very interested in any tips/pointers as well, or even in "ideas" around that. Some of my own are at: https://github.com/arafalov/SolrLint Regards, Alex. On Thu, 18 Mar 2021 at 15:35, Karl Stoney wrote: > > Hi all, > I'm trying to add some tests to the pipeline which deploys our solr config, > and right now i'm working with bash/xmlint/xmlstarlet, and it's a bit crude. > > Does anyone have any tips/pointers on how best to validate solrconfig.xml and > schema.xml. Is there published schemas for specific solr versions we can > validate against? etc etc. > > Thanks > Karl > Unless expressly stated otherwise in this email, this e-mail is sent on > behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place, > Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto > Trader Limited is part of the Auto Trader Group Plc group. This email and any > files transmitted with it are confidential and may be legally privileged, and > intended solely for the use of the individual or entity to whom they are > addressed. If you have received this email in error please notify the sender. > This email message has been swept for the presence of computer viruses.