Re: REINDEXCOLLECTION gradually slows

2021-03-18 Thread Karl Stoney
A bit more data on this to help... really welcome any ideas as the slow down 
over time just means we're never really able to finish a reindex via 
REINDEXCOLLECTION

We have disabled auto commits on the destination collection during the reindex, 
however we're still seeing commits happen as the reindexing happens.  Does the 
REINDEXCOLLECTION perform a commit after every batch or every N seconds?  They 
seem to be bang on about 30 seconds?

```
/karl-test/config/updateHandler
{

  *
responseHeader:
{
 *
status: 0,
 *
QTime: 0
},
  *
config:
{
 *
updateHandler:
{
*
indexWriter:
{
   *
closeWaitsForMerges: true
},
*
commitWithin:
{
   *
softCommit: true
},
*
autoCommit:
{
   *
maxDocs: -1,
   *
maxTime: -1,
   *
openSearcher: false
},
*
autoSoftCommit:
{
   *
maxDocs: -1,
   *
maxTime: 90
}
}
}

}
```

This in turn seems to be causing cache warming, which obviously gets 
increasingly slower and increasingly adds load.


From: Karl Stoney 
Sent: 17 March 2021 19:16
To: solr-u...@lucene.apache.org 
Subject: REINDEXCOLLECTION gradually slows

Hey,
So we're trying to use REINDEXCOLLECTION again (solr 8.8) and similar to the 
last time we tried it (8.1) we see it gradually slow down; as you can see from 
the logs below.

I terminated it early but you can see the trend in the processingRatePerSecond.

I'm wondering if anyone has any idea's why this might happen? 🤷‍♂️. The machine 
isn't CPU or Memory constrained, so it makes reindexing larger collections 
really hard (we've had to code a separate app to do it).

Thanks



```
{"timestamp":"2021-03-17T19:04:44.703Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":156000,"targetDocs":787176,"percentComplete":20,"durationSeconds":125,"processingRatePerSecond":1248}
{"timestamp":"2021-03-17T19:04:49.719Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":156000,"targetDocs":787176,"percentComplete":20,"durationSeconds":130,"processingRatePerSecond":1200}
{"timestamp":"2021-03-17T19:04:54.736Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":172000,"targetDocs":787176,"percentComplete":22,"durationSeconds":135,"processingRatePerSecond":1274}
{"timestamp":"2021-03-17T19:04:59.751Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":172000,"targetDocs":787176,"percentComplete":22,"durationSeconds":140,"processingRatePerSecond":1229}
{"timestamp":"2021-03-17T19:05:04.765Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":188000,"targetDocs":787176,"percentComplete":24,"durationSeconds":145,"processingRatePerSecond":1297}
{"timestamp":"2021-03-17T19:05:09.782Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":188000,"targetDocs":787176,"percentComplete":24,"durationSeconds":150,"processingRatePerSecond":1253}
{"timestamp":"2021-03-17T19:05:14.798Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":202000,"targetDocs":787176,"percentComplete":26,"durationSeconds":155,"processingRatePerSecond":1303}
{"timestamp":"2021-03-17T19:05:19.816Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":202000,"targetDocs":787176,"percentComplete":26,"durationSeconds":161,"processingRatePerSecond":1255}
{"timestamp":"2021-03-17T19:05:24.832Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":212000,"targetDocs":787176,"percentComplete":27,"durationSeconds":166,"processingRatePerSecond":1277}
{"timestamp":"2021-03-17T19:05:29.850Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":212000,"targetDocs":787176,"percentComplete":27,"durationSeconds":171,"processingRatePerSe

Problem with Backup - Standalone Mode

2021-03-18 Thread adamw
I have a problem while performing a backup. I have a Solr instance in
Standalone Mode.

I use the following command to do backup:

http://solr-content:8983/solr/core_example/replication?command=backup&location=\\resource\backup\core_example&name=core_example

In the first time I will see the files in the \\resource\backup directory.
But after few minutes I have no files in location:
\\resource\backup\core_example. The files dissapeared. The core has 127 GB
size.

When I done backup files which has 500 MB size backup ends with success.

What's happened ? What may be a reason of this situation ?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: REINDEXCOLLECTION gradually slows

2021-03-18 Thread Karl Stoney
And a bit more... it's almost certainly to do with the commits happening after 
every batch.
When i set rows rather high (for example; 25k) I hold a sustained throughput 
for a longer duration and on the smaller index, things finish faster.

Downside is obviously more memory being used (but we have quite a bit).

Which really leads me down a route of "how to minimise the ever increasing 
costs of the commits.

FYI the index is about 30gb

This is my merge policy:

  
128

  5
  5
  5000
  25


  10
  15


  

I'm thinking I could disable via configAPI cache autoWarming on the index for 
the reindex duration?

From: Karl Stoney 
Sent: 18 March 2021 10:21
To: solr-u...@lucene.apache.org ; 
users@solr.apache.org 
Subject: Re: REINDEXCOLLECTION gradually slows

A bit more data on this to help... really welcome any ideas as the slow down 
over time just means we're never really able to finish a reindex via 
REINDEXCOLLECTION

We have disabled auto commits on the destination collection during the reindex, 
however we're still seeing commits happen as the reindexing happens.  Does the 
REINDEXCOLLECTION perform a commit after every batch or every N seconds?  They 
seem to be bang on about 30 seconds?

```
/karl-test/config/updateHandler
{

  *
responseHeader:
{
 *
status: 0,
 *
QTime: 0
},
  *
config:
{
 *
updateHandler:
{
*
indexWriter:
{
   *
closeWaitsForMerges: true
},
*
commitWithin:
{
   *
softCommit: true
},
*
autoCommit:
{
   *
maxDocs: -1,
   *
maxTime: -1,
   *
openSearcher: false
},
*
autoSoftCommit:
{
   *
maxDocs: -1,
   *
maxTime: 90
}
}
}

}
```

This in turn seems to be causing cache warming, which obviously gets 
increasingly slower and increasingly adds load.


From: Karl Stoney 
Sent: 17 March 2021 19:16
To: solr-u...@lucene.apache.org 
Subject: REINDEXCOLLECTION gradually slows

Hey,
So we're trying to use REINDEXCOLLECTION again (solr 8.8) and similar to the 
last time we tried it (8.1) we see it gradually slow down; as you can see from 
the logs below.

I terminated it early but you can see the trend in the processingRatePerSecond.

I'm wondering if anyone has any idea's why this might happen? 🤷‍♂️. The machine 
isn't CPU or Memory constrained, so it makes reindexing larger collections 
really hard (we've had to code a separate app to do it).

Thanks



```
{"timestamp":"2021-03-17T19:04:44.703Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":156000,"targetDocs":787176,"percentComplete":20,"durationSeconds":125,"processingRatePerSecond":1248}
{"timestamp":"2021-03-17T19:04:49.719Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":156000,"targetDocs":787176,"percentComplete":20,"durationSeconds":130,"processingRatePerSecond":1200}
{"timestamp":"2021-03-17T19:04:54.736Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":172000,"targetDocs":787176,"percentComplete":22,"durationSeconds":135,"processingRatePerSecond":1274}
{"timestamp":"2021-03-17T19:04:59.751Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":172000,"targetDocs":787176,"percentComplete":22,"durationSeconds":140,"processingRatePerSecond":1229}
{"timestamp":"2021-03-17T19:05:04.765Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":188000,"targetDocs":787176,"percentComplete":24,"durationSeconds":145,"processingRatePerSecond":1297}
{"timestamp":"2021-03-17T19:05:09.782Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":188000,"targetDocs":787176,"percentComplete":24,"durationSeconds":150,"processingRatePerSecond":1253}
{"timestamp":"2021-03-17T19:05:14.798Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","destinationCollection":"karl-test","configName":"at-uk-002-509","rows":2000,"processedDocs":202000,"targetDocs":787176,"percentComplete":26,"durationSeconds":155,"processingRatePerSecond":1303}
{"timestamp":"2021-03-17T19:05:19.816Z","level":"info","module":"solr","message":"reindex
 still 
running","sourceCollection":"at-uk-002","

Re: Disable commits during a REINDEXCOLLECTION

2021-03-18 Thread Joel Bernstein
Commits need to happen or you'll get a memory leak. There are parts of the
transaction log which are held in memory for real-time get. Commits release
this memory as indexing occurs. I'm not aware of way to disable this
behavior.





Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Mar 17, 2021 at 6:47 PM Karl Stoney
 wrote:

> Bit of a hacky workaround but:
>
>
>   *   create a collection from the target config name before reindex
>   *   set property on that (which writes overlay to zookeeper)
>   *   delete the collection
>   *   run reindex
>   *   unset-property on the created collection
>
> 
> From: Karl Stoney 
> Sent: 17 March 2021 22:20
> To: users@solr.apache.org 
> Subject: Re: Disable commits during a REINDEXCOLLECTION
>
> That's the source collection, I'm referring to the destination.
>
> @David - seeing as REINDEX creates the target collection, I had to start
> the process and then attempt to set the config on the destination.
> However setting the destination config kills the REINDEX:
>
> ```
> 22:18:40.389
> [DaemonStream-karl-test-20015-thread-1-processing-n:solr-0.search-solr.svc.cluster.local:80_solr
> x:at-uk-002_shard1_replica_n1 c:at-uk-002 s:shard1 r:core_node2] ERROR
> org.apache.solr.client.solrj.io.stream.DaemonStream - Err
> or in DaemonStream: karl-test
> java.io.IOException: org.apache.solr.common.SolrException: Could not find
> a healthy node to handle the request.
> at
> org.apache.solr.client.solrj.io.stream.TopicStream.persistCheckpoints(TopicStream.java:472)
> ~[solr-solrj-8.8.1.jar:8.8.1 64f3b496bfee762a9d2dbff40700f457f4464dfe - tjp
> - 2021-02-16 15:52:04]
> at
> org.apache.solr.client.solrj.io.stream.TopicStream.close(TopicStream.java:342)
> ~[solr-solrj-8.8.1.jar:8.8.1 64f3b496bfee762a9d2dbff40700f457f4464dfe - tjp
> - 2021-02-16 15:52:04]
> at
> org.apache.solr.client.solrj.io.stream.PushBackStream.close(PushBackStream.java:75)
> ~[solr-solrj-8.8.1.jar:8.8.1 64f3b496bfee762a9d2dbff40700f457f4464dfe - tjp
> - 2021-02-16 15:52:04]
> at
> org.apache.solr.client.solrj.io.stream.UpdateStream.close(UpdateStream.java:147)
> ~[solr-solrj-8.8.1.jar:8.8.1 64f3b496bfee762a9d2dbff40700f457f4464dfe - tjp
> - 2021-02-16 15:52:04]
> at
> org.apache.solr.client.solrj.io.stream.CommitStream.close(CommitStream.java:155)
> ~[solr-solrj-8.8.1.jar:8.8.1 64f3b496bfee762a9d2dbff40700f457f4464dfe - tjp
> - 2021-02-16 15:52:04]
> at
> org.apache.solr.client.solrj.io.stream.DaemonStream$StreamRunner.stream(DaemonStream.java:380)
> ~[solr-solrj-8.8.1.jar:8.8.1 64f3b496bfee762a9d2dbff40700f457f4464dfe - tjp
> - 2021-02-16 15:52:04]
> ```
>
> If i remove the set-config; it's fine.
> 
> From: Vivaldi 
> Sent: 17 March 2021 21:59
> To: users@solr.apache.org 
> Subject: Re: Disable commits during a REINDEXCOLLECTION
>
> I thought the REINDEXCOLLECTION puts the source collection in read-only
> mode, am I wrong? Doesn’t that alsı disable commits?
>
> Sent from my iPhone
>
> > On 18 Mar 2021, at 00:28, Karl Stoney 
> wrote:
> >
> > Can you believe I had never come across that before!  Thanks!
> >
> > Every day’s a school day
> >
> > Get Outlook for iOS<
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2Fo0ukef&data=04%7C01%7CKarl.Stoney%40autotrader.co.uk%7Ce328a4cc735f49dc70a708d8e9900e49%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637516152316874603%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=hAgyg3QJVOFoukpXl6TlulQHhtOYASlxhiUvnY5O1Ac%3D&reserved=0
> >
> > 
> > From: David Smiley 
> > Sent: Wednesday, March 17, 2021 8:48:23 PM
> > To: users@solr.apache.org 
> > Subject: Re: Disable commits during a REINDEXCOLLECTION
> >
> > Hi Karl,
> >
> > Look into the "config apI".  Let us know how it goes!
> > Ideally, this feature would do this automatically.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> >
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fdavidwsmiley&data=04%7C01%7CKarl.Stoney%40autotrader.co.uk%7Ce328a4cc735f49dc70a708d8e9900e49%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637516152316874603%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=7fZ17ijzSk3rs34K01tJ4SjP4kV4XU6Md8yzpcnu1jw%3D&reserved=0
> >
> >
> > On Wed, Mar 17, 2021 at 2:02 PM Karl Stoney
> >  wrote:
> >
> >> Hi all,
> >> We're wanting to use REINDEXCOLLECTION, but our config has a relatively
> >> aggressive autoCommit interval configured by default (intentionally).
> >>
> >> Ideally I'd like to be able to disable hard commits for the duration of
> >> the reindex, but can't see a way to do that without pushing a whole new
> >> config and reloading that collection.
> >>
> >> Does anyone know of a ninja way at runtime to disable autoCommits on a
> >> collection (solr cloud)?
> >>

Re: Cannot find Solr 7.4.1 release

2021-03-18 Thread Olivier Tavard
Hi,

Thanks for your answer. It was just to be sure. I created the binary
package meantime.

Best regards,

Olivier

Le jeu. 18 févr. 2021 à 14:30, Shawn Heisey  a écrit :

> On 2/18/2021 1:05 AM, Olivier Tavard wrote:
> > I wanted to download Solr 7.4.1, but I cannot find the 7.4.1 release into
> > http://archive.apache.org/dist/lucene/solr/ : there are Solr 7.4 and
> after
> > directly 7.5.
> > Of course I can build from source code, but this is frustrating because I
> > can see that in the 7_4_branch there is a fix that I need (SOLR-12594)
> with
> > the status fixed into 7.4.1 and 7.5 versions. Everythings seems to have
> > been prepared to release the 7.4.1, but I cannot find it.
> > Does this release exist ?
>
> That release does not exist.  There was never any discussion about it on
> the dev list.
>
> 7.4.1 was added to Jira for tracking purposes, and the code change for
> that issue was saved to branch_7_4 just in case somebody felt a 7.4.1
> release was required.  That issue deals with a problem in metrics, which
> is outside of basic Solr functionality -- not critical enough to warrant
> a point release.
>
> The release process for 7.5.0 was underway about a month after that
> issue was committed.
>
> If 7.5.0 (or one of the many later releases) will not work for your
> needs, then you will need to compile branch_7_4 yourself.  I have used
> custom-compiled versions before in production because we needed a bugfix
> that was not deemed severe enough for a new point release.
>
> You can create binary packages similar to what is available for download
> by running "ant package" in the solr directory of your code checkout.  I
> think that build target only works on *NIX systems -- Windows is missing
> some of the required pieces.
>
> Thanks,
> Shawn
>


How to set maxExpansions parameter for fuzzy search

2021-03-18 Thread Olivier Tavard
Hi,

I have a question regarding the fuzzy search.
By reading previous questions on the ML in the past I saw that the
parameter max expansions is set to 50 in the code.
I have the same behavior that for other users meaning that if I have a
Solrcloud cluster with many shards, I obtain more results for a fuzzy
search than if I were on a monoserver with a single shard cause of the max
expansions parameter because the value is per shard.
So I would like to increase the value but to my knowledge, I need to change
it in the code and recompile Solr. Is there a way to set it directly at the
query, I did not find anything in the documentation. I am aware that it can
cause poor search performance but I need to increase it. I saw that in
ElasticSearch, recently they did a modification to set it at query time but
not in Solr as I see, am I correct ?

Thank you,

Olivier


Solr config and schema linting and validation

2021-03-18 Thread Karl Stoney
Hi all,
I'm trying to add some tests to the pipeline which deploys our solr config, and 
right now i'm working with bash/xmlint/xmlstarlet, and it's a bit crude.

Does anyone have any tips/pointers on how best to validate solrconfig.xml and 
schema.xml.  Is there published schemas for specific solr versions we can 
validate against?  etc etc.

Thanks
Karl
Unless expressly stated otherwise in this email, this e-mail is sent on behalf 
of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, 
Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited 
is part of the Auto Trader Group Plc group. This email and any files 
transmitted with it are confidential and may be legally privileged, and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.


Re: Solr config and schema linting and validation

2021-03-18 Thread Alexandre Rafalovitch
I am not aware of any, and I looked. In fact I have two pages of notes
of what a service like that would look like if I chose to create one.
My main thought was that solrconfig and schema are effectively a graph
database and that can be used for validation both against rules and
for unused/wrong references.

So, I would be very interested in any tips/pointers as well, or even
in "ideas" around that. Some of my own are at:
https://github.com/arafalov/SolrLint

Regards,
   Alex.

On Thu, 18 Mar 2021 at 15:35, Karl Stoney
 wrote:
>
> Hi all,
> I'm trying to add some tests to the pipeline which deploys our solr config, 
> and right now i'm working with bash/xmlint/xmlstarlet, and it's a bit crude.
>
> Does anyone have any tips/pointers on how best to validate solrconfig.xml and 
> schema.xml.  Is there published schemas for specific solr versions we can 
> validate against?  etc etc.
>
> Thanks
> Karl
> Unless expressly stated otherwise in this email, this e-mail is sent on 
> behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place, 
> Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto 
> Trader Limited is part of the Auto Trader Group Plc group. This email and any 
> files transmitted with it are confidential and may be legally privileged, and 
> intended solely for the use of the individual or entity to whom they are 
> addressed. If you have received this email in error please notify the sender. 
> This email message has been swept for the presence of computer viruses.