Use of LB in base URL and shards parameter for search in Solr (ver: 6.1.0)

2021-03-20 Thread jay harkhani
Hello,

We are using Solr 6.1.0. We have 2 shards and each has one replica with 7 
zookeepers in our live environment.

Our Solr URL is
http://solr2.xyz.com:8983/solr/actionscomments/select?q=+resource_id:(123)+and+entity_type:(4)++action_status:(0)++is_active:(true)+and+recipient_id:(5941841)&sort=action_date+desc,id+desc&start=0&rows=1000&fq=&fl=action_id&indent=off&shards.tolerant=true&shards=s3.xyz.com:8983/solr/actionscomments|s3r1.xyz.com:8983/solr/actionscomments,s4.xyz.com:8983/solr/actionscomments|s4r1.xyz.com:8983/solr/actionscomments

I would like to understand correct configuration for use of LB and shards 
parameters in URL. Above URL we use for search in collection.

1) In current configuration base URL (https://solr2.xyz.com) is of LB which 
point to IPs of shards (Shard1 & Shard2) and replicas (replica1 & replica2). Is 
it correct? or instead provide IP of shard be more effective?

2) What is importance of shards parameters? What should we use LB URL or direct 
IP of shards and replicas?

Regards,
Jay Harkhani.



Use of LB in base URL and shards parameter for search in Solr (ver: 6.1.0)

2021-03-20 Thread jay harkhani
Hello,

We are using Solr 6.1.0. We have 2 shards and each has one replica with 7 
zookeepers in our live environment.

Our Solr URL is
http://solr2.xyz.com:8983/solr/actionscomments/select?q=+resource_id:(123)+and+entity_type:(4)++action_status:(0)++is_active:(true)+and+recipient_id:(5941841)&sort=action_date+desc,id+desc&start=0&rows=1000&fq=&fl=action_id&indent=off&shards.tolerant=true&shards=s3.xyz.com:8983/solr/actionscomments|s3r1.xyz.com:8983/solr/actionscomments,s4.xyz.com:8983/solr/actionscomments|s4r1.xyz.com:8983/solr/actionscomments

I would like to understand correct configuration for use of LB and shards 
parameters in URL. Above URL we use for search in collection.

1) In current configuration base URL (https://solr2.xyz.com) is of LB which 
point to IPs of shards (Shard1 & Shard2) and replicas (replica1 & replica2). Is 
it correct? or instead provide IP of shard be more effective?

2) What is importance of shards parameters? What should we use LB URL or direct 
IP of shards and replicas?

Regards,
Jay Harkhani.



Use of LB in base URL and shards parameter for search in Solr (ver: 6.1.0)

2021-03-20 Thread vishal patel
We are using Solr 6.1.0. We have 2 shards and each has one replica with 7 
zookeepers in our live environment.

Our Solr URL is
http://solr2.xyz.com:8983/solr/actionscomments/select?q=+resource_id:(123)+and+entity_type:(4)++action_status:(0)++is_active:(true)+and+recipient_id:(5941841)&sort=action_date+desc,id+desc&start=0&rows=1000&fq=&fl=action_id&indent=off&shards.tolerant=true&shards=s3.xyz.com:8983/solr/actionscomments|s3r1.xyz.com:8983/solr/actionscomments,s4.xyz.com:8983/solr/actionscomments|s4r1.xyz.com:8983/solr/actionscomments

I would like to understand correct configuration for use of LB and shards 
parameters in URL. Above URL we use for search in collection.

1) In current configuration base URL (https://solr2.xyz.com) is of LB which 
point to IPs of shards (Shard1 & Shard2) and replicas (replica1 & replica2). Is 
it correct? or instead provide IP of shard be more effective?
 2) What is importance of shards parameters? What should we use LB URL or 
direct IP of shards and replicas?

Regards,
Vishal Patel


Re: Solr complains about unknown field during atomic indexing

2021-03-20 Thread Shawn Heisey

On 3/19/2021 3:36 PM, gnandre wrote:

While performing  atomic indexing, I run into an error which says 'unknown
field X' where X is not a field specified in the schema. It is a
discontinued field. After deleting that field from the schema, I have
restarted Solr but I have not re-indexed the content back, so the deleted
field data still might be there in Solr index.

The way I understand how atomic indexing works, it tries to index all
stored values again, but why is it trying to index stored value of a field
that does not exist in the schema?



Solr's Atomic Update feature works by grabbing the existing document, 
all of it, performing the atomic update instructions on that document, 
and then indexing the results as a new document.  If the uniqueKey 
feature is enabled (which would be required for Atomic Updates to work 
properly), the old document is deleted as the new document is added.  I 
haven't looked at the code, but the existing fields are likely added to 
the document that is being built all at once and without consulting the 
schema.  So if field X is in the document that's already in the index, 
it will be in the new document too.  If X is deleted from the schema, 
you'll get the error you're getting.


It would be a fair amount of work to have Solr take the schema into 
account for atomic updates.  Not impossible, just slightly 
time-consuming.  I think we (the Solr developers) would want it to still 
fail indexing in this situation, the failure would just happen at a 
different place in the code than it does now, during atomic document 
assembly.  Fail earlier and faster.


What you'll need to for your circumstances is leave X in the schema, but 
change it to a type that will be completely ignored on indexing.


Something like this:



You could then add the following to take care of any and all unknown fields:



Or you could name individual fields like that, which I think would be a 
better option than the wildcard dynamic field.


My source for the config snippets: 
https://stackoverflow.com/questions/46509259/solr-7-managed-schema-how-to-ignore-unnamed-fields


Thanks,
Shawn


Re: Use of LB in base URL and shards parameter for search in Solr (ver: 6.1.0)

2021-03-20 Thread Shawn Heisey

On 3/20/2021 1:04 AM, jay harkhani wrote:

Hello,

We are using Solr 6.1.0. We have 2 shards and each has one replica with 7 
zookeepers in our live environment.


So you've got it running SolrCloud mode (with ZooKeeper)?


Our Solr URL is
http://solr2.xyz.com:8983/solr/actionscomments/select?q=+resource_id:(123)+and+entity_type:(4)++action_status:(0)++is_active:(true)+and+recipient_id:(5941841)&sort=action_date+desc,id+desc&start=0&rows=1000&fq=&fl=action_id&indent=off&shards.tolerant=true&shards=s3.xyz.com:8983/solr/actionscomments|s3r1.xyz.com:8983/solr/actionscomments,s4.xyz.com:8983/solr/actionscomments|s4r1.xyz.com:8983/solr/actionscomments

I would like to understand correct configuration for use of LB and shards 
parameters in URL. Above URL we use for search in collection.

1) In current configuration base URL (https://solr2.xyz.com) is of LB which point to 
IPs of shards (Shard1 & Shard2) and replicas (replica1 & replica2). Is it 
correct? or instead provide IP of shard be more effective?

2) What is importance of shards parameters? What should we use LB URL or direct 
IP of shards and replicas?


If you want Solr to do load balancing automatically across all servers 
containing parts of that collection, you should completely remove the 
shards parameter.  It is not necessary and probably prevents load 
balancing from working.


If you're talking about using your own load balancer, I would still 
remove the shards parameter.  Solr will automatically figure out where 
every shard replica is when you query the collection.


Thanks,
Shawn


Check for ongoing REINDEXCOLLECTION

2021-03-20 Thread Karl Stoney
Hi,
I’m aware I can check the status of a reindx, if I know both the source and 
destination cluster, or I can check the progress of the async request via the 
async API.

However, if I know neither of these, and I just want to check if there are any 
REINDEX’s running on the cluster at any given time, is there a programmatic way 
to do this?

The reason behind this is that we’ve got a custom application which starts the 
reindex, and I want it to first validate there aren’t any other reindexes 
running.

Thanks
Karl
Unless expressly stated otherwise in this email, this e-mail is sent on behalf 
of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, 
Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited 
is part of the Auto Trader Group Plc group. This email and any files 
transmitted with it are confidential and may be legally privileged, and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.


RE: Check for ongoing REINDEXCOLLECTION

2021-03-20 Thread ufuk yılmaz
I think calling:

/solr/collectionReindexCommandIsSentTo/stream?action=list

Lists running daemon processes, which should contain running reindexing 
operations. Be careful of https://issues.apache.org/jira/browse/SOLR-13245 
though, if you have more than one replica on the same node. (Joel Bernstein’s 
comment under the issue).

Sent from Mail for Windows 10

From: Karl Stoney
Sent: 20 March 2021 19:40
To: solr-u...@lucene.apache.org
Subject: Check for ongoing REINDEXCOLLECTION

Hi,
I’m aware I can check the status of a reindx, if I know both the source and 
destination cluster, or I can check the progress of the async request via the 
async API.

However, if I know neither of these, and I just want to check if there are any 
REINDEX’s running on the cluster at any given time, is there a programmatic way 
to do this?

The reason behind this is that we’ve got a custom application which starts the 
reindex, and I want it to first validate there aren’t any other reindexes 
running.

Thanks
Karl
Unless expressly stated otherwise in this email, this e-mail is sent on behalf 
of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, 
Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited 
is part of the Auto Trader Group Plc group. This email and any files 
transmitted with it are confidential and may be legally privileged, and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.



Re: tlog size issue- solr cloud 6.6

2021-03-20 Thread Ritvik Sharma
  Hi Guys

We have faced an issue where tlog size is increasing unnecessarily. We are
using a "heavy indexing, heavy query" approach.  We enabled hard commit
also,

solr cloud: 6.6
zk: 3.4.10
shards: 2, replication factor= 2


solrconfig,

  
   ${solr.autoCommit.maxTime:15000}
  1
   false


 



 
   ${solr.autoSoftCommit.maxTime:15000} -->
 

In every replica, tlog is increasing for more than 200GB which exhaust disk
space.  Please suggest someting

On Fri, 19 Mar 2021 at 16:36, Ritvik Sharma  wrote:

> Hi Guys
>
> We have faced an issue where tlog size is increasing unnecessarily. We are
> using a "heavy indexing, heavy query" approach.  We enabled hard commit
> also,
>
> solr cloud: 6.6
> zk: 3.4.10
> shards: 2, replication factor= 2
>
>
> solrconfig,
>
>   
>${solr.autoCommit.maxTime:15000}
>   1
>false
>
>
>  
>
> 
>
>  
>${solr.autoSoftCommit.maxTime:15000} -->
>  
>
>


Re: How to set maxExpansions parameter for fuzzy search

2021-03-20 Thread David Smiley
It appears this setter needs to be
called: org.apache.solr.parser.SolrQueryParserBase#setFuzzyMinSim  and it
could be done by edismax.  PR welcome!

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, Mar 18, 2021 at 12:28 PM Olivier Tavard 
wrote:

> Hi,
>
> I have a question regarding the fuzzy search.
> By reading previous questions on the ML in the past I saw that the
> parameter max expansions is set to 50 in the code.
> I have the same behavior that for other users meaning that if I have a
> Solrcloud cluster with many shards, I obtain more results for a fuzzy
> search than if I were on a monoserver with a single shard cause of the max
> expansions parameter because the value is per shard.
> So I would like to increase the value but to my knowledge, I need to change
> it in the code and recompile Solr. Is there a way to set it directly at the
> query, I did not find anything in the documentation. I am aware that it can
> cause poor search performance but I need to increase it. I saw that in
> ElasticSearch, recently they did a modification to set it at query time but
> not in Solr as I see, am I correct ?
>
> Thank you,
>
> Olivier
>


facet alias with "duplicate" uniqueKey

2021-03-20 Thread buchman
Hi folks,
I've noticed the following warning in the aliases documentation:
"...Reindexing a document with a different route value for the same ID
produces two distinct documents with the same ID accessible via the
alias..."
When tested such case it seems that really only one doc is retrieved but
when turning on facets they aren't aligned with the result set.
My test:
1) create two collections test1 and test2 and alias named test for both
2) index docs with the same id to both of the collections
{"id":123}
3) querying the alias as followed with explained debug:
http://localhost:8983/solr/test/select?debug.explain.structured=true&debugQuery=on&facet.field=id&facet=on&q=*%3A*
{
  "responseHeader":{
"zkConnected":true,
"status":0,
"QTime":25,
"params":{
  "q":"*:*",
  "facet.field":"id",
  "debug.explain.structured":"true",
  "facet":"on",
  "debugQuery":"on",
  "_":"1616269705741"}},
 
"response":{*"numFound":1*,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[
  {
"id":"123",
"_version_":1694670492462481408}]
  },
  "facet_counts":{
"facet_queries":{},
"facet_fields":{
  *"id":[
"123",2]},*
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}},
  "debug":{
"track":{
  "rid":"-31",
  "EXECUTE_QUERY":{
"http://some_ip:8983/solr/test2_shard1_replica_n1/":{
  "QTime":"3",
  "ElapsedTime":"10",
  "RequestPurpose":"GET_TOP_IDS,GET_FACETS,SET_TERM_STATS",
  "NumFound":"1",
 
"Response":"{responseHeader={zkConnected=true,status=0,QTime=3,params={df=_text_,distrib=false,fl=[id,
score],shards.purpose=16404,fsv=true,shard.url=http://some_ip:8983/solr/test2_shard1_replica_n1/,rid=-31,wt=javabin,_=1616269705741,facet.field=id,f.id.facet.mincount=0,debug=[false,
timing,
track],start=0,f.id.facet.limit=160,collection=test1,test2,rows=10,debug.explain.structured=true,version=2,q=*:*,omitHeader=false,requestPurpose=GET_TOP_IDS,GET_FACETS,SET_TERM_STATS,NOW=1616270594521,isShard=true,facet=on,debugQuery=false}},response={numFound=1,numFoundExact=true,start=0,maxScore=1.0,docs=[SolrDocument{id=123,
score=1.0}]},sort_values={},facet_counts={facet_queries={},facet_fields={id={123=1}},facet_ranges={},facet_intervals={},facet_heatmaps={}},debug={facet-debug={elapse=0,sub-facet=[{processor=SimpleFacets,elapse=0,action=field
facet,maxThreads=0,sub-facet=[{elapse=0,requestedMethod=not
specified,appliedMethod=FC,inputDocSetSize=1,field=id,numBuckets=2}]}]},timing={time=2.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},terms={time=0.0},debug={time=0.0}},process={time=2.0,query={time=0.0},facet={time=1.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},terms={time=0.0},debug={time=0.0}"},
"http://some_ip:8983/solr/test1_shard1_replica_n1/":{
  "QTime":"2",
  "ElapsedTime":"12",
  "RequestPurpose":"GET_TOP_IDS,GET_FACETS,SET_TERM_STATS",
  "NumFound":"1",
 
"Response":"{responseHeader={zkConnected=true,status=0,QTime=2,params={df=_text_,distrib=false,fl=[id,
score],shards.purpose=16404,fsv=true,shard.url=http://some_ip:8983/solr/test1_shard1_replica_n1/,rid=-31,wt=javabin,_=1616269705741,facet.field=id,f.id.facet.mincount=0,debug=[false,
timing,
track],start=0,f.id.facet.limit=160,collection=test1,test2,rows=10,debug.explain.structured=true,version=2,q=*:*,omitHeader=false,requestPurpose=GET_TOP_IDS,GET_FACETS,SET_TERM_STATS,NOW=1616270594521,isShard=true,facet=on,debugQuery=false}},response={numFound=1,numFoundExact=true,start=0,maxScore=1.0,docs=[SolrDocument{id=123,
score=1.0}]},sort_values={},facet_counts={facet_queries={},facet_fields={id={123=1}},facet_ranges={},facet_intervals={},facet_heatmaps={}},debug={facet-debug={elapse=0,sub-facet=[{processor=SimpleFacets,elapse=0,action=field
facet,maxThreads=0,sub-facet=[{elapse=0,requestedMethod=not
specified,appliedMethod=FC,inputDocSetSize=1,field=id,numBuckets=2}]}]},timing={time=2.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},terms={time=0.0},debug={time=0.0}},process={time=2.0,query={time=0.0},facet={time=1.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},terms={time=0.0},debug={time=0.0}"}},
  "GET_FIELDS":{
"http://some_ip:8983/solr/test2_shard1_replica_n1/":{
  "QTime":"5",
  "ElapsedTime":"8",
  "RequestPurpose":"GET_FIELDS,GET_DEBUG,SET_TERM_STATS",
  "NumFound":"1",
 
"Response":"{responseHeader={zkConnected=true,status=0,QTime=5,params={facet.field=id,df=_text_,distrib=false,debug=[timing,
track],shards.purpose=16704,collection=test1,test2,shard.url=http://some_ip:8983/solr/test2_shard1_replica_n1/,rows=10,rid=-31,

Re: tlog size issue- solr cloud 6.6

2021-03-20 Thread Dominique Bejean
Hi,

By heavy query, do you mean you have a high query rate and/or you need
index update be available fast after few seconds (NRT search) ?

Do you see the hard commits in logs ?
Can you try to increase autoSoftCommit to 30 seconds or more ?

Regards

Dominique

Le sam. 20 mars 2021 à 18:53, Ritvik Sharma  a
écrit :

>   Hi Guys
>
> We have faced an issue where tlog size is increasing unnecessarily. We are
> using a "heavy indexing, heavy query" approach.  We enabled hard commit
> also,
>
> solr cloud: 6.6
> zk: 3.4.10
> shards: 2, replication factor= 2
>
>
> solrconfig,
>
>   
>${solr.autoCommit.maxTime:15000}
>   1
>false
>
>
>  
>
> 
>
>  
>${solr.autoSoftCommit.maxTime:15000} -->
>  
>
> In every replica, tlog is increasing for more than 200GB which exhaust disk
> space.  Please suggest someting
>
> On Fri, 19 Mar 2021 at 16:36, Ritvik Sharma  wrote:
>
> > Hi Guys
> >
> > We have faced an issue where tlog size is increasing unnecessarily. We
> are
> > using a "heavy indexing, heavy query" approach.  We enabled hard commit
> > also,
> >
> > solr cloud: 6.6
> > zk: 3.4.10
> > shards: 2, replication factor= 2
> >
> >
> > solrconfig,
> >
> >   
> >${solr.autoCommit.maxTime:15000}
> >   1
> >false
> >
> >
> >  
> >
> > 
> >
> >  
> >${solr.autoSoftCommit.maxTime:15000} -->
> >  
> >
> >
>


Re: tlog size issue- solr cloud 6.6

2021-03-20 Thread Ritvik Sharma
Hi Dominique

Heavy query means high query rate on solr.

Honestly for 2-3 days we have stopped the queries on solr only we are doing
indexing !!!
As you see we have enabled Hard commit also to decrease tlog log size or
more and the same is written on solr docs.  Still the same behaviour is
occurring on replicas. In some replicas, tlog size is of 300 GB.

We may increase softcommit timeout but as per my understanding it only
helps in searching availability prior to commit to indexes.

On Sun, 21 Mar 2021 at 4:29 AM, Dominique Bejean 
wrote:

> Hi,
>
> By heavy query, do you mean you have a high query rate and/or you need
> index update be available fast after few seconds (NRT search) ?
>
> Do you see the hard commits in logs ?
> Can you try to increase autoSoftCommit to 30 seconds or more ?
>
> Regards
>
> Dominique
>
> Le sam. 20 mars 2021 à 18:53, Ritvik Sharma  a
> écrit :
>
> >   Hi Guys
> >
> > We have faced an issue where tlog size is increasing unnecessarily. We
> are
> > using a "heavy indexing, heavy query" approach.  We enabled hard commit
> > also,
> >
> > solr cloud: 6.6
> > zk: 3.4.10
> > shards: 2, replication factor= 2
> >
> >
> > solrconfig,
> >
> >   
> >${solr.autoCommit.maxTime:15000}
> >   1
> >false
> >
> >
> >  
> >
> > 
> >
> >  
> >${solr.autoSoftCommit.maxTime:15000} -->
> >  
> >
> > In every replica, tlog is increasing for more than 200GB which exhaust
> disk
> > space.  Please suggest someting
> >
> > On Fri, 19 Mar 2021 at 16:36, Ritvik Sharma 
> wrote:
> >
> > > Hi Guys
> > >
> > > We have faced an issue where tlog size is increasing unnecessarily. We
> > are
> > > using a "heavy indexing, heavy query" approach.  We enabled hard commit
> > > also,
> > >
> > > solr cloud: 6.6
> > > zk: 3.4.10
> > > shards: 2, replication factor= 2
> > >
> > >
> > > solrconfig,
> > >
> > >   
> > >${solr.autoCommit.maxTime:15000}
> > >   1
> > >false
> > >
> > >
> > >  
> > >
> > > 
> > >
> > >  
> > >${solr.autoSoftCommit.maxTime:15000} -->
> > >  
> > >
> > >
> >
>