Use of LB in base URL and shards parameter for search in Solr (ver: 6.1.0)
Hello, We are using Solr 6.1.0. We have 2 shards and each has one replica with 7 zookeepers in our live environment. Our Solr URL is http://solr2.xyz.com:8983/solr/actionscomments/select?q=+resource_id:(123)+and+entity_type:(4)++action_status:(0)++is_active:(true)+and+recipient_id:(5941841)&sort=action_date+desc,id+desc&start=0&rows=1000&fq=&fl=action_id&indent=off&shards.tolerant=true&shards=s3.xyz.com:8983/solr/actionscomments|s3r1.xyz.com:8983/solr/actionscomments,s4.xyz.com:8983/solr/actionscomments|s4r1.xyz.com:8983/solr/actionscomments I would like to understand correct configuration for use of LB and shards parameters in URL. Above URL we use for search in collection. 1) In current configuration base URL (https://solr2.xyz.com) is of LB which point to IPs of shards (Shard1 & Shard2) and replicas (replica1 & replica2). Is it correct? or instead provide IP of shard be more effective? 2) What is importance of shards parameters? What should we use LB URL or direct IP of shards and replicas? Regards, Jay Harkhani.
Use of LB in base URL and shards parameter for search in Solr (ver: 6.1.0)
Hello, We are using Solr 6.1.0. We have 2 shards and each has one replica with 7 zookeepers in our live environment. Our Solr URL is http://solr2.xyz.com:8983/solr/actionscomments/select?q=+resource_id:(123)+and+entity_type:(4)++action_status:(0)++is_active:(true)+and+recipient_id:(5941841)&sort=action_date+desc,id+desc&start=0&rows=1000&fq=&fl=action_id&indent=off&shards.tolerant=true&shards=s3.xyz.com:8983/solr/actionscomments|s3r1.xyz.com:8983/solr/actionscomments,s4.xyz.com:8983/solr/actionscomments|s4r1.xyz.com:8983/solr/actionscomments I would like to understand correct configuration for use of LB and shards parameters in URL. Above URL we use for search in collection. 1) In current configuration base URL (https://solr2.xyz.com) is of LB which point to IPs of shards (Shard1 & Shard2) and replicas (replica1 & replica2). Is it correct? or instead provide IP of shard be more effective? 2) What is importance of shards parameters? What should we use LB URL or direct IP of shards and replicas? Regards, Jay Harkhani.
Use of LB in base URL and shards parameter for search in Solr (ver: 6.1.0)
We are using Solr 6.1.0. We have 2 shards and each has one replica with 7 zookeepers in our live environment. Our Solr URL is http://solr2.xyz.com:8983/solr/actionscomments/select?q=+resource_id:(123)+and+entity_type:(4)++action_status:(0)++is_active:(true)+and+recipient_id:(5941841)&sort=action_date+desc,id+desc&start=0&rows=1000&fq=&fl=action_id&indent=off&shards.tolerant=true&shards=s3.xyz.com:8983/solr/actionscomments|s3r1.xyz.com:8983/solr/actionscomments,s4.xyz.com:8983/solr/actionscomments|s4r1.xyz.com:8983/solr/actionscomments I would like to understand correct configuration for use of LB and shards parameters in URL. Above URL we use for search in collection. 1) In current configuration base URL (https://solr2.xyz.com) is of LB which point to IPs of shards (Shard1 & Shard2) and replicas (replica1 & replica2). Is it correct? or instead provide IP of shard be more effective? 2) What is importance of shards parameters? What should we use LB URL or direct IP of shards and replicas? Regards, Vishal Patel
Re: Solr complains about unknown field during atomic indexing
On 3/19/2021 3:36 PM, gnandre wrote: While performing atomic indexing, I run into an error which says 'unknown field X' where X is not a field specified in the schema. It is a discontinued field. After deleting that field from the schema, I have restarted Solr but I have not re-indexed the content back, so the deleted field data still might be there in Solr index. The way I understand how atomic indexing works, it tries to index all stored values again, but why is it trying to index stored value of a field that does not exist in the schema? Solr's Atomic Update feature works by grabbing the existing document, all of it, performing the atomic update instructions on that document, and then indexing the results as a new document. If the uniqueKey feature is enabled (which would be required for Atomic Updates to work properly), the old document is deleted as the new document is added. I haven't looked at the code, but the existing fields are likely added to the document that is being built all at once and without consulting the schema. So if field X is in the document that's already in the index, it will be in the new document too. If X is deleted from the schema, you'll get the error you're getting. It would be a fair amount of work to have Solr take the schema into account for atomic updates. Not impossible, just slightly time-consuming. I think we (the Solr developers) would want it to still fail indexing in this situation, the failure would just happen at a different place in the code than it does now, during atomic document assembly. Fail earlier and faster. What you'll need to for your circumstances is leave X in the schema, but change it to a type that will be completely ignored on indexing. Something like this: You could then add the following to take care of any and all unknown fields: Or you could name individual fields like that, which I think would be a better option than the wildcard dynamic field. My source for the config snippets: https://stackoverflow.com/questions/46509259/solr-7-managed-schema-how-to-ignore-unnamed-fields Thanks, Shawn
Re: Use of LB in base URL and shards parameter for search in Solr (ver: 6.1.0)
On 3/20/2021 1:04 AM, jay harkhani wrote: Hello, We are using Solr 6.1.0. We have 2 shards and each has one replica with 7 zookeepers in our live environment. So you've got it running SolrCloud mode (with ZooKeeper)? Our Solr URL is http://solr2.xyz.com:8983/solr/actionscomments/select?q=+resource_id:(123)+and+entity_type:(4)++action_status:(0)++is_active:(true)+and+recipient_id:(5941841)&sort=action_date+desc,id+desc&start=0&rows=1000&fq=&fl=action_id&indent=off&shards.tolerant=true&shards=s3.xyz.com:8983/solr/actionscomments|s3r1.xyz.com:8983/solr/actionscomments,s4.xyz.com:8983/solr/actionscomments|s4r1.xyz.com:8983/solr/actionscomments I would like to understand correct configuration for use of LB and shards parameters in URL. Above URL we use for search in collection. 1) In current configuration base URL (https://solr2.xyz.com) is of LB which point to IPs of shards (Shard1 & Shard2) and replicas (replica1 & replica2). Is it correct? or instead provide IP of shard be more effective? 2) What is importance of shards parameters? What should we use LB URL or direct IP of shards and replicas? If you want Solr to do load balancing automatically across all servers containing parts of that collection, you should completely remove the shards parameter. It is not necessary and probably prevents load balancing from working. If you're talking about using your own load balancer, I would still remove the shards parameter. Solr will automatically figure out where every shard replica is when you query the collection. Thanks, Shawn
Check for ongoing REINDEXCOLLECTION
Hi, I’m aware I can check the status of a reindx, if I know both the source and destination cluster, or I can check the progress of the async request via the async API. However, if I know neither of these, and I just want to check if there are any REINDEX’s running on the cluster at any given time, is there a programmatic way to do this? The reason behind this is that we’ve got a custom application which starts the reindex, and I want it to first validate there aren’t any other reindexes running. Thanks Karl Unless expressly stated otherwise in this email, this e-mail is sent on behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited is part of the Auto Trader Group Plc group. This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
RE: Check for ongoing REINDEXCOLLECTION
I think calling: /solr/collectionReindexCommandIsSentTo/stream?action=list Lists running daemon processes, which should contain running reindexing operations. Be careful of https://issues.apache.org/jira/browse/SOLR-13245 though, if you have more than one replica on the same node. (Joel Bernstein’s comment under the issue). Sent from Mail for Windows 10 From: Karl Stoney Sent: 20 March 2021 19:40 To: solr-u...@lucene.apache.org Subject: Check for ongoing REINDEXCOLLECTION Hi, I’m aware I can check the status of a reindx, if I know both the source and destination cluster, or I can check the progress of the async request via the async API. However, if I know neither of these, and I just want to check if there are any REINDEX’s running on the cluster at any given time, is there a programmatic way to do this? The reason behind this is that we’ve got a custom application which starts the reindex, and I want it to first validate there aren’t any other reindexes running. Thanks Karl Unless expressly stated otherwise in this email, this e-mail is sent on behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited is part of the Auto Trader Group Plc group. This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
Re: tlog size issue- solr cloud 6.6
Hi Guys We have faced an issue where tlog size is increasing unnecessarily. We are using a "heavy indexing, heavy query" approach. We enabled hard commit also, solr cloud: 6.6 zk: 3.4.10 shards: 2, replication factor= 2 solrconfig, ${solr.autoCommit.maxTime:15000} 1 false ${solr.autoSoftCommit.maxTime:15000} --> In every replica, tlog is increasing for more than 200GB which exhaust disk space. Please suggest someting On Fri, 19 Mar 2021 at 16:36, Ritvik Sharma wrote: > Hi Guys > > We have faced an issue where tlog size is increasing unnecessarily. We are > using a "heavy indexing, heavy query" approach. We enabled hard commit > also, > > solr cloud: 6.6 > zk: 3.4.10 > shards: 2, replication factor= 2 > > > solrconfig, > > >${solr.autoCommit.maxTime:15000} > 1 >false > > > > > > > >${solr.autoSoftCommit.maxTime:15000} --> > > >
Re: How to set maxExpansions parameter for fuzzy search
It appears this setter needs to be called: org.apache.solr.parser.SolrQueryParserBase#setFuzzyMinSim and it could be done by edismax. PR welcome! ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Thu, Mar 18, 2021 at 12:28 PM Olivier Tavard wrote: > Hi, > > I have a question regarding the fuzzy search. > By reading previous questions on the ML in the past I saw that the > parameter max expansions is set to 50 in the code. > I have the same behavior that for other users meaning that if I have a > Solrcloud cluster with many shards, I obtain more results for a fuzzy > search than if I were on a monoserver with a single shard cause of the max > expansions parameter because the value is per shard. > So I would like to increase the value but to my knowledge, I need to change > it in the code and recompile Solr. Is there a way to set it directly at the > query, I did not find anything in the documentation. I am aware that it can > cause poor search performance but I need to increase it. I saw that in > ElasticSearch, recently they did a modification to set it at query time but > not in Solr as I see, am I correct ? > > Thank you, > > Olivier >
facet alias with "duplicate" uniqueKey
Hi folks, I've noticed the following warning in the aliases documentation: "...Reindexing a document with a different route value for the same ID produces two distinct documents with the same ID accessible via the alias..." When tested such case it seems that really only one doc is retrieved but when turning on facets they aren't aligned with the result set. My test: 1) create two collections test1 and test2 and alias named test for both 2) index docs with the same id to both of the collections {"id":123} 3) querying the alias as followed with explained debug: http://localhost:8983/solr/test/select?debug.explain.structured=true&debugQuery=on&facet.field=id&facet=on&q=*%3A* { "responseHeader":{ "zkConnected":true, "status":0, "QTime":25, "params":{ "q":"*:*", "facet.field":"id", "debug.explain.structured":"true", "facet":"on", "debugQuery":"on", "_":"1616269705741"}}, "response":{*"numFound":1*,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[ { "id":"123", "_version_":1694670492462481408}] }, "facet_counts":{ "facet_queries":{}, "facet_fields":{ *"id":[ "123",2]},* "facet_ranges":{}, "facet_intervals":{}, "facet_heatmaps":{}}, "debug":{ "track":{ "rid":"-31", "EXECUTE_QUERY":{ "http://some_ip:8983/solr/test2_shard1_replica_n1/":{ "QTime":"3", "ElapsedTime":"10", "RequestPurpose":"GET_TOP_IDS,GET_FACETS,SET_TERM_STATS", "NumFound":"1", "Response":"{responseHeader={zkConnected=true,status=0,QTime=3,params={df=_text_,distrib=false,fl=[id, score],shards.purpose=16404,fsv=true,shard.url=http://some_ip:8983/solr/test2_shard1_replica_n1/,rid=-31,wt=javabin,_=1616269705741,facet.field=id,f.id.facet.mincount=0,debug=[false, timing, track],start=0,f.id.facet.limit=160,collection=test1,test2,rows=10,debug.explain.structured=true,version=2,q=*:*,omitHeader=false,requestPurpose=GET_TOP_IDS,GET_FACETS,SET_TERM_STATS,NOW=1616270594521,isShard=true,facet=on,debugQuery=false}},response={numFound=1,numFoundExact=true,start=0,maxScore=1.0,docs=[SolrDocument{id=123, score=1.0}]},sort_values={},facet_counts={facet_queries={},facet_fields={id={123=1}},facet_ranges={},facet_intervals={},facet_heatmaps={}},debug={facet-debug={elapse=0,sub-facet=[{processor=SimpleFacets,elapse=0,action=field facet,maxThreads=0,sub-facet=[{elapse=0,requestedMethod=not specified,appliedMethod=FC,inputDocSetSize=1,field=id,numBuckets=2}]}]},timing={time=2.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},terms={time=0.0},debug={time=0.0}},process={time=2.0,query={time=0.0},facet={time=1.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},terms={time=0.0},debug={time=0.0}"}, "http://some_ip:8983/solr/test1_shard1_replica_n1/":{ "QTime":"2", "ElapsedTime":"12", "RequestPurpose":"GET_TOP_IDS,GET_FACETS,SET_TERM_STATS", "NumFound":"1", "Response":"{responseHeader={zkConnected=true,status=0,QTime=2,params={df=_text_,distrib=false,fl=[id, score],shards.purpose=16404,fsv=true,shard.url=http://some_ip:8983/solr/test1_shard1_replica_n1/,rid=-31,wt=javabin,_=1616269705741,facet.field=id,f.id.facet.mincount=0,debug=[false, timing, track],start=0,f.id.facet.limit=160,collection=test1,test2,rows=10,debug.explain.structured=true,version=2,q=*:*,omitHeader=false,requestPurpose=GET_TOP_IDS,GET_FACETS,SET_TERM_STATS,NOW=1616270594521,isShard=true,facet=on,debugQuery=false}},response={numFound=1,numFoundExact=true,start=0,maxScore=1.0,docs=[SolrDocument{id=123, score=1.0}]},sort_values={},facet_counts={facet_queries={},facet_fields={id={123=1}},facet_ranges={},facet_intervals={},facet_heatmaps={}},debug={facet-debug={elapse=0,sub-facet=[{processor=SimpleFacets,elapse=0,action=field facet,maxThreads=0,sub-facet=[{elapse=0,requestedMethod=not specified,appliedMethod=FC,inputDocSetSize=1,field=id,numBuckets=2}]}]},timing={time=2.0,prepare={time=0.0,query={time=0.0},facet={time=0.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},terms={time=0.0},debug={time=0.0}},process={time=2.0,query={time=0.0},facet={time=1.0},facet_module={time=0.0},mlt={time=0.0},highlight={time=0.0},stats={time=0.0},expand={time=0.0},terms={time=0.0},debug={time=0.0}"}}, "GET_FIELDS":{ "http://some_ip:8983/solr/test2_shard1_replica_n1/":{ "QTime":"5", "ElapsedTime":"8", "RequestPurpose":"GET_FIELDS,GET_DEBUG,SET_TERM_STATS", "NumFound":"1", "Response":"{responseHeader={zkConnected=true,status=0,QTime=5,params={facet.field=id,df=_text_,distrib=false,debug=[timing, track],shards.purpose=16704,collection=test1,test2,shard.url=http://some_ip:8983/solr/test2_shard1_replica_n1/,rows=10,rid=-31,
Re: tlog size issue- solr cloud 6.6
Hi, By heavy query, do you mean you have a high query rate and/or you need index update be available fast after few seconds (NRT search) ? Do you see the hard commits in logs ? Can you try to increase autoSoftCommit to 30 seconds or more ? Regards Dominique Le sam. 20 mars 2021 à 18:53, Ritvik Sharma a écrit : > Hi Guys > > We have faced an issue where tlog size is increasing unnecessarily. We are > using a "heavy indexing, heavy query" approach. We enabled hard commit > also, > > solr cloud: 6.6 > zk: 3.4.10 > shards: 2, replication factor= 2 > > > solrconfig, > > >${solr.autoCommit.maxTime:15000} > 1 >false > > > > > > > >${solr.autoSoftCommit.maxTime:15000} --> > > > In every replica, tlog is increasing for more than 200GB which exhaust disk > space. Please suggest someting > > On Fri, 19 Mar 2021 at 16:36, Ritvik Sharma wrote: > > > Hi Guys > > > > We have faced an issue where tlog size is increasing unnecessarily. We > are > > using a "heavy indexing, heavy query" approach. We enabled hard commit > > also, > > > > solr cloud: 6.6 > > zk: 3.4.10 > > shards: 2, replication factor= 2 > > > > > > solrconfig, > > > > > >${solr.autoCommit.maxTime:15000} > > 1 > >false > > > > > > > > > > > > > > > >${solr.autoSoftCommit.maxTime:15000} --> > > > > > > >
Re: tlog size issue- solr cloud 6.6
Hi Dominique Heavy query means high query rate on solr. Honestly for 2-3 days we have stopped the queries on solr only we are doing indexing !!! As you see we have enabled Hard commit also to decrease tlog log size or more and the same is written on solr docs. Still the same behaviour is occurring on replicas. In some replicas, tlog size is of 300 GB. We may increase softcommit timeout but as per my understanding it only helps in searching availability prior to commit to indexes. On Sun, 21 Mar 2021 at 4:29 AM, Dominique Bejean wrote: > Hi, > > By heavy query, do you mean you have a high query rate and/or you need > index update be available fast after few seconds (NRT search) ? > > Do you see the hard commits in logs ? > Can you try to increase autoSoftCommit to 30 seconds or more ? > > Regards > > Dominique > > Le sam. 20 mars 2021 à 18:53, Ritvik Sharma a > écrit : > > > Hi Guys > > > > We have faced an issue where tlog size is increasing unnecessarily. We > are > > using a "heavy indexing, heavy query" approach. We enabled hard commit > > also, > > > > solr cloud: 6.6 > > zk: 3.4.10 > > shards: 2, replication factor= 2 > > > > > > solrconfig, > > > > > >${solr.autoCommit.maxTime:15000} > > 1 > >false > > > > > > > > > > > > > > > >${solr.autoSoftCommit.maxTime:15000} --> > > > > > > In every replica, tlog is increasing for more than 200GB which exhaust > disk > > space. Please suggest someting > > > > On Fri, 19 Mar 2021 at 16:36, Ritvik Sharma > wrote: > > > > > Hi Guys > > > > > > We have faced an issue where tlog size is increasing unnecessarily. We > > are > > > using a "heavy indexing, heavy query" approach. We enabled hard commit > > > also, > > > > > > solr cloud: 6.6 > > > zk: 3.4.10 > > > shards: 2, replication factor= 2 > > > > > > > > > solrconfig, > > > > > > > > >${solr.autoCommit.maxTime:15000} > > > 1 > > >false > > > > > > > > > > > > > > > > > > > > > > > >${solr.autoSoftCommit.maxTime:15000} --> > > > > > > > > > > > >