Need help to configure DocExpirationUpdateProcessorFactory to delete documents older than 6 months based on column name in document
Hello Users, I am using solr version 7.4.0. I have around 10 collections which are created using the default solrconfig.xml. Few collections have the column : *solr_created_dttm* which is indexed. I want to delete all the documents from these collections which are older than 6 months. I understand there is *DocExpirationUpdateProcessorFactory *which needs to be added to solrconfig.xml. Can someone please help with the configuration I need to add. I tried a few things like adding expiration filed but wanst sure how to calculate expiration period to check *solr_created_dttm*is 6 month older? Also since all the collections share the same config file this processor won't affect those collections which don't have " *solr_created_dttm* " . Please confirm if my understanding is correct as I only want to delete documents from collection having filed "*solr_created_dttm* " Thanks in advance !! Regards, Amruta
Re: query.queryResultMaxDocCached not editable via overlay
Haha thanks for the spot! __ On 26/03/2021, 01:26, "Koji Sekiguchi" wrote: It seems the reference guide has typo for the parameter. Try query.queryResultMaxDocsCached. Koji On 2021/03/25 22:12, Karl Stoney wrote: > Hey, > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsolr.apache.org%2Fguide%2F8_8%2Fconfig-api.html&data=04%7C01%7CKarl.Stoney%40autotrader.co.uk%7Cca908ad444f34263769c08d8eff60c31%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637523188188795191%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=UmXzsb%2Bio2nU467vtbTOt8oN83xvvhTx4eERDs74b%2F8%3D&reserved=0 states the field is editable, however I get a 400 back from solr: > > ‘query.queryResultMaxDocCached' is not an editable property > > Any ideas? > > Can change other fields fine. > Unless expressly stated otherwise in this email, this e-mail is sent on behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited is part of the Auto Trader Group Plc group. This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses. > Unless expressly stated otherwise in this email, this e-mail is sent on behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited is part of the Auto Trader Group Plc group. This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.
Re: Frequent OutOfMemoryError on production solr
Hello Parshant, I can't see anything particular wrong with your query. You could try using collpaseqparser and expand instead of group to see if you have any improvements. Also you should check how much memory your caches use and check for any evictions or check the hitrate. Maybe you do not need your caches to be that big. Especially you should check the document cache. The documentation says the following: "The size for the documentCache should always be greater than max_results times the max_concurrent_queries, to ensure that Solr does not need to refetch a document during a request. The more fields you store in your documents, the higher the memory usage of this cache will be." Also can you please share the gc log before a oom? Thanks, Florin Babeş În vin., 26 mar. 2021 la 08:53, Parshant Kumar a scris: > Hi Florin, > > Please check the info and let me know if some improvisation can be done. > > Query Example > shards=good&mc1="190850"&mc3="190850A"&mc4="190850B"&mc5="190850LS"&mc6="190850SS"&mc7="190850P"&mc12="190850CA"&mcHigh=190850&mcHighA=190850&mcHighB=190850B&mcHighAB=190850&q=bags&ps=2&rows=14&group=true&group.limit=5&group.field=glid&group.ngroups=true&lat=0&lon=0&spellcheck=true&fq=wt:[0 > TO 1299]&fq=-wt:(1259 1279 1289)&fq=id:(someids) OR ( titles:("imsw bags > imsw") AND titles:("bagimsw") )&boost=map(query({!dismax qf=id v=$mc3 > pf=""}),0,0,map(query({!dismax qf=id v=$mc4 pf=""}),0,0,map(query({!dismax > qf=id mm=0 v=$mcHighA pf=""}),0,0,map(query({!dismax qf=id mm=0 v=$mcHighB > pf=""}),0,0,map(query({!dismax qf=id v=$mc12 pf=""}),0,0,map(query({!dismax > qf=id v=$mc1 pf=""}),0,0,1,1.1 ),2.0),80.0),105.0),175.0),250.0)&some more > similiar boosts > cache configuration initialSize="2000" autowarmCount="100" /> > autowarmCount="100" /> > autowarmCount="512" /> > JVM gc configuration -XX:CICompilerCount=4 -XX:ConcGCThreads=3 > -XX:G1HeapRegionSize=8388608 -XX:GCLogFileSize=20971520 > -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=17179869184 > -XX:MarkStackSize=4194304 -XX:MaxHeapSize=17179869184 > -XX:MaxNewSize=10301210624 -XX:MinHeapDeltaBytes=8388608 > -XX:NumberOfGCLogFiles=9 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime > -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps > -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:ThreadStackSize=256 > -XX:+UseCompressedClassPointers -XX:+UseCompressedOops > -XX:+UseFastUnorderedTimeStamps -XX:+UseG1GC -XX:+UseGCLogFileRotation > heap size 16GB > grouping field cardinality Cardinality of the group field is around 6.2 > Million > > On Thu, Mar 25, 2021 at 2:28 PM Florin Babes > wrote: > > > Hello, > > Can you give a query example, cache configuration, JVM gc configuration > and > > heap size, grouping field cardinality? > > Thanks, > > > > Florin Babeş > > > > > > În joi, 25 mar. 2021 la 10:16, Parshant Kumar > > a scris: > > > > > Yes we are using grouped queries > > > > > > On Thu, Mar 25, 2021, 1:42 PM Saurabh Sharma < > saurabh.infoe...@gmail.com > > > > > > wrote: > > > > > > > Are you doing lots of group queries? Sometimes due to huge data scan, > > you > > > > will face high gc activity and may lead to oom errors. > > > > > > > > On Thu, Mar 25, 2021, 1:24 PM Parshant Kumar > > > > wrote: > > > > > > > > > We have 4solr servers which contain same data 100GB each. > > > > > Each server has following configuration: > > > > > > > > > > Solr version - 6.5 > > > > > RAM 96 GB > > > > > 14 Processors > > > > > DiskSpace 350GB for data folder > > > > > > > > > > Request Rate on our servers is around 20/second. > > > > > > > > > > Our servers go OutOfMemory quite often, either when replication > > > > completes( > > > > > not full replication, partial one) or when there is spike is > request > > > > count. > > > > > > > > > > Its not the case that it goes OOM with every replication cycle,but > > > > > sometimes. > > > > > > > > > > We are not able to figure out the reason for this. > > > > > Any help would be appreciated. > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > -- > >
Error creating TLOG collection
I've tried to post his several times, but have not seen it in the discussion, maybe my account was removed? Anyway I need help with this issue. It looks like a bug to me We use Solr 7.7.2 in a solrcloud consisting of 4 nodes. Each collection has 2 shards and we try to place a replica on each node and don't want to have more than one replica for a collection on the same node. One of our collections has very specific requirements that force us to use TLOG replicas for the collection. We only have 4 nodes, we don't want more than one replica for the collection on a node. From what I see in the documentation I would expect that adding the rule parameter to the collection creation call would give me what I want. For most of our collections it does. However, it fails if I try this with a tlog collection. http://localhost:8983/solr/admin/collections?action=CREATE&name=b2b-catalog-material-20210320T&collection.configName=b2b-catalog-material-20210320&numShards=2&rule=replica:<2,node:*&tlogReplicas=2 it fails with the error: TLOG or PULL replica types not supported with placement rules or cluster policies This seems like a bug to me especially after I saw this Jira: https://issues.apache.org/jira/browse/SOLR-10233 According to that all replicas should be supported. Is there a work around for this? How can I place TLOG replicas on separate nodes? Personally I find the relevancy values for TLOG replicas much better than NRT replicas and think tlog should be the default replica type. This was the complete response. { "responseHeader": { "status": 500, "QTime": 574 }, "Operation create caused exception:": "org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: TLOG or PULL replica types not supported with placement rules or cluster policies", "exception": { "msg": "TLOG or PULL replica types not supported with placement rules or cluster policies", "rspCode": 500 }, "error": { "metadata": [ "error-class", "org.apache.solr.common.SolrException", "root-error-class", "org.apache.solr.common.SolrException" ], "msg": "TLOG or PULL replica types not supported with placement rules or cluster policies", "trace": "org.apache.solr.common.SolrException: TLOG or PULL replica types not supported with placement rules or cluster policies\n\tat org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:275)\n\tat org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:247)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)\n\tat org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)\n\tat org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat org.eclipse.jetty.server.handler.StatisticsHandler.handle(St
Re: Need help to configure DocExpirationUpdateProcessorFactory to delete documents older than 6 months based on column name in document
: Few collections have the column : *solr_created_dttm* which is indexed. I : want to delete all the documents from these collections which are older : than 6 months. : : I understand there is *DocExpirationUpdateProcessorFactory *which needs to : be added to solrconfig.xml. : : Can someone please help with the configuration I need to add. I tried a If you alread yhave an existing collection, containing existing documents, then adding DocExpirationUpdateProcessorFactory at this point will not help you delete documents where the "created" date field is more then 6months old. What the DocExpirationUpdateProcessorFactory does is provide 2 features that can be used together or independently: 1) add an *additional* "expirationField" at indexing time, containing an absolute datetime computed from some relative "TTL" (Time To Live) that can be specified per document (either as a field or as a request param when indexing) 2) provide a periodic timer to delete documents whose expirationField is in the past. So, hypothetically, you could DocExpirationUpdateProcessorFactory to solrconfig.xml, along with a DefaultValueUpdateProcessorFactory that included a "+6MONTH" TTL field in every document, and then re-index all of your documents and DocExpirationUpdateProcessorFactory could then start periodically deleting all docs based on this new expire field. Or ... you could just setup a small cron job to periodically run a DeleteByQuery request against your solr_created_dttm using similar -- but negated -- date math against your existing field... solr_created_dttm:[* TO NOW-6MONTH] -Hoss http://www.lucidworks.com/
Highlighting with Span queries
Hi all, I am trying to get highlighting working with Span queries. My span query looks like (my query parser is an extension of the edismax queryparser): *spanNear([stemmed_text:tintin, stemmed_text:haddock], 4, false)* When I change the query to *+stemmed_text:tintin +stemmed_text:haddock* I get highlights. I'm using the unified highlighter and also I tried setting the usePhraseHighlighter explicitly with no luck. Should highlighting be supported and if so, am I missing something? Thanks! Sjoerd unstemmed_text schema: ** highlight component: * 100 70 0.5[-\w ,/\n\"']{20,200} http://hl.simple.post/>"> http://hl.tag.post/>"> 10 .,!? WORD enUS *
Highlighting with Span queries
Hi all, I am trying to get highlighting working with Span queries. My span query looks like (my query parser is an extension of the edismax queryparser): *spanNear([stemmed_text:tintin, stemmed_text:haddock], 4, false)* When I change the query to *+stemmed_text:tintin +stemmed_text:haddock* I get highlights. I'm using the unified highlighter and also I tried setting the usePhraseHighlighter explicitly with no luck. Should highlighting be supported and if so, am I missing something? Thanks! Sjoerd unstemmed_text schema: ** highlight component: * 100 70 0.5[-\w ,/\n\"']{20,200} http://hl.simple.post>"> http://hl.tag.post>"> 10 .,!? WORD enUS *