Need help to configure DocExpirationUpdateProcessorFactory to delete documents older than 6 months based on column name in document

2021-03-26 Thread amruta dale
Hello Users,


I am using solr version 7.4.0.  I have around 10 collections which are
created using the default solrconfig.xml.

Few collections have the column : *solr_created_dttm* which is indexed.  I
want to delete all the documents from these collections which are older
than 6 months.

I understand there is *DocExpirationUpdateProcessorFactory *which needs to
be added to solrconfig.xml.

Can someone please help with the configuration I need to add.  I tried a
few things like adding expiration filed but wanst sure how to calculate
expiration period to check *solr_created_dttm*is 6 month older?

Also since all the collections share the same config file this processor
won't affect those collections which don't have " *solr_created_dttm* " .
Please confirm if my understanding is correct as I only want to delete
documents from collection having filed  "*solr_created_dttm* "


Thanks in advance !!


Regards,
Amruta


Re: query.queryResultMaxDocCached not editable via overlay

2021-03-26 Thread Karl Stoney
Haha thanks for the spot!  __

On 26/03/2021, 01:26, "Koji Sekiguchi"  wrote:

It seems the reference guide has typo for the parameter. Try 
query.queryResultMaxDocsCached.

Koji


On 2021/03/25 22:12, Karl Stoney wrote:
> Hey,
> 
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsolr.apache.org%2Fguide%2F8_8%2Fconfig-api.html&data=04%7C01%7CKarl.Stoney%40autotrader.co.uk%7Cca908ad444f34263769c08d8eff60c31%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637523188188795191%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=UmXzsb%2Bio2nU467vtbTOt8oN83xvvhTx4eERDs74b%2F8%3D&reserved=0
 states the field is editable, however I get a 400 back from solr:
>
> ‘query.queryResultMaxDocCached' is not an editable property
>
> Any ideas?
>
> Can change other fields fine.
> Unless expressly stated otherwise in this email, this e-mail is sent on 
behalf of Auto Trader Limited Registered Office: 1 Tony Wilson Place, 
Manchester, Lancashire, M15 4FN (Registered in England No. 03909628). Auto 
Trader Limited is part of the Auto Trader Group Plc group. This email and any 
files transmitted with it are confidential and may be legally privileged, and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.
>

Unless expressly stated otherwise in this email, this e-mail is sent on behalf 
of Auto Trader Limited Registered Office: 1 Tony Wilson Place, Manchester, 
Lancashire, M15 4FN (Registered in England No. 03909628). Auto Trader Limited 
is part of the Auto Trader Group Plc group. This email and any files 
transmitted with it are confidential and may be legally privileged, and 
intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses.


Re: Frequent OutOfMemoryError on production solr

2021-03-26 Thread Florin Babes
Hello Parshant,
I can't see anything particular wrong with your query. You could try using
collpaseqparser and expand instead of group to see if you have any
improvements.
Also you should check how much memory your caches use and check for any
evictions or check the hitrate. Maybe you do not need your caches to be
that big.  Especially you should check the document cache. The
documentation says the following:
"The size for the documentCache should always be greater than max_results
times the max_concurrent_queries, to ensure that Solr does not need to
refetch a document during a request. The more fields you store in your
documents, the higher the memory usage of this cache will be."

Also can you please share the gc log before a oom?

Thanks,

Florin Babeş



În vin., 26 mar. 2021 la 08:53, Parshant Kumar
 a scris:

> Hi Florin,
>
> Please check the info and let me know if some improvisation can be done.
>
> Query Example
> shards=good&mc1="190850"&mc3="190850A"&mc4="190850B"&mc5="190850LS"&mc6="190850SS"&mc7="190850P"&mc12="190850CA"&mcHigh=190850&mcHighA=190850&mcHighB=190850B&mcHighAB=190850&q=bags&ps=2&rows=14&group=true&group.limit=5&group.field=glid&group.ngroups=true&lat=0&lon=0&spellcheck=true&fq=wt:[0
> TO 1299]&fq=-wt:(1259 1279 1289)&fq=id:(someids) OR ( titles:("imsw bags
> imsw") AND titles:("bagimsw") )&boost=map(query({!dismax qf=id v=$mc3
> pf=""}),0,0,map(query({!dismax qf=id v=$mc4 pf=""}),0,0,map(query({!dismax
> qf=id mm=0 v=$mcHighA pf=""}),0,0,map(query({!dismax qf=id mm=0 v=$mcHighB
> pf=""}),0,0,map(query({!dismax qf=id v=$mc12 pf=""}),0,0,map(query({!dismax
> qf=id v=$mc1 pf=""}),0,0,1,1.1 ),2.0),80.0),105.0),175.0),250.0)&some more
> similiar boosts
> cache configuration  initialSize="2000" autowarmCount="100" />
>  autowarmCount="100" />
>  autowarmCount="512" />
> JVM gc configuration -XX:CICompilerCount=4 -XX:ConcGCThreads=3
> -XX:G1HeapRegionSize=8388608 -XX:GCLogFileSize=20971520
> -XX:+HeapDumpOnOutOfMemoryError -XX:InitialHeapSize=17179869184
> -XX:MarkStackSize=4194304 -XX:MaxHeapSize=17179869184
> -XX:MaxNewSize=10301210624 -XX:MinHeapDeltaBytes=8388608
> -XX:NumberOfGCLogFiles=9 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:ThreadStackSize=256
> -XX:+UseCompressedClassPointers -XX:+UseCompressedOops
> -XX:+UseFastUnorderedTimeStamps -XX:+UseG1GC -XX:+UseGCLogFileRotation
> heap size 16GB
> grouping field cardinality Cardinality of the group field is around 6.2
> Million
>
> On Thu, Mar 25, 2021 at 2:28 PM Florin Babes 
> wrote:
>
> > Hello,
> > Can you give a query example, cache configuration, JVM gc configuration
> and
> > heap size, grouping field cardinality?
> > Thanks,
> >
> > Florin Babeş
> >
> >
> > În joi, 25 mar. 2021 la 10:16, Parshant Kumar
> >  a scris:
> >
> > > Yes we are using grouped queries
> > >
> > > On Thu, Mar 25, 2021, 1:42 PM Saurabh Sharma <
> saurabh.infoe...@gmail.com
> > >
> > > wrote:
> > >
> > > > Are you doing lots of group queries? Sometimes due to huge data scan,
> > you
> > > > will face high gc activity and may lead to oom errors.
> > > >
> > > > On Thu, Mar 25, 2021, 1:24 PM Parshant Kumar
> > > >  wrote:
> > > >
> > > > > We have 4solr servers which contain same data 100GB each.
> > > > > Each server has following configuration:
> > > > >
> > > > > Solr version - 6.5
> > > > > RAM 96 GB
> > > > > 14 Processors
> > > > > DiskSpace 350GB for data folder
> > > > >
> > > > > Request Rate on our servers is around 20/second.
> > > > >
> > > > > Our servers go OutOfMemory quite often, either when replication
> > > > completes(
> > > > > not full replication, partial one) or when there is spike is
> request
> > > > count.
> > > > >
> > > > > Its not the case that it goes OOM with every replication cycle,but
> > > > > sometimes.
> > > > >
> > > > > We are not able to figure out the reason for this.
> > > > > Any help would be appreciated.
> > > > >
> > > > > --
> > > > >
> > > > >
> > > >
> > >
> > > --
> > >
> > >
> >
>
> --
>
>


Error creating TLOG collection

2021-03-26 Thread Webster Homer
I've tried to post his several times, but have not seen it in the discussion, 
maybe my account was removed?
Anyway I need help with this issue. It looks like a bug to me

We use Solr 7.7.2 in a solrcloud consisting of 4 nodes. Each collection has 2 
shards and we try to place a replica on each node and  don't want to have more 
than one replica for a collection on the same node.

One of our collections has very specific requirements that force us to use TLOG 
replicas for the collection.

We only have 4 nodes, we don't want more than one replica for the collection on 
a node. From what I see in the documentation I would expect that adding the 
rule parameter to the collection creation call would give me what I want. For 
most of our collections it does. However, it fails if I try this with a tlog 
collection.

http://localhost:8983/solr/admin/collections?action=CREATE&name=b2b-catalog-material-20210320T&collection.configName=b2b-catalog-material-20210320&numShards=2&rule=replica:<2,node:*&tlogReplicas=2

it fails with the error: TLOG or PULL replica types not supported with 
placement rules or cluster policies

This seems like a bug to me especially after I saw this Jira: 
https://issues.apache.org/jira/browse/SOLR-10233
According to that all replicas should be supported.

Is there a work around for this? How can I place TLOG replicas on separate 
nodes?

Personally I find the relevancy values for TLOG replicas much better than NRT 
replicas and think tlog should be the default replica type.

This was the complete response.
{
"responseHeader": {
"status": 500,
"QTime": 574
},
"Operation create caused exception:": 
"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
TLOG or PULL replica types not supported with placement rules or cluster 
policies",
"exception": {
"msg": "TLOG or PULL replica types not supported with placement rules 
or cluster policies",
"rspCode": 500
},
"error": {
"metadata": [
"error-class",
"org.apache.solr.common.SolrException",
"root-error-class",
"org.apache.solr.common.SolrException"
],
"msg": "TLOG or PULL replica types not supported with placement rules 
or cluster policies",
"trace": "org.apache.solr.common.SolrException: TLOG or PULL replica 
types not supported with placement rules or cluster policies\n\tat 
org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:53)\n\tat
 
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:275)\n\tat
 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:247)\n\tat
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
 org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)\n\tat 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)\n\tat
 org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
 
org.eclipse.jetty.server.handler.StatisticsHandler.handle(St

Re: Need help to configure DocExpirationUpdateProcessorFactory to delete documents older than 6 months based on column name in document

2021-03-26 Thread Chris Hostetter


: Few collections have the column : *solr_created_dttm* which is indexed.  I
: want to delete all the documents from these collections which are older
: than 6 months.
: 
: I understand there is *DocExpirationUpdateProcessorFactory *which needs to
: be added to solrconfig.xml.
: 
: Can someone please help with the configuration I need to add.  I tried a

If you alread yhave an existing collection, containing existing documents, 
then adding DocExpirationUpdateProcessorFactory at this point will not 
help you delete documents where the "created" date field is more then 
6months old.

What the DocExpirationUpdateProcessorFactory does is provide 2 features 
that can be used together or independently:

1) add an *additional* "expirationField" at indexing time, containing 
an absolute datetime computed from some relative "TTL" (Time To Live) that 
can be specified per document (either as a field or as a request param 
when indexing)

2) provide a periodic timer to delete documents whose expirationField is 
in the past.


So, hypothetically, you could DocExpirationUpdateProcessorFactory to 
solrconfig.xml, along with a DefaultValueUpdateProcessorFactory that 
included a "+6MONTH" TTL field in every document, and then re-index all of 
your documents and DocExpirationUpdateProcessorFactory could then start 
periodically deleting all docs based on this new expire field.

Or ... you could just setup a small cron job to periodically run a 
DeleteByQuery request against your solr_created_dttm using similar -- but 
negated -- date math against your existing field...

solr_created_dttm:[* TO NOW-6MONTH]



-Hoss
http://www.lucidworks.com/


Highlighting with Span queries

2021-03-26 Thread Sjoerd Smeets
Hi all,

I am trying to get highlighting working with Span queries. My span query
looks like (my query parser is an extension of the edismax queryparser):

*spanNear([stemmed_text:tintin, stemmed_text:haddock], 4, false)*

When I change the query to
*+stemmed_text:tintin +stemmed_text:haddock*

I get highlights. I'm using the unified highlighter and also I tried
setting the usePhraseHighlighter explicitly with no luck.

Should highlighting be supported and if so, am I missing something?

Thanks!
Sjoerd

unstemmed_text schema:
**

highlight component:



































































*

100
  70
0.5[-\w ,/\n\"']{20,200}
  http://hl.simple.post/>">
  




http://hl.tag.post/>">
10
  .,!? 	

WORD
  enUS
*


Highlighting with Span queries

2021-03-26 Thread Sjoerd Smeets
Hi all,

I am trying to get highlighting working with Span queries. My span query
looks like (my query parser is an extension of the edismax queryparser):

*spanNear([stemmed_text:tintin, stemmed_text:haddock], 4, false)*

When I change the query to
*+stemmed_text:tintin +stemmed_text:haddock*

I get highlights. I'm using the unified highlighter and also I tried
setting the usePhraseHighlighter explicitly with no luck.

Should highlighting be supported and if so, am I missing something?

Thanks!
Sjoerd

unstemmed_text schema:
**

highlight component:



































































*

100
  70
0.5[-\w ,/\n\"']{20,200}
  http://hl.simple.post>">
  




http://hl.tag.post>">
  10
  .,!? 	

WORD
  enUS
*