tlog size issue- solr cloud 6.6

2021-03-19 Thread Ritvik Sharma
Hi Guys

We have faced an issue where tlog size is increasing unnecessarily. We are
using a "heavy indexing, heavy query" approach.  We enabled hard commit
also,

solr cloud: 6.6
zk: 3.4.10
shards: 2, replication factor= 2


solrconfig,

  
   ${solr.autoCommit.maxTime:15000}
  1
   false


 



 
   ${solr.autoSoftCommit.maxTime:15000} -->
 


Distributed IDF for Solr using ExactStatsCache issue

2021-03-19 Thread Cameron M VandenBerg
Hello,

I am using Solr in a distributed environment where I have split my collection 
into parts, which I have running on different nodes.  When I create each part 
of the collection, I set numShards and replicationFactor to 1.  The query speed 
is most important to us, and we are not worried about load on the system.

I want a Distributed IDF across all parts of the collection so I have added 
this line to my solrconfig.xml:


This seems to work about 90% of the time, but if I run the same request over 
and over again, sometimes I get scores with a local IDF for just one part of 
the collection.  Here is a request example:
/solr/collection1,collection2/query?q=fulltext:shark&rows=500&fl=id,url,title,score&sort=score+desc

I still get documents from both collection1 and collection2, but sometimes I 
get scores that are the same as when I would just query collection1.  I believe 
that it is only using the document frequency of collection one for the term in 
that case.

Should I use a different configuration?  I would like to make sure the IDF is 
always distributed and the same every time I run the same query.  Is there any 
technique I could use to ensure that this happens?

Thank you,
Cameron VandenBerg



Re: Conflict between atomic update and highlighting constraints

2021-03-19 Thread gnandre
This is useful. Thanks, David.
Although in my case, I don't think the solution to the issue you
mentioned resolves the issue.

Here is why -
I don't want to point to other fields for highlighting but use the same
copy fields destination fields for highlighting.
This is because those copy field destination fields in my case are 'exact
match' fields. So it is important that the highlighting works on those
exact match fields (unstemmed, no synonyms etc).

Also, in case if I had to point to other fields while highlighting, can't I
just override that with the hl.fl parameter? Basically, use field f1 in qf
param for searching, but use field f2 in hl.fl param? Why is there a need
for contentField param?



On Sun, Mar 14, 2021 at 12:02 AM David Smiley  wrote:

> Good point.  Using a different stored field for highlights has been asked
> for; there's a JIRA issue & a patch:
> https://issues.apache.org/jira/browse/SOLR-1105
> I'm too busy to push this forward by myself but if you can take over there,
> I can work with you (or anyone) to get it in.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Fri, Mar 12, 2021 at 12:32 PM gnandre  wrote:
>
> > Hi,
> >
> > I am running into a conflict between two constraints.
> >
> > Atomic updates require copy-field destinations to be stored=false.
> However,
> > if we want to use these copy-field destination fields in highlighting
> then
> > they need to be stored=true.
> >
> > How to resolve this conflict?
> >
>


Solr complains about unknown field during atomic indexing

2021-03-19 Thread gnandre
While performing  atomic indexing, I run into an error which says 'unknown
field X' where X is not a field specified in the schema. It is a
discontinued field. After deleting that field from the schema, I have
restarted Solr but I have not re-indexed the content back, so the deleted
field data still might be there in Solr index.

The way I understand how atomic indexing works, it tries to index all
stored values again, but why is it trying to index stored value of a field
that does not exist in the schema?


Re: Conflict between atomic update and highlighting constraints

2021-03-19 Thread David Smiley
I'm not 100% clear; perhaps you could use a hypothetical example where you
more clearly state which of f1 or f2 is stored, which way the copyField
goes, which field(s) you are already listing in "qf" for search.

It's been awhile since I last looked at that JIRA issue but I'm imagining a
feature where the UH might see that a particular hl.fl field is not
"stored" yet accommodate this by automatically detecting that this field is
the destination of one copyField declaration whose source is "stored", or
the reverse -- is copied to some other field that is stored.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Mar 19, 2021 at 4:50 PM gnandre  wrote:

> This is useful. Thanks, David.
> Although in my case, I don't think the solution to the issue you
> mentioned resolves the issue.
>
> Here is why -
> I don't want to point to other fields for highlighting but use the same
> copy fields destination fields for highlighting.
> This is because those copy field destination fields in my case are 'exact
> match' fields. So it is important that the highlighting works on those
> exact match fields (unstemmed, no synonyms etc).
>
> Also, in case if I had to point to other fields while highlighting, can't I
> just override that with the hl.fl parameter? Basically, use field f1 in qf
> param for searching, but use field f2 in hl.fl param? Why is there a need
> for contentField param?
>
>
>
> On Sun, Mar 14, 2021 at 12:02 AM David Smiley  wrote:
>
> > Good point.  Using a different stored field for highlights has been asked
> > for; there's a JIRA issue & a patch:
> > https://issues.apache.org/jira/browse/SOLR-1105
> > I'm too busy to push this forward by myself but if you can take over
> there,
> > I can work with you (or anyone) to get it in.
> >
> > ~ David Smiley
> > Apache Lucene/Solr Search Developer
> > http://www.linkedin.com/in/davidwsmiley
> >
> >
> > On Fri, Mar 12, 2021 at 12:32 PM gnandre 
> wrote:
> >
> > > Hi,
> > >
> > > I am running into a conflict between two constraints.
> > >
> > > Atomic updates require copy-field destinations to be stored=false.
> > However,
> > > if we want to use these copy-field destination fields in highlighting
> > then
> > > they need to be stored=true.
> > >
> > > How to resolve this conflict?
> > >
> >
>


Deprecation of QueryElevationComponent's support for elevate.xml in data directories

2021-03-19 Thread David Smiley
Hey Solr community,

I found myself doing some maintenance inside QueryElevationComponent
related to some much larger refactoring.  I'm seeing a fair amount of
complexity around it's support of "elevate.xml" in a Solr *data* directory
(where the index is).  I don't know of any other Solr component that does
anything like this.  This feature of QEC existed from the start in Jan
2008.  In October of that same year, Solr's replication handler gained the
ability to replicate config files (see "confFiles" option) between cores.
I think this aspect of QEC would not have existed if it came later.  In
SolrCloud, you're expected to use ZooKeeper of course.

I plan to remove this mechanism in 9.0 on the grounds that it is obsolete.

To account for the ability to automatically detect elevate.xml changes in a
data dir and reload on a commit, I'm adding this mechanism to the
configSet/conf.  I'm not doing so for SolrCloud at this time because of
overhead concerns but it could be added later with some work.

I also plan to remove QEC's support of the elevate.xml file to be
"versioned" -- e.g. elevate.xml.001, elevate.xml.002.  The underlying
mechanism was added in
https://issues.apache.org/jira/browse/SOLR-351 where it was argued to be
useful for Windows OS which cannot replace an already open file.  Obviously
this is a non-issue in SolrCloud, and perhaps not an issue if you use the
confFiles replication feature to edit it either.  Regardless, I'm highly
suspicious it's an issue whatsoever since this file shouldn't be held open
on a sustained basis; it's only loaded on core load / replication and with
my changes above, on a commit if the modification time changes.

If you have concerns, let me know.

SOLR-15274 - QueryElevationComponent: auto-load file changes; remove data
dir support 

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley