Re: pf (Phrase Fields) Query expansion and maxClauseCount nested clause issue in Solr 9.2

2023-11-07 Thread Mikhail Khludnev
Here's the Lucene change
https://github.com/apache/lucene/issues/10247
It will be interesting to experiment with old logic of deeply nesting
spans, but with intervals.

On Mon, Nov 6, 2023 at 11:00 PM Karun G  wrote:

> Hello,
>
> Trying to upgrade from solr 8.9.0 to 9.2 and got the issue as below. Unable
> to find the query chars which causes this problem. This was not a problem
> in the lower version, gives 200 with no results and I don't want to
> increase maxClauseCount from 1024.
>
> Returns 500 :
>
>
> http://localhost:8983/solr/techproducts/select?start=0&rows=0&defType=edismax&pf=title+nouns+synonyms&ps=5&debugQuery=true&q=(ccoco%25e2%2580%2586oco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o)
>
> Query contains too many nested clauses;
> maxClauseCount is set to 4096
>  name="trace">org.apache.lucene.search.IndexSearcher$TooManyNestedClauses:
> Query contains too many nested clauses; maxClauseCount is set to 4096
>
>
> DebugQuery Example to produce below expansion.
>
>
> http://localhost:8983/solr/techproducts/select?start=0&rows=0&defType=edismax&pf=title+nouns+synonyms&ps=5&debugQuery=true&q=(ccoco%25e2%2580%2586oco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o)
>
>  managed-schema entry
>
>multiValued="false" indexed="true"  stored="true"/>
>
>multiValued="true" indexed="true" stored="true"/>
>
>multiValued="true" indexed="true" stored="true"/>
>
>
>
>  solrconfig.xml
>
>title
>
>
>
> DebugQuery Output  :
>
>
>
> "parsedquery_toString": "+(((title:ccoco title:\"ccoco e\")
> (title:28086 title:\"2 80 86\") (title:oco title:\"oco e\") (title:28086
> title:\"2 80 86\") (title:oe title:\"o e\") (title:28086 title:\"2 80 86\")
> (title:mcoe title:\"mco e\") (title:28086 title:\"2 80 86\") title:o))
> ((nouns:\"ccoco 28086 oco 28086 oe 28086 mcoe 28086 o\"~5 nouns:\"ccoco
> 28086 oco 28086 oe 28086 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086
> oe 28086 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco 28086 oe 28086 mco e 2
> 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086 oe 2 80 86 mcoe 28086 o\"~5
> nouns:\"ccoco 28086 oco 28086 oe 2 80 86 mcoe 2 80 86 o\"~5 nouns:\"ccoco
> 28086 oco 28086 oe 2 80 86 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco 28086
> oe 2 80 86 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 28086 mcoe
> 28086 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 28086 mcoe 2 80 86 o\"~5
> nouns:\"ccoco 28086 oco 28086 o e 28086 mco e 28086 o\"~5 nouns:\"ccoco
> 28086 oco 28086 o e 28086 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086
> o e 2 80 86 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 2 80 86 mcoe
> 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 2 80 86 mco e 28086 o\"~5
> nouns:\"ccoco 28086 oco 28086 o e 2 80 86 mco e 2 80 86 o\"~5 nouns:\"ccoco
> 28086 oco 2 80 86 oe 28086 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86
> oe 28086 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 oe 28086 mco e
> 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 oe 28086 mco e 2 80 86 o\"~5
> nouns:\"ccoco 28086 oco 2 80 86 oe 2 80 86 mcoe 28086 o\"~5 nouns:\"ccoco
> 28086 oco 2 80 86 oe 2 80 86 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2
> 80 86 oe 2 80 86 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 oe 2 80
> 86 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 28086 mcoe 28086
> o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 28086 mcoe 2 80 86 o\"~5
> nouns:\"ccoco 28086 oco 2 80 86 o e 28086 mco e 28086 o\"~5 nouns:\"ccoco
> 28086 oco 2 80 86 o e 28086 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2
> 80 86 o e 2 80 86 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 2 80
> 86 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 2 80 86 mco e
> 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 2 80 86 mco e 2 80 86 o\"~5
> nouns:\"ccoco 28086 oco e 28086 oe 28086 mcoe 28086 o\"~5 nouns:\"ccoco
> 28086 oco e 28086 oe 28086 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco e
> 28086 oe 28086 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco e 28086 oe 28086
> mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco e 28086 oe 2 80 86 mcoe 28086
> o\"~5 nouns:\"ccoco 28086 oco e 28086 oe 2 80 86 mcoe 2 80 86 o\"~5
> nouns:\"ccoco 28086 oco e 28086 oe 2 80 86 mco e 28086 o\"~5 nouns:\"ccoco
> 28086 oco e 28086 oe 2 80 86 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco e
> 28086 o e 28086 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco e 28086 o e 28086
> mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco e 28086 o e 28086 mco e 28086
> o\"~5 nouns:\"ccoco 28086 oco e 28086 o e 28086 mco e 2 80 86 o\"~5
> nouns:\"ccoco 28086 oco e 28086 o e 2 80 86 mcoe 28086 o\"~5 nouns:\"ccoco
> 28086 oco e 28086 o e 2 80 86 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco e
> 28086 o e 2 80 86 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco e 28086 o e 2
> 80 86 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco e 2 80 86 oe 28086 mcoe
> 28086 o\"~5 nouns:\"ccoco 28086 oco e 2 80 86 oe 28086 mcoe 2 80 86 o\"~5
> nouns:\"ccoco 28086 oco e 2 80 86 oe 28086 mco e 28086 o\"~5 nouns:\"c

Re: Disk cleanup of deleted data

2023-11-07 Thread Koen De Groote
Hello Jan,

Thanks for the article and suggestions.

On Fri, Nov 3, 2023 at 2:11 PM Jan Høydahl  wrote:

> Hi,
>
> What's going on is Lucene doing merges of segments, and when it so does,
> all documents marked for deletion in the old segments are gone, freeing up
> space. I recommend planning for 2-3x more disk space than your minimum
> need, both for worst-case merge, future growth and flexibility (e.g.
> re-indexing to another collection).
> See more about merges in this article:
> https://lucidworks.com/post/solr-segment-merge-frees-wasted-space-caused-by-deleted-documents/
>
> Jan
>
> > 3. nov. 2023 kl. 10:42 skrev Koen De Groote :
> >
> > Hello,
> >
> > I'm reading online that deleting items doesn't actually remove them from
> > disk, it merely marks them as "deleted".
> >
> > And that in order to free disk space by removing deleted items, I need to
> > do an "optimize" or "expungeDeletes" call.
> >
> > Having run a bunch of deletes on a collection, over a few hours, I
> noticed
> > the used disk space for this collection did indeed go down. Looking at
> > metrics, it seems every hour or so, a chunk of disk space was freed up.
> >
> >
> > So what's actually going on here? I was under the impression that nothing
> > would be freed up and I would need to call "optimize".
> >
> >
> > Regards,
> > Koen
>
>


Re: Solr Operator Tutorial

2023-11-07 Thread Jason Gerlowski
Hey, thanks for sharing.

Each version of the operator supports a range of Solr versions.  The latest
operator version (0.8.0) only supports Solr versions >= 8.11.  It looks
like the tutorial you were following along with hasn't been updated to
match the range of Solr versions, which is definitely a doc bug.

I'll take a crack at updating that.  If anyone else sees older Solr
versions mentioned in the docs, please let me know!

Best,

Jason

On Thu, Oct 26, 2023 at 11:06 AM Solr User  wrote:

> I went through the tutorial documented at
> https://apache.github.io/solr-operator/docs/local_tutorial and the solr
> instances would not start.  I saw that there was an error parsing solr.xml
> on the field "allowPaths" so decided to try version 8.7 since it looks like
> this was introduced in 8.6:
> https://solr.apache.org/guide/8_6/format-of-solr-xml.html.
>
> When using 8.7 it worked!  The tutorial explains how to do an upgrade as
> well, so perhaps it should start with 8.6 and then upgrade to 8.7?  I did
> not try out 8.6, but perhaps someone could validate that if they agree with
> my findings.
>
> Anyway, just wanted to share this.  Thanks!
>


Re: pf (Phrase Fields) Query expansion and maxClauseCount nested clause issue in Solr 9.2

2023-11-07 Thread Karun G
Mikhail, Thanks for the comment. Is this considered a bug or as a new
feature with no plan to address this type of query expansion in further
releases?

thanks.

On Tue, Nov 7, 2023 at 3:00 AM Mikhail Khludnev  wrote:

> Here's the Lucene change
> https://github.com/apache/lucene/issues/10247
> It will be interesting to experiment with old logic of deeply nesting
> spans, but with intervals.
>
> On Mon, Nov 6, 2023 at 11:00 PM Karun G  wrote:
>
> > Hello,
> >
> > Trying to upgrade from solr 8.9.0 to 9.2 and got the issue as below.
> Unable
> > to find the query chars which causes this problem. This was not a problem
> > in the lower version, gives 200 with no results and I don't want to
> > increase maxClauseCount from 1024.
> >
> > Returns 500 :
> >
> >
> >
> http://localhost:8983/solr/techproducts/select?start=0&rows=0&defType=edismax&pf=title+nouns+synonyms&ps=5&debugQuery=true&q=(ccoco%25e2%2580%2586oco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o)
> >
> > Query contains too many nested clauses;
> > maxClauseCount is set to 4096
> >  > name="trace">org.apache.lucene.search.IndexSearcher$TooManyNestedClauses:
> > Query contains too many nested clauses; maxClauseCount is set to 4096
> >
> >
> > DebugQuery Example to produce below expansion.
> >
> >
> >
> http://localhost:8983/solr/techproducts/select?start=0&rows=0&defType=edismax&pf=title+nouns+synonyms&ps=5&debugQuery=true&q=(ccoco%25e2%2580%2586oco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o)
> >
> >  managed-schema entry
> >
> >> multiValued="false" indexed="true"  stored="true"/>
> >
> >> multiValued="true" indexed="true" stored="true"/>
> >
> >> multiValued="true" indexed="true" stored="true"/>
> >
> >
> >
> >  solrconfig.xml
> >
> >title
> >
> >
> >
> > DebugQuery Output  :
> >
> >
> >
> > "parsedquery_toString": "+(((title:ccoco title:\"ccoco
> e\")
> > (title:28086 title:\"2 80 86\") (title:oco title:\"oco e\") (title:28086
> > title:\"2 80 86\") (title:oe title:\"o e\") (title:28086 title:\"2 80
> 86\")
> > (title:mcoe title:\"mco e\") (title:28086 title:\"2 80 86\") title:o))
> > ((nouns:\"ccoco 28086 oco 28086 oe 28086 mcoe 28086 o\"~5 nouns:\"ccoco
> > 28086 oco 28086 oe 28086 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086
> > oe 28086 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco 28086 oe 28086 mco e 2
> > 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086 oe 2 80 86 mcoe 28086 o\"~5
> > nouns:\"ccoco 28086 oco 28086 oe 2 80 86 mcoe 2 80 86 o\"~5 nouns:\"ccoco
> > 28086 oco 28086 oe 2 80 86 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco
> 28086
> > oe 2 80 86 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 28086
> mcoe
> > 28086 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 28086 mcoe 2 80 86 o\"~5
> > nouns:\"ccoco 28086 oco 28086 o e 28086 mco e 28086 o\"~5 nouns:\"ccoco
> > 28086 oco 28086 o e 28086 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco
> 28086
> > o e 2 80 86 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 2 80 86
> mcoe
> > 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 2 80 86 mco e 28086 o\"~5
> > nouns:\"ccoco 28086 oco 28086 o e 2 80 86 mco e 2 80 86 o\"~5
> nouns:\"ccoco
> > 28086 oco 2 80 86 oe 28086 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80
> 86
> > oe 28086 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 oe 28086 mco
> e
> > 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 oe 28086 mco e 2 80 86 o\"~5
> > nouns:\"ccoco 28086 oco 2 80 86 oe 2 80 86 mcoe 28086 o\"~5 nouns:\"ccoco
> > 28086 oco 2 80 86 oe 2 80 86 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2
> > 80 86 oe 2 80 86 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 oe 2
> 80
> > 86 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 28086 mcoe
> 28086
> > o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 28086 mcoe 2 80 86 o\"~5
> > nouns:\"ccoco 28086 oco 2 80 86 o e 28086 mco e 28086 o\"~5 nouns:\"ccoco
> > 28086 oco 2 80 86 o e 28086 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2
> > 80 86 o e 2 80 86 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 2
> 80
> > 86 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 2 80 86 mco e
> > 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 2 80 86 mco e 2 80 86
> o\"~5
> > nouns:\"ccoco 28086 oco e 28086 oe 28086 mcoe 28086 o\"~5 nouns:\"ccoco
> > 28086 oco e 28086 oe 28086 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco e
> > 28086 oe 28086 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco e 28086 oe 28086
> > mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco e 28086 oe 2 80 86 mcoe 28086
> > o\"~5 nouns:\"ccoco 28086 oco e 28086 oe 2 80 86 mcoe 2 80 86 o\"~5
> > nouns:\"ccoco 28086 oco e 28086 oe 2 80 86 mco e 28086 o\"~5
> nouns:\"ccoco
> > 28086 oco e 28086 oe 2 80 86 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco
> e
> > 28086 o e 28086 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco e 28086 o e
> 28086
> > mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco e 28086 o e 28086 mco e 28086
> > o\"~5 nouns:\"ccoco 28086 oco e 28086 o e 28086 mco e 2 80 86 o\"~5
> > nouns:\

Re: pf (Phrase Fields) Query expansion and maxClauseCount nested clause issue in Solr 9.2

2023-11-07 Thread Mikhail Khludnev
It's a feature. No plan to address, beside my earlier idea to prototype
expansion to intervals, but I couldn't find anyone who's interested in
evaluation, and suspended prototyping. So, it's worth revising the problem
definition and approach.

On Tue, Nov 7, 2023 at 5:40 PM Karun G  wrote:

> Mikhail, Thanks for the comment. Is this considered a bug or as a new
> feature with no plan to address this type of query expansion in further
> releases?
>
> thanks.
>
> On Tue, Nov 7, 2023 at 3:00 AM Mikhail Khludnev  wrote:
>
> > Here's the Lucene change
> > https://github.com/apache/lucene/issues/10247
> > It will be interesting to experiment with old logic of deeply nesting
> > spans, but with intervals.
> >
> > On Mon, Nov 6, 2023 at 11:00 PM Karun G  wrote:
> >
> > > Hello,
> > >
> > > Trying to upgrade from solr 8.9.0 to 9.2 and got the issue as below.
> > Unable
> > > to find the query chars which causes this problem. This was not a
> problem
> > > in the lower version, gives 200 with no results and I don't want to
> > > increase maxClauseCount from 1024.
> > >
> > > Returns 500 :
> > >
> > >
> > >
> >
> http://localhost:8983/solr/techproducts/select?start=0&rows=0&defType=edismax&pf=title+nouns+synonyms&ps=5&debugQuery=true&q=(ccoco%25e2%2580%2586oco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o)
> > >
> > > Query contains too many nested clauses;
> > > maxClauseCount is set to 4096
> > >  > >
> name="trace">org.apache.lucene.search.IndexSearcher$TooManyNestedClauses:
> > > Query contains too many nested clauses; maxClauseCount is set to 4096
> > >
> > >
> > > DebugQuery Example to produce below expansion.
> > >
> > >
> > >
> >
> http://localhost:8983/solr/techproducts/select?start=0&rows=0&defType=edismax&pf=title+nouns+synonyms&ps=5&debugQuery=true&q=(ccoco%25e2%2580%2586oco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o)
> > >
> > >  managed-schema entry
> > >
> > >> > multiValued="false" indexed="true"  stored="true"/>
> > >
> > >> > multiValued="true" indexed="true" stored="true"/>
> > >
> > >> > multiValued="true" indexed="true" stored="true"/>
> > >
> > >
> > >
> > >  solrconfig.xml
> > >
> > >title
> > >
> > >
> > >
> > > DebugQuery Output  :
> > >
> > >
> > >
> > > "parsedquery_toString": "+(((title:ccoco title:\"ccoco
> > e\")
> > > (title:28086 title:\"2 80 86\") (title:oco title:\"oco e\")
> (title:28086
> > > title:\"2 80 86\") (title:oe title:\"o e\") (title:28086 title:\"2 80
> > 86\")
> > > (title:mcoe title:\"mco e\") (title:28086 title:\"2 80 86\") title:o))
> > > ((nouns:\"ccoco 28086 oco 28086 oe 28086 mcoe 28086 o\"~5 nouns:\"ccoco
> > > 28086 oco 28086 oe 28086 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco
> 28086
> > > oe 28086 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco 28086 oe 28086 mco
> e 2
> > > 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086 oe 2 80 86 mcoe 28086 o\"~5
> > > nouns:\"ccoco 28086 oco 28086 oe 2 80 86 mcoe 2 80 86 o\"~5
> nouns:\"ccoco
> > > 28086 oco 28086 oe 2 80 86 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco
> > 28086
> > > oe 2 80 86 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 28086
> > mcoe
> > > 28086 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 28086 mcoe 2 80 86 o\"~5
> > > nouns:\"ccoco 28086 oco 28086 o e 28086 mco e 28086 o\"~5 nouns:\"ccoco
> > > 28086 oco 28086 o e 28086 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco
> > 28086
> > > o e 2 80 86 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 2 80 86
> > mcoe
> > > 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 2 80 86 mco e 28086
> o\"~5
> > > nouns:\"ccoco 28086 oco 28086 o e 2 80 86 mco e 2 80 86 o\"~5
> > nouns:\"ccoco
> > > 28086 oco 2 80 86 oe 28086 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco 2
> 80
> > 86
> > > oe 28086 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 oe 28086
> mco
> > e
> > > 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 oe 28086 mco e 2 80 86
> o\"~5
> > > nouns:\"ccoco 28086 oco 2 80 86 oe 2 80 86 mcoe 28086 o\"~5
> nouns:\"ccoco
> > > 28086 oco 2 80 86 oe 2 80 86 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086
> oco 2
> > > 80 86 oe 2 80 86 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 oe 2
> > 80
> > > 86 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 28086 mcoe
> > 28086
> > > o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 28086 mcoe 2 80 86 o\"~5
> > > nouns:\"ccoco 28086 oco 2 80 86 o e 28086 mco e 28086 o\"~5
> nouns:\"ccoco
> > > 28086 oco 2 80 86 o e 28086 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086
> oco 2
> > > 80 86 o e 2 80 86 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e
> 2
> > 80
> > > 86 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 2 80 86 mco e
> > > 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 2 80 86 mco e 2 80 86
> > o\"~5
> > > nouns:\"ccoco 28086 oco e 28086 oe 28086 mcoe 28086 o\"~5 nouns:\"ccoco
> > > 28086 oco e 28086 oe 28086 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco e
> > > 28086 oe 28086 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco e 28086 oe
> 28086
> > > 

Rerank top-k by field

2023-11-07 Thread Tomasz Elendt
Hey,

If I want to rerank (rq={!ltr ...}) the window of top-k results but that top-k 
is not selected by a regular query-to-doc similarity score, but by a value of a 
date field (so technically, rerank k most recent), how should I do it?

Is ?q={!func}ms(my_date_field) the only way? 
Or is there a faster way? And how to deal with precision loss? I believe that 
scoring uses a float and is not big enough to encode date timestamps (long) 
precisely. 

Best,
Tomasz

Re: Vector math with Streaming Expressions?

2023-11-07 Thread Eric Pugh
Just got to give this a try and it worked GREAT!Here is the working example 
(that will be in the upcoming “How to use Vectors” tutorial):

let(
  a=select(
search(films,
qt="/select",
q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter and 
the Chamber of Secrets"",
fl="id,name,film_vector"),
film_vector),
  b=col(a, film_vector),
  m=matrix(valueAt(b, 0), valueAt(b, 1), valueAt(b, 2)),
  average=scalarDivide(3, sumColumns(m))
  )


> On Oct 15, 2023, at 11:53 PM, Joel Bernstein  wrote:
> 
> This would in theory return the average of the vectors:
> 
> let(a=select(search(...), film_vector),
> b=col(a, film_vector),
> m=matrix(valueAt(b, 0), valueAt(b, 1), valueAt(b, 2)),
> av=scalarDivide(3, sumColumns(m))
> 
> 
> 
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> 
> On Sat, Oct 14, 2023 at 2:50 PM ufuk yılmaz 
> wrote:
> 
>> The main thing which converts search result fields to arrays is the “col”
>> function
>> https://solr.apache.org/guide/8_4/vectorization.html#creating-a-vector-with-the-col-function
>> 
>> You may also need “let” to use variables etc. Rest is  just employing
>> available math functions.
>> 
>> But they don’t play well with multivalued fields, it’s hard to work with
>> them. They look like arrays but are not exactly arrays. It’s just a bunch
>> of values sticking together. For example afaik there’s no way to refer to
>> 1st, 2nd element of a multivalued field. When you enable docValues and use
>> the export handler, those values would be returned in ascending order,
>> losing position information.
>> 
>> For example if the ratings were from different movie raters, such as imdb,
>> rottentomatoes etc and every rating were in a different field, it would be
>> much easier to work with, as Solr expects to build arrays and matrices from
>> such formatted documents.
>> 
>> I’d be happy to learn if someone more knowledgeable has a better answer.
>> 
>> Sent from Mail for Windows
>> 
>> From: Eric Pugh
>> Sent: Saturday, October 14, 2023 8:05 PM
>> To: users@solr.apache.org
>> Subject: Re: Vector math with Streaming Expressions?
>> 
>> By average them, I mean the first version.   So at the end, I get a set of
>> numbers that represents the average vector.
>> 
>> Here is an example of the vector..
>> https://github.com/apache/solr/blob/main/solr/example/films/films.json#L8365
>> 
>> In the existing docs on searching vectors, we make a statement that we
>> have the average vector of three movies:
>> https://github.com/apache/solr/blob/main/solr/example/films/README.md?plain=1#L154
>> 
>> I’d actually like to figure out how to calculate that vector from data we
>> have in Solr already.
>> 
>> 
>> 
>>> On Oct 14, 2023, at 12:50 PM, ufuk yılmaz 
>> wrote:
>>> 
>>> By “average them” do you mean to calculate the simple arithmetic average
>> element by element of the all returned film ratings? Eg. sum first element
>> of all arrays and divide by the number of arrays, do it again for the
>> second element etc..
>>> 
>>> Or find the average of the array for each movie, producing a single
>> number for each movie
>>> 
>>> ~ufuk
>>> 
>>> —
>>> 
 On 14 Oct 2023, at 19:19, Eric Pugh > > wrote:
 
 I’m trying to average three arrays of floats and not quite making the
>> conceptual jump from “I defined a array of numbers” in the way that the
>> https://github.com/apache/lucene-solr/blob/visual-guide/solr/solr-ref-guide/src/vector-math.adoc#element-by-element-vector-math
>> example expects with “I made a query and get back a array of numbers”.
 
 I’m using the films example, so :  bin/solr start -c -e films
 
 Then, I want to get the vectors for three films and average them.
 
 The streaming expression grabs the three vectors, but I can’t figure
>> out how to wrap it in something to average them.
 
 select(
 search(films,
  qt="/select",
  q="name:"Finding Nemo" OR name:"Bee Movie" OR name:"Harry Potter
>> and the Chamber of Secrets"",
  fl="id,name,film_vector"),
 film_vector
 )
 
 produces:
 
 {
 "result-set": {
  "docs": [
{
  "film_vector": [
"-0.2758314",
"-0.14416906",
"-0.11316811",
"0.2745105",
"0.040616427",
"-4.2628963E-4",
"-0.120363355",
"0.0752",
"0.036417373",
"-0.29541242"
  ]
},
{
  "film_vector": [
"-0.11665395",
"0.04247921",
"-0.13233364",
"0.52578413",
"-0.1739291",
"-0.01880563",
"-0.06670809",
"-0.11242808",
"0.09724514",
"-0.11909142"
  ]
},
{
  "film_vector": [
"-0.14272659",
"0.13051921",
"-0.19087574",
"

Re: pf (Phrase Fields) Query expansion and maxClauseCount nested clause issue in Solr 9.2

2023-11-07 Thread Karun G
Mikhail, thanks for the clarification. Many regards..

On Tue, Nov 7, 2023 at 10:04 AM Mikhail Khludnev  wrote:

> It's a feature. No plan to address, beside my earlier idea to prototype
> expansion to intervals, but I couldn't find anyone who's interested in
> evaluation, and suspended prototyping. So, it's worth revising the problem
> definition and approach.
>
> On Tue, Nov 7, 2023 at 5:40 PM Karun G  wrote:
>
> > Mikhail, Thanks for the comment. Is this considered a bug or as a new
> > feature with no plan to address this type of query expansion in further
> > releases?
> >
> > thanks.
> >
> > On Tue, Nov 7, 2023 at 3:00 AM Mikhail Khludnev  wrote:
> >
> > > Here's the Lucene change
> > > https://github.com/apache/lucene/issues/10247
> > > It will be interesting to experiment with old logic of deeply nesting
> > > spans, but with intervals.
> > >
> > > On Mon, Nov 6, 2023 at 11:00 PM Karun G  wrote:
> > >
> > > > Hello,
> > > >
> > > > Trying to upgrade from solr 8.9.0 to 9.2 and got the issue as below.
> > > Unable
> > > > to find the query chars which causes this problem. This was not a
> > problem
> > > > in the lower version, gives 200 with no results and I don't want to
> > > > increase maxClauseCount from 1024.
> > > >
> > > > Returns 500 :
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8983/solr/techproducts/select?start=0&rows=0&defType=edismax&pf=title+nouns+synonyms&ps=5&debugQuery=true&q=(ccoco%25e2%2580%2586oco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o)
> > > >
> > > > Query contains too many nested clauses;
> > > > maxClauseCount is set to 4096
> > > >  > > >
> > name="trace">org.apache.lucene.search.IndexSearcher$TooManyNestedClauses:
> > > > Query contains too many nested clauses; maxClauseCount is set to 4096
> > > >
> > > >
> > > > DebugQuery Example to produce below expansion.
> > > >
> > > >
> > > >
> > >
> >
> http://localhost:8983/solr/techproducts/select?start=0&rows=0&defType=edismax&pf=title+nouns+synonyms&ps=5&debugQuery=true&q=(ccoco%25e2%2580%2586oco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o)
> > > >
> > > >  managed-schema entry
> > > >
> > > >> > > multiValued="false" indexed="true"  stored="true"/>
> > > >
> > > >> > > multiValued="true" indexed="true" stored="true"/>
> > > >
> > > >> > > multiValued="true" indexed="true" stored="true"/>
> > > >
> > > >
> > > >
> > > >  solrconfig.xml
> > > >
> > > >title
> > > >
> > > >
> > > >
> > > > DebugQuery Output  :
> > > >
> > > >
> > > >
> > > > "parsedquery_toString": "+(((title:ccoco
> title:\"ccoco
> > > e\")
> > > > (title:28086 title:\"2 80 86\") (title:oco title:\"oco e\")
> > (title:28086
> > > > title:\"2 80 86\") (title:oe title:\"o e\") (title:28086 title:\"2 80
> > > 86\")
> > > > (title:mcoe title:\"mco e\") (title:28086 title:\"2 80 86\")
> title:o))
> > > > ((nouns:\"ccoco 28086 oco 28086 oe 28086 mcoe 28086 o\"~5
> nouns:\"ccoco
> > > > 28086 oco 28086 oe 28086 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco
> > 28086
> > > > oe 28086 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco 28086 oe 28086 mco
> > e 2
> > > > 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086 oe 2 80 86 mcoe 28086 o\"~5
> > > > nouns:\"ccoco 28086 oco 28086 oe 2 80 86 mcoe 2 80 86 o\"~5
> > nouns:\"ccoco
> > > > 28086 oco 28086 oe 2 80 86 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco
> > > 28086
> > > > oe 2 80 86 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086 o e
> 28086
> > > mcoe
> > > > 28086 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 28086 mcoe 2 80 86
> o\"~5
> > > > nouns:\"ccoco 28086 oco 28086 o e 28086 mco e 28086 o\"~5
> nouns:\"ccoco
> > > > 28086 oco 28086 o e 28086 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco
> > > 28086
> > > > o e 2 80 86 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 2 80
> 86
> > > mcoe
> > > > 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 28086 o e 2 80 86 mco e 28086
> > o\"~5
> > > > nouns:\"ccoco 28086 oco 28086 o e 2 80 86 mco e 2 80 86 o\"~5
> > > nouns:\"ccoco
> > > > 28086 oco 2 80 86 oe 28086 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco 2
> > 80
> > > 86
> > > > oe 28086 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 oe 28086
> > mco
> > > e
> > > > 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 oe 28086 mco e 2 80 86
> > o\"~5
> > > > nouns:\"ccoco 28086 oco 2 80 86 oe 2 80 86 mcoe 28086 o\"~5
> > nouns:\"ccoco
> > > > 28086 oco 2 80 86 oe 2 80 86 mcoe 2 80 86 o\"~5 nouns:\"ccoco 28086
> > oco 2
> > > > 80 86 oe 2 80 86 mco e 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86
> oe 2
> > > 80
> > > > 86 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 28086 mcoe
> > > 28086
> > > > o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o e 28086 mcoe 2 80 86 o\"~5
> > > > nouns:\"ccoco 28086 oco 2 80 86 o e 28086 mco e 28086 o\"~5
> > nouns:\"ccoco
> > > > 28086 oco 2 80 86 o e 28086 mco e 2 80 86 o\"~5 nouns:\"ccoco 28086
> > oco 2
> > > > 80 86 o e 2 80 86 mcoe 28086 o\"~5 nouns:\"ccoco 28086 oco 2 80 86 o
> e
> > 2
> > > 80
> > > > 86 mcoe 2

Re: pf (Phrase Fields) Query expansion and maxClauseCount nested clause issue in Solr 9.2

2023-11-07 Thread Chris Hostetter


: Mikhail, Thanks for the comment. Is this considered a bug or as a new
: feature with no plan to address this type of query expansion in further
: releases?

It is a documented feature in Lucene (to prevent query explosion) with the 
change in behavior called out in the "Major Changes" section of the 
Solr upgrade notes when the behavior was changed in Solr 9.0...

https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-9.html#querying-and-indexing-2

"solr.xml maxBooleanClauses is now enforced recursively. Users who upgrade 
from prior versions of Solr may find that some requests involving complex 
internal query structures (Example: long query strings using edismax with 
many qf and pf fields that include query time synonym expansion) which 
worked in the past now hit this limit and fail. Users in this situation 
are advised to consider the complexity of their queries/configuration, and 
increase the value of maxBooleanClauses if warranted."


-Hoss
http://www.lucidworks.com/


Re: pf (Phrase Fields) Query expansion and maxClauseCount nested clause issue in Solr 9.2

2023-11-07 Thread Karun G
Hoss, Thanks for the update. Any idea how to handle this exact query (which
was giving no results before upgrade and now gives error and pushes query
to a big loop for  maxBooleanClauses  issue ) , Many regards.

http://localhost:8983/solr/techproducts/select?start=0&rows=0&defType=edismax&pf=title+nouns+synonyms&ps=5&debugQuery=true&q=(ccoco%25e2%2580%2586oco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o%25e2%2580%2586mco%25e2%2580%2586o)


On Tue, Nov 7, 2023 at 12:50 PM Chris Hostetter 
wrote:

>
> : Mikhail, Thanks for the comment. Is this considered a bug or as a new
> : feature with no plan to address this type of query expansion in further
> : releases?
>
> It is a documented feature in Lucene (to prevent query explosion) with the
> change in behavior called out in the "Major Changes" section of the
> Solr upgrade notes when the behavior was changed in Solr 9.0...
>
>
> https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-9.html#querying-and-indexing-2
>
> "solr.xml maxBooleanClauses is now enforced recursively. Users who upgrade
> from prior versions of Solr may find that some requests involving complex
> internal query structures (Example: long query strings using edismax with
> many qf and pf fields that include query time synonym expansion) which
> worked in the past now hit this limit and fail. Users in this situation
> are advised to consider the complexity of their queries/configuration, and
> increase the value of maxBooleanClauses if warranted."
>
>
> -Hoss
> http://www.lucidworks.com/
>