Re: 3x+ performance reduction for the prefixed wildcard fl (like fl=abc_*) in 9.5.0 compared to 9.4.1

2024-02-26 Thread Justin Sweeney
I can take another look at this. Changed the implementation for matching
glob patterns to avoid a third party dependency, but it seems like we may
just want to use the original implementation from Apache Commons to avoid
the performance degradation. Is the best practice here to do a PR with the
original Jira or create a new Jira?

On Sat, Feb 24, 2024 at 10:54 AM Gus Heck  wrote:

> Likely Introduced by SOLR-17022: Support for glob patterns for fields in
> Export handler, Stream handler and with SelectStream streaming expression
> (#1996)
>
> On Sat, Feb 24, 2024 at 10:51 AM Gus Heck  wrote:
>
> > Well... awesome that you have identified and documented this. Not awesome
> > that it happened of course. Definitely Jira worthy.
> >
> > On Fri, Feb 23, 2024 at 9:55 PM Ishan Chattopadhyaya <
> > ichattopadhy...@gmail.com> wrote:
> >
> >> Awesome. Please feel free to open a JIRA issue about it.
> >>
> >> On Fri, 23 Feb, 2024, 5:03 pm Oleksandr Tkachuk,  >
> >> wrote:
> >>
> >> > We have ~17 dynamic fields like abc_xxx, and requests like
> >> > /select?fl=abc_* took ~180ms with 9.4.1, but after upgrading to 9.5.0
> >> > such requests now take ~620ms to execute.
> >> >
> >> > It seems in 9.5.0
> >> > org.apache.solr.common.util.GlobPatternUtil.matches
> >> > used instead of
> >> > org.apache.commons.io.FilenameUtils.wildcardMatch
> >> > Which leads to huge losses in performance.
> >> >
> >> > Here is the call tree from the profiler:
> >> > 9.4.1
> >> > https://i.imgur.com/2gubfDr.png
> >> >
> >> > 9.5.0
> >> > https://i.imgur.com/JIZ1E9u.png
> >> >
> >>
> >
> >
> > --
> > http://www.needhamsoftware.com (work)
> > https://a.co/d/b2sZLD9 (my fantasy fiction book)
> >
>
>
> --
> http://www.needhamsoftware.com (work)
> https://a.co/d/b2sZLD9 (my fantasy fiction book)
>


Re: 3x+ performance reduction for the prefixed wildcard fl (like fl=abc_*) in 9.5.0 compared to 9.4.1

2024-02-26 Thread Mike Drob
Since the change was released, please create a new Jira. That way future
software archaeologists can track which version has the degradation and
which version has the fix.

On Mon, Feb 26, 2024 at 7:41 AM Justin Sweeney 
wrote:

> I can take another look at this. Changed the implementation for matching
> glob patterns to avoid a third party dependency, but it seems like we may
> just want to use the original implementation from Apache Commons to avoid
> the performance degradation. Is the best practice here to do a PR with the
> original Jira or create a new Jira?
>
> On Sat, Feb 24, 2024 at 10:54 AM Gus Heck  wrote:
>
> > Likely Introduced by SOLR-17022: Support for glob patterns for fields in
> > Export handler, Stream handler and with SelectStream streaming expression
> > (#1996)
> >
> > On Sat, Feb 24, 2024 at 10:51 AM Gus Heck  wrote:
> >
> > > Well... awesome that you have identified and documented this. Not
> awesome
> > > that it happened of course. Definitely Jira worthy.
> > >
> > > On Fri, Feb 23, 2024 at 9:55 PM Ishan Chattopadhyaya <
> > > ichattopadhy...@gmail.com> wrote:
> > >
> > >> Awesome. Please feel free to open a JIRA issue about it.
> > >>
> > >> On Fri, 23 Feb, 2024, 5:03 pm Oleksandr Tkachuk, <
> sasha547...@gmail.com
> > >
> > >> wrote:
> > >>
> > >> > We have ~17 dynamic fields like abc_xxx, and requests like
> > >> > /select?fl=abc_* took ~180ms with 9.4.1, but after upgrading to
> 9.5.0
> > >> > such requests now take ~620ms to execute.
> > >> >
> > >> > It seems in 9.5.0
> > >> > org.apache.solr.common.util.GlobPatternUtil.matches
> > >> > used instead of
> > >> > org.apache.commons.io.FilenameUtils.wildcardMatch
> > >> > Which leads to huge losses in performance.
> > >> >
> > >> > Here is the call tree from the profiler:
> > >> > 9.4.1
> > >> > https://i.imgur.com/2gubfDr.png
> > >> >
> > >> > 9.5.0
> > >> > https://i.imgur.com/JIZ1E9u.png
> > >> >
> > >>
> > >
> > >
> > > --
> > > http://www.needhamsoftware.com (work)
> > > https://a.co/d/b2sZLD9 (my fantasy fiction book)
> > >
> >
> >
> > --
> > http://www.needhamsoftware.com (work)
> > https://a.co/d/b2sZLD9 (my fantasy fiction book)
> >
>


RE: Re: 3x+ performance reduction for the prefixed wildcard fl (like fl=abc_*) in 9.5.0 compared to 9.4.1

2024-02-26 Thread Oleksandr Tkachuk
https://issues.apache.org/jira/projects/SOLR/issues/SOLR-17179

On 2024/02/26 13:40:06 Justin Sweeney wrote:
> I can take another look at this. Changed the implementation for matching
> glob patterns to avoid a third party dependency, but it seems like we may
> just want to use the original implementation from Apache Commons to avoid
> the performance degradation. Is the best practice here to do a PR with the
> original Jira or create a new Jira?
>
> On Sat, Feb 24, 2024 at 10:54 AM Gus Heck  wrote:
>
> > Likely Introduced by SOLR-17022: Support for glob patterns for fields in
> > Export handler, Stream handler and with SelectStream streaming expression
> > (#1996)
> >
> > On Sat, Feb 24, 2024 at 10:51 AM Gus Heck  wrote:
> >
> > > Well... awesome that you have identified and documented this. Not awesome
> > > that it happened of course. Definitely Jira worthy.
> > >
> > > On Fri, Feb 23, 2024 at 9:55 PM Ishan Chattopadhyaya <
> > > ichattopadhy...@gmail.com> wrote:
> > >
> > >> Awesome. Please feel free to open a JIRA issue about it.
> > >>
> > >> On Fri, 23 Feb, 2024, 5:03 pm Oleksandr Tkachuk,  > >
> > >> wrote:
> > >>
> > >> > We have ~17 dynamic fields like abc_xxx, and requests like
> > >> > /select?fl=abc_* took ~180ms with 9.4.1, but after upgrading to 9.5.0
> > >> > such requests now take ~620ms to execute.
> > >> >
> > >> > It seems in 9.5.0
> > >> > org.apache.solr.common.util.GlobPatternUtil.matches
> > >> > used instead of
> > >> > org.apache.commons.io.FilenameUtils.wildcardMatch
> > >> > Which leads to huge losses in performance.
> > >> >
> > >> > Here is the call tree from the profiler:
> > >> > 9.4.1
> > >> > https://i.imgur.com/2gubfDr.png
> > >> >
> > >> > 9.5.0
> > >> > https://i.imgur.com/JIZ1E9u.png
> > >> >
> > >>
> > >
> > >
> > > --
> > > http://www.needhamsoftware.com (work)
> > > https://a.co/d/b2sZLD9 (my fantasy fiction book)
> > >
> >
> >
> > --
> > http://www.needhamsoftware.com (work)
> > https://a.co/d/b2sZLD9 (my fantasy fiction book)
> >
>


Re: firstSearcher listener replaying queries 3 times

2024-02-26 Thread rajani m
That makes sense, thank you.

On Fri, Feb 23, 2024 at 6:59 PM Chris Hostetter 
wrote:

>
> The obvious answer that comes to mind is that your collection has 3 shards
> and you have one replica for each shard on the node where you see this
> listern triggering 3 times on collection reload.  (or some other situation
> that causes 3 replicas on this one node)
>
> firstSearcher and newSearcher events are processed on individual
> SolrIndexSearchers -- and each replica has it's own SolrIndexSearchers
>
> : Date: Tue, 13 Feb 2024 16:57:59 -0500
> : From: rajani m 
> : Reply-To: users@solr.apache.org
> : To: solr-user 
> : Subject: firstSearcher listener replaying queries 3 times
> :
> : Hi Solr Users,
> :
> :   The first searcher listener replays the list of queries under the
> : listener list 3 times, wondering what could be the reason for it?
> :
> : In the below example, when the collection is reloaded, the "q" is
> replayed
> : 3 times, I expected it to be once.  Is it a bug or the first searcher
> : triggers any other listener?
> :
> : 
> : 
> : cats
> : 
> : 
> :
>
> -Hoss
> http://www.lucidworks.com/
>


Multiple query parsers syntax

2024-02-26 Thread rajani m
Hi Solr Users,

  Could you please help me with an example query syntax that uses more than
one query parser in the same query?

I tried the following, edismax to search against the description field and
lucene parser to search against the keywords field,  but it does not work.
What is wrong?

host:port/solr/v9/select?q={!edismax qf=description}white roses OR
{!lucene}keywords:(white AND roses)&debug=true

The solr parsed string is following -

 "parsedquery_toString": "+((description:white roses) (description:OR)
(description:{!lucene}keywords:(white) (description:AND)
(description:roses)))",

Thank you,
Rajani


Re: Backtick character in field data breaks streaming query

2024-02-26 Thread Rahul Goswami
Floating this once again in case anyone has any thoughts.

Thanks,
Rahul

On Sun, Feb 25, 2024 at 11:51 PM Rahul Goswami 
wrote:

> Hello,
> I am running Solr 8.11.1 and running into an issue with stream api. Looks
> like searches break when the data contains the backtick character( ` ).
> Eg:
>
> http://host-name:8983/solr/MyCollection/stream?expr=search(MyCollection,q="My_Field:Foto`s",fl="field1",qt="/export")
> 
>
> Upon further investigation, found that the below change was introduced in
> Solr 8.5 to replace ` with " in StreamExpressionParser which is the root
> cause.
>
>
> https://github.com/apache/solr/blob/main/solr/solrj-streaming/src/java/org/apache/solr/client/solrj/io/stream/expr/StreamExpressionParser.java#L138
>
> Associated JIRA:
> https://issues.apache.org/jira/browse/SOLR-14139
>
> Is there a workaround to get the streaming query working? This is across
> multiple shards so I can't use /export directly.
>
> If there is no workaround, in my humble opinion, this seems like a
> breaking change and should be considered for rolling back, or the
> implementation should be rethought.
>
> Thanks,
> Rahul
>
>


Re: Multiple query parsers syntax

2024-02-26 Thread Robi Petersen
Hi Rajani

This may help. I'm thinking you might want to do some query alias
substitution kind of thing? plus the first 'OR' might force your query into
lucene query parser mode overall?

I like Hoss' breakdown in this presentation for query substitution
syntax... :)

 the Lucene/Solr Revolution 2016 presentation by hoss
 - see slideshow link at
top...

Best
Robi

On Mon, Feb 26, 2024 at 8:15 PM rajani m  wrote:

> Hi Solr Users,
>
>   Could you please help me with an example query syntax that uses more than
> one query parser in the same query?
>
> I tried the following, edismax to search against the description field and
> lucene parser to search against the keywords field,  but it does not work.
> What is wrong?
>
> host:port/solr/v9/select?q={!edismax qf=description}white roses OR
> {!lucene}keywords:(white AND roses)&debug=true
>
> The solr parsed string is following -
>
>  "parsedquery_toString": "+((description:white roses) (description:OR)
> (description:{!lucene}keywords:(white) (description:AND)
> (description:roses)))",
>
> Thank you,
> Rajani
>


Re: Multiple query parsers syntax

2024-02-26 Thread Mikhail Khludnev
Hi,

IIRC eDismax allows some advanced q ops syntax. Also JSON query DSL may be
convenient for structuring queries.
With regards to old local params syntax, it struggles with defining the end
of substring to be processed, eg if we put space in the beginning of the
string, dismax takes only 'white' word.
My bet is
/select?q.op=OR&q={!edismax qf=description v=$wr} {!lucene df=keywords
q.OP=AND v=$wr}&wr=white roses&debug=true
or the latter part might be written as keywords:(+white +roses).
About preferable syntax https://lucidworks.com/post/solr-boolean-operators/

On Tue, Feb 27, 2024 at 7:14 AM rajani m  wrote:

> Hi Solr Users,
>
>   Could you please help me with an example query syntax that uses more than
> one query parser in the same query?
>
> I tried the following, edismax to search against the description field and
> lucene parser to search against the keywords field,  but it does not work.
> What is wrong?
>
> host:port/solr/v9/select?q={!edismax qf=description}white roses OR
> {!lucene}keywords:(white AND roses)&debug=true
>
> The solr parsed string is following -
>
>  "parsedquery_toString": "+((description:white roses) (description:OR)
> (description:{!lucene}keywords:(white) (description:AND)
> (description:roses)))",
>
> Thank you,
> Rajani
>


-- 
Sincerely yours
Mikhail Khludnev