SOLR security scan question

2023-02-09 Thread Razvan Bolocan
Hi,

We are using SOLR 8.11.2  both classic and containerised/docker.
We have an internal security scanner and it contains multiple types of scans. 
On the latest scans we have:

Critical
CVE-2015-1832 : org.apache.derby:derby 10.9.1.0
: org.apache.derby:derby 10.9.1.0
Critical
CVE-2017-15095 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
Critical
CVE-2018-11307 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
Critical
CVE-2018-14718 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
High
CVE-2018-5968 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
Critical
CVE-2018-7489 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
Critical
CVE-2019-14540 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
Critical
CVE-2019-14893 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
Critical
CVE-2019-16335 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
Critical
CVE-2019-16942 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
Critical
CVE-2019-16943 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
Critical
CVE-2019-17267 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
Critical
CVE-2019-17531 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
Critical
CVE-2019-20330 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
High
CVE-2020-10650 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
High
CVE-2020-35490 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
High
CVE-2020-35491 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
High
CVE-2020-36518 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
High
CVE-2021-22573 : com.google.oauth-client:google-oauth-client 1.32.1
: com.google.oauth-client:google-oauth-client 1.32.1
High
CVE-2021-33813 : org.jdom:jdom2 2.0.6
: org.jdom:jdom2 2.0.6
Critical
CVE-2021-37404 : org.apache.hadoop:hadoop-common 3.2.2
: org.apache.hadoop:hadoop-common 3.2.2
High
CVE-2022-2048 : org.eclipse.jetty.http2:http2-server 9.4.44.v20210927
: org.eclipse.jetty.http2:http2-server 9.4.44.v20210927
Critical
CVE-2022-25168 : org.apache.hadoop:hadoop-common 3.2.2
: org.apache.hadoop:hadoop-common 3.2.2
High
CVE-2022-25647 : com.google.code.gson:gson 2.7
: com.google.code.gson:gson 2.7
Critical
CVE-2022-26612 : org.apache.hadoop:hadoop-common 3.2.2
: org.apache.hadoop:hadoop-common 3.2.2
High
CVE-2022-3171 : com.google.protobuf:protobuf-java 3.11.0
: com.google.protobuf:protobuf-java 3.11.0
High
CVE-2022-36364 : org.apache.calcite.avatica:avatica-core 1.18.0
: org.apache.calcite.avatica:avatica-core 1.18.0
Critical
CVE-2022-39135 : org.apache.calcite:calcite-core 1.27.0
: org.apache.calcite:calcite-core 1.27.0
High
CVE-2022-40151 : com.fasterxml.woodstox:woodstox-core 6.2.4
: com.fasterxml.woodstox:woodstox-core 6.2.4
High
CVE-2022-40152 : com.fasterxml.woodstox:woodstox-core 6.2.4
: com.fasterxml.woodstox:woodstox-core 6.2.4
Critical
CVE-2022-41853 : org.hsqldb:hsqldb 2.4.0
: org.hsqldb:hsqldb 2.4.0
High
CVE-2022-42003 : com.fasterxml.jackson.core:jackson-databind 2.13.4
: com.fasterxml.jackson.core:jackson-databind 2.13.4
High
CVE-2022-42003 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
High
CVE-2022-42004 : com.fasterxml.jackson.core:jackson-databind 2.4.0
: com.fasterxml.jackson.core:jackson-databind 2.4.0
High
CVE-2022-47629 : libksba 1.3.5-8.el8_6
: libksba 1.3.5-8.el8_6


We know some of them are covered in 
https://solr.apache.org/security.html#cve-reports-for-apache-solr-dependencies 
but not all.
We have also seen the 
https://lists.apache.org/thread/539bkq8r11msjpl3yo1ssvy77kmdrps7
Can we have a resolution for the above?

Thanks,
Razvan Bolocan



Re: MoreLikeThis highlighting question

2023-02-09 Thread Grace Sainsbury
Okay. Thanks for the response.


On Wed, 8 Feb 2023 at 16:23, Mikhail Khludnev  wrote:

> Hello, Grace.
> Now, MoreLikeThis handler carries the burden of invoking FacetComponet's
> internals, but it does nothing for highlighting, and there is no way to
> plug HighlightingComponent in it via solrconfig.xml.
> So, there are two options: someone makes hands dirty to plug
> HighlightingComponent into MLTHandler; or you can switch to MLT Query,
> which should be highlighted by the standard SearchHandler flow.
>
> On Wed, Feb 8, 2023 at 11:57 PM Grace Sainsbury  wrote:
>
> > Thanks again for fixing the NPE issue so quickly. I've noticed another
> > issue, but I'm not sure if it's a configuration issue on my end. If I
> > enable highlighting by providing the hl and hl.q parameters, I don't get
> > any highlighting in the results. There is no highlighting section in the
> > returned results.
> >
> > Is there something I need to do to enable the highlighting component in
> the
> > MoreLikeThis handler?
> >
> > Thanks,
> > Grace
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>


Re: MoreLikeThis highlighting question

2023-02-09 Thread Anshum Gupta
Hi Grace,

I'd like to reiterate what Mikhail said but emphasize on using the
MLTQParser. Issues like this is why the MLT query parser was added (
https://issues.apache.org/jira/browse/SOLR-6248 )

On Wed, Feb 8, 2023 at 1:22 PM Mikhail Khludnev  wrote:

> Hello, Grace.
> Now, MoreLikeThis handler carries the burden of invoking FacetComponet's
> internals, but it does nothing for highlighting, and there is no way to
> plug HighlightingComponent in it via solrconfig.xml.
> So, there are two options: someone makes hands dirty to plug
> HighlightingComponent into MLTHandler; or you can switch to MLT Query,
> which should be highlighted by the standard SearchHandler flow.
>
> On Wed, Feb 8, 2023 at 11:57 PM Grace Sainsbury  wrote:
>
> > Thanks again for fixing the NPE issue so quickly. I've noticed another
> > issue, but I'm not sure if it's a configuration issue on my end. If I
> > enable highlighting by providing the hl and hl.q parameters, I don't get
> > any highlighting in the results. There is no highlighting section in the
> > returned results.
> >
> > Is there something I need to do to enable the highlighting component in
> the
> > MoreLikeThis handler?
> >
> > Thanks,
> > Grace
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>


-- 
Anshum Gupta


Re: SOLR security scan question

2023-02-09 Thread Kevin Watters
Hi Razvan,
 We maintain a forked branch of Solr 8.11.2 that fixes , I think, all of
these.  We also publish a container for that.  If you're interested to
learn more, let me know.
Best,
  -Kevin
   https://kmwllc.com


On Thu, Feb 9, 2023 at 7:37 AM Razvan Bolocan
 wrote:

> Hi,
>
> We are using SOLR 8.11.2  both classic and
> containerised/docker.
> We have an internal security scanner and it contains multiple types of
> scans. On the latest scans we have:
>
> Critical
> CVE-2015-1832 : org.apache.derby:derby 10.9.1.0
> : org.apache.derby:derby 10.9.1.0
> Critical
> CVE-2017-15095 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> Critical
> CVE-2018-11307 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> Critical
> CVE-2018-14718 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> High
> CVE-2018-5968 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> Critical
> CVE-2018-7489 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> Critical
> CVE-2019-14540 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> Critical
> CVE-2019-14893 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> Critical
> CVE-2019-16335 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> Critical
> CVE-2019-16942 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> Critical
> CVE-2019-16943 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> Critical
> CVE-2019-17267 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> Critical
> CVE-2019-17531 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> Critical
> CVE-2019-20330 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> High
> CVE-2020-10650 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> High
> CVE-2020-35490 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> High
> CVE-2020-35491 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> High
> CVE-2020-36518 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> High
> CVE-2021-22573 : com.google.oauth-client:google-oauth-client 1.32.1
> : com.google.oauth-client:google-oauth-client 1.32.1
> High
> CVE-2021-33813 : org.jdom:jdom2 2.0.6
> : org.jdom:jdom2 2.0.6
> Critical
> CVE-2021-37404 : org.apache.hadoop:hadoop-common 3.2.2
> : org.apache.hadoop:hadoop-common 3.2.2
> High
> CVE-2022-2048 : org.eclipse.jetty.http2:http2-server 9.4.44.v20210927
> : org.eclipse.jetty.http2:http2-server 9.4.44.v20210927
> Critical
> CVE-2022-25168 : org.apache.hadoop:hadoop-common 3.2.2
> : org.apache.hadoop:hadoop-common 3.2.2
> High
> CVE-2022-25647 : com.google.code.gson:gson 2.7
> : com.google.code.gson:gson 2.7
> Critical
> CVE-2022-26612 : org.apache.hadoop:hadoop-common 3.2.2
> : org.apache.hadoop:hadoop-common 3.2.2
> High
> CVE-2022-3171 : com.google.protobuf:protobuf-java 3.11.0
> : com.google.protobuf:protobuf-java 3.11.0
> High
> CVE-2022-36364 : org.apache.calcite.avatica:avatica-core 1.18.0
> : org.apache.calcite.avatica:avatica-core 1.18.0
> Critical
> CVE-2022-39135 : org.apache.calcite:calcite-core 1.27.0
> : org.apache.calcite:calcite-core 1.27.0
> High
> CVE-2022-40151 : com.fasterxml.woodstox:woodstox-core 6.2.4
> : com.fasterxml.woodstox:woodstox-core 6.2.4
> High
> CVE-2022-40152 : com.fasterxml.woodstox:woodstox-core 6.2.4
> : com.fasterxml.woodstox:woodstox-core 6.2.4
> Critical
> CVE-2022-41853 : org.hsqldb:hsqldb 2.4.0
> : org.hsqldb:hsqldb 2.4.0
> High
> CVE-2022-42003 : com.fasterxml.jackson.core:jackson-databind 2.13.4
> : com.fasterxml.jackson.core:jackson-databind 2.13.4
> High
> CVE-2022-42003 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> High
> CVE-2022-42004 : com.fasterxml.jackson.core:jackson-databind 2.4.0
> : com.fasterxml.jackson.core:jackson-databind 2.4.0
> High
> CVE-2022-47629 : libksba 1.3.5-8.el8_6
> : libksba 1.3.5-8.el8_6
>
>
> We know some of them are covered in
> https://solr.apache.org/security.html#cve-reports-for-apache-solr-dependencies
> but not all.
> We have also seen the
> https://lists.apache.org/thread/539bkq8r11msjpl3yo1ssvy77kmdrps7
> Can we have a resolution for the above?
>
> Tha

Suggester configuration? (Solr 8.11.2)

2023-02-09 Thread Mike
Hello!

I'm doing something wrong when configuring the suggester,
it didn't work for me with 9 and now also with Solr 8.11.2:

{
"servlet":"default",
"message":"Not Found",
"url":"/solr/core01/suggest",
"status":"404"
}

This is my fresh solrconfig.xml file:
https://paste.debian.net/1270165/

I did it as described in this documentation:
https://solr.apache.org/guide/8_11/suggester.html


Can anyone tell me what I'm doing wrong?

Thank you!

Mike


multi-term synonym prevents single-term match -- known issue?

2023-02-09 Thread Rudi Seitz
Is this known behavior or is it worth a JIRA ticket?

Searching against a text_general field in Solr 9.1, if my edismax query is
"foo bar" I should be able to get matches for "foo" without "bar" and vice
versa. However, if there happens to be a synonym rule applied at query
time, like "foo bar,zzz" I can no longer get single-term matches against
"foo" or "bar." Both terms are now required, but can occur in either order.
If we change the text_general analysis chain to apply synonyms at index
time instead of query time, this behavior goes away and single-term matches
are again possible.

To reproduce, use the _default configset with "foo bar,zzz" added to
synonyms.txt. Index these four docs:

{"id":"1", "title_txt":"foo"}
{"id":"2", "title_txt":"bar"}
{"id":"3", "title_txt":"foo bar"}
{"id":"4", "title_txt":"bar foo"}

Issue a query for "foo bar" (i.e.
defType=edismax&q.op=OR&qf=title_txt&q=foo bar)
Result: Only docs 3 and 4 come back

Issue a query for "bar foo"
Result: All four docs come back; the synonym rule is not invoked

Looking at the explain output for "foo bar" we see:

+((title_txt:zzz (+title_txt:foo +title_txt:bar)))


Looking at the explain output for "bar foo" we see:

+((title_txt:bar) (title_txt:foo))

So, the observed behavior makes sense according to the low-level query
structure. But -- is this how it's "supposed" to work?

Why not expand the "foo bar" query like this instead?

+((title_txt:zzz (title_txt:foo title_txt:bar)))

Rudi


Re: Memory cost of unused indexed and docValues properties

2023-02-09 Thread Heinz Hölzer
Hi Mikhail,

Thank a lot for your answer, this means we will only pay a small heap price
for both docValues and indexed, right? Thx for your `rawSize=true` finding,
this will be helpful to inspect disk usage.

Best,
Heinz

Am Di., 7. Feb. 2023 um 21:21 Uhr schrieb Mikhail Khludnev :

> Hello, Heinz.
> These data structures reside on disk and only small excerpts ie head is
> kept in heap. So, dropping them you should get some gain in heap, but I
> don't think it's a lot. Indexing footprint should be reduced as well.
> Regarding heap size. I've found a pretty cool param rawSize=true in
>
> http://localhost:8983/solr/admin/collections?action=COLSTATUS&collection=gettingstarted&coreInfo=true&segments=true&fieldInfo=true&sizeInfo=true&rawSize=true
> which estimates field size, but it's a disk size, not heap usage. It seems
> to me that heap size estimation is abandoned.
>
> On Tue, Feb 7, 2023 at 2:34 PM Heinz Hölzer
>  wrote:
>
> > Hi,
> >
> > We are currently optimizing and cleaning up our complex Solr Schema and
> we
> > are wondering if the corresponding data structures of a field with
> > indexed="true" and docValues="true" (inverted index, column-oriented
> > mapping) are loaded from disk into memory when both features are never
> used
> > or accessed? In other words, can we decrease our memory usage by setting
> > indexed and docValues to false even if those features are never used at
> > runtime?
> >
> > Thx and best regards,
> > Heinz
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>


Re: Query time

2023-02-09 Thread Mike
Thank you all



Mikhail Khludnev  schrieb am Mi., 8. Feb. 2023, 19:47:

> I suppose if we sort=_docid_ asc it will break after collecting rows rows.
> Obviously, it disables all scoring, but perhaps you can pre-sort index.
>
> On Wed, Feb 8, 2023 at 7:36 PM Tomás Fernández Löbbe <
> tomasflo...@gmail.com>
> wrote:
>
> > > Then, the long answer is that Apache Solr implements already approaches
> > for
> > > 'early termination' such as Block Max WAND from Solr 8(thanks Lucene
> for
> > > this: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7148045/) to
> optimise
> > > query time and 'skip un-worthy candidates'.
> > >
> >
> > Note that this is not used by default, you need to specify the
> > “minExactCount” parameter.
> >
> >
> >
> https://solr.apache.org/guide/solr/latest/query-guide/common-query-parameters.html#minexactcount-parameter
> >
> >
> > > For your second question, you can use the
> > >
> > >
> >
> https://solr.apache.org/guide/6_6/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter
> > > .
> > > Be aware it's not a panacea and that presented some bugs in some older
> > > versions, so make sure you grab a Solr version that has fixed the
> > problem (
> > > https://issues.apache.org/jira/browse/SOLR-9882)
> > >
> > > Cheers
> > > --
> > > *Alessandro Benedetti*
> > > Director @ Sease Ltd.
> > > *Apache Lucene/Solr Committer*
> > > *Apache Solr PMC Member*
> > >
> > > e-mail: a.benede...@sease.io
> > >
> > >
> > > *Sease* - Information Retrieval Applied
> > > Consulting | Training | Open Source
> > >
> > > Website: Sease.io 
> > > LinkedIn  | Twitter
> > >  | Youtube
> > >  | Github
> > > 
> > >
> > >
> > > On Wed, 8 Feb 2023 at 16:02, Andy Lester  wrote:
> > >
> > > > Please include your schema and some sample queries so we have
> specifics
> > > to
> > > > go on.
> > > >
> > > > > On Feb 8, 2023, at 9:00 AM, Mike  wrote:
> > > > >
> > > > > I have a standalone Solr server and an index of millions of
> > documents.
> > > > > Some queries that e.g. more than 1 million times exist takes a long
> > > time.
> > > > > I only need the first 100 results, can I make solr stop ranking and
> > > sort
> > > > by
> > > > > the first 100 hits?
> > > > > How can i limit the search time of sometimes more than 10 seconds?
> > > > >
> > > > > Thanks
> > > > > Mike
> > > >
> > > >
> > >
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> https://t.me/MUST_SEARCH
> A caveat: Cyrillic!
>


Re: Memory cost of unused indexed and docValues properties

2023-02-09 Thread Mikhail Khludnev
Pls check inside below.

On Fri, Feb 10, 2023 at 9:21 AM Heinz Hölzer
 wrote:

> Hi Mikhail,
>
> Thank a lot for your answer, this means we will only pay a small heap price
> for both docValues and indexed, right?

Exactly. Perhaps this idea expressed
https://www.youtube.com/watch?v=T5RmMNDR5XI
Some time ago Lucene had an API to report heap used by internal
components... until https://issues.apache.org/jira/browse/LUCENE-9387


> Thx for your `rawSize=true` finding,
> this will be helpful to inspect disk usage.
>
> Best,
> Heinz
>
> Am Di., 7. Feb. 2023 um 21:21 Uhr schrieb Mikhail Khludnev <
> m...@apache.org
> >:
>
> > Hello, Heinz.
> > These data structures reside on disk and only small excerpts ie head is
> > kept in heap. So, dropping them you should get some gain in heap, but I
> > don't think it's a lot. Indexing footprint should be reduced as well.
> > Regarding heap size. I've found a pretty cool param rawSize=true in
> >
> >
> http://localhost:8983/solr/admin/collections?action=COLSTATUS&collection=gettingstarted&coreInfo=true&segments=true&fieldInfo=true&sizeInfo=true&rawSize=true
> > which estimates field size, but it's a disk size, not heap usage. It
> seems
> > to me that heap size estimation is abandoned.
> >
> > On Tue, Feb 7, 2023 at 2:34 PM Heinz Hölzer
> >  wrote:
> >
> > > Hi,
> > >
> > > We are currently optimizing and cleaning up our complex Solr Schema and
> > we
> > > are wondering if the corresponding data structures of a field with
> > > indexed="true" and docValues="true" (inverted index, column-oriented
> > > mapping) are loaded from disk into memory when both features are never
> > used
> > > or accessed? In other words, can we decrease our memory usage by
> setting
> > > indexed and docValues to false even if those features are never used at
> > > runtime?
> > >
> > > Thx and best regards,
> > > Heinz
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > https://t.me/MUST_SEARCH
> > A caveat: Cyrillic!
> >
>


-- 
Sincerely yours
Mikhail Khludnev
https://t.me/MUST_SEARCH
A caveat: Cyrillic!