Keeping certain stored fields uncompressed

2024-01-26 Thread Srijan
Hi All,

I'm currently facing a significant performance challenge with Apache Solr
9.x and would greatly appreciate any insights or suggestions you might have.

Context:
In my Solr setup, I have a custom post filter that is critical to our
search process. This filter needs to read a specific stored field from each
document during the search. The field in question contains access control
information that's essential for the filter's logic.

Problem:
Since upgrading to Solr 9.x, I've observed a drastic decrease in
performance – approximately 10 to 20 times slower than before. And this
stems from the fact that stored fields in Solr 9.x are now compressed.
Decompressing these fields during each search query has introduced a
substantial performance overhead.

One potential solution for me is to use Binary Field instead of string -
since binary data is not compressed in Lucene. But the issue with this guy
is that I will have to load the entire document to read my binary field.
And since I have a large number of stored fields, this could kill my search
performance.

I'm reaching out to see if anyone in the community has encountered a
similar issue or has suggestions on how to handle this scenario? Is there a
way for me to force Lucene to keep a field uncompressed?

Thanks,
Seez


Solr 9.4 - Help regarding vector search min Similarity threshold with knn parser

2024-01-26 Thread kumar gaurav
HI Everyone

I am using vector search in Solr 9.4. I am using cosine similarity with knn
parser.

Same as the documentation
https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html

Schema



Query
q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]

The problem is it always returns docs even if it's not relevant. Even if I
am using the xyz keyword, knn parser is returning the documents which is
useless. I want to control the similarity of documents. I need highly
similar documents only. Does Solr have any parameter in the knn parser
which controls the similarity threshold ?

*How can I control the minimum Similarity threshold with knn parser ?*

Please help. Thanks in advance.


-- 
Thanks & Regards
Kumar Gaurav


Re: Solr 9.4 - Help regarding vector search min Similarity threshold with knn parser

2024-01-26 Thread Srijan
I have been testing dense vector search on Solr and it's been working great
for me so far. Mine is an image search use case using OpenAI's CLIP model
but the configurations are pretty much the same as yours. What embedding
model are you using? And can you share a portion of the actual query?

On Fri, Jan 26, 2024 at 6:16 AM kumar gaurav  wrote:

> HI Everyone
>
> I am using vector search in Solr 9.4. I am using cosine similarity with knn
> parser.
>
> Same as the documentation
>
> https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
>
> Schema
>  vectorDimension="768" similarityFunction="cosine"/>
> 
>
> Query
> q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
>
> The problem is it always returns docs even if it's not relevant. Even if I
> am using the xyz keyword, knn parser is returning the documents which is
> useless. I want to control the similarity of documents. I need highly
> similar documents only. Does Solr have any parameter in the knn parser
> which controls the similarity threshold ?
>
> *How can I control the minimum Similarity threshold with knn parser ?*
>
> Please help. Thanks in advance.
>
>
> --
> Thanks & Regards
> Kumar Gaurav
>


[dev help wanted] /admin/segments handler: expose the term count

2024-01-26 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Hi Everyone,

Have you used or are you curious about the segments info handler and/or screen? 
https://solr.apache.org/guide/solr/latest/configuration-guide/index-segments-merging.html#segments-info-screen

If so then would you be interested in contributing to the 
https://issues.apache.org/jira/browse/SOLR-17038 issue?

Thanks,
Christine



Re: Solr 9.4 - Help regarding vector search min Similarity threshold with knn parser

2024-01-26 Thread kumar gaurav
HI Srijan

Thanks for replying.

I am using the BERT open source model to generate vectors. Are you aware of
any minSimilary parameter threshold in knn parser ?

I am working with an ecommerce dataset. So I am getting the same non
relevant results and the same score if I am using any invalid search token
which is not present in my index.

I want to apply some kind of minimum similarity threshold so I can
throw out the outliers and can get very nearest documents only.



On Fri, 26 Jan 2024 at 17:05, Srijan  wrote:

> I have been testing dense vector search on Solr and it's been working great
> for me so far. Mine is an image search use case using OpenAI's CLIP model
> but the configurations are pretty much the same as yours. What embedding
> model are you using? And can you share a portion of the actual query?
>
> On Fri, Jan 26, 2024 at 6:16 AM kumar gaurav  wrote:
>
> > HI Everyone
> >
> > I am using vector search in Solr 9.4. I am using cosine similarity with
> knn
> > parser.
> >
> > Same as the documentation
> >
> >
> https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
> >
> > Schema
> >  > vectorDimension="768" similarityFunction="cosine"/>
> > 
> >
> > Query
> > q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
> >
> > The problem is it always returns docs even if it's not relevant. Even if
> I
> > am using the xyz keyword, knn parser is returning the documents which is
> > useless. I want to control the similarity of documents. I need highly
> > similar documents only. Does Solr have any parameter in the knn parser
> > which controls the similarity threshold ?
> >
> > *How can I control the minimum Similarity threshold with knn parser ?*
> >
> > Please help. Thanks in advance.
> >
> >
> > --
> > Thanks & Regards
> > Kumar Gaurav
> >
>


Re: Expanding child document matches with parent fields

2024-01-26 Thread Frederic Font Corbera
Hi,

Thanks for your suggestion. I already tried that, but unfortunately it is
not what I need because it will not sort results according the the child
score (which I need), and also it would return only one parent even if
several of its children would match.

My current solution using the domain property of the JSON facets mode and
duplication the grouping fields in child documents works perfectly.

Thanks a lot for your help!


frederic


--
Frederic Font - ffont.github.io
Music Technology Group, UPF - mtg.upf.edu 
Freesound - freesound.org



On Thu, 25 Jan 2024 at 19:31, Mikhail Khludnev  wrote:

> Probably you are talking about searching parents and then roll over parents
> to children via
>
> https://solr.apache.org/guide/solr/latest/query-guide/document-transformers.html#child-childdoctransformerfactory
>
> On Thu, Jan 25, 2024 at 7:16 PM Frederic Font Corbera <
> frederic.f...@upf.edu>
> wrote:
>
> > Hi Mikhail,
> >
> > Thanks a lot for your quick response! I did not know about that and this
> > seems to be exactly what I was looking for. I did some quick tests with
> the
> > JSON facets API (previously I was using the non-JSON faceting method) and
> > it allows me to query child document but facet by parents, just as you
> > described. This is perfect for me.
> >
> > There is one extra issue that I did not mention in my previous email
> which
> > is, similar to the faceting problem which is now solved, I have a
> grouping
> > problem because I'd like to group child document by a field of the
> parent.
> > Again I could fix that by indexing the parent fields with the child (and
> > because I only need one field it would not be too bad in this case). But
> > maybe there is a similar solution to that of the facets? I searched the
> > docs but could not find it.
> >
> > Thanks a lot!!!
> >
> >
> > frederic
> >
> >
> > --
> > Frederic Font - ffont.github.io
> > Music Technology Group, UPF - mtg.upf.edu 
> > Freesound - freesound.org
> >
> >
> >
> > On Thu, 25 Jan 2024 at 13:02, Mikhail Khludnev  wrote:
> >
> > > Hello Federic,
> > > It sounds like blockParent domain change see.
> > >
> > >
> >
> https://solr.apache.org/guide/solr/latest/query-guide/json-faceting-domain-changes.html#block-join-domain-changes
> > >
> > > On Thu, Jan 25, 2024 at 12:15 PM Frederic Font Corbera <
> > > frederic.f...@upf.edu> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I'm one of the developers behind the Freesound website (
> > > > https://freesound.org, a sound sharing website), we use Solr as our
> > > search
> > > > engine and I'm currently experimenting with a new feature that I'd
> like
> > > to
> > > > implement using Solr. In summary, we have a Solr index with one
> > document
> > > > per sound in our database and we do standard search operations there.
> > > > However, I'd like to add child documents to each of the main
> documents
> > > > which include specific information about the sounds at different
> points
> > > in
> > > > time. For example, I have a main document with basic properties like
> > > sound
> > > > title and tags, but then have N child documents that have a timestamp
> > > field
> > > > and some extra information  associated with that time stamp. Here is
> > > > simplified example of a document that could be indexed (normally my
> > child
> > > > documents would include also dense vector fields):
> > > >
> > > > [
> > > >   {
> > > > "ID": "1",
> > > > "title": "Recording of a street ambience",
> > > > "tags": ['urban', 'ambience', 'dogs', 'birds'],
> > > > "duration": 1:21,
> > > > "events": [{
> > > > "ID": "1/events#0",
> > > > "timestamp": 0:23,
> > > > "event_description": "Dog barking"
> > > >   },{
> > > > "ID": "1/events#1",
> > > > "timestamp": 0:47,
> > > > "event_description": "Bird calls"
> > > >   },{
> > > > "ID": "1/events#2",
> > > > "timestamp": 1:05,
> > > > "event_description": "Dog barking"
> > > >   },
> > > >   ...
> > > > ]
> > > >   },
> > > >   ...
> > > > ]
> > > >
> > > > What I want to achieve is to do a query that matches child documents
> > and
> > > > sorts them according to some score, but I want to do faceting based
> on
> > > > parent document fields. For example, I want to get all documents in
> > > which a
> > > > "Dog barking" event happens (and if a document has 2 such events like
> > in
> > > > the example, I want the document returned 2 times), I want them
> sorted
> > by
> > > > the score of the child document, but I want to include faceting data
> > for,
> > > > e.g. the "duration" field (which refers to the parent document).
> > > >
> > > > One solution would be to duplicate all the parent document fields in
> > > every
> > > > child document at index time. This would work, but then I would get a
> > lot
> > > > of redundant information in the index.
> > > >
> > > > What I think wo

Setting up Basic Authentication on Solr Cloud

2024-01-26 Thread Flowerday, Matthew J
Hi There

I have been tasked with setting up Basic Authentication on our SolrCloud 
database running ZooKeeper 3.8 and Solr 9.1.1. I have got it working I think 
but there are a few things I would like to check.

I set up a security.json file and placed it in the server/solr folder in a 
single server solr system, restarted solr and used the Admin Tool to set up a 
new user (XX) and password and disabled the default solr user. The modifed 
security.json file I then copied in to the solrCloud server/solr folder.

In the solrCloud solr.in.cmd file I set up these variables

set SOLR_AUTH_TYPE=basic
set SOLR_AUTHENTICATION_OPTS=-Dbasicauth=XX:yyy

where XX is the username and yyy is the password.

In the ZooKeeper zkServer.cmd I added an extra parameter

call %JAVA% "-Dzookeeper.log.dir=%ZOO_LOG_DIR%" 
"-Dzookeeper.log.file=%ZOO_LOG_FILE%" 
"-Dzookeeper.DigestAuthenticationProvider.superDigest=XX:mAeU2NK3etoPziu6TdqmLmGjmSN56YUq5hAZhpsPk48=
 O7L5Ofmg8IaXaFbn237ENlKFcC9+NWgCcphYdVJu3es=" 
"-XX:+HeapDumpOnOutOfMemoryError" "-XX:OnOutOfMemoryError=cmd /c taskkill /pid 
p /t /f" -cp "%CLASSPATH%" %ZOOMAIN% "%ZOOCFG%" %*

where XX is the username.

I then deployed the security.json across the ZooKeepers using the command

solr zk cp file:\security.json zk:/security.json -z localhost:

I then stopped solrCloud processes and all the ZooKeepers and then restarted 
all the ZooKeepers and SolrCloud processes. The Admin Tool then needed the 
username and password to log on.

I changed the wildfly standalone.conf.bat file of our application  to have this 
additional java option

REM set up Solr Basic Authentication
set JAVA_OPTS=%JAVA_OPTS% 
-Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory
 -Dbasicauth=XX: yyy

and bounced wildfly. I then logged on to the application and could carry out 
solr searches ok and updates. So, all looks good.

My concern is have I properly configured ZooKeeper for basic authentication.  
In the solr.in.cmd file there is this set up

REM Settings for ZK ACL
REM set 
SOLR_ZK_CREDS_AND_ACLS=-DzkACLProvider=org.apache.solr.common.cloud.DigestZkACLProvider
 ^
REM  
-DzkCredentialsProvider=org.apache.solr.common.cloud.DigestZkCredentialsProvider
 ^
REM  
-DzkCredentialsInjector=org.apache.solr.common.cloud.VMParamsZkCredentialsInjector
 ^
REM  -DzkDigestUsername=admin-user -DzkDigestPassword=CHANGEME-ADMIN-PASSWORD ^
REM  -DzkDigestReadonlyUsername=readonly-user 
-DzkDigestReadonlyPassword=CHANGEME-READONLY-PASSWORD
REM set SOLR_OPTS=%SOLR_OPTS% %SOLR_ZK_CREDS_AND_ACLS%

Which I have not altered. Should I have set up 'admin-user'  and 'readonly-user 
' and given a passwords for each to have properly configured ZooKeeper or is 
what I have done (which seems to work ok) fine and solr/ZooKeeper are using the 
solr user details I set up correctly.

Many Thanks

Matthew

Matthew Flowerday | Consultant | IMS - Information Management System
Unisys | matthew.flower...@unisys.com
ENIGMA, Wavendon Business Park, Milton Keynes, MK17 8LX

[cid:image001.png@01DA506D.D4B63820]

THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
MATERIAL and is for use only by the intended recipient. If you received this in 
error, please contact the sender and delete the e-mail and its attachments from 
all devices.
[Grey_LI]   [Grey_TW] 
[Grey_YT] 
   [Grey_FB] 
   [Grey_Vimeo]   
 [Grey_UB] 




Re: Solr 9.4 - Help regarding vector search min Similarity threshold with knn parser

2024-01-26 Thread Charlie Hull

Hi Kumar,

kNN will return the k closest vectors, which as you've found out may not 
be very close at all. Most of the approaches we're seeing as we work 
with e-commerce clients involve combining kNN with a standard, lexical 
search in some way - combining the results from both, or using one to 
boost certain results. You might find this blog useful as it discusses 
some strategies for coping with what you've found 
https://opensourceconnections.com/blog/2023/03/22/building-vector-search-in-chorus-a-technical-deep-dive/


best

Charlie


On 26/01/2024 12:18, kumar gaurav wrote:

HI Srijan

Thanks for replying.

I am using the BERT open source model to generate vectors. Are you aware of
any minSimilary parameter threshold in knn parser ?

I am working with an ecommerce dataset. So I am getting the same non
relevant results and the same score if I am using any invalid search token
which is not present in my index.

I want to apply some kind of minimum similarity threshold so I can
throw out the outliers and can get very nearest documents only.



On Fri, 26 Jan 2024 at 17:05, Srijan  wrote:


I have been testing dense vector search on Solr and it's been working great
for me so far. Mine is an image search use case using OpenAI's CLIP model
but the configurations are pretty much the same as yours. What embedding
model are you using? And can you share a portion of the actual query?

On Fri, Jan 26, 2024 at 6:16 AM kumar gaurav  wrote:


HI Everyone

I am using vector search in Solr 9.4. I am using cosine similarity with

knn

parser.

Same as the documentation



https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html

Schema



Query
q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]

The problem is it always returns docs even if it's not relevant. Even if

I

am using the xyz keyword, knn parser is returning the documents which is
useless. I want to control the similarity of documents. I need highly
similar documents only. Does Solr have any parameter in the knn parser
which controls the similarity threshold ?

*How can I control the minimum Similarity threshold with knn parser ?*

Please help. Thanks in advance.


--
Thanks & Regards
Kumar Gaurav


--
Charlie Hull - Managing Consultant at OpenSource Connections Limited
Founding member of The Search Network and co-author of Searching the Enterprise
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828

OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
Amtsgericht Charlottenburg | HRB 230712 B
Geschäftsführer: John M. Woodell | David E. Pugh
Finanzamt: Berlin Finanzamt für Körperschaften II



Re: Solr 9.4 - Help regarding vector search min Similarity threshold with knn parser

2024-01-26 Thread Alessandro Benedetti
Hi Kumar,
Knn search in Apache Solr doesn't support any min-threshold parameter.
To be honest, even if it did, you wouldn't be in a much better position:
your perceived relevance won't necessarily match the 0-1 cosine similarity
between your query and your vectors, and what you consider highly relevant
may have a score of 0.35 for one query and 0.96 for another.
Having such a parameter just delegates to the user the pain of setting up a
useful threshold, which, trust me, it's not an easy (or maybe doable) job.

It's on my roadmap to add a sort of auto-cutting functionality based on the
document score and Lucene also added a threshold-based search (which we may
or may not port to Apache Solr).
In the meantime, you can play with Hybrid Search (which will also be
improved in the future):
https://sease.io/2023/12/hybrid-search-with-apache-solr.html

Cheers

--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Fri, 26 Jan 2024 at 17:01, Charlie Hull 
wrote:

> Hi Kumar,
>
> kNN will return the k closest vectors, which as you've found out may not
> be very close at all. Most of the approaches we're seeing as we work
> with e-commerce clients involve combining kNN with a standard, lexical
> search in some way - combining the results from both, or using one to
> boost certain results. You might find this blog useful as it discusses
> some strategies for coping with what you've found
>
> https://opensourceconnections.com/blog/2023/03/22/building-vector-search-in-chorus-a-technical-deep-dive/
>
> best
>
> Charlie
>
>
> On 26/01/2024 12:18, kumar gaurav wrote:
> > HI Srijan
> >
> > Thanks for replying.
> >
> > I am using the BERT open source model to generate vectors. Are you aware
> of
> > any minSimilary parameter threshold in knn parser ?
> >
> > I am working with an ecommerce dataset. So I am getting the same non
> > relevant results and the same score if I am using any invalid search
> token
> > which is not present in my index.
> >
> > I want to apply some kind of minimum similarity threshold so I can
> > throw out the outliers and can get very nearest documents only.
> >
> >
> >
> > On Fri, 26 Jan 2024 at 17:05, Srijan  wrote:
> >
> >> I have been testing dense vector search on Solr and it's been working
> great
> >> for me so far. Mine is an image search use case using OpenAI's CLIP
> model
> >> but the configurations are pretty much the same as yours. What embedding
> >> model are you using? And can you share a portion of the actual query?
> >>
> >> On Fri, Jan 26, 2024 at 6:16 AM kumar gaurav  wrote:
> >>
> >>> HI Everyone
> >>>
> >>> I am using vector search in Solr 9.4. I am using cosine similarity with
> >> knn
> >>> parser.
> >>>
> >>> Same as the documentation
> >>>
> >>>
> >>
> https://solr.apache.org/guide/solr/latest/query-guide/dense-vector-search.html
> >>> Schema
> >>>  >>> vectorDimension="768" similarityFunction="cosine"/>
> >>> 
> >>>
> >>> Query
> >>> q={!knn f=vector topK=10}[1.0, 2.0, 3.0, 4.0]
> >>>
> >>> The problem is it always returns docs even if it's not relevant. Even
> if
> >> I
> >>> am using the xyz keyword, knn parser is returning the documents which
> is
> >>> useless. I want to control the similarity of documents. I need highly
> >>> similar documents only. Does Solr have any parameter in the knn parser
> >>> which controls the similarity threshold ?
> >>>
> >>> *How can I control the minimum Similarity threshold with knn parser ?*
> >>>
> >>> Please help. Thanks in advance.
> >>>
> >>>
> >>> --
> >>> Thanks & Regards
> >>> Kumar Gaurav
> >>>
> --
> Charlie Hull - Managing Consultant at OpenSource Connections Limited
> Founding member of The Search Network and co-author of Searching the
> Enterprise
> tel/fax: +44 (0)8700 118334
> mobile: +44 (0)7767 825828
>
> OpenSource Connections Europe GmbH | Pappelallee 78/79 | 10437 Berlin
> Amtsgericht Charlottenburg | HRB 230712 B
> Geschäftsführer: John M. Woodell | David E. Pugh
> Finanzamt: Berlin Finanzamt für Körperschaften II
>
>


Re: Keeping certain stored fields uncompressed

2024-01-26 Thread Shawn Heisey

On 1/26/24 03:38, Srijan wrote:

Since upgrading to Solr 9.x, I've observed a drastic decrease in
performance – approximately 10 to 20 times slower than before. And this
stems from the fact that stored fields in Solr 9.x are now compressed.
Decompressing these fields during each search query has introduced a
substantial performance overhead.


Stored fields have been compressed since Solr 4.1.0.  Unless you 
upgraded from a VERY old version, stored field compression is unlikely 
to be the source of your performance issue.


You can have the source of the data uncompressed if you set stored to 
false, docValues to true, and useDocValuesAsStored to true.  Not all 
fieldtype classes support docValues, though -- in particular TextField 
does not.


Thanks,
Shawn



Re: [dev help wanted] /admin/segments handler: expose the term count

2024-01-26 Thread Rahul Goswami
I would love to take this up.

On Fri, Jan 26, 2024 at 6:46 AM Christine Poerschke (BLOOMBERG/ LONDON) <
cpoersc...@bloomberg.net> wrote:

> Hi Everyone,
>
> Have you used or are you curious about the segments info handler and/or
> screen?
> https://solr.apache.org/guide/solr/latest/configuration-guide/index-segments-merging.html#segments-info-screen
>
> If so then would you be interested in contributing to the
> https://issues.apache.org/jira/browse/SOLR-17038 issue?
>
> Thanks,
> Christine
>
>


Re: Keeping certain stored fields uncompressed

2024-01-26 Thread Mikhail Khludnev
Hello.
Agreed. By default it's BEST_SPEED which is LZ4. So, it can't be faster
or less compressive. Binary DocValues Field should be an answer.

On Fri, Jan 26, 2024 at 9:41 PM Shawn Heisey 
wrote:

> On 1/26/24 03:38, Srijan wrote:
> > Since upgrading to Solr 9.x, I've observed a drastic decrease in
> > performance – approximately 10 to 20 times slower than before. And this
> > stems from the fact that stored fields in Solr 9.x are now compressed.
> > Decompressing these fields during each search query has introduced a
> > substantial performance overhead.
>
> Stored fields have been compressed since Solr 4.1.0.  Unless you
> upgraded from a VERY old version, stored field compression is unlikely
> to be the source of your performance issue.
>
> You can have the source of the data uncompressed if you set stored to
> false, docValues to true, and useDocValuesAsStored to true.  Not all
> fieldtype classes support docValues, though -- in particular TextField
> does not.
>
> Thanks,
> Shawn
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Expanding child document matches with parent fields

2024-01-26 Thread Mikhail Khludnev
Hi, I don't fully follow, but I remember that there's a function for
sorting parents by matching children
https://solr.apache.org/guide/solr/latest/query-guide/function-queries.html#childfieldfield-function
unfortunately its' vice versa stuck in implementation
https://issues.apache.org/jira/browse/SOLR-10701
Can you confirm that sort=parentfield() will suite for your problem?

On Fri, Jan 26, 2024 at 4:01 PM Frederic Font Corbera 
wrote:

> Hi,
>
> Thanks for your suggestion. I already tried that, but unfortunately it is
> not what I need because it will not sort results according the the child
> score (which I need), and also it would return only one parent even if
> several of its children would match.
>
> My current solution using the domain property of the JSON facets mode and
> duplication the grouping fields in child documents works perfectly.
>
> Thanks a lot for your help!
>
>
> frederic
>
>
> --
> Frederic Font - ffont.github.io
> Music Technology Group, UPF - mtg.upf.edu 
> Freesound - freesound.org
>
>
>
> On Thu, 25 Jan 2024 at 19:31, Mikhail Khludnev  wrote:
>
> > Probably you are talking about searching parents and then roll over
> parents
> > to children via
> >
> >
> https://solr.apache.org/guide/solr/latest/query-guide/document-transformers.html#child-childdoctransformerfactory
> >
> > On Thu, Jan 25, 2024 at 7:16 PM Frederic Font Corbera <
> > frederic.f...@upf.edu>
> > wrote:
> >
> > > Hi Mikhail,
> > >
> > > Thanks a lot for your quick response! I did not know about that and
> this
> > > seems to be exactly what I was looking for. I did some quick tests with
> > the
> > > JSON facets API (previously I was using the non-JSON faceting method)
> and
> > > it allows me to query child document but facet by parents, just as you
> > > described. This is perfect for me.
> > >
> > > There is one extra issue that I did not mention in my previous email
> > which
> > > is, similar to the faceting problem which is now solved, I have a
> > grouping
> > > problem because I'd like to group child document by a field of the
> > parent.
> > > Again I could fix that by indexing the parent fields with the child
> (and
> > > because I only need one field it would not be too bad in this case).
> But
> > > maybe there is a similar solution to that of the facets? I searched the
> > > docs but could not find it.
> > >
> > > Thanks a lot!!!
> > >
> > >
> > > frederic
> > >
> > >
> > > --
> > > Frederic Font - ffont.github.io
> > > Music Technology Group, UPF - mtg.upf.edu <
> https://www.upf.edu/web/mtg/>
> > > Freesound - freesound.org
> > >
> > >
> > >
> > > On Thu, 25 Jan 2024 at 13:02, Mikhail Khludnev 
> wrote:
> > >
> > > > Hello Federic,
> > > > It sounds like blockParent domain change see.
> > > >
> > > >
> > >
> >
> https://solr.apache.org/guide/solr/latest/query-guide/json-faceting-domain-changes.html#block-join-domain-changes
> > > >
> > > > On Thu, Jan 25, 2024 at 12:15 PM Frederic Font Corbera <
> > > > frederic.f...@upf.edu> wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I'm one of the developers behind the Freesound website (
> > > > > https://freesound.org, a sound sharing website), we use Solr as
> our
> > > > search
> > > > > engine and I'm currently experimenting with a new feature that I'd
> > like
> > > > to
> > > > > implement using Solr. In summary, we have a Solr index with one
> > > document
> > > > > per sound in our database and we do standard search operations
> there.
> > > > > However, I'd like to add child documents to each of the main
> > documents
> > > > > which include specific information about the sounds at different
> > points
> > > > in
> > > > > time. For example, I have a main document with basic properties
> like
> > > > sound
> > > > > title and tags, but then have N child documents that have a
> timestamp
> > > > field
> > > > > and some extra information  associated with that time stamp. Here
> is
> > > > > simplified example of a document that could be indexed (normally my
> > > child
> > > > > documents would include also dense vector fields):
> > > > >
> > > > > [
> > > > >   {
> > > > > "ID": "1",
> > > > > "title": "Recording of a street ambience",
> > > > > "tags": ['urban', 'ambience', 'dogs', 'birds'],
> > > > > "duration": 1:21,
> > > > > "events": [{
> > > > > "ID": "1/events#0",
> > > > > "timestamp": 0:23,
> > > > > "event_description": "Dog barking"
> > > > >   },{
> > > > > "ID": "1/events#1",
> > > > > "timestamp": 0:47,
> > > > > "event_description": "Bird calls"
> > > > >   },{
> > > > > "ID": "1/events#2",
> > > > > "timestamp": 1:05,
> > > > > "event_description": "Dog barking"
> > > > >   },
> > > > >   ...
> > > > > ]
> > > > >   },
> > > > >   ...
> > > > > ]
> > > > >
> > > > > What I want to achieve is to do a query that matches child
> documents
> > > and
> > > > > sorts them according to some 

Re: Keeping certain stored fields uncompressed

2024-01-26 Thread Srijan
I stand corrected. Looks like my stored fields were compressed in Solr 8.11
too. But something seems to have changed in 9.x. Decompression is awfully
slow. New algorithm?

Regarding binary field, Solr doesn't allow docvalues for binary field (btw
Lucene does). So I tried using stored binary field but now I have to load
the entire document to read this one stored field (and I have tons of
stored fields)


On Fri, Jan 26, 2024, 2:44 PM Mikhail Khludnev  wrote:

> Hello.
> Agreed. By default it's BEST_SPEED which is LZ4. So, it can't be faster
> or less compressive. Binary DocValues Field should be an answer.
>
> On Fri, Jan 26, 2024 at 9:41 PM Shawn Heisey  >
> wrote:
>
> > On 1/26/24 03:38, Srijan wrote:
> > > Since upgrading to Solr 9.x, I've observed a drastic decrease in
> > > performance – approximately 10 to 20 times slower than before. And this
> > > stems from the fact that stored fields in Solr 9.x are now compressed.
> > > Decompressing these fields during each search query has introduced a
> > > substantial performance overhead.
> >
> > Stored fields have been compressed since Solr 4.1.0.  Unless you
> > upgraded from a VERY old version, stored field compression is unlikely
> > to be the source of your performance issue.
> >
> > You can have the source of the data uncompressed if you set stored to
> > false, docValues to true, and useDocValuesAsStored to true.  Not all
> > fieldtype classes support docValues, though -- in particular TextField
> > does not.
> >
> > Thanks,
> > Shawn
> >
> >
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Keeping certain stored fields uncompressed

2024-01-26 Thread ufuk yılmaz
Hi, just curious, may I ask how did you come to the conclusion that the 
compression of fields is the cause of slowness in 9.4?

—

> On 26 Jan 2024, at 23:13, Srijan  wrote:
> 
> I stand corrected. Looks like my stored fields were compressed in Solr 8.11
> too. But something seems to have changed in 9.x. Decompression is awfully
> slow. New algorithm?
> 
> Regarding binary field, Solr doesn't allow docvalues for binary field (btw
> Lucene does). So I tried using stored binary field but now I have to load
> the entire document to read this one stored field (and I have tons of
> stored fields)
> 
> 
>> On Fri, Jan 26, 2024, 2:44 PM Mikhail Khludnev  wrote:
>> 
>> Hello.
>> Agreed. By default it's BEST_SPEED which is LZ4. So, it can't be faster
>> or less compressive. Binary DocValues Field should be an answer.
>> 
>> On Fri, Jan 26, 2024 at 9:41 PM Shawn Heisey >> 
>> wrote:
>> 
 On 1/26/24 03:38, Srijan wrote:
 Since upgrading to Solr 9.x, I've observed a drastic decrease in
 performance – approximately 10 to 20 times slower than before. And this
 stems from the fact that stored fields in Solr 9.x are now compressed.
 Decompressing these fields during each search query has introduced a
 substantial performance overhead.
>>> 
>>> Stored fields have been compressed since Solr 4.1.0.  Unless you
>>> upgraded from a VERY old version, stored field compression is unlikely
>>> to be the source of your performance issue.
>>> 
>>> You can have the source of the data uncompressed if you set stored to
>>> false, docValues to true, and useDocValuesAsStored to true.  Not all
>>> fieldtype classes support docValues, though -- in particular TextField
>>> does not.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> 



Re: Keeping certain stored fields uncompressed

2024-01-26 Thread Walter Underwood
You seem to be jumping to conclusions about causes. Might want to step back and 
do some measurements.

Try eliminating parts of the query one at a time, including returning fields. 
You might need to do this with a query set of a few thousand queries to avoid 
cache effects.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jan 26, 2024, at 12:13 PM, Srijan  wrote:
> 
> I stand corrected. Looks like my stored fields were compressed in Solr 8.11
> too. But something seems to have changed in 9.x. Decompression is awfully
> slow. New algorithm?
> 
> Regarding binary field, Solr doesn't allow docvalues for binary field (btw
> Lucene does). So I tried using stored binary field but now I have to load
> the entire document to read this one stored field (and I have tons of
> stored fields)
> 
> 
> On Fri, Jan 26, 2024, 2:44 PM Mikhail Khludnev  wrote:
> 
>> Hello.
>> Agreed. By default it's BEST_SPEED which is LZ4. So, it can't be faster
>> or less compressive. Binary DocValues Field should be an answer.
>> 
>> On Fri, Jan 26, 2024 at 9:41 PM Shawn Heisey >> 
>> wrote:
>> 
>>> On 1/26/24 03:38, Srijan wrote:
 Since upgrading to Solr 9.x, I've observed a drastic decrease in
 performance – approximately 10 to 20 times slower than before. And this
 stems from the fact that stored fields in Solr 9.x are now compressed.
 Decompressing these fields during each search query has introduced a
 substantial performance overhead.
>>> 
>>> Stored fields have been compressed since Solr 4.1.0.  Unless you
>>> upgraded from a VERY old version, stored field compression is unlikely
>>> to be the source of your performance issue.
>>> 
>>> You can have the source of the data uncompressed if you set stored to
>>> false, docValues to true, and useDocValuesAsStored to true.  Not all
>>> fieldtype classes support docValues, though -- in particular TextField
>>> does not.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> 



Re: Keeping certain stored fields uncompressed

2024-01-26 Thread Mikhail Khludnev
On Fri, Jan 26, 2024 at 11:14 PM Srijan  wrote:

> Regarding binary field, Solr doesn't allow docvalues for binary field (btw
> Lucene does).

https://solr.apache.org/guide/solr/latest/indexing-guide/field-types-included-with-solr.html
mentions BinaryField
FWIW BinaryDocValues has no compression anymore
https://issues.apache.org/jira/browse/LUCENE-9843


> So I tried using stored binary field but now I have to load
> the entire document to read this one stored field (and I have tons of
> stored fields)
>
Well intentions for docValues was to read a single field, in contrast to a
stored field which always reads the whole record (document) despite the
"filter".
Presumably some Solr logic might somehow impact it, but that is what I
expect with docvalues - read a certain field only.


>
>
> On Fri, Jan 26, 2024, 2:44 PM Mikhail Khludnev  wrote:
>
> > Hello.
> > Agreed. By default it's BEST_SPEED which is LZ4. So, it can't be faster
> > or less compressive. Binary DocValues Field should be an answer.
> >
> > On Fri, Jan 26, 2024 at 9:41 PM Shawn Heisey
>  > >
> > wrote:
> >
> > > On 1/26/24 03:38, Srijan wrote:
> > > > Since upgrading to Solr 9.x, I've observed a drastic decrease in
> > > > performance – approximately 10 to 20 times slower than before. And
> this
> > > > stems from the fact that stored fields in Solr 9.x are now
> compressed.
> > > > Decompressing these fields during each search query has introduced a
> > > > substantial performance overhead.
> > >
> > > Stored fields have been compressed since Solr 4.1.0.  Unless you
> > > upgraded from a VERY old version, stored field compression is unlikely
> > > to be the source of your performance issue.
> > >
> > > You can have the source of the data uncompressed if you set stored to
> > > false, docValues to true, and useDocValuesAsStored to true.  Not all
> > > fieldtype classes support docValues, though -- in particular TextField
> > > does not.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>


-- 
Sincerely yours
Mikhail Khludnev


Re: Setting up Basic Authentication on Solr Cloud

2024-01-26 Thread Jan Høydahl
Hi,

You probably want to enable SSL for Solr if you use BasicAuth.
For ZK, ACL protection could be the first step, as described in 
https://solr.apache.org/guide/solr/latest/deployment-guide/zookeeper-access-control.html
Protecting ZK connection with SSL is probably also smart, but it is 
unfortunately not well documented in ref.guide, so you have to find 
documentation elsewhere.

Jan

> 26. jan. 2024 kl. 17:17 skrev Flowerday, Matthew J 
> :
> 
> Hi There
>  
> I have been tasked with setting up Basic Authentication on our SolrCloud 
> database running ZooKeeper 3.8 and Solr 9.1.1. I have got it working I think 
> but there are a few things I would like to check.
>  
> I set up a security.json file and placed it in the server/solr folder in a 
> single server solr system, restarted solr and used the Admin Tool to set up a 
> new user (XX) and password and disabled the default solr user. The 
> modifed security.json file I then copied in to the solrCloud server/solr 
> folder.
>  
> In the solrCloud solr.in.cmd file I set up these variables
>  
> set SOLR_AUTH_TYPE=basic
> set SOLR_AUTHENTICATION_OPTS=-Dbasicauth=XX:yyy
>  
> where XX is the username and yyy is the password.
>  
> In the ZooKeeper zkServer.cmd I added an extra parameter
>  
> call %JAVA% "-Dzookeeper.log.dir=%ZOO_LOG_DIR%" 
> "-Dzookeeper.log.file=%ZOO_LOG_FILE%" 
> "-Dzookeeper.DigestAuthenticationProvider.superDigest=XX:mAeU2NK3etoPziu6TdqmLmGjmSN56YUq5hAZhpsPk48=
>  O7L5Ofmg8IaXaFbn237ENlKFcC9+NWgCcphYdVJu3es=" 
> "-XX:+HeapDumpOnOutOfMemoryError" "-XX:OnOutOfMemoryError=cmd /c taskkill 
> /pid p /t /f" -cp "%CLASSPATH%" %ZOOMAIN% "%ZOOCFG%" %*
>  
> where XX is the username.
>  
> I then deployed the security.json across the ZooKeepers using the command
>  
> solr zk cp file:\security.json zk:/security.json -z localhost:  id>
>  
> I then stopped solrCloud processes and all the ZooKeepers and then restarted 
> all the ZooKeepers and SolrCloud processes. The Admin Tool then needed the 
> username and password to log on.
>  
> I changed the wildfly standalone.conf.bat file of our application  to have 
> this additional java option
>  
> REM set up Solr Basic Authentication
> set JAVA_OPTS=%JAVA_OPTS% 
> -Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory
>  -Dbasicauth=XX: yyy
>  
> and bounced wildfly. I then logged on to the application and could carry out 
> solr searches ok and updates. So, all looks good.
>  
> My concern is have I properly configured ZooKeeper for basic authentication.  
> In the solr.in.cmd file there is this set up
>  
> REM Settings for ZK ACL
> REM set 
> SOLR_ZK_CREDS_AND_ACLS=-DzkACLProvider=org.apache.solr.common.cloud.DigestZkACLProvider
>  ^
> REM  
> -DzkCredentialsProvider=org.apache.solr.common.cloud.DigestZkCredentialsProvider
>  ^
> REM  
> -DzkCredentialsInjector=org.apache.solr.common.cloud.VMParamsZkCredentialsInjector
>  ^
> REM  -DzkDigestUsername=admin-user -DzkDigestPassword=CHANGEME-ADMIN-PASSWORD 
> ^
> REM  -DzkDigestReadonlyUsername=readonly-user 
> -DzkDigestReadonlyPassword=CHANGEME-READONLY-PASSWORD
> REM set SOLR_OPTS=%SOLR_OPTS% %SOLR_ZK_CREDS_AND_ACLS%
>  
> Which I have not altered. Should I have set up ‘admin-user’  and 
> ‘readonly-user ‘ and given a passwords for each to have properly configured 
> ZooKeeper or is what I have done (which seems to work ok) fine and 
> solr/ZooKeeper are using the solr user details I set up correctly.
>  
> Many Thanks
>  
> Matthew
>  
> Matthew Flowerday | Consultant | IMS – Information Management System
> Unisys | matthew.flower...@unisys.com 
> ENIGMA, Wavendon Business Park, Milton Keynes, MK17 8LX
>  
>  
>  
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY 
> MATERIAL and is for use only by the intended recipient. If you received this 
> in error, please contact the sender and delete the e-mail and its attachments 
> from all devices.
>    
>      
>       
>