Re: Performance Suggestion for Dense Vectors

2024-03-29 Thread Alessandro Benedetti
Hi Rajani,
the discussion for a centralised Apache Solr blog is in progress (that
allows both linking to private blogs to gather more views or write directly
there), I'll give you an update as soon as the community finalises the
solution.

In the meantime, as Ishan said, posting a quest blog is a possibility in
many blogs.
We've done that as well with an initiative in collaboration with the
University of Padua.

Cheers


--
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io 
LinkedIn  | Twitter
 | Youtube
 | Github



On Fri, 29 Mar 2024 at 07:02, Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> wrote:

> Hi Rajani,
>
> Please feel free to submit guest posts to our SearchScale blog. We welcome
> posts on vector search.
>
> https://SearchScale.com/blog
>
> Thanks,
> Ishan
>
> On Fri, 29 Mar, 2024, 1:18 am rajani m,  wrote:
>
> > @Alessandro,
> > Is there a solr blog site where we can submit work/articles or are you
> > suggesting to post on my own site and share a link here? I prefer the
> > former if there is one because there were times when I had my own,
> > it hardly had any views and on top of that google blogging made me
> migrate
> > from blogs to sites and sites got deprecated. Is there or can we have a
> > solr specific wiki/blog site  where solr users can submit common features
> > configs/modules configs/examples/performance metrics and so onand
> maybe
> > have a voting/likes to confirm it works. We will have one common place to
> > submit and look for.
> >
> >
> >
> > On Thu, Mar 28, 2024 at 3:33 PM rajani m  wrote:
> >
> > > Run the same knn queries at a slow throughput  for 30-60 minutes, this
> > > should warm up disk caches with hnsw index files, and then you should
> > see a
> > > significant drop in the query time. Also make use of "fq" and reduce
> the
> > > document space as much as you can.
> > >
> > > On Thu, Mar 28, 2024 at 12:50 PM Iram Tariq
> > >  wrote:
> > >
> > >> Hi  Alessandro,
> > >>
> > >> Thank you for the feedback. Kindly see my comments below,
> > >>
> > >> *Ale*:
> > >>
> > https://www.elastic.co/blog/accelerating-vector-search-simd-instructions
> ,
> > >> I
> > >> suggest to experiment with simD vector improvements  (unless you are
> > >> already doing it)
> > >>
> > >> * We will try this soon. *
> > >>
> > >> *Ale*: What about the machine memory?
> > >>
> > >> Following is the system specification:  Linux ( CPU:64, RAM:488 GB,
> > >> OS:Ubuntu 20.04.6 )
> > >>
> > >> *Ale*: you can fine-tune the hyper-parameter to compromise a bit on
> > recall
> > >> in favour of performance  (hnswBeamWidth, hnswMaxConnections)
> > >>
> > >> I am trying this as a first step. But I am sure it will impact recall.
> > >>
> > >> Regards,
> > >>
> > >>
> > >> Iram Tariq | Software Architect
> > >>
> > >> NorthBay
> > >>
> > >> Direct:  +1 (902) 329-7329
> > >>
> > >> iram.ta...@northbaysolutions.net
> > >>
> > >> www.northbaysolutions.com
> > >>
> > >>
> > >>
> > >>
> > >> On Thu, Mar 28, 2024 at 5:42 AM Alessandro Benedetti <
> > >> a.benede...@sease.io>
> > >> wrote:
> > >>
> > >> > That's interesting.
> > >> > I think it's vital to get back some performance tests from the
> > >> community.
> > >> > Since my contribution to support Vector-search in Apache Solr was
> > >> merged,
> > >> > we got little or null feedback to understand its performance, in
> > >> real-world
> > >> > use cases.
> > >> > Blogs, open benchmarks or even just this sort of mail message are
> > >> welcome.
> > >> > Let me reply in line:
> > >> > --
> > >> > *Alessandro Benedetti*
> > >> > Director @ Sease Ltd.
> > >> > *Apache Lucene/Solr Committer*
> > >> > *Apache Solr PMC Member*
> > >> >
> > >> > e-mail: a.benede...@sease.io
> > >> >
> > >> >
> > >> > *Sease* - Information Retrieval Applied
> > >> > Consulting | Training | Open Source
> > >> >
> > >> > Website: Sease.io 
> > >> > LinkedIn  | Twitter
> > >> >  | Youtube
> > >> >  | Github
> > >> > 
> > >> >
> > >> >
> > >> > On Wed, 27 Mar 2024 at 21:06, Kent Fitch 
> > wrote:
> > >> >
> > >> > > Hi Iram,
> > >> > >
> > >> > > Is the machine doing lots of IO? If the hnsw graphs are not
> entirely
> > >> in
> > >> > > memory, performance will be poor. What JVM? You may get some
> benefit
> > >> from
> > >> > > simd support in java 21. Can you use the latest quantisation
> changes
> > >> in
> > >> > > Lucene to reduce memory footprint of the hnsw graphs? That's a
> large
> > >> > topk,
> > >> > 

Re: solr9.5.0/solrj9.5.0 bugs in shard request

2024-03-29 Thread Yue Yu
Hi Christine,

Thank you for testing it out. Yes it should be a straightforward fix. I'll
open a ticket then.

Best,

Yue

On Tue, Mar 26, 2024 at 4:50 AM Christine Poerschke (BLOOMBERG/ LONDON) <
cpoersc...@bloomberg.net> wrote:

> Hello Yue,
>
> I'm not familiar with this part of the code but wanted to share that
> changing the Http2SolrClient.java code locally on main branch like this
>
> - Fields fields = new Fields();
> + Fields fields = new Fields(true);
>
> does pass tests when run locally. Though perhaps that could be due to lack
> of test coverage similar to the f.case_sensitive_field.facet.limit=5 &
> f.CASE_SENSITIVE_FIELD.facet.limit=99 usage you mention.
>
> Hope that helps.
>
> Christine
>
> From: users@solr.apache.org At: 03/25/24 16:09:15 UTCTo:
> users@solr.apache.org
> Subject: solr9.5.0/solrj9.5.0 bugs in shard request
>
> Hello,
>
> I found an issue in solr9.5.0/solrj9.5.0 regarding shard requests:
> As of now, the multi-shard requests are sent through Http2SolrClient, and
> this function composes the actual Jetty Request object:
>
> > private Request fillContentStream( Request req, Collection
> > streams, ModifiableSolrParams wparams, boolean isMultipart) throws
> > IOException { if (isMultipart) { // multipart/form-data try
> > (MultiPartRequestContent content = new MultiPartRequestContent()) {
> > Iterator iter = wparams.getParameterNamesIterator(); while
> > (iter.hasNext()) { String key = iter.next(); String[] vals =
> > wparams.getParams(key); if (vals != null) { for (String val : vals) {
> > content.addFieldPart(key, new StringRequestContent(val), null); } } } if
> > (streams != null) { for (ContentStream contentStream : streams) { String
> > contentType = contentStream.getContentType(); if (contentType == null) {
> > contentType = "multipart/form-data"; // default } String name =
> > contentStream.getName(); if (name == null) { name = ""; }
> > HttpFields.Mutable fields = HttpFields.build(1);
> > fields.add(HttpHeader.CONTENT_TYPE, contentType); content.addFilePart(
> > name, contentStream.getName(), new
> > InputStreamRequestContent(contentStream.getStream()), fields); } }
> > req.body(content); } } else { // application/x-www-form-urlencoded Fields
> > fields = new Fields(); Iterator iter =
> > wparams.getParameterNamesIterator(); while (iter.hasNext()) { String key
> =
> > iter.next(); String[] vals = wparams.getParams(key); if (vals != null) {
> > for (String val : vals) { fields.add(key, val); } } } req.body(new
> > FormRequestContent(fields, FALLBACK_CHARSET)); } return req; }
> >
> The problem is the use of this class *Fields fields = new Fields(); * where
> caseSensitive=false by default, this leads to case sensitive solr params
> being merged together. For example f.case_sensitive_field.facet.limit=5 &
> f.CASE_SENSITIVE_FIELD.facet.limit=99
>
> Not sure if this is intentional for some reason?
>
> Best,
>
> Yue
>
>
>


Re: Getting NPE while doing atomic updates using add-distinct for a multivalued field [Solr 8.11.2]

2024-03-29 Thread Susmit Shukla
Ran across similar NullPointerException on Atomic update . Some pointers -
the input update contained more than one update for the same unique id .
the atomic update  operation was doing both 'remove' and add and some value
was null.
add-distinct without null value fixed it.

On Thu, Mar 28, 2024 at 3:54 AM Akhilesh kumar. 
wrote:

> Hello all,
>
> We’re using Solr for our e-commerce platform since many years. We do a lot
> of real-time updates and rely on the atomic updates mostly to update the
> documents. We ran into an issue with the atomic updates with add-distinct
> on the production environment, where we are getting NullPointerException.
>
> Input document: [SolrInputDocument(fields: [id=10001,
> promotionType={add-distinct=WHOLESALE},
> lastUpdatedTime={set=1711433462524}])],
>
> update field definition:  docValues="true" multiValued="true" indexed="false" stored="true"/>
>
> Solr version: 8.11.2 running on Cloud mode.
>
> We were unable to replicate this locally (solr setup in cloud mode running
> single sharded collection with only few doucments) while on production we
> have millions of docs in multi-sharded collection.
>
> Attaching the logs:
> org.apache.solr.client.solrj.impl.CloudSolrClient$RouteException: Error
> from server at {hostname}/solr/collection_shard6_replica_t2737:
> java.lang.NullPointerException at
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.doAddDistinct(AtomicUpdateDocumentMerger.java:466)
>  at
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.mergeDocHavingSameId(AtomicUpdateDocumentMerger.java:174)
>   at
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.mergeChildDocRecursive(AtomicUpdateDocumentMerger.java:115)
> at
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.merge(AtomicUpdateDocumentMerger.java:106)
>  at
> org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:730)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:380)
>   at
> org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:343)
>at
> org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)  at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:343)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:229)
> at
> org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:245)
> at
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110)
>   at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:344)
>  at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:292)
>  at
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:338)
>  at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)  at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:245)
> at
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:303)
>  at
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)  at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:196)
>   at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:131)
>at
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:122)
>   at
> org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:70)
>  at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
>   at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:82)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2637)at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:791)  at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:564) at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357)
>at
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)   at
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
>   at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
>  at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper