Re: Solr Search not working

2023-11-29 Thread Markus Jelsma
Hi - this is a task for Solr's spellchecker component. Even though
misspelled "dialblo" gives results (due to misspelled content), the
spellchecker component can still spellcheck the input and come up with a
suitable spellchecked suggestion.

Regards,
Markus

[1] https://solr.apache.org/guide/8_11/spell-checking.html

Op wo 29 nov 2023 om 12:48 schreef Raj Krishna :

> Hi Team,
>
>
>
> Search is not providing me results. Please check the below example
>
>
>
> When I search for “diablo”, its giving me results and logs
>
>
>
>
>
> 2023-11-29 11:07:21.839 INFO  (qtp1311844206-227) [   x:drupalcollection]
> o.a.s.c.S.Request [drupalcollection]  webapp=/solr path=/select
> params={f.ss_name_1.facet.limit=-1&facet.field={!key%3Dss_name+ex%3Dfacet:name}ss_name&facet.field={!key%3Dss_name_2+ex%3Dfacet:name_2}ss_name_2&facet.field={!key%3Dss_name_1+ex%3Dfacet:name_1}ss_name_1&
> json.nl
> =flat&f.ss_name.facet.missing=false&TZ=America/Toronto&fl=ss_search_api_id,ss_search_api_language,score,hash&f.ss_name_2.facet.limit=-1&start=0&f.ss_name_1.facet.missing=false&facet.missing=false&sort=sort_X3b_en_field_version_number+desc,ds_created+desc,score+desc,its_field_product+asc,its_field_release+asc&fq=(%2Bbs_status:"true"+%2Bss_type:"book")&fq=%2Bindex_id:solr_index&fq=ss_search_api_language:("en"+"und")&rows=12&f.ss_name.facet.limit=-1&f.ss_name_2.facet.missing=false&q={!boost+b%3Dboost_document}++{!payload_score+f%3Dboost_term+v%3D"diablo"+func%3Dmax}+(tm_X3b_en_body:("
> diablo")^1+tm_X3b_und_body:("*diablo*")^1+tm_X3b_en_title:("diablo
> ")^8+tm_X3b_und_title:("*diablo*")^8)&facet.limit=10&omitHeader=true&facet.mincount=1&wt=json&facet=true&facet.sort=count}
> hits=*861* status=0 QTime=2
>
>
>
>
>
> When I search for “dialblo”, it’s not giving me any results and logs
>
>
>
>
>
> 2023-11-29 10:54:20.899 INFO  (qtp1311844206-221) [   x:drupalcollection] 
> o.a.s.c.S.Request [drupalcollection]  webapp=/solr path=/select 
> params={f.ss_name_1.facet.limit=-1&facet.field={!key%3Dss_name+ex%3Dfacet:name}ss_name&facet.field={!key%3Dss_name_2+ex%3Dfacet:name_2}ss_name_2&facet.field={!key%3Dss_name_1+ex%3Dfacet:name_1}ss_name_1&json.nl=flat&f.ss_name.facet.missing=false&TZ=America/Toronto&fl=ss_search_api_id,ss_search_api_language,score,hash&f.ss_name_2.facet.limit=-1&start=0&f.ss_name_1.facet.missing=false&facet.missing=false&sort=sort_X3b_en_field_version_number+desc,ds_created+desc,score+desc,its_field_product+asc,its_field_release+asc&fq=(%2Bbs_status:"true"+%2Bss_type:"book")&fq=%2Bindex_id:solr_index&fq=ss_search_api_language:("en"+"und")&rows=12&f.ss_name.facet.limit=-1&f.ss_name_2.facet.missing=false&q={!boost+b%3Dboost_document}++{!payload_score+f%3Dboost_term+v%3D"dialblo"+func%3Dmax}+(tm_X3b_en_body:("dialblo")^1+tm_X3b_und_body:("dialblo")^1+tm_X3b_en_title:("dialblo")^8+tm_X3b_und_title:("dialblo")^8)&facet.limit=10&omitHeader=true&facet.mincount=1&wt=json&facet=true&facet.sort=count}
>  hits=0 status=0 QTime=0
>
>
>
>
>
> Though both the words are present in my same content.
>
>
>
> Can you suggest why this issue is showing up? How to debug this? and how
> to fix this?
>
>
>
> Note that everything is indexed
>
>
>
> I have attached config.zip file also
>
>
>
> Thanks
>
> Raj
>
> *Disclaimer:*
> This communication (including any attachments) is intended for the use of
> the intended recipient(s) only and may contain information that is
> considered confidential, proprietary, sensitive and/or otherwise legally
> protected. Any unauthorized use or dissemination of this communication is
> strictly prohibited. If you have received this communication in error,
> please immediately notify the sender by return e-mail message and delete
> all copies of the original communication. Thank you for your cooperation.
>


Re: Solr Search not working

2023-11-29 Thread Mikhail Khludnev
Hello Raj.
I think there should be a hit since "dialblo" seems like occurring in
original content. Unfortunately, there are a plenty of reasons why it
doesn't come out and devil in details, you know.
Perhaps the most straightforward way is to request debugQuery=true for
"diablo", then pick a matching document id, and use it in
explainOther=id:123 to find out why it doesn't match "dialblo"

On Wed, Nov 29, 2023 at 2:48 PM Raj Krishna  wrote:

> Hi Team,
>
>
>
> Search is not providing me results. Please check the below example
>
>
>
> When I search for “diablo”, its giving me results and logs
>
>
>
>
>
> 2023-11-29 11:07:21.839 INFO  (qtp1311844206-227) [   x:drupalcollection]
> o.a.s.c.S.Request [drupalcollection]  webapp=/solr path=/select
> params={f.ss_name_1.facet.limit=-1&facet.field={!key%3Dss_name+ex%3Dfacet:name}ss_name&facet.field={!key%3Dss_name_2+ex%3Dfacet:name_2}ss_name_2&facet.field={!key%3Dss_name_1+ex%3Dfacet:name_1}ss_name_1&
> json.nl
> =flat&f.ss_name.facet.missing=false&TZ=America/Toronto&fl=ss_search_api_id,ss_search_api_language,score,hash&f.ss_name_2.facet.limit=-1&start=0&f.ss_name_1.facet.missing=false&facet.missing=false&sort=sort_X3b_en_field_version_number+desc,ds_created+desc,score+desc,its_field_product+asc,its_field_release+asc&fq=(%2Bbs_status:"true"+%2Bss_type:"book")&fq=%2Bindex_id:solr_index&fq=ss_search_api_language:("en"+"und")&rows=12&f.ss_name.facet.limit=-1&f.ss_name_2.facet.missing=false&q={!boost+b%3Dboost_document}++{!payload_score+f%3Dboost_term+v%3D"diablo"+func%3Dmax}+(tm_X3b_en_body:("
> diablo")^1+tm_X3b_und_body:("*diablo*")^1+tm_X3b_en_title:("diablo
> ")^8+tm_X3b_und_title:("*diablo*")^8)&facet.limit=10&omitHeader=true&facet.mincount=1&wt=json&facet=true&facet.sort=count}
> hits=*861* status=0 QTime=2
>
>
>
>
>
> When I search for “dialblo”, it’s not giving me any results and logs
>
>
>
>
>
> 2023-11-29 10:54:20.899 INFO  (qtp1311844206-221) [   x:drupalcollection] 
> o.a.s.c.S.Request [drupalcollection]  webapp=/solr path=/select 
> params={f.ss_name_1.facet.limit=-1&facet.field={!key%3Dss_name+ex%3Dfacet:name}ss_name&facet.field={!key%3Dss_name_2+ex%3Dfacet:name_2}ss_name_2&facet.field={!key%3Dss_name_1+ex%3Dfacet:name_1}ss_name_1&json.nl=flat&f.ss_name.facet.missing=false&TZ=America/Toronto&fl=ss_search_api_id,ss_search_api_language,score,hash&f.ss_name_2.facet.limit=-1&start=0&f.ss_name_1.facet.missing=false&facet.missing=false&sort=sort_X3b_en_field_version_number+desc,ds_created+desc,score+desc,its_field_product+asc,its_field_release+asc&fq=(%2Bbs_status:"true"+%2Bss_type:"book")&fq=%2Bindex_id:solr_index&fq=ss_search_api_language:("en"+"und")&rows=12&f.ss_name.facet.limit=-1&f.ss_name_2.facet.missing=false&q={!boost+b%3Dboost_document}++{!payload_score+f%3Dboost_term+v%3D"dialblo"+func%3Dmax}+(tm_X3b_en_body:("dialblo")^1+tm_X3b_und_body:("dialblo")^1+tm_X3b_en_title:("dialblo")^8+tm_X3b_und_title:("dialblo")^8)&facet.limit=10&omitHeader=true&facet.mincount=1&wt=json&facet=true&facet.sort=count}
>  hits=0 status=0 QTime=0
>
>
>
>
>
> Though both the words are present in my same content.
>
>
>
> Can you suggest why this issue is showing up? How to debug this? and how
> to fix this?
>
>
>
> Note that everything is indexed
>
>
>
> I have attached config.zip file also
>
>
>
> Thanks
>
> Raj
>
> *Disclaimer:*
> This communication (including any attachments) is intended for the use of
> the intended recipient(s) only and may contain information that is
> considered confidential, proprietary, sensitive and/or otherwise legally
> protected. Any unauthorized use or dissemination of this communication is
> strictly prohibited. If you have received this communication in error,
> please immediately notify the sender by return e-mail message and delete
> all copies of the original communication. Thank you for your cooperation.
>


-- 
Sincerely yours
Mikhail Khludnev


Prevent Loss of Documents after Implicit Sharding

2023-11-29 Thread Saksham Gupta
Hi Solr Developers,

Problem Statement

We have been using solr cloud with implicit sharding. The data of the
collection was divided into 8 shards. In order to reduce the response time,
we thought of sharding the data further.

Therefore we planned on sharding the solr data into 56 shards to reduce
response time. According to this sharding strategy, one of the values of a
multivalued field is being used to decide the shard of the document.

But this has led to loss of documents.

How is the loss Happening? Explaining the problem with an example:

Consider 3 solr Documents:

Doc1

{

FieldA: id21, id29, id60P;

Field2: val2;

}

Doc2

{

FieldA: id19, id9, id8P;

Field2: val1;

}

Doc1

{

FieldA: id101, id29, id108P;

Field2: val4;

}

While Querying on Solr:

Let’s consider the Query---  fq=FieldA: id21+id8+id108;

According to previous sharding, Doc1, Doc2, & Doc3 will be returned in the
results as the filter query matches with at least one values present in
each document i.e. id21 in Doc1, id8 in Doc2 and id108 in Doc3.


According to the new sharding, only Doc2 and Doc3 will be returned and Doc1
will not be included in results because the query will be routed only to
the shards corresponding to values present in filter query i.e.
shard21,shard8,shard108 and Doc1 is present on shard60.

INDEXING


QUERYING ON THIS COLLECTION

And our query won’t even go to the shard that contains document1.
Therefore, document1 will not be returned in the results.

Probable Solutions

To deal with this, we can index the same document on multiple shards based
on all the values of the field. But handling indexing/deletion if the
values of this field is changed would be very complicated. So, this index
can be very complex to maintain.

Is this the most optimal way or is there a better way to achieve the goal
and avoid losing any documents?


Re: Help Solr Newsletter October 2023 with links, blogs, articles

2023-11-29 Thread David Smiley
Alejandro,

How's the newsletter project going?  I see October is still in DRAFT state
and I see a new one for November.

Perhaps you were discouraged by the Twitter account status matter; I
recommend completely ignoring that!  When it's time to publish, we'll
publish.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Tue, Oct 31, 2023 at 8:23 AM Arrieta, Alejandro <
aarri...@perrinsoftware.com> wrote:

> Hello Team :-)
>
> Thanks for your articles.
> I will work on the newsletter later today adding the last few days of news
> and will ping here if someone wants to do the last-minute check.
>
> Kind Regards,
> Alejandro Arrieta
>
> On Tue, Oct 31, 2023 at 7:35 AM Lisa Biella  wrote:
>
> > Hi Alejandro,
> >
> > I will follow up on the other email to add some cool stuff we are going
> to
> > do in November!
> >
> >
> > *Meetups or conferences that will take place in November:*We run the
> London
> > Information Retrieval Meetup, a free event that this time will take place
> > in London as a satellite event to the annual Search Solution conference.
> > You can read more here:
> >
> >
> https://www.meetup.com/london-information-retrieval-meetup-group/events/297065775
> > This one is scheduled for the 20th of November.
> > --- If you want to take a look at past talks we had, you can check our
> > YouTube channel here:
> > https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ
> > one of the most recent ones is about our contribution to Lucene (Word2Vec
> > Model To Generate Synonyms on the Fly in Apache Lucene):
> > https://www.youtube.com/watch?v=CeLTxnXq1CY&t=77s
> >
> > *Blog posts*
> > We publish a lot about Apache Solr/Lucene or search in general. You can
> > find here our blog: https://sease.io/our-blog
> > If you are looking for solr-related articles, here is the page you are
> > looking for: https://sease.io/?s=solr
> > Talking about relevant and recent posts, I would go with these:
> > -
> >
> >
> https://sease.io/2023/10/apache-lucene-solr-ai-roadmap-do-you-want-to-make-it-happen.html
> > (Apache Lucene/Solr AI Roadmap – Do You Want to Make It Happen?)
> > -
> >
> >
> https://sease.io/2023/10/apache-lucene-solr-the-top-10-pain-points-community-over-code-2023-edition.html
> > (Apache Lucene/Solr: the Top 10 Pain Points – Community Over Code 2023
> > Edition)
> > -
> >
> >
> https://sease.io/2023/02/benchmark-apache-solr-performance-with-apache-jmeter.html
> > (Benchmark Apache Solr Performance with Apache JMeter)
> > - https://sease.io/2023/01/apache-solr-neural-search-tutorial.html
> (Apache
> > Solr Neural Search Tutorial)
> > -
> >
> >
> https://sease.io/2022/12/impact-of-large-stored-fields-on-apache-solr-query-performance.html
> > (Impact of Large Stored fields on Apache Solr Query Performance)
> >
> > *Tools compatible with Solr*
> > You may already know this, but we publish two tools for search quality
> > evaluation that are fully compatible with Solr.
> > I'm talking about:
> > Rated Ranking Evaluator (
> > https://github.com/SeaseLtd/rated-ranking-evaluator
> > )
> > Rated Ranking Evaluator Enterprise (
> > https://sease.io/rated-ranking-evaluator-enterprise)
> >
> > Plus, we just announced the new Apache Solr Neural Highlighting Plugin, a
> > plugin that empowers your search engine to identify the most relevant
> > paragraphs for a query, right within the search results. (
> > https://sease.io/apache-solr-neural-highlighting-plugin)
> >
> > *Other*
> > I think Alessandro has already mentioned our efforts to improve the
> search
> > community by launching a new Information Retrieval Forum!
> > The forum is pretty much ready, and here is the link (we'll start
> promoting
> > it these days!): https://ir-relevant.net/
> > Anyone who would like to get involved in this project is very welcome!
> >
> >
> > Hope this has been of help to you with the newsletter!
> >
> >
> > *Lisa Biella*Digital Marketing Manager
> > e-mail:
> > *l.bie...@sease.io *
> >
> > *Sease* - Information Retrieval Applied
> > Consulting | Training | Open Source
> >
> > Website: Sease.io
> > LinkedIn  | Twitter
> >  | Youtube
> >  | Github
> > 
> >
> >
> > [image: Mailtrack]
> > <
> >
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&;
> > >
> > Sender
> > notified by
> > Mailtrack
> > <
> >
> https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality11&;
> > >
> > 10/31/23,
> > 11:32:52 AM
> >
> > On Thu, Oct 26, 2023 at 9:03 PM Alessandro Benedetti <
> a.benede...@sease.io
> > >
> > wrote:
> >
> > > Plenty of cool stuff!
> > > I'll have our digital marketing manager @Lisa Biella <
> l.bie...@sease.io>
> > who
> > > reads in copy to add our part as well!
> > >
> > > Cheers
> > > --
> > > *Alessandro Benedetti*
> > > Director @ 

Re: Prevent Loss of Documents after Implicit Sharding

2023-11-29 Thread Saksham Gupta
Hi All,
Pinging again for some assistance.

On Wed, Nov 29, 2023 at 7:11 PM Saksham Gupta 
wrote:

> Hi Solr Developers,
>
> Problem Statement
>
> We have been using solr cloud with implicit sharding. The data of the
> collection was divided into 8 shards. In order to reduce the response time,
> we thought of sharding the data further.
>
> Therefore we planned on sharding the solr data into 56 shards to reduce
> response time. According to this sharding strategy, one of the values of a
> multivalued field is being used to decide the shard of the document.
>
> But this has led to loss of documents.
>
> How is the loss Happening? Explaining the problem with an example:
>
> Consider 3 solr Documents:
>
> Doc1
>
> {
>
> FieldA: id21, id29, id60P;
>
> Field2: val2;
>
> }
>
> Doc2
>
> {
>
> FieldA: id19, id9, id8P;
>
> Field2: val1;
>
> }
>
> Doc1
>
> {
>
> FieldA: id101, id29, id108P;
>
> Field2: val4;
>
> }
>
> While Querying on Solr:
>
> Let’s consider the Query---  fq=FieldA: id21+id8+id108;
>
> According to previous sharding, Doc1, Doc2, & Doc3 will be returned in
> the results as the filter query matches with at least one values present in
> each document i.e. id21 in Doc1, id8 in Doc2 and id108 in Doc3.
>
>
> According to the new sharding, only Doc2 and Doc3 will be returned and
> Doc1 will not be included in results because the query will be routed
> only to the shards corresponding to values present in filter query i.e.
> shard21,shard8,shard108 and Doc1 is present on shard60.
>
> INDEXING
>
>
> QUERYING ON THIS COLLECTION
>
> And our query won’t even go to the shard that contains document1.
> Therefore, document1 will not be returned in the results.
>
> Probable Solutions
>
> To deal with this, we can index the same document on multiple shards based
> on all the values of the field. But handling indexing/deletion if the
> values of this field is changed would be very complicated. So, this index
> can be very complex to maintain.
>
> Is this the most optimal way or is there a better way to achieve the goal
> and avoid losing any documents?
>
>