Improper Solr Search results

Raj Krishna Mon, 21 Nov 2022 01:26:06 -0800

Hi solr team,

The solr search is not showing up the proper results.

Here is what I am looking for:

Scenerio1
Let's say, I searched for "ABC DEF" with Contains all of these words
configuration.
Result I get:
.......ABC........................DEF.........
.......DEF...........ABC.............
.......DEF......................
.......ABC............

Expected Result:
..........ABC DEF.......

In scenerio1, in some cases when I go to the actual page of the partial search
results (let's say 3rd one). I get the exact match in some different line, not
the excerpt which is displayed in the result.

Scenerio2
Let's say, I searched for "ABC DEF" with Contains all of these words
configuration.
Result I get:
.......DEF......................
.......ABC............

Expected Result:
..........ABC DEF.......

In Scenerio2, I don't even get the exact match.

Here are the settings what I have used.

1. Home
2. Administration
3. Configuration
4. Search and Metadata
5. Search API
6. Solr index
7. Solr index
Index name Machine name: solr_index
Enter the displayed name for the index.
Machine-readable name
A unique machine-readable name. Can only contain lowercase letters, numbers,
and underscores.
Datasources
Comment
Provides Comment entities for indexing and searching.
Contact message
Provides Contact message entities for indexing and searching.
Content
Provides Content entities for indexing and searching.
Content moderation state
Provides Content moderation state entities for indexing and searching.
Custom block
Provides Custom block entities for indexing and searching.
Custom menu link
Provides Custom menu link entities for indexing and searching.
File
Provides File entities for indexing and searching.
Media
Provides Media entities for indexing and searching.
Search task
Provides Search task entities for indexing and searching.
Shortcut link
Provides Shortcut link entities for indexing and searching.
Simplenews subscriber
Provides Simplenews subscriber entities for indexing and searching.
Solr Document
Search through external Solr content. (Only works if this index is attached to
a Solr-based server.)
Solr Multisite Document
Search through a different site's content. (Only works if this index is
attached to a Solr-based server.)
Taxonomy term
Provides Taxonomy term entities for indexing and searching.
URL alias
Provides URL alias entities for indexing and searching.
User
Provides User entities for indexing and searching.
Webform submission
Provides Webform submission entities for indexing and searching.
Workflow scheduled transition
Provides Workflow scheduled transition entities for indexing and searching.
Workflow transition
Provides Workflow transition entities for indexing and searching.
Select one or more datasources of items that will be stored in this index.
CONFIGURE THE CONTENT DATASOURCE
BUNDLESLANGUAGES
CONFIGURE THE DEFAULT TRACKER
Default index tracker which uses a simple database table for tracking items.
Indexing order
Index items in the same order in which they were saved
Index the most recent items first
The order in which items will be indexed.
Server
- No server -
solr index server
Select the server this index should use. Indexes cannot be enabled without a
connection to a valid, enabled server.
Enabled
Only enabled indexes can be used for indexing and searching. This setting will
only take effect if the selected server is also enabled.
Description
Enter a description for the index.
INDEX OPTIONS
Read only
Do not write to this index or track the status of items in this index.
Index items immediately
Immediately index new or updated items instead of waiting for the next cron
run. This might have serious performance drawbacks and is generally not advised
for larger sites.
Track changes in referenced entities
Automatically queue items for re-indexing if one of the field values indexed
from entities they reference is changed. (For instance, when indexing the name
of a taxonomy term in a Content index, this would lead to re-indexing when the
term's name changes.) Enabling this setting can lead to performance problems on
large sites when saving some types of entities (an often-used taxonomy term in
our example). However, when the setting is disabled, fields from referenced
entities can go stale in the search index and other steps should be taken to
prevent this.
Cron batch size
Set how many items will be indexed at once when indexing items during a cron
run. "0" means that no items will be indexed by cron for this index, "-1" means
that cron should index all items at once.
SOLR SPECIFIC INDEX OPTIONS
Finalize index before first search
If enabled, other modules could hook in to apply "finalizations" to the index
after updates or deletions happend to index items.
MULTILINGUAL
Limit to current content language.
Limit all search results for custom queries or search pages not managed by
Views to current content language if no language is specified in the query.
Include language independent content in search results.
This option will include content without a language assigned in the results of
custom queries or search pages not managed by Views. For example, if you search
for English content, but have an article with languague of "undefined", you
will see those results as well. If you disable this option, you will only see
content that matches the language.
HIGHLIGHTER
If "Retrieve result data from Solr" and "Highlight retrieved data" are selected
for the Solr backend on the server edit page, these highlighting settings will
be used.
maxAnalyzedChars
Specifies the number of characters into a document that Solr should look for
suitable snippets.
fragmenter
Specifies a text snippet generator for highlighted text. The standard
fragmenter is gap, which creates fixed-sized fragments with gaps for
multi-valued fields. Another option is regex, which tries to create fragments
that resemble a specified regular expression. This parameter accepts per-field
overrdes.
REGEX
regex.slop
When using the regex fragmenter, this parameter defines the factor by which the
fragmenter can stray from the ideal fragment size (given by fragsize) to
accommodate a regular expression. For instance, a slop of 0.2 with fragsize=100
should yield fragments between 80 and 120 characters in length. It is usually
good to provide a slightly smaller fragsize value when using the regex
fragmenter.
regex.pattern
Specifies the regular expression for fragmenting. This could be used to extract
sentences.
regex.maxAnalyzedChars
Instructs Solr to analyze only this many characters from a field when using the
regex fragmenter (after which, the fragmenter produces fixed-sized fragments).
Applying a complicated regex to a huge field is computationally expensive.
usePhraseHighlighter
If set, Solr will highlight phrase queries (and other advanced
position-sensitive queries) accurately. If false, the parts of the phrase will
be highlighted everywhere instead of only when it forms the given phrase.
highlightMultiTerm
If set, Solr will highlight wildcard queries (and other MultiTermQuery
subclasses). If false, they won't be highlighted at all.
preserveMulti
If set, multi-valued fields will return all values in the order they were saved
in the index. If false, only values that match the highlight request will be
returned.
mergeContiguous
Instructs Solr to collapse contiguous fragments into a single fragment. A value
of true indicates contiguous fragments will be collapsed into single fragment.
This parameter accepts per-field overrides. The default value, false, is also
the backward-compatible setting.
requireFieldMatch
If set, highlights terms only if they appear in the specified field. If not
set, terms are highlighted in all requested fields regardless of which field
matched the query.
snippets
Specifies maximum number of highlighted snippets to generate per field. It is
possible for any number of snippets from zero to this value to be generated.
This parameter accepts per-field overrides.
fragsize
Specifies the size, in characters, of fragments to consider for highlighting. 0
indicates that no fragmenting should be considered and the whole field value
should be used. This parameter accepts per-field overrides.
MLT (MORELIKETHIS)TERM MODIFIERSADVANCED

Manage processors for search index Solr index
Add to Default
shortcuts<https://docs.support.sandvine.com/admin/config/user-interface/shortcut/manage/default/add-link-inline?link=admin/config/search/search-api/index/solr_index/processors&name=Manage%20processors%20for%20search%20index%20Solr%20index&destination=/admin/config/search/search-api/index/solr_index/processors&token=IXOY03csEq7siIRPM6iA8innjeB_U7l08-neAjqibSk>
Primary tabs

* View
* Edit
* Fields
* Processors(active tab)
Breadcrumb
1. Home
2. Administration
3. Configuration
4. Search and Metadata
5. Search API
6. Solr index
7. Solr index

Configure processors which will pre- and post-process data at index and search
time. Find more information on the processors documentation page.
ENABLED
Boost more recent dates
Boost more recent documents and penalize older documents.
Content access
Adds content access checks for nodes and comments.
Double Quote Workaround
Replaces double quotes in field values and query to work around a bug in Solr
streaming expressions.
Entity status
Exclude inactive users and unpublished entities (which have a "Published"
state) from being indexed.
Highlight
Adds a highlighted excerpt to results and highlights returned fields.
HTML filter
Strips HTML tags from fulltext fields and decodes HTML entities. Use this
processor when indexing HTML data - for example, node bodies for certain text
formats. The processor also allows to boost (or ignore) the contents of
specific elements.
Ignore case
Makes searches case-insensitive on selected fields.
It is recommended not to use this processor with the selected server.
Ignore characters
Configure types of characters which should be ignored for searches.
Index hierarchy
Allows the indexing of values along with all their ancestors for hierarchical
fields (like taxonomy term references)
Number field-based boosting
Adds a boost to indexed items based on the value of a numeric field.
Regular expression based replacements
Regular expression based replacements.
Reverse entity references
Allows indexing of entities that link to the indexed entity.
Role-based access
Adds an access check based on a user's roles. This may be sufficient for sites
where access is primarily granted or denied based on roles and permissions. For
grants-based access checks on "Content" or "Comment" entities the "Content
access" processor may be a suitable alternative.
Solr dummy fields
Adds dummy fields to all datasources to register a pseudo field names that get
their values via API, for example hook_search_api_solr_documents_alter().
Stemmer
Stems search terms (for example, talking to talk). Currently, this only acts on
English language content. It uses the Porter 2 stemmer algorithm (More
information). For best results, use after tokenizing.
It is recommended not to use this processor with the selected server.
Stopwords
Allows you to define stopwords which will be ignored in searches. Caution: Only
use after both 'Ignore case' and 'Tokenizer' have run.
It is recommended not to use this processor with the selected server.
Tokenizer
Splits text into individual words for searching.
It is recommended not to use this processor with the selected server.
Transliteration
Makes searches insensitive to accents and other non-ASCII characters.
It is recommended not to use this processor with the selected server.
Type-specific boosting
Adds a boost to indexed items based on their datasource and/or bundle.
PROCESSOR ORDER
PREPROCESS INDEX
Show row weights
<https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors>

<https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors>
HTML filter

PREPROCESS QUERY
Show row weights
<https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors>

<https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors>
HTML filter

<https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors>

<https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors>
Content access

<https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors>

<https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors>
Boost more recent dates

POSTPROCESS QUERY
Show row weights
<https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors>

<https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors>
Highlight

Processor settings
* Boost more recent datesEnabled
* HighlightEnabled(active tab)
* HTML filterEnabled
Highlight returned field data
Select whether returned fields should be highlighted.
Highlight partial matches
When enabled, matches in parts of words will be highlighted as well.
Create excerpt
When enabled, an excerpt will be created for searches with keywords, containing
all occurrences of keywords in a fulltext field.
Create excerpt even if no search keys are available
When enabled, an excerpt will be created even with an empty query string.
Excerpt length
The requested length of the excerpt, in characters
Exclude fields from excerpt
Body (body)
Title (title)
Exclude certain fulltext fields from being included in the excerpt.
Highlighting prefix
Text/HTML that will be prepended to all occurrences of search keywords in
highlighted text
Highlighting suffix
Text/HTML that will be appended to all occurrences of search keywords in
highlighted text

Please Triage on this issue.
Feel free to ask for more clarity and details regarding this from my side.

Thanks
Raj

Disclaimer:
This communication (including any attachments) is intended for the use of the
intended recipient(s) only and may contain information that is considered
confidential, proprietary, sensitive and/or otherwise legally protected. Any
unauthorized use or dissemination of this communication is strictly prohibited.
If you have received this communication in error, please immediately notify the
sender by return e-mail message and delete all copies of the original
communication. Thank you for your cooperation.

Improper Solr Search results

Reply via email to