Improper Solr Search results

2022-11-21 Thread Raj Krishna
Hi solr team,

The solr search is not showing up the proper results.

Here is what I am looking for:

Scenerio1
Let's say, I searched for "ABC DEF" with Contains all of these words 
configuration.
Result I get:
...ABCDEF.
...DEF...ABC.
...DEF..
...ABC

Expected Result:
..ABC DEF...

In scenerio1, in some cases when I go  to the actual page of the partial search 
results (let's say 3rd one). I get the exact match in some different line, not 
the excerpt which is displayed in the result.

Scenerio2
Let's say, I searched for "ABC DEF" with Contains all of these words 
configuration.
Result I get:
...DEF..
...ABC

Expected Result:
..ABC DEF...

In Scenerio2, I don't even get the exact match.




Here are the settings what I have used.



1.Home
2. Administration
3. Configuration
4. Search and Metadata
5. Search API
6. Solr index
7. Solr index
Index name Machine name: solr_index
Enter the displayed name for the index.
Machine-readable name
A unique machine-readable name. Can only contain lowercase letters, numbers, 
and underscores.
Datasources
 Comment
Provides Comment entities for indexing and searching.
 Contact message
Provides Contact message entities for indexing and searching.
 Content
Provides Content entities for indexing and searching.
 Content moderation state
Provides Content moderation state entities for indexing and searching.
 Custom block
Provides Custom block entities for indexing and searching.
 Custom menu link
Provides Custom menu link entities for indexing and searching.
 File
Provides File entities for indexing and searching.
 Media
Provides Media entities for indexing and searching.
 Search task
Provides Search task entities for indexing and searching.
 Shortcut link
Provides Shortcut link entities for indexing and searching.
 Simplenews subscriber
Provides Simplenews subscriber entities for indexing and searching.
 Solr Document
Search through external Solr content. (Only works if this index is attached to 
a Solr-based server.)
 Solr Multisite Document
Search through a different site's content. (Only works if this index is 
attached to a Solr-based server.)
 Taxonomy term
Provides Taxonomy term entities for indexing and searching.
 URL alias
Provides URL alias entities for indexing and searching.
 User
Provides User entities for indexing and searching.
 Webform submission
Provides Webform submission entities for indexing and searching.
 Workflow scheduled transition
Provides Workflow scheduled transition entities for indexing and searching.
 Workflow transition
Provides Workflow transition entities for indexing and searching.
Select one or more datasources of items that will be stored in this index.
CONFIGURE THE CONTENT DATASOURCE
BUNDLESLANGUAGES
CONFIGURE THE DEFAULT TRACKER
Default index tracker which uses a simple database table for tracking items.
Indexing order
 Index items in the same order in which they were saved
 Index the most recent items first
The order in which items will be indexed.
Server
 - No server -
 solr index server
Select the server this index should use. Indexes cannot be enabled without a 
connection to a valid, enabled server.
 Enabled
Only enabled indexes can be used for indexing and searching. This setting will 
only take effect if the selected server is also enabled.
Description
Enter a description for the index.
INDEX OPTIONS
 Read only
Do not write to this index or track the status of items in this index.
 Index items immediately
Immediately index new or updated items instead of waiting for the next cron 
run. This might have serious performance drawbacks and is generally not advised 
for larger sites.
 Track changes in referenced entities
Automatically queue items for re-indexing if one of the field values indexed 
from entities they reference is changed. (For instance, when indexing the name 
of a taxonomy term in a Content index, this would lead to re-indexing when the 
term's name changes.) Enabling this setting can lead to performance problems on 
large sites when saving some types of entities (an often-used taxonomy term in 
our example). However, when the setting is disabled, fields from referenced 
entities can go stale in the search index and other steps should be taken to 
prevent this.
Cron batch size
Set how many items will be indexed at once when indexing items during a cron 
run. "0" means that no items will be indexed by cron for this index, "-1" means 
that cron should index all items at once.
SOLR SPECIFIC INDEX OPTIONS
 Finalize index before first search
If enabled, other modules could hook in to apply "finalizations" to the index 
after updates or deletions happend to index items.
MULTILINGUAL
 Limit to current content language.
Limit all search results for custom queries or search pages not managed by 
Views to current content language if no langu

Re: Commit Process

2022-11-21 Thread Maulin Rathod
Thanks Alessandro...
I will send mail in this group to review a Pull request so committers can
review it. .




On Sun, Nov 20, 2022 at 11:07 PM Alessandro Benedetti 
wrote:

> After point 5 you need to draw the attention of one or more free committers
> that will need to review the pull request.
> If they agree the contribution is valid and in an acceptable form, the code
> will be merged.
>
> This will require some time, especially for the first contributions.
> The more you become familiar with the process, the higher the quality of
> your code and the happier committers with your contributions the faster the
> end to end process.
>
> After some time of valuable contributions, support and work in the
> community, the Apache Solr PMC will invite you as committer :)
>
>
>
>
> On Sun, 20 Nov 2022, 14:31 Maulin Rathod,  wrote:
>
> > Hi,
> >
> > For our requirement, we have implemented PageStreaming Decorator to allow
> > results to be displayed with pagination. We want to commit it to the Solr
> > main branch.
> >
> > Do I need any permission for commiting?
> >
> > I understand we need to perform the following steps to commit it in
> solr..
> >
> > 1) Create solr Jira ticket.
> > 2) Fork a new feature branch from the main branch.
> > 3) Implement code..
> > 4) Run "gradlew check" to ensure all tests are working fine.
> > 5) Raise PR for this.
> >
> > Please let me know if anything else I need to take care of..
> >
> > Regards,
> >
> > Maulin
> >
>


-- 
Regards,

Maulin Rathod
Development Director
Asite Solution Pvt Ltd.

M: 9723286945
E : mnrat...@gmail.com
W: www.asite.com


Apache Solr is vulnerable to CVE-2022-39135 via /sql handler

2022-11-21 Thread David Smiley
Vendor:

  The Apache Software Foundation


Versions Affected:

  Solr 6.5 to 8.11.2

  Solr 9.0


Description:

  Apache Calcite has a vulnerability, CVE-2022-39135, that is exploitable
in Apache Solr in SolrCloud mode.  If an untrusted user can supply SQL
queries to Solr’s “/sql” handler (even indirectly via proxies / other
apps), then the user could perform an XML External Entity (XXE) attack.  This
might have been exposed by some deployers of Solr in order for internal
analysts to use JDBC based tooling, but would have unlikely been granted to
wider audiences.


Impact:

  An XXE attack may lead to the disclosure of confidential data, denial of
service, server side request forgery (SSRF), port scanning from the Solr
node, and other system impacts.


Mitigation:

  Most Solr installations don’t make use of the SQL functionality.  For
such users, the standard Solr security advice of using a firewall should be
adequate.  Nonetheless, the functionality can be disabled.  As of Solr 9,
it has been modularized and thus became opt-in, so nothing is needed for
Solr 9 users that don’t use it.  Users *not* using SolrCloud can’t use the
functionality at all.  For other users that wish to disable it, you must
register a request handler that masks the underlying functionality in
solrconfig.xml like so:

  


  Users needing this SQL functionality are forced to upgrade to Solr 9.1.
If Solr 8.11.3 is released, then it will be an option as well.  Simply
replacing Calcite and other JAR files may mostly work but could fail
depending on the particulars of the query.  Users interested in this or in
patching their own versions of Solr should examine SOLR-16421 for a source
patch.


Credit:

  Andreas Hubold at CoreMedia GmbH


References:

https://nvd.nist.gov/vuln/detail/CVE-2022-39135

https://issues.apache.org/jira/browse/SOLR-16421


Re: accessing an array of objects in a JSON payload.

2022-11-21 Thread Matthew Castrigno
So, one answer here is to use the split parameter. This creates multiple 
documents which I believe is the intention.

However, this reveals a number of really strange behaviors when used in 
combination with a script in the updateRequest ProcesseserChain

It appears that processAdd(cmd) is run for each document created but not the 
other functions.

The problem that arises is in attempting to find fields to add in all these 
documents. If I save them in the scope of the overall script they get saved, 
however if I attempt to add them in another of the required functions, that is 
only run once (oddly enough) the addField method cannot be accessed, it throws 
an error thinking it is a property. If I add the field in the processAdd() 
function, they get added n! times because the dictionary gets bigger each time. 
If I attempt to add just one field for each time  processAdd() runs only the 
last field actually gets added.

I am getting the result that I wanted but really would like to know how to make 
this happen with adding the fields n! times!

In the end, I have one document added that has my new fields found and 
processed in the script, but it seems to have very poor performance indexing 
due to the reasons above.

Is it the intention that the addField() method only accessible in the 
processAdd() function?

From: Matthew Castrigno 
Sent: Thursday, November 17, 2022 4:40 PM
To: users@solr.apache.org 
Subject: accessing an array of objects in a JSON payload.

I am attempting to write a script for the ScriptingUpdateProcessor. In the JSON 
below I want to pull out the objects in the "Fields" array and create fields 
for them based on the values provided in the object. However, if I attempt to 
iterate
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside the St. Luke's email system.

ZjQcmQRYFpfptBannerEnd

I am attempting to write a script for the ScriptingUpdateProcessor. In the JSON 
below I want to pull out the objects in the "Fields" array and create fields 
for them based on the values provided in the object.
However, if I attempt to iterate over the fields in the solrDoc like this:
doc = cmd.solrDoc;
field_names = doc.getFieldNames().toArray();
  for(i=0; i < field_names.length; i++) { ...

I will find the first item in the array but none of the subsequent ones.

How can I access this array so that my script can process it?

I attempted to  use rsp.getJSON(); but his returns null.

Thank you for any insight.



{
  "doc_id": "45",
  "content": {
"Page": {
  "Id": "2ff99d1a-a21b-4391-9c47-af2865acb753",
  "Name": "Ronald McDonald House Idaho meals",
  "Url": 
"/blogs/st-lukes/news-and-community/2021/jan/ronald-mcdonald-house-idaho-meals",
  "ContentType": "Blog",
  "Body": {
"Fields": [
  {"Name": "Heading Background Image", "Type": "Image", "Value": "" },
  {"Name": "Blog Post Name", "Type": "Single-Line Text", "Value": 
"Ronald McDonald House, St. Luke’s Children’s find new ways to help families" }
],
"Facets": ["Blogs", "Article"],
"Title": "Ronald McDonald House, St. Luke’s Children’s find new ways to 
help families",
"Summary": ""
  }
}
  }
}


  

--
"This message is intended for the use of the person or entity to which it is 
addressed and may contain information that is confidential or privileged, the 
disclosure of which is governed by applicable law. If the reader of this 
message is not the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this information is strictly 
prohibited. If you have received this message by error, please notify us 
immediately and destroy the related message."


--
"This message is intended for the use of the person or entity to which it is 
addressed and may contain information that is confidential or privileged, the 
disclosure of which is governed by applicable law. If the reader of this 
message is not the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this information is strictly 
prohibited. If you have received this message by error, please notify us 
immediately and destroy the related message."


Solr is restarting automatically

2022-11-21 Thread gnandre
Hi,

I am using Solr 8.5.2 in legacy mode (non-cloud).

Some of the Solr nodes are automatically getting restarted after a few
days. There is no clear pattern to the rebooting time. Also, no pattern in
number of incoming queries or nature of those queries. No
particular pattern in errors found in Solr logs.

I am going to turn on the debug logs to see what is happening in Solr when
it goes down. I am not able to reproduce the issue in one of our non-prod
performance testing environments.I am recreating the same traffic as prod
using access logs.

Any other ideas about how I should go about debugging or reproducing this
issue? TIA.


Re: Solr is restarting automatically

2022-11-21 Thread Shawn Heisey

On 11/21/22 15:01, gnandre wrote:

I am using Solr 8.5.2 in legacy mode (non-cloud).

Some of the Solr nodes are automatically getting restarted after a few
days. There is no clear pattern to the rebooting time. Also, no pattern in
number of incoming queries or nature of those queries. No
particular pattern in errors found in Solr logs.

I am going to turn on the debug logs to see what is happening in Solr when
it goes down. I am not able to reproduce the issue in one of our non-prod
performance testing environments.I am recreating the same traffic as prod
using access logs.

Any other ideas about how I should go about debugging or reproducing this
issue? TIA.


As shipped, if Solr dies, it will NOT restart automatically.  So that 
must have been something you added.


What OS do you have it running on?

If everything is sized correctly, Solr will never crash.  Java programs 
are VERY stable if written correctly and run with plenty of system 
resources.


On non-windows systems, Solr starts with an option that will cause it to 
commit suicide if Java's OutOfMemoryError exception is thrown.  There 
are several resource depletions that can cause OOME, and some of them 
are NOT related to memory.  This capability will not exist on Windows 
until Solr 9.2.0 is released.


https://issues.apache.org/jira/browse/SOLR-8803

Most operating systems have a process that is called an "out of memory 
killer" ... if available memory gets too low, this will find a program 
on the system that is using a lot of memory and terminate it.  On most 
installs, the process using the most memory will be Solr.


I strongly recommend NOT restarting Solr automatically if it ever dies.  
Chances are that the reason it died is because the system needs some 
attention, and restarting it is simply going to result in it dying 
again, over and over.


Thanks,
Shawn