Re: Solr throws errors on empty fields on ingestion

2025-03-19 Thread Colvin Cowie
Hello,

re the "400 OK". I don't see that happening myself locally, I have the
correct "Bad Request" status line when making requests directly to the
/update handler.
Perhaps it's an issue in Solarium-PHP?


On Wed, 19 Mar 2025 at 13:26, Ehrenleitner Robert Harald <
robert.ehrenleit...@plus.ac.at> wrote:

> Hi,
>
> that was fast.
>
> Actually, I see that the documents which do not have a title are also
> missing in the index of the older Solr version which is still fed by the
> older version of Solarium-PHP. So, probably the newer version of
> Solarium-PHP exposes an error which was there before but was not logged. I
> don't want to check this now.
>
> As a side node: It seems like Solr responds with HTTP status "400 OK",
> which is not a good idea. It should be "400 Invalid request".
>
> Thanks for the advice with the filename, that's a good idea. I will modify
> the crawler to fallback to the slug (special term from WordPress) or to the
> filename if the title is empty.
>
> Kind regards,
>
>
>
>
> Mag.phil. Robert Ehrenleitner, BEng.
> --
>
> Mag.phil. Robert Ehrenleitner, BEng.
>
> Web-Developer
>
> IT-Services | Application & Digitalization Services
>
> Hellbrunner Straße 34 | 5020 Salzburg | Austria
>
> Tel.: +43/(0)662/8044 - 6778
>
> *www.plus.ac.at *
>
>
>
> --
> *Von:* Colvin Cowie 
> *Gesendet:* Mittwoch, 19. März 2025 11:51
> *An:* users@solr.apache.org 
> *Betreff:* Re: Solr throws errors on empty fields on ingestion
>
> [Sie erhalten nicht häufig E-Mails von colvin.cowie@gmail.com.
> Weitere Informationen, warum dies wichtig ist, finden Sie unter
> https://aka.ms/LearnAboutSenderIdentification ]
>
> Required fields need non-empty values, as far as I know there's no
> exceptions to that.
>
> Take this from the UX/end user perspective. If a document has no title, or
> an empty title, what does a user expect to see and do with that?
> If they expect to see *something* then yes I think you should insert a
> suitable default or a fallback value like the file name or url.
> If they don't expect to see something (and you can't always provide a
> title), then the title shouldn't be marked as required.
>
> On Wed, 19 Mar 2025 at 10:03, Ehrenleitner Robert Harald <
> robert.ehrenleit...@plus.ac.at> wrote:
>
> >
> >
> > Hi all,
> >
> > we have a crawler built on our own based on Solarium-PHP which ingests
> > Solr. Since I have upgraded from 9.6.1 to 9.8.0, I see errors in the log
> of
> > the crawler. It tells me that Solr complains that the field "title" is
> > missing. Acutally, it is part of the request, but it's just empty.
> >
> > This is a snippet of the request body (for this to be output, I have
> > inserted a var_dump() in an appropriate place of Solarium-PHP):
> >
> > Content-Disposition: form-data; name="literal.publishDate"
> > Content-Type: text/plain;charset=UTF-8
> >
> > 2023-01-12T10:25:06Z
> > --0280
> > Content-Disposition: form-data; name="literal.title"
> > Content-Type: text/plain;charset=UTF-8
> >
> >
> > --0280
> > Content-Disposition: form-data; name="literal.number"
> >
> > And this is the response:
> >
> > Error indexing document 14935: wp-content/uploads/loremipsum.pdf: Solr
> > HTTP error: OK (400)
> > {
> >   "responseHeader":{
> > "status":400,
> > "QTime":121
> >   },
> >   "error":{
> >
> >
> "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],
> > "msg":"[doc=141396] missing required field: title",
> > "code":400
> >   }
> > }
> >
> > I cannot fix the PDF file having no title (for various non-technical
> > reasons), nevertheless it was working fine until before the upgrade.
> >
> > The schema was created with this JSON data, especially its title field:
> > {
> > /* something left out here */
> > {
> > "name": "title",
> > "type": "text_general",
> > "stored": true,
> > "indexed": true,
> > "multiValued": false,
> > "required": true
> > },
> > /* something left out here */
> > }
> >
> > The document is not being indexed.
> >
> > How can I fix this? Is there probably something in the schema (JSON data)
> > I have to change? Or is it better to replace empty titles with some
> > constant non-empty string (this can be done in the crawler)?
> >
> > I have noticed that in the documentation regarding the field option
> > "required", it says:
> >
> > Instructs Solr to reject any attempts to add a document which does not
> > have a value for this field. This property defaults to false.
> >
> > This is ambiguous for me. What is meant with "does not have a value?"
> > Well, the value is present but it is an empty string.
> >
> > Kind regards,
> >
> > Mag.phil. Robert Ehrenleitner, BEng.
> > --
> >
> > Mag.phil. Robert Ehrenleitner, BEng.
> >
> > Web-Developer
> >
> > IT-Services | Application & Digita

Re: Solr upgrade from 8.11 to 9.x (latest if possible)

2025-03-19 Thread Jan Høydahl
Hi,

It's fairly well described in the reference guide
https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-9.html

Normally we recommend not to "upgrade" your existing nodes on a major upgrade, 
but instead stand up a clean new 9.x cluster/server and do a full re-index.

It could, given certain conditions, be possible to do an in-place / rolling 
upgrade of an 8.11 cluster to 9.x while keeping the index, but it only works if 
your 8.x index was first created with Solr 8, and there are some gotchas 
detailed in the upgrade notes. I'd go with a fresh install. And of course 
simulate the upgrade locally, then in a test environment etc.

Feel free to ask more specific questions once you have decided on your strategy.

Jan


> 19. mars 2025 kl. 00:53 skrev AlvaradoAlonso, Francisco 
> :
> 
> Hello,
> 
> Does anyone have the specific steps to upgrade from Solr 8.11 to 9.x? I've 
> seen that part of the process is extracting the new tgz/zip file into a new 
> location and copying over the configuration files and data files from current 
> 8.11 version, but I haven't found any official tutorial with these steps.
> 
> We currently have the On-Prem version in some of our servers, one of them 
> with a single instance and the others are using leader/follower configuration.
> 
> Thank you!
> 
> Regards,
> 
> Francisco Javier Alvarado Alonso
> JAVA Apps Dev Sr  - BT West
> Gulfstream Aerospace
> Mexicali, Baja California 21376
> Work Cellphone: +52 686 221 8824
> 
> This e-mail message, including all attachments, is for the sole use of the 
> intended recipient(s) and may contain Personal Information under General 
> Dynamics policy CP 07-105 and/or legally privileged and confidential 
> information. Any Personal Information can be accessed only by authorized 
> personnel of General Dynamics and its approved service providers and may be 
> used only as permitted by General Dynamics and its policies. Contractual 
> restrictions apply to third parties. If you are not an intended recipient, 
> you are hereby notified that you have either received this message in error 
> or through interception, and that any review, use, distribution, copying or 
> disclosure of this message or its attachments is strictly prohibited and is 
> subject to criminal and civil penalties. All personal messages express solely 
> the sender's views and not those of Gulfstream Aerospace Corporation. If you 
> received this message in error, please contact the sender by reply e-mail and 
> destroy all copies of the original message.



Re: Solr upgrade from 8.11 to 9.x (latest if possible)

2025-03-19 Thread Corrado Fiore

Dear All,

On 19 Mar 2025, at 11:08, Jan Høydahl wrote:

Normally we recommend not to "upgrade" your existing nodes on a major 
upgrade, but instead stand up a clean new 9.x cluster/server and do a 
full re-index.


I second that.  We did a similar upgrade recently (from a stand-alone 
8.5.2 to SolrCloud 9.8.1), so I think I can share a few tips:


1. Very likely, you will need to upgrade your configuration files and 
port (merge) any existing modifications into the 9.8.1 default 
configuration file.  This alone is a good reason to start fresh with a 
new install IMHO.


2. Re: full re-index, please note that the source can be the your 
existing 8.11 node.  Instead of trying to copy and upgrade the Lucene 
index files, you can do a “logical” transfer, meaning that you can 
create a client script to read data from the source Solr in JSON format 
in batches of, say, 500-1000 documents each and feed those documents 
into the destination Solr as a regular INSERT.  It is quite a robust 
method as any indexing errors will be immediately visible.


3. The re-indexing will also allow you to apply changes in the schema if 
needed (e.g. some fields might benefit from using DocValues).


My 2 cents :-)

Kind regards,
Corrado

Re: Solr 8 Reference Guide Not Loading Properly

2025-03-19 Thread Alexandre Rafalovitch
Loading CSS from 3rd party sites:
https://content-security-policy.com/examples/blocked-csp/


Console error:
a-quick-overview.html:13 Refused to load the stylesheet '
https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css'
because it violates the following Content Security Policy directive:
"style-src 'self' 'unsafe-inline' data:". Note that 'style-src-elem' was
not explicitly set, so 'style-src' is used as a fallback.

a-quick-overview.html:14 Refused to load the stylesheet '
https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css'
because it violates the following Content Security Policy directive:
"style-src 'self' 'unsafe-inline' data:". Note that 'style-src-elem' was
not explicitly set, so 'style-src' is used as a fallback.


On Wed, 19 Mar 2025 at 17:36, Chris Hostetter 
wrote:

>
> : I'm not able to see any article text on the webpages for Solr Reference
> : Guide versions 8.6 through 8.11 (e.g.
> : https://solr.apache.org/guide/8_11/a-quick-overview.html), and my
> : colleagues are reporting the same issue. Did something break the
> : rendering of these pages?
>
> Hmmm, good question -- I can reproduce what you describe on multiple
> browsers, and whatever the problem is seems to be related to CSS -- with
> all stylesheets disabled the contents of all the pages are there.
>
>
> Any HTML/CSS experts out there who can help debug this?
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Solr 8 Reference Guide Not Loading Properly

2025-03-19 Thread Chris Hostetter


: I'm not able to see any article text on the webpages for Solr Reference 
: Guide versions 8.6 through 8.11 (e.g. 
: https://solr.apache.org/guide/8_11/a-quick-overview.html), and my 
: colleagues are reporting the same issue. Did something break the 
: rendering of these pages?

Hmmm, good question -- I can reproduce what you describe on multiple 
browsers, and whatever the problem is seems to be related to CSS -- with 
all stylesheets disabled the contents of all the pages are there.


Any HTML/CSS experts out there who can help debug this?



-Hoss
http://www.lucidworks.com/


index source code with solr combined with a SAST for cyber security purpose

2025-03-19 Thread anon anon
Hello,

I wanted to combine the power of a search engine with a SAST. I already
started with Zoekt. I want to know if I should fork zoekt or solr. I am
wondering:

- If I can rewrite it in solr for more maintainability
- if I SHOULD actually maintain zoekt instead of solar in order to not have
to reinvent the whell (exactly like solr)
- AND THE MOST IMPORTANT: would it be easier to implement, maintain and use
the SAST from zoekt code base instead of solr? Maybe could I transfer the
SAST part to another program and the zoekt search from zoekt software only?

Whatis the best deal for the open source community please?

Best regards.


Re: Solr 8 Reference Guide Not Loading Properly

2025-03-19 Thread Houston Putman
Yeah Apache turned this on recently, so there's no way around it, and
nothing that we did wrong on our end.

I'm not sure how we fix this, but it really is a bad experience for users
(and us developers).

- Houston

On Wed, Mar 19, 2025 at 4:44 PM Alexandre Rafalovitch 
wrote:

> Loading CSS from 3rd party sites:
> https://content-security-policy.com/examples/blocked-csp/
>
>
> Console error:
> a-quick-overview.html:13 Refused to load the stylesheet '
> https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css'
> because it violates the following Content Security Policy directive:
> "style-src 'self' 'unsafe-inline' data:". Note that 'style-src-elem' was
> not explicitly set, so 'style-src' is used as a fallback.
>
> a-quick-overview.html:14 Refused to load the stylesheet '
> https://maxcdn.bootstrapcdn.com/font-awesome/4.5.0/css/font-awesome.min.css
> '
> because it violates the following Content Security Policy directive:
> "style-src 'self' 'unsafe-inline' data:". Note that 'style-src-elem' was
> not explicitly set, so 'style-src' is used as a fallback.
>
>
> On Wed, 19 Mar 2025 at 17:36, Chris Hostetter 
> wrote:
>
> >
> > : I'm not able to see any article text on the webpages for Solr Reference
> > : Guide versions 8.6 through 8.11 (e.g.
> > : https://solr.apache.org/guide/8_11/a-quick-overview.html), and my
> > : colleagues are reporting the same issue. Did something break the
> > : rendering of these pages?
> >
> > Hmmm, good question -- I can reproduce what you describe on multiple
> > browsers, and whatever the problem is seems to be related to CSS -- with
> > all stylesheets disabled the contents of all the pages are there.
> >
> >
> > Any HTML/CSS experts out there who can help debug this?
> >
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
>


Not able to optimize Solr 9.8.0 HNSW index

2025-03-19 Thread Wei
Greetings everyone,

After building the Solr HNSW index on 9.8.0,  I tried to optimize it to a
single segment.  However the optimization doesn't happen and no errors
found in solr log.

Schema:

  


Solrconfig for merge:

25 25 <
int name="maxMergedSegmentMB">10 


In total 30M docs are indexed and I trigger optimization with
/update?optimize=true&waitSearcher=false&maxSegments=1.   When using the BYTE
type, optimization finishes successfully. Howeverwith FLOAT32 type it does
not work, and ends up with ~33G index in ~80 segments. Also I noticed there
isn't any reduction in segment number.
The solr node has ~200G memory and plenty of disk space.

Thanks,
Wei


Re: Solr 8 Reference Guide Not Loading Properly

2025-03-19 Thread Mike Drob
In the web console I see errors relating to unable to find jquery. Did we
remove it at some point? Or link to a wrong version?

On Wed, Mar 19, 2025 at 4:37 PM Chris Hostetter 
wrote:

>
> : I'm not able to see any article text on the webpages for Solr Reference
> : Guide versions 8.6 through 8.11 (e.g.
> : https://solr.apache.org/guide/8_11/a-quick-overview.html), and my
> : colleagues are reporting the same issue. Did something break the
> : rendering of these pages?
>
> Hmmm, good question -- I can reproduce what you describe on multiple
> browsers, and whatever the problem is seems to be related to CSS -- with
> all stylesheets disabled the contents of all the pages are there.
>
>
> Any HTML/CSS experts out there who can help debug this?
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Performance Degradation in Solr When Using OR with frange in fq

2025-03-19 Thread David Smiley
Ideally, after the user created the JIRA issue, it would be shared in this
thread.  Here it is:
https://issues.apache.org/jira/browse/SOLR-17699
(fix for 9.9)

Nice response Hoss, particularly sharing the "filter" trick, which may be
useful here.

On Wed, Mar 19, 2025 at 2:18 PM Chris Hostetter 
wrote:

>
> : We are experiencing high query times in Solr when using an fq filter that
> : combines an OR condition with frange. The response time significantly
> : increases compared to queries that do not use this combination.
> :
> : Query Example
> : fq={!cache=false tag=prm}field:value OR {!frange l=1 u=1 v=$funcQuery}
>
> The first thing i want to make sure you understand is that the {!...}
> syntax is prefix based, so {!cache=false tag=prm} applies to the *entire*
> "fq" param -- not just the "field:value" boolean clause -- i want to
> clarify that because the way you describe breakign the query down implies
> you think otherwise...
>
> : Observations
> : 1) When we use just {!frange l=1 u=1 v=$funcQuery}, the query executes
> : quickly[20ms].
>
> Details matter -- you say "when we use just ..." the frange portion it's
> quick -- but you're not clarifying *how* you use the frange portion by
> itself.
>
> If you mean a request with 'fq={!frange l=1 u=1 v=$funcQuery}' is quick,
> that's likely because it winds up being slow the first time, and
> then cached in the filterCache for very fast subsequent use.
>
> : Question
> : 1) Why does the OR operation with frange cause a significant increase in
> : query time?
>
> The way an frange query works is that it scans every document in the index
> to compute the function value, and then checks if it is in range.
>
> In general, when Lucnee/Solr execute a two clause boolean "AND" query,
> the searcher can tell individual clauses to "skip ahead" based on the
> current match point from the other AND clause
>
> So in the case of "X:Y AND {!frange...}" where X:Y only matches a small
> subset of the index, the frange doesn't have to be computed for every
> document in the index, it gets to skip ahead to the first match of "X:Y"
> and then skip ahead to the second match of "X:Y", etc...
>
> With "X:Y OR {!frange...}" it still has compute the function for every
> document in the index ... and when you combine that with the "cache=false"
> (on the entire "fq")
>
> : 2) Are there any optimizations or alternative query structures that could
> : improve performance?
>
> you can use the special (and slightly odd) "filter()" syntax in the
> default parser to say that a particular boolean clause should be cached as
> non-scoring clause (and that cache hit will be re-used even when using in
> other boolean queries...
>
> fq=X:Y OR filter({!frange...})
> fq=X:Z OR filter({!frange...}) // the frange will be a filterCache hit
>
>
> -Hoss
> http://www.lucidworks.com/
>


Re: Potential bug in task list management

2025-03-19 Thread Chris Hostetter


: Essentially, whenever a query task is abnormally ended, ie either the 
: client times out and closes the connection, the query hits the 
: timeAllowed or cpuAllowed limit, or the task is cancelled through the 
: /solr/collection/tasks/cancel?queryUUID= mechanism, the task is never or 
: almost never removed from the list of tasks returned by the 
: /v2/collections/collection/tasks/list endpoint.

I'm not really familiar with the "task" management API and the cancelng 
queries, not have I tried to reproduce the behavior you are describing, 
but based on a skim of the only test I see that involves canceling 
queries, i don't see anything in that test that would rule out what you're 
descrbing.

TestTaskManagement.testNonExistentQuery
 - just asserts a 404 when trying to cancel an non existent UUID

TestTaskManagement.testCancellationQuery
 - runs some queries in background threads and then cancels them
 - nothing about the queries ensures they are still around to be canceled
 - test only asserts that the number of queries it created equals the 
   number of successs + failures in trying to cancel
 - so even if the 100% of the queries never ran, and were never tracked, 
   this test would pass
 - and nothing in the test confirms that any tasks which *might* have been 
   tracked are removed from the list at the end of the test

TestTaskManagement.testListCancellableQueries
 - runs 50 queries in background threads and then lists current tasks
 - only asserts that the number of items in the is: 0 <= n <= 50 
 - so again: if 100% of the queries never make it to solr the test passes
 - if 100% of the queries are stuck in the list forever, the test passes


So yeah.  There's really not much that this test actaully proves.

Can you please file a Jira with the details of your observations?



If you're up for it, here's how i would approach fixing the test:


1) write a custom SearchComponent that checks for some "blockTest=true" 
request param, and if it's set...

  - calls release() on a "public static final Semaphore REQ_READY"
  - then calls acquire() on a "public static final Semaphore REQ_WAITS_FOR"

...before letting the request finish

2) register & use that component in a /blocking SearchHandler (either via 
a new configset, or via the APIs to add them at query time)


3) change the test logic:

  - use the new /blocking request handler and send blockTest=true on all N 
requests
  - wait to acquire(N) permits from REQ_READY before making any 
assertions about the task list and/or canceling X requests
  - then and only then release(N-X) on REQ_WAITS_FOR
  - wait for all the background request responses, and assert that the 
canceld ones failed, and the other ones succeeded
  - then check the task list and confirm it's now empty.



-Hoss
http://www.lucidworks.com/


Re: Solr throws errors on empty fields on ingestion

2025-03-19 Thread Colvin Cowie
Required fields need non-empty values, as far as I know there's no
exceptions to that.

Take this from the UX/end user perspective. If a document has no title, or
an empty title, what does a user expect to see and do with that?
If they expect to see *something* then yes I think you should insert a
suitable default or a fallback value like the file name or url.
If they don't expect to see something (and you can't always provide a
title), then the title shouldn't be marked as required.

On Wed, 19 Mar 2025 at 10:03, Ehrenleitner Robert Harald <
robert.ehrenleit...@plus.ac.at> wrote:

>
>
> Hi all,
>
> we have a crawler built on our own based on Solarium-PHP which ingests
> Solr. Since I have upgraded from 9.6.1 to 9.8.0, I see errors in the log of
> the crawler. It tells me that Solr complains that the field "title" is
> missing. Acutally, it is part of the request, but it's just empty.
>
> This is a snippet of the request body (for this to be output, I have
> inserted a var_dump() in an appropriate place of Solarium-PHP):
>
> Content-Disposition: form-data; name="literal.publishDate"
> Content-Type: text/plain;charset=UTF-8
>
> 2023-01-12T10:25:06Z
> --0280
> Content-Disposition: form-data; name="literal.title"
> Content-Type: text/plain;charset=UTF-8
>
>
> --0280
> Content-Disposition: form-data; name="literal.number"
>
> And this is the response:
>
> Error indexing document 14935: wp-content/uploads/loremipsum.pdf: Solr
> HTTP error: OK (400)
> {
>   "responseHeader":{
> "status":400,
> "QTime":121
>   },
>   "error":{
>
> "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],
> "msg":"[doc=141396] missing required field: title",
> "code":400
>   }
> }
>
> I cannot fix the PDF file having no title (for various non-technical
> reasons), nevertheless it was working fine until before the upgrade.
>
> The schema was created with this JSON data, especially its title field:
> {
> /* something left out here */
> {
> "name": "title",
> "type": "text_general",
> "stored": true,
> "indexed": true,
> "multiValued": false,
> "required": true
> },
> /* something left out here */
> }
>
> The document is not being indexed.
>
> How can I fix this? Is there probably something in the schema (JSON data)
> I have to change? Or is it better to replace empty titles with some
> constant non-empty string (this can be done in the crawler)?
>
> I have noticed that in the documentation regarding the field option
> "required", it says:
>
> Instructs Solr to reject any attempts to add a document which does not
> have a value for this field. This property defaults to false.
>
> This is ambiguous for me. What is meant with "does not have a value?"
> Well, the value is present but it is an empty string.
>
> Kind regards,
>
> Mag.phil. Robert Ehrenleitner, BEng.
> --
>
> Mag.phil. Robert Ehrenleitner, BEng.
>
> Web-Developer
>
> IT-Services | Application & Digitalization Services
>
> Hellbrunner Straße 34 | 5020 Salzburg | Austria
>
> Tel.: +43/(0)662/8044 - 6778
>
> *www.plus.ac.at *
>
>
>


Re: Solr 9.7.0 - Boolean Query parser is not caching result in Query result cache in case of preFilter is used

2025-03-19 Thread Chris Hostetter


: According to an existing test this should work:
: 
https://github.com/apache/solr/blob/ac3d349dac530cf1001d5113fc21b0fd641cc9d5/solr/core/src/test/org/apache/solr/search/QueryEqualityTest.java#L1494

FWIW: The Lucene HNSW/vector based queries do some very weird non-standard 
thigs in their rewrite() methods, which certainly make it plausale that 
the SOlr QueryEqualityTests might pass, but some "real world" usage might 
not cache properly.

having said that...

I tried recreating the problem described using a minimal viable manual 
test -- i happened to use 9.6.1 because i already had it running with a 
trivial vector field...




With this data...

curl -H "Content-Type: application/json" 
"http://localhost:8983/solr/techproducts/update?commit=true"; --data-binary 
'[{"id":"aaa","type_s":"xxx","vector":[1,2,3,4]},{"id":"bbb","type_s":"xxx","vector":[2,2,3,4]},{"id":"ccc","type_s":"yyy","vector":[1,2,3,5]}]'

Running this request and monitoring the cache metrics showed cache inserts 
on the fitlerCache (for the preFilter) and on the queryResultsCache (for 
the overall query) ...

curl 'http://localhost:8983/solr/techproducts/select' --form-string 'q={!bool 
should=$vectorQuery}' --form-string 'ectorQuery={!knn f=vector topK=3 
preFilter=$preFilter v=$vector}' --form-string 'preFilter=type_s:xxx' 
--form-string 'vector=[2,2,2,2]'

..re-running that exact curl command over and over again showed repeated 
cache hits from both caches (w/o any interstions)

So i echo Matthias's suspicion that maybe there is something about your 
particular "real world" situation that's causing the issue?

Can you provide us with detailed steps to reproduce? (exact configs, 
sample data to index, sample queries to run, waht metrics you are looking 
at that convince you the caching isn't working correctly, etc...)


-Hoss
http://www.lucidworks.com/


Re: Performance Degradation in Solr When Using OR with frange in fq

2025-03-19 Thread Chris Hostetter


: We are experiencing high query times in Solr when using an fq filter that
: combines an OR condition with frange. The response time significantly
: increases compared to queries that do not use this combination.
: 
: Query Example
: fq={!cache=false tag=prm}field:value OR {!frange l=1 u=1 v=$funcQuery}

The first thing i want to make sure you understand is that the {!...} 
syntax is prefix based, so {!cache=false tag=prm} applies to the *entire* 
"fq" param -- not just the "field:value" boolean clause -- i want to 
clarify that because the way you describe breakign the query down implies 
you think otherwise...

: Observations
: 1) When we use just {!frange l=1 u=1 v=$funcQuery}, the query executes
: quickly[20ms].

Details matter -- you say "when we use just ..." the frange portion it's 
quick -- but you're not clarifying *how* you use the frange portion by 
itself.

If you mean a request with 'fq={!frange l=1 u=1 v=$funcQuery}' is quick, 
that's likely because it winds up being slow the first time, and 
then cached in the filterCache for very fast subsequent use.

: Question
: 1) Why does the OR operation with frange cause a significant increase in
: query time?

The way an frange query works is that it scans every document in the index 
to compute the function value, and then checks if it is in range.

In general, when Lucnee/Solr execute a two clause boolean "AND" query, 
the searcher can tell individual clauses to "skip ahead" based on the 
current match point from the other AND clause

So in the case of "X:Y AND {!frange...}" where X:Y only matches a small 
subset of the index, the frange doesn't have to be computed for every 
document in the index, it gets to skip ahead to the first match of "X:Y" 
and then skip ahead to the second match of "X:Y", etc...

With "X:Y OR {!frange...}" it still has compute the function for every 
document in the index ... and when you combine that with the "cache=false" 
(on the entire "fq") 

: 2) Are there any optimizations or alternative query structures that could
: improve performance?

you can use the special (and slightly odd) "filter()" syntax in the 
default parser to say that a particular boolean clause should be cached as 
non-scoring clause (and that cache hit will be re-used even when using in 
other boolean queries...

fq=X:Y OR filter({!frange...})
fq=X:Z OR filter({!frange...}) // the frange will be a filterCache hit


-Hoss
http://www.lucidworks.com/


Re: Solr throws errors on empty fields on ingestion

2025-03-19 Thread Thomas Corthals
This actually looks like a request to the /extract handler.

Can you open an issue at https://github.com/solariumphp/solarium/issues
with the code that causes this behaviour?

Thomas

Op wo 19 mrt 2025 om 15:49 schreef Colvin Cowie :

> Hello,
>
> re the "400 OK". I don't see that happening myself locally, I have the
> correct "Bad Request" status line when making requests directly to the
> /update handler.
> Perhaps it's an issue in Solarium-PHP?
>
>
> On Wed, 19 Mar 2025 at 13:26, Ehrenleitner Robert Harald <
> robert.ehrenleit...@plus.ac.at> wrote:
>
> > Hi,
> >
> > that was fast.
> >
> > Actually, I see that the documents which do not have a title are also
> > missing in the index of the older Solr version which is still fed by the
> > older version of Solarium-PHP. So, probably the newer version of
> > Solarium-PHP exposes an error which was there before but was not logged.
> I
> > don't want to check this now.
> >
> > As a side node: It seems like Solr responds with HTTP status "400 OK",
> > which is not a good idea. It should be "400 Invalid request".
> >
> > Thanks for the advice with the filename, that's a good idea. I will
> modify
> > the crawler to fallback to the slug (special term from WordPress) or to
> the
> > filename if the title is empty.
> >
> > Kind regards,
> >
> >
> >
> >
> > Mag.phil. Robert Ehrenleitner, BEng.
> > --
> >
> > Mag.phil. Robert Ehrenleitner, BEng.
> >
> > Web-Developer
> >
> > IT-Services | Application & Digitalization Services
> >
> > Hellbrunner Straße 34 | 5020 Salzburg | Austria
> >
> > Tel.: +43/(0)662/8044 - 6778
> >
> > *www.plus.ac.at *
> >
> >
> >
> > --
> > *Von:* Colvin Cowie 
> > *Gesendet:* Mittwoch, 19. März 2025 11:51
> > *An:* users@solr.apache.org 
> > *Betreff:* Re: Solr throws errors on empty fields on ingestion
> >
> > [Sie erhalten nicht häufig E-Mails von colvin.cowie@gmail.com.
> > Weitere Informationen, warum dies wichtig ist, finden Sie unter
> > https://aka.ms/LearnAboutSenderIdentification ]
> >
> > Required fields need non-empty values, as far as I know there's no
> > exceptions to that.
> >
> > Take this from the UX/end user perspective. If a document has no title,
> or
> > an empty title, what does a user expect to see and do with that?
> > If they expect to see *something* then yes I think you should insert a
> > suitable default or a fallback value like the file name or url.
> > If they don't expect to see something (and you can't always provide a
> > title), then the title shouldn't be marked as required.
> >
> > On Wed, 19 Mar 2025 at 10:03, Ehrenleitner Robert Harald <
> > robert.ehrenleit...@plus.ac.at> wrote:
> >
> > >
> > >
> > > Hi all,
> > >
> > > we have a crawler built on our own based on Solarium-PHP which ingests
> > > Solr. Since I have upgraded from 9.6.1 to 9.8.0, I see errors in the
> log
> > of
> > > the crawler. It tells me that Solr complains that the field "title" is
> > > missing. Acutally, it is part of the request, but it's just empty.
> > >
> > > This is a snippet of the request body (for this to be output, I have
> > > inserted a var_dump() in an appropriate place of Solarium-PHP):
> > >
> > > Content-Disposition: form-data; name="literal.publishDate"
> > > Content-Type: text/plain;charset=UTF-8
> > >
> > > 2023-01-12T10:25:06Z
> > > --0280
> > > Content-Disposition: form-data; name="literal.title"
> > > Content-Type: text/plain;charset=UTF-8
> > >
> > >
> > > --0280
> > > Content-Disposition: form-data; name="literal.number"
> > >
> > > And this is the response:
> > >
> > > Error indexing document 14935: wp-content/uploads/loremipsum.pdf: Solr
> > > HTTP error: OK (400)
> > > {
> > >   "responseHeader":{
> > > "status":400,
> > > "QTime":121
> > >   },
> > >   "error":{
> > >
> > >
> >
> "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],
> > > "msg":"[doc=141396] missing required field: title",
> > > "code":400
> > >   }
> > > }
> > >
> > > I cannot fix the PDF file having no title (for various non-technical
> > > reasons), nevertheless it was working fine until before the upgrade.
> > >
> > > The schema was created with this JSON data, especially its title field:
> > > {
> > > /* something left out here */
> > > {
> > > "name": "title",
> > > "type": "text_general",
> > > "stored": true,
> > > "indexed": true,
> > > "multiValued": false,
> > > "required": true
> > > },
> > > /* something left out here */
> > > }
> > >
> > > The document is not being indexed.
> > >
> > > How can I fix this? Is there probably something in the schema (JSON
> data)
> > > I have to change? Or is it better to replace empty titles with some
> > > constant non-empty string (this can be done in the crawler)?
> > >
> > > I have noticed that in the docum

WG: Solr throws errors on empty fields on ingestion

2025-03-19 Thread Ehrenleitner Robert Harald


Hi all,

we have a crawler built on our own based on Solarium-PHP which ingests Solr. 
Since I have upgraded from 9.6.1 to 9.8.0, I see errors in the log of the 
crawler. It tells me that Solr complains that the field "title" is missing. 
Acutally, it is part of the request, but it's just empty.

This is a snippet of the request body (for this to be output, I have inserted a 
var_dump() in an appropriate place of Solarium-PHP):

Content-Disposition: form-data; name="literal.publishDate"
Content-Type: text/plain;charset=UTF-8

2023-01-12T10:25:06Z
--0280
Content-Disposition: form-data; name="literal.title"
Content-Type: text/plain;charset=UTF-8


--0280
Content-Disposition: form-data; name="literal.number"

And this is the response:

Error indexing document 14935: wp-content/uploads/loremipsum.pdf: Solr HTTP 
error: OK (400)
{
  "responseHeader":{
"status":400,
"QTime":121
  },
  "error":{

"metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],
"msg":"[doc=141396] missing required field: title",
"code":400
  }
}

I cannot fix the PDF file having no title (for various non-technical reasons), 
nevertheless it was working fine until before the upgrade.

The schema was created with this JSON data, especially its title field:
{
/* something left out here */
{
"name": "title",
"type": "text_general",
"stored": true,
"indexed": true,
"multiValued": false,
"required": true
},
/* something left out here */
}

The document is not being indexed.

How can I fix this? Is there probably something in the schema (JSON data) I 
have to change? Or is it better to replace empty titles with some constant 
non-empty string (this can be done in the crawler)?

I have noticed that in the documentation regarding the field option "required", 
it says:

Instructs Solr to reject any attempts to add a document which does not have a 
value for this field. This property defaults to false.

This is ambiguous for me. What is meant with "does not have a value?" Well, the 
value is present but it is an empty string.

Kind regards,

Mag.phil. Robert Ehrenleitner, BEng.
--

[cid:880112d6-7211-43dd-98f7-8cb2daea9baf]

Mag.phil. Robert Ehrenleitner, BEng.

Web-Developer

IT-Services | Application & Digitalization Services

Hellbrunner Straße 34 | 5020 Salzburg | Austria

Tel.: +43/(0)662/8044 - 6778

www.plus.ac.at




AW: Solr throws errors on empty fields on ingestion

2025-03-19 Thread Ehrenleitner Robert Harald
Hi,

that was fast.

Actually, I see that the documents which do not have a title are also missing 
in the index of the older Solr version which is still fed by the older version 
of Solarium-PHP. So, probably the newer version of Solarium-PHP exposes an 
error which was there before but was not logged. I don't want to check this now.

As a side node: It seems like Solr responds with HTTP status "400 OK", which is 
not a good idea. It should be "400 Invalid request".

Thanks for the advice with the filename, that's a good idea. I will modify the 
crawler to fallback to the slug (special term from WordPress) or to the 
filename if the title is empty.

Kind regards,




Mag.phil. Robert Ehrenleitner, BEng.
--

[cid:bc04df4a-2a84-44e2-a0bc-c2dfed6d34cb]

Mag.phil. Robert Ehrenleitner, BEng.

Web-Developer

IT-Services | Application & Digitalization Services

Hellbrunner Straße 34 | 5020 Salzburg | Austria

Tel.: +43/(0)662/8044 - 6778

www.plus.ac.at




Von: Colvin Cowie 
Gesendet: Mittwoch, 19. März 2025 11:51
An: users@solr.apache.org 
Betreff: Re: Solr throws errors on empty fields on ingestion

[Sie erhalten nicht häufig E-Mails von colvin.cowie@gmail.com. Weitere 
Informationen, warum dies wichtig ist, finden Sie unter 
https://aka.ms/LearnAboutSenderIdentification ]

Required fields need non-empty values, as far as I know there's no
exceptions to that.

Take this from the UX/end user perspective. If a document has no title, or
an empty title, what does a user expect to see and do with that?
If they expect to see *something* then yes I think you should insert a
suitable default or a fallback value like the file name or url.
If they don't expect to see something (and you can't always provide a
title), then the title shouldn't be marked as required.

On Wed, 19 Mar 2025 at 10:03, Ehrenleitner Robert Harald <
robert.ehrenleit...@plus.ac.at> wrote:

>
>
> Hi all,
>
> we have a crawler built on our own based on Solarium-PHP which ingests
> Solr. Since I have upgraded from 9.6.1 to 9.8.0, I see errors in the log of
> the crawler. It tells me that Solr complains that the field "title" is
> missing. Acutally, it is part of the request, but it's just empty.
>
> This is a snippet of the request body (for this to be output, I have
> inserted a var_dump() in an appropriate place of Solarium-PHP):
>
> Content-Disposition: form-data; name="literal.publishDate"
> Content-Type: text/plain;charset=UTF-8
>
> 2023-01-12T10:25:06Z
> --0280
> Content-Disposition: form-data; name="literal.title"
> Content-Type: text/plain;charset=UTF-8
>
>
> --0280
> Content-Disposition: form-data; name="literal.number"
>
> And this is the response:
>
> Error indexing document 14935: wp-content/uploads/loremipsum.pdf: Solr
> HTTP error: OK (400)
> {
>   "responseHeader":{
> "status":400,
> "QTime":121
>   },
>   "error":{
>
> "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],
> "msg":"[doc=141396] missing required field: title",
> "code":400
>   }
> }
>
> I cannot fix the PDF file having no title (for various non-technical
> reasons), nevertheless it was working fine until before the upgrade.
>
> The schema was created with this JSON data, especially its title field:
> {
> /* something left out here */
> {
> "name": "title",
> "type": "text_general",
> "stored": true,
> "indexed": true,
> "multiValued": false,
> "required": true
> },
> /* something left out here */
> }
>
> The document is not being indexed.
>
> How can I fix this? Is there probably something in the schema (JSON data)
> I have to change? Or is it better to replace empty titles with some
> constant non-empty string (this can be done in the crawler)?
>
> I have noticed that in the documentation regarding the field option
> "required", it says:
>
> Instructs Solr to reject any attempts to add a document which does not
> have a value for this field. This property defaults to false.
>
> This is ambiguous for me. What is meant with "does not have a value?"
> Well, the value is present but it is an empty string.
>
> Kind regards,
>
> Mag.phil. Robert Ehrenleitner, BEng.
> --
>
> Mag.phil. Robert Ehrenleitner, BEng.
>
> Web-Developer
>
> IT-Services | Application & Digitalization Services
>
> Hellbrunner Straße 34 | 5020 Salzburg | Austria
>
> Tel.: +43/(0)662/8044 - 6778
>
> *www.plus.ac.at *
>
>
>