This actually looks like a request to the /extract handler. Can you open an issue at https://github.com/solariumphp/solarium/issues with the code that causes this behaviour?
Thomas Op wo 19 mrt 2025 om 15:49 schreef Colvin Cowie <colvin.cowie....@gmail.com >: > Hello, > > re the "400 OK". I don't see that happening myself locally, I have the > correct "Bad Request" status line when making requests directly to the > /update handler. > Perhaps it's an issue in Solarium-PHP? > > > On Wed, 19 Mar 2025 at 13:26, Ehrenleitner Robert Harald < > robert.ehrenleit...@plus.ac.at> wrote: > > > Hi, > > > > that was fast. > > > > Actually, I see that the documents which do not have a title are also > > missing in the index of the older Solr version which is still fed by the > > older version of Solarium-PHP. So, probably the newer version of > > Solarium-PHP exposes an error which was there before but was not logged. > I > > don't want to check this now. > > > > As a side node: It seems like Solr responds with HTTP status "400 OK", > > which is not a good idea. It should be "400 Invalid request". > > > > Thanks for the advice with the filename, that's a good idea. I will > modify > > the crawler to fallback to the slug (special term from WordPress) or to > the > > filename if the title is empty. > > > > Kind regards, > > > > > > > > > > Mag.phil. Robert Ehrenleitner, BEng. > > -- > > > > Mag.phil. Robert Ehrenleitner, BEng. > > > > Web-Developer > > > > IT-Services | Application & Digitalization Services > > > > Hellbrunner Straße 34 | 5020 Salzburg | Austria > > > > Tel.: +43/(0)662/8044 - 6778 > > > > *www.plus.ac.at <http://www.plus.ac.at>* > > > > > > > > ------------------------------ > > *Von:* Colvin Cowie <colvin.cowie....@gmail.com> > > *Gesendet:* Mittwoch, 19. März 2025 11:51 > > *An:* users@solr.apache.org <users@solr.apache.org> > > *Betreff:* Re: Solr throws errors on empty fields on ingestion > > > > [Sie erhalten nicht häufig E-Mails von colvin.cowie....@gmail.com. > > Weitere Informationen, warum dies wichtig ist, finden Sie unter > > https://aka.ms/LearnAboutSenderIdentification ] > > > > Required fields need non-empty values, as far as I know there's no > > exceptions to that. > > > > Take this from the UX/end user perspective. If a document has no title, > or > > an empty title, what does a user expect to see and do with that? > > If they expect to see *something* then yes I think you should insert a > > suitable default or a fallback value like the file name or url. > > If they don't expect to see something (and you can't always provide a > > title), then the title shouldn't be marked as required. > > > > On Wed, 19 Mar 2025 at 10:03, Ehrenleitner Robert Harald < > > robert.ehrenleit...@plus.ac.at> wrote: > > > > > > > > > > > Hi all, > > > > > > we have a crawler built on our own based on Solarium-PHP which ingests > > > Solr. Since I have upgraded from 9.6.1 to 9.8.0, I see errors in the > log > > of > > > the crawler. It tells me that Solr complains that the field "title" is > > > missing. Acutally, it is part of the request, but it's just empty. > > > > > > This is a snippet of the request body (for this to be output, I have > > > inserted a var_dump() in an appropriate place of Solarium-PHP): > > > > > > Content-Disposition: form-data; name="literal.publishDate" > > > Content-Type: text/plain;charset=UTF-8 > > > > > > 2023-01-12T10:25:06Z > > > --00000000000002800000000000000000 > > > Content-Disposition: form-data; name="literal.title" > > > Content-Type: text/plain;charset=UTF-8 > > > > > > > > > --00000000000002800000000000000000 > > > Content-Disposition: form-data; name="literal.number" > > > > > > And this is the response: > > > > > > Error indexing document 14935: wp-content/uploads/loremipsum.pdf: Solr > > > HTTP error: OK (400) > > > { > > > "responseHeader":{ > > > "status":400, > > > "QTime":121 > > > }, > > > "error":{ > > > > > > > > > "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"], > > > "msg":"[doc=141396] missing required field: title", > > > "code":400 > > > } > > > } > > > > > > I cannot fix the PDF file having no title (for various non-technical > > > reasons), nevertheless it was working fine until before the upgrade. > > > > > > The schema was created with this JSON data, especially its title field: > > > { > > > /* something left out here */ > > > { > > > "name": "title", > > > "type": "text_general", > > > "stored": true, > > > "indexed": true, > > > "multiValued": false, > > > "required": true > > > }, > > > /* something left out here */ > > > } > > > > > > The document is not being indexed. > > > > > > How can I fix this? Is there probably something in the schema (JSON > data) > > > I have to change? Or is it better to replace empty titles with some > > > constant non-empty string (this can be done in the crawler)? > > > > > > I have noticed that in the documentation regarding the field option > > > "required", it says: > > > > > > Instructs Solr to reject any attempts to add a document which does not > > > have a value for this field. This property defaults to false. > > > > > > This is ambiguous for me. What is meant with "does not have a value?" > > > Well, the value is present but it is an empty string. > > > > > > Kind regards, > > > > > > Mag.phil. Robert Ehrenleitner, BEng. > > > -- > > > > > > Mag.phil. Robert Ehrenleitner, BEng. > > > > > > Web-Developer > > > > > > IT-Services | Application & Digitalization Services > > > > > > Hellbrunner Straße 34 | 5020 Salzburg | Austria > > > > > > Tel.: +43/(0)662/8044 - 6778 > > > > > > *www.plus.ac.at <http://www.plus.ac.at>* > > > > > > > > > > > >