Hi, that was fast.
Actually, I see that the documents which do not have a title are also missing in the index of the older Solr version which is still fed by the older version of Solarium-PHP. So, probably the newer version of Solarium-PHP exposes an error which was there before but was not logged. I don't want to check this now. As a side node: It seems like Solr responds with HTTP status "400 OK", which is not a good idea. It should be "400 Invalid request". Thanks for the advice with the filename, that's a good idea. I will modify the crawler to fallback to the slug (special term from WordPress) or to the filename if the title is empty. Kind regards, Mag.phil. Robert Ehrenleitner, BEng. -- [cid:bc04df4a-2a84-44e2-a0bc-c2dfed6d34cb] Mag.phil. Robert Ehrenleitner, BEng. Web-Developer IT-Services | Application & Digitalization Services Hellbrunner Straße 34 | 5020 Salzburg | Austria Tel.: +43/(0)662/8044 - 6778 www.plus.ac.at<http://www.plus.ac.at> ________________________________ Von: Colvin Cowie <colvin.cowie....@gmail.com> Gesendet: Mittwoch, 19. März 2025 11:51 An: users@solr.apache.org <users@solr.apache.org> Betreff: Re: Solr throws errors on empty fields on ingestion [Sie erhalten nicht häufig E-Mails von colvin.cowie....@gmail.com. Weitere Informationen, warum dies wichtig ist, finden Sie unter https://aka.ms/LearnAboutSenderIdentification ] Required fields need non-empty values, as far as I know there's no exceptions to that. Take this from the UX/end user perspective. If a document has no title, or an empty title, what does a user expect to see and do with that? If they expect to see *something* then yes I think you should insert a suitable default or a fallback value like the file name or url. If they don't expect to see something (and you can't always provide a title), then the title shouldn't be marked as required. On Wed, 19 Mar 2025 at 10:03, Ehrenleitner Robert Harald < robert.ehrenleit...@plus.ac.at> wrote: > > > Hi all, > > we have a crawler built on our own based on Solarium-PHP which ingests > Solr. Since I have upgraded from 9.6.1 to 9.8.0, I see errors in the log of > the crawler. It tells me that Solr complains that the field "title" is > missing. Acutally, it is part of the request, but it's just empty. > > This is a snippet of the request body (for this to be output, I have > inserted a var_dump() in an appropriate place of Solarium-PHP): > > Content-Disposition: form-data; name="literal.publishDate" > Content-Type: text/plain;charset=UTF-8 > > 2023-01-12T10:25:06Z > --00000000000002800000000000000000 > Content-Disposition: form-data; name="literal.title" > Content-Type: text/plain;charset=UTF-8 > > > --00000000000002800000000000000000 > Content-Disposition: form-data; name="literal.number" > > And this is the response: > > Error indexing document 14935: wp-content/uploads/loremipsum.pdf: Solr > HTTP error: OK (400) > { > "responseHeader":{ > "status":400, > "QTime":121 > }, > "error":{ > > "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"], > "msg":"[doc=141396] missing required field: title", > "code":400 > } > } > > I cannot fix the PDF file having no title (for various non-technical > reasons), nevertheless it was working fine until before the upgrade. > > The schema was created with this JSON data, especially its title field: > { > /* something left out here */ > { > "name": "title", > "type": "text_general", > "stored": true, > "indexed": true, > "multiValued": false, > "required": true > }, > /* something left out here */ > } > > The document is not being indexed. > > How can I fix this? Is there probably something in the schema (JSON data) > I have to change? Or is it better to replace empty titles with some > constant non-empty string (this can be done in the crawler)? > > I have noticed that in the documentation regarding the field option > "required", it says: > > Instructs Solr to reject any attempts to add a document which does not > have a value for this field. This property defaults to false. > > This is ambiguous for me. What is meant with "does not have a value?" > Well, the value is present but it is an empty string. > > Kind regards, > > Mag.phil. Robert Ehrenleitner, BEng. > -- > > Mag.phil. Robert Ehrenleitner, BEng. > > Web-Developer > > IT-Services | Application & Digitalization Services > > Hellbrunner Straße 34 | 5020 Salzburg | Austria > > Tel.: +43/(0)662/8044 - 6778 > > *www.plus.ac.at <http://www.plus.ac.at>* > > >