Hello,

re the "400 OK". I don't see that happening myself locally, I have the
correct "Bad Request" status line when making requests directly to the
/update handler.
Perhaps it's an issue in Solarium-PHP?

On Wed, 19 Mar 2025 at 13:26, Ehrenleitner Robert Harald <
robert.ehrenleit...@plus.ac.at> wrote:

> Hi,
>
> that was fast.
>
> Actually, I see that the documents which do not have a title are also
> missing in the index of the older Solr version which is still fed by the
> older version of Solarium-PHP. So, probably the newer version of
> Solarium-PHP exposes an error which was there before but was not logged. I
> don't want to check this now.
>
> As a side node: It seems like Solr responds with HTTP status "400 OK",
> which is not a good idea. It should be "400 Invalid request".
>
> Thanks for the advice with the filename, that's a good idea. I will modify
> the crawler to fallback to the slug (special term from WordPress) or to the
> filename if the title is empty.
>
> Kind regards,
>
>
>
>
> Mag.phil. Robert Ehrenleitner, BEng.
> --
>
> Mag.phil. Robert Ehrenleitner, BEng.
>
> Web-Developer
>
> IT-Services | Application & Digitalization Services
>
> Hellbrunner Straße 34 | 5020 Salzburg | Austria
>
> Tel.: +43/(0)662/8044 - 6778
>
> *www.plus.ac.at <http://www.plus.ac.at>*
>
>
>
> ------------------------------
> *Von:* Colvin Cowie <colvin.cowie....@gmail.com>
> *Gesendet:* Mittwoch, 19. März 2025 11:51
> *An:* users@solr.apache.org <users@solr.apache.org>
> *Betreff:* Re: Solr throws errors on empty fields on ingestion
>
> [Sie erhalten nicht häufig E-Mails von colvin.cowie....@gmail.com.
> Weitere Informationen, warum dies wichtig ist, finden Sie unter
> https://aka.ms/LearnAboutSenderIdentification ]
>
> Required fields need non-empty values, as far as I know there's no
> exceptions to that.
>
> Take this from the UX/end user perspective. If a document has no title, or
> an empty title, what does a user expect to see and do with that?
> If they expect to see *something* then yes I think you should insert a
> suitable default or a fallback value like the file name or url.
> If they don't expect to see something (and you can't always provide a
> title), then the title shouldn't be marked as required.
>
> On Wed, 19 Mar 2025 at 10:03, Ehrenleitner Robert Harald <
> robert.ehrenleit...@plus.ac.at> wrote:
>
> >
> >
> > Hi all,
> >
> > we have a crawler built on our own based on Solarium-PHP which ingests
> > Solr. Since I have upgraded from 9.6.1 to 9.8.0, I see errors in the log
> of
> > the crawler. It tells me that Solr complains that the field "title" is
> > missing. Acutally, it is part of the request, but it's just empty.
> >
> > This is a snippet of the request body (for this to be output, I have
> > inserted a var_dump() in an appropriate place of Solarium-PHP):
> >
> > Content-Disposition: form-data; name="literal.publishDate"
> > Content-Type: text/plain;charset=UTF-8
> >
> > 2023-01-12T10:25:06Z
> > --00000000000002800000000000000000
> > Content-Disposition: form-data; name="literal.title"
> > Content-Type: text/plain;charset=UTF-8
> >
> >
> > --00000000000002800000000000000000
> > Content-Disposition: form-data; name="literal.number"
> >
> > And this is the response:
> >
> > Error indexing document 14935: wp-content/uploads/loremipsum.pdf: Solr
> > HTTP error: OK (400)
> > {
> >   "responseHeader":{
> >     "status":400,
> >     "QTime":121
> >   },
> >   "error":{
> >
> >
> "metadata":["error-class","org.apache.solr.common.SolrException","root-error-class","org.apache.solr.common.SolrException"],
> >     "msg":"[doc=141396] missing required field: title",
> >     "code":400
> >   }
> > }
> >
> > I cannot fix the PDF file having no title (for various non-technical
> > reasons), nevertheless it was working fine until before the upgrade.
> >
> > The schema was created with this JSON data, especially its title field:
> > {
> > /* something left out here */
> >         {
> >             "name": "title",
> >             "type": "text_general",
> >             "stored": true,
> >             "indexed": true,
> >             "multiValued": false,
> >             "required": true
> >         },
> > /* something left out here */
> > }
> >
> > The document is not being indexed.
> >
> > How can I fix this? Is there probably something in the schema (JSON data)
> > I have to change? Or is it better to replace empty titles with some
> > constant non-empty string (this can be done in the crawler)?
> >
> > I have noticed that in the documentation regarding the field option
> > "required", it says:
> >
> > Instructs Solr to reject any attempts to add a document which does not
> > have a value for this field. This property defaults to false.
> >
> > This is ambiguous for me. What is meant with "does not have a value?"
> > Well, the value is present but it is an empty string.
> >
> > Kind regards,
> >
> > Mag.phil. Robert Ehrenleitner, BEng.
> > --
> >
> > Mag.phil. Robert Ehrenleitner, BEng.
> >
> > Web-Developer
> >
> > IT-Services | Application & Digitalization Services
> >
> > Hellbrunner Straße 34 | 5020 Salzburg | Austria
> >
> > Tel.: +43/(0)662/8044 - 6778
> >
> > *www.plus.ac.at <http://www.plus.ac.at>*
> >
> >
> >
>

Reply via email to