[
https://issues.apache.org/jira/browse/SOLR-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Pugh resolved SOLR-3386.
-----------------------------
Resolution: Won't Fix
In Solr 10 we are leveraging either Tika Server (running in it's own seperate
server process) or maybe Tika Pipes (again, running in a seperate JVM).
Please revalidate your issue against Solr 10 with one of those options, and if
it is still present need, happy to work with you on a fix using the new
approach for Tika.
> ExtractingRequestHandler applies fname settings to literals
> -----------------------------------------------------------
>
> Key: SOLR-3386
> URL: https://issues.apache.org/jira/browse/SOLR-3386
> Project: Solr
> Issue Type: Bug
> Components: contrib - Solr Cell (Tika extraction)
> Affects Versions: 3.5
> Reporter: Colin Hebert
> Priority: Minor
>
> The SolrContentHandler.addLiterals() method call the
> SolrContentHandler.addField() which itself obtains the field with
> SolrContentHandler.findMappedName().
> If this call makes sense with SolrContentHandler.addMetadata() [and others]
> because the user can't set the name of the fields otherwise, with literals it
> isn't useful, the name of the field is manually given by the user and it
> shouldn't be modified at all (maybe applying unknownFieldPrefix or
> defaultField could be done, but even that doesn't seem quite normal).
> ----
> I got this issue with the following usecase:
> I have a schema containing a "title" field which is mandatory and contains
> only one value.
> My documents have an internal title which is used as the value of the "title"
> field.
> When sending one of these documents (and HTML document), if it contains a
> "title" metadata I get an exception because I have multiple values for my
> "title" field (an exception I expect).
> To fix that I used "fname.title=tika_title", so the title provided by tika is
> kept under another name.
> Both titles (the original one I pass manually, and the metadata one) are now
> stored in the field "tika_title" and I get an exception because the "title"
> field hasn't been provided at all.
> ----
> An easy workaround for this bug is sending the literal named as "my_title",
> and adding the following fnames
> "fname.my_title=title&fname.title=tika_title". A small swicheroo which puts
> back the correct value in the expected field.
> ----
> A way to fix that is extracting the first part of
> SolrContentHandler.addField() (lowerNames and findMappedName()) in an
> external method (or put the lowerNames check in
> SolrContentHandler.findMappedName() ) and use that external method (or
> findMappedName() ) _before_ calling SolrContentHandler.addField()
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]