[ 
https://issues.apache.org/jira/browse/SOLR-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Pugh resolved SOLR-3386.
-----------------------------
    Resolution: Won't Fix

In Solr 10 we are leveraging either Tika Server (running in it's own seperate 
server process) or maybe Tika Pipes (again, running in a seperate JVM).   
Please revalidate your issue against Solr 10 with one of those options, and if 
it is still present need, happy to work with you on a fix using the new 
approach for Tika.

> ExtractingRequestHandler applies fname settings to literals
> -----------------------------------------------------------
>
>                 Key: SOLR-3386
>                 URL: https://issues.apache.org/jira/browse/SOLR-3386
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 3.5
>            Reporter: Colin Hebert
>            Priority: Minor
>
> The SolrContentHandler.addLiterals() method call the 
> SolrContentHandler.addField() which itself obtains the field with 
> SolrContentHandler.findMappedName().
> If this call makes sense with SolrContentHandler.addMetadata() [and others] 
> because the user can't set the name of the fields otherwise, with literals it 
> isn't useful, the name of the field is manually given by the user and it 
> shouldn't be modified at all (maybe applying unknownFieldPrefix or 
> defaultField could be done, but even that doesn't seem quite normal).
> ----
> I got this issue with the following usecase:
> I have a schema containing a "title" field which is mandatory and contains 
> only one value.
> My documents have an internal title which is used as the value of the "title" 
> field.
> When sending one of these documents (and HTML document), if it contains a 
> "title" metadata I get an exception because I have multiple values for my 
> "title" field (an exception I expect).
> To fix that I used "fname.title=tika_title", so the title provided by tika is 
> kept under another name.
> Both titles (the original one I pass manually, and the metadata one) are now 
> stored in the field "tika_title" and I get an exception because the "title" 
> field hasn't been provided at all.
> ----
> An easy workaround for this bug is sending the literal named as "my_title", 
> and adding the following fnames 
> "fname.my_title=title&fname.title=tika_title". A small swicheroo which puts 
> back the correct value in the expected field.
> ----
> A way to fix that is extracting the first part of 
> SolrContentHandler.addField() (lowerNames and findMappedName()) in an 
> external method (or put the lowerNames check in 
> SolrContentHandler.findMappedName() ) and use that external method (or 
> findMappedName() ) _before_ calling SolrContentHandler.addField()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to