[ 
https://issues.apache.org/jira/browse/SOLR-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795488#comment-13795488
 ] 

Chaiyasit (Sit) Manovit commented on SOLR-1634:
-----------------------------------------------

Hello,

I am not sure if SOLR 1856 completely fixes this, particularly when 
lowernames==true comes in to the picture. Consider a case where:
 1. Tika generated field "FOO=BAR" for the doc.
 2. literalsOverride==true.
 3. lowernames==true.
 4. User supplied "literal.foo=bar".

According to the rules 
(http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations),
 literalsOverride is applied before lowernames and, thus, will have no effect 
here since the field FOO from Tika and literal.foo are considered different 
fields at this stage before lowernames==true kicks in. And when 
lowernames==true kicks in, it has the effect of merging FOO into foo, giving it 
both values BAR and bar.

Adding "fmap.foo=tika_foo" does not help because fmap is applied even later, by 
that time foo already contains both BAR and bar.

Adding "fmap.FOO=tika_foo" and with "lowernames==false" would do (regardless of 
literalsOverride), but what if we need "lowernames==true" and what if the 
capitalization of FOO can vary.

Would it make sense to have an option to apply the rules in the order that they 
are specified in the config file or URL params rather than always in a static 
order?

Thanks.

> change order of field operations in SolrCell
> --------------------------------------------
>
>                 Key: SOLR-1634
>                 URL: https://issues.apache.org/jira/browse/SOLR-1634
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Hoss Man
>
> As noted on the mailing list, SolrCell evaluates fmap.* params AFTER 
> literal.* params.  This makes it impossible for users to map tika produced 
> fields to other names (possibly for the purpose of ignoring them completely) 
> and then using literal to provide explicit values for those fields.  At first 
> glance this seems like a bug, except that it is explicitly documented...
> http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations
> ...so i'm opening this as an "Improvement".   We should either consider 
> changing the order of operations, or find some other way to support what 
> seems like a very common use case...
> http://old.nabble.com/Re%3A-WELCOME-to-solr-user%40lucene.apache.org-to26650071.html#a26650071



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to