[
https://issues.apache.org/jira/browse/SOLR-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795488#comment-13795488
]
Chaiyasit (Sit) Manovit commented on SOLR-1634:
-----------------------------------------------
Hello,
I am not sure if SOLR 1856 completely fixes this, particularly when
lowernames==true comes in to the picture. Consider a case where:
1. Tika generated field "FOO=BAR" for the doc.
2. literalsOverride==true.
3. lowernames==true.
4. User supplied "literal.foo=bar".
According to the rules
(http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations),
literalsOverride is applied before lowernames and, thus, will have no effect
here since the field FOO from Tika and literal.foo are considered different
fields at this stage before lowernames==true kicks in. And when
lowernames==true kicks in, it has the effect of merging FOO into foo, giving it
both values BAR and bar.
Adding "fmap.foo=tika_foo" does not help because fmap is applied even later, by
that time foo already contains both BAR and bar.
Adding "fmap.FOO=tika_foo" and with "lowernames==false" would do (regardless of
literalsOverride), but what if we need "lowernames==true" and what if the
capitalization of FOO can vary.
Would it make sense to have an option to apply the rules in the order that they
are specified in the config file or URL params rather than always in a static
order?
Thanks.
> change order of field operations in SolrCell
> --------------------------------------------
>
> Key: SOLR-1634
> URL: https://issues.apache.org/jira/browse/SOLR-1634
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: Hoss Man
>
> As noted on the mailing list, SolrCell evaluates fmap.* params AFTER
> literal.* params. This makes it impossible for users to map tika produced
> fields to other names (possibly for the purpose of ignoring them completely)
> and then using literal to provide explicit values for those fields. At first
> glance this seems like a bug, except that it is explicitly documented...
> http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations
> ...so i'm opening this as an "Improvement". We should either consider
> changing the order of operations, or find some other way to support what
> seems like a very common use case...
> http://old.nabble.com/Re%3A-WELCOME-to-solr-user%40lucene.apache.org-to26650071.html#a26650071
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]