[jira] [Commented] (SOLR-12593) Remove date parsing functionality from extraction contrib

David Smiley (JIRA) Tue, 11 Sep 2018 21:39:22 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-12593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16611590#comment-16611590
 ]


David Smiley commented on SOLR-12593:
-------------------------------------

So I took the PR and further changed the ref guide page on this, and then the 
default config slightly as well. My changes grew in scope to misc things I 
didn't like in the guide for this feature but I hope other committers are happy 
with it. FWIW I held back from doing more :) [~arafalov] I know you tend to the 
configs so I'm hoping you can review this (or anyone of course).
 * Revamped the "Key Solr Cell Concepts"
 * Switched the examples / "trying out" instructions from using the 
"techproducts" example config to using our default config (via -e 
"schemaless"). Why? Firstly, I observed that the techproducts config didn't 
have the URPs I wanted. Fixable, yes, but... Secondly, I think it simply 
doesn't make sense to have the "techproducts" config, by virtue of its name, 
have things other than .. you know... _tech products_.
 ** The default configset's schema oddly does not include an "ignored" field 
type and "ignored_*" dynamic field. I added them. These are useful, especially 
with Solr Cell.
 ** minutia: removed the metadata name mapping of metadata "meta" to "ignored_" 
from the default parameters of the default configset's /update/extract request 
handler. I don't see the point of this and FWIW it's not in the techproducts 
config either.  Lets keep this config more minimal.
 ** The default configset is schemaless, and so the "try tika" instructions 
were modified to recognize the fact that the metadata is all automatically 
added instead of how it used to be which was only those fields that happened to 
be in the techproducts schema. This is good but there is an awkward part in the 
last step of the demo if you want to _not_ map the metadata since it requires 
wiping the core and starting over.
 * Added a tip on URPs with an example to specify these processors.

> Remove date parsing functionality from extraction contrib
> ---------------------------------------------------------
>
>                 Key: SOLR-12593
>                 URL: https://issues.apache.org/jira/browse/SOLR-12593
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>             Fix For: master (8.0)
>
>         Attachments: SOLR-12593.patch
>
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The date parsing functionality in the extraction contrib is obsoleted by 
> equivalent functionality in ParseDateFieldUpdateProcessorFactory.  It should 
> be removed.  We should add documentation within this part of the ref guide on 
> how to accomplish the same (and test it).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-12593) Remove date parsing functionality from extraction contrib

Reply via email to