[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

Tomoko Uchida (JIRA) Thu, 04 Jul 2019 09:35:16 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878792#comment-16878792
 ]


Tomoko Uchida commented on SOLR-13593:
--------------------------------------

I updated the pull request.
{quote}I am not so happy about the "spi" name, I'd perfer "name". Whats's 
exactly the problem with using "name"? The Solr plugin stuff should not be 
affected by this.
{quote}
+1 Now the PR uses "name" to specify SPI names (just as my first proposal):
{code:xml}
<!-- managed-schema file -->
<fieldType name="text_fa_spi" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <!-- for ZWNJ -->
    <charFilter name="persian"/>
    <tokenizer name="standard"/>
    <filter name="lowercase"/>
    <filter name="arabicNormalization"/>
    <filter name="persianNormalization"/>
    <filter name="stop" ignoreCase="true" words="lang/stopwords_fa.txt" />
  </analyzer>
</fieldType>
{code}
{code:java}
# REST API
curl -X POST -H 'Content-type:application/json' --data-binary '{
  "add-field-type" : {
     "name":"myNewTxtField",
     "class":"solr.TextField",
     "positionIncrementGap":"100",
     "analyzer" : {
        "charFilters":[{
           "name":"htmlStrip"
        }],
        "tokenizer":{
           "name":"whitespace" },
        "filters":[{
           "name":"lowercase"
        }]}}
}' http://localhost:8983/solr/techproducts/schema
{code}

--
{quote}Another suggestion, not sure if it's already implemented: When 
persisting a managed schema after modification, it should use the provider 
names only and no longer persist class names.
{quote}
I have not noticed that.
 It seems that Solr persists the factory's original properties as-is with its 
class name ("class"). So I changed the property handling logic in 
{{o.a.s.schema.FieldType}} to discard "class" property when the SPI name is 
passed, and instead preserve "name" in the original properties to keep 
consistency of managed-schema.

> Allow to specify analyzer components by their SPI names in schema definition
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-13593
>                 URL: https://issues.apache.org/jira/browse/SOLR-13593
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Schema and Analysis
>            Reporter: Tomoko Uchida
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now each analysis factory has explicitely documented SPI name which is stored 
> in the static "NAME" field (LUCENE-8778).
>  Solr uses factories' simple class name in schema definition (like 
> class="solr.WhitespaceTokenizerFactory"), but we should be able to also use 
> more concise SPI names (like name="whitespace").
> e.g.:
> {code:xml}
> <fieldtype name="myfieldtype" class="solr.TextField">
>   <analyzer>
>     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>     <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" 
> />
>     <filter class="solr.PorterStemFilterFactory" />
>   </analyzer>
> </fieldtype>
> {code}
> would be
> {code:xml}
> <fieldtype name="myfieldtype" class="solr.TextField">
>   <analyzer>
>     <tokenizer name="whitespace"/>
>     <filter name="keywordMarker" protected="protwords.txt" />
>     <filter name="porterStem" />
>   </analyzer>
> </fieldtype>
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-13593) Allow to specify analyzer components by their SPI names in schema definition

Reply via email to