[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

Robert Muir (JIRA) Tue, 29 Mar 2011 18:38:49 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012809#comment-13012809
 ]


Robert Muir commented on SOLR-2338:
-----------------------------------

{quote}
i don't personally think it would be confusing, but i also don't think we need 
to advertise it in the example.

we should definitely encourage using similarity per field type, but for people 
who have used it in the past, having it continue to work as a "global default" 
when fieldTypes don't define a similarity gives us nice back-compatibility.
{quote}

I agree here, this is a good compromise and by not advertising it in the 
example, I won't have concerns about the example being confusing.

{quote}
More generally though, i'm thinking that the same <similarity/> tag can be used 
for both the old style (global default) Similarity/SimilarityFactory and the 
new SimilarityProviderFactory using instanceof checks...
{quote}

I have to disagree on this one. The new SimilarityProvider serves a totally 
different purpose, its not a global sim: it answers to requests for sims for 
specific fields. The only reason I provided a factory for it, is so that users 
can tune the parts of lucene's relevance ranking system that are not per-field: 
coord() and queryNorm(). But its not a way to configure tf() or idf() or 
anything like that. In the patch I added this with "expert" to the example, 
though we could remove it from the example entirely if its too expert (might 
be?)

So I think we should do as you suggest and allow a global <similarity/> that is 
the default term weighting unless otherwise specified by a field, but we 
shouldn't confuse this with the parts that arent field-specific...

{quote}
The one other thing i just noticed is that you have 
SimilarityProviderFactory.init(SolrParams) 
{quote}

I configured it this way, because this is how <similarity/> worked before (and 
it was just enough XML to not scare me away). Is it possible we can defer this 
improvement to a later issue? I think we should give this a little more 
thought, for example if we do this its a break in the API for 
<similarityFactory/>, which this patch does not actually do: it only MOVES it 
to the fieldType.


> improved per-field similarity integration into schema.xml
> ---------------------------------------------------------
>
>                 Key: SOLR-2338
>                 URL: https://issues.apache.org/jira/browse/SOLR-2338
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: SOLR-2338.patch, SOLR-2338.patch
>
>
> Currently since LUCENE-2236, we can enable Similarity per-field, but in 
> schema.xml there is only a 'global' factory
> for the SimilarityProvider.
> In my opinion this is too low-level because to customize Similarity on a 
> per-field basis, you have to set your own
> CustomSimilarityProvider with <similarity class=.../> and manage the 
> per-field mapping yourself in java code.
> Instead I think it would be better if you just specify the Similarity in the 
> FieldType, like after <analyzer>.
> As far as the example, one idea from LUCENE-1360 was to make a "short_text" 
> or "metadata_text" used by the
> various metadata fields in the example that has better norm quantization for 
> its shortness...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2338) improved per-field similarity integration into schema.xml

Reply via email to