[jira] [Updated] (SOLR-6248) MoreLikeThis Query Parser

Steve Rowe (JIRA) Tue, 15 Jul 2014 10:39:20 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Rowe updated SOLR-6248:
-----------------------------

    Description: 
MLT Component doesn't let people highlight/paginate and the handler comes with 
an cost of maintaining another piece in the config. Also, any changes to the 
default (number of results to be fetched etc.) /select handler need to be 
copied/synced with this handler too.

Having an MLT QParser would let users get back docs based on a query for them 
to paginate, highlight etc. It would also give them the flexibility to use this 
anywhere i.e. q,fq,bq etc.

A bit of history about MLT (thanks to Hoss)

MLT Handler pre-dates the existence of QParsers and was meant to take an 
arbitrary query as input, find docs that match that 
query, club them together to find interesting terms, and then use those 
terms as if they were my main query to generate a main result set.

This result would then be used as the set to facet, highlight etc.

The flow: Query -> DocList(m) -> Bag (terms) -> Query -> DocList\(y)

The MLT component on the other hand solved a very different purpose of 
augmenting the main result set. It is used to get similar docs for each of the 
doc in the main result set.

DocSet\(n) -> n * Bag (terms) -> n * (Query) -> n * DocList(m)

The new approach:

All of this can be done better and cleaner (and makes more sense too) using an 
MLT QParser.

An important thing to handle here is the case where the user doesn't have 
TermVectors, in which case, it does what happens right now i.e. parsing stored 
fields.

Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
field would need to be a TextField with an index analyzer defined. This 
analyzer will then be used to extract terms for MLT.

In case of SolrCloud mode, '/get-termvectors' can be used after looking at the 
schema (if TermVectors are enabled for the field). If not, a /get call can be 
used to fetch the field and parse it.

  was:
MLT Component doesn't let people highlight/paginate and the handler comes with 
an cost of maintaining another piece in the config. Also, any changes to the 
default (number of results to be fetched etc.) /select handler need to be 
copied/synced with this handler too.

Having an MLT QParser would let users get back docs based on a query for them 
to paginate, highlight etc. It would also give them the flexibility to use this 
anywhere i.e. q,fq,bq etc.

A bit of history about MLT (thanks to Hoss)

MLT Handler pre-dates the existence of QParsers and was meant to take an 
arbitrary query as input, find docs that match that 
query, club them together to find interesting terms, and then use those 
terms as if they were my main query to generate a main result set.

This result would then be used as the set to facet, highlight etc.

The flow: Query -> DocList(m) -> Bag (terms) -> Query -> DocList(y)

The MLT component on the other hand solved a very different purpose of 
augmenting the main result set. It is used to get similar docs for each of the 
doc in the main result set.

DocSet(n) -> n * Bag (terms) -> n * (Query) -> n * DocList(m)

The new approach:

All of this can be done better and cleaner (and makes more sense too) using an 
MLT QParser.

An important thing to handle here is the case where the user doesn't have 
TermVectors, in which case, it does what happens right now i.e. parsing stored 
fields.

Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
field would need to be a TextField with an index analyzer defined. This 
analyzer will then be used to extract terms for MLT.

In case of SolrCloud mode, '/get-termvectors' can be used after looking at the 
schema (if TermVectors are enabled for the field). If not, a /get call can be 
used to fetch the field and parse it.


> MoreLikeThis Query Parser
> -------------------------
>
>                 Key: SOLR-6248
>                 URL: https://issues.apache.org/jira/browse/SOLR-6248
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Anshum Gupta
>
> MLT Component doesn't let people highlight/paginate and the handler comes 
> with an cost of maintaining another piece in the config. Also, any changes to 
> the default (number of results to be fetched etc.) /select handler need to be 
> copied/synced with this handler too.
> Having an MLT QParser would let users get back docs based on a query for them 
> to paginate, highlight etc. It would also give them the flexibility to use 
> this anywhere i.e. q,fq,bq etc.
> A bit of history about MLT (thanks to Hoss)
> MLT Handler pre-dates the existence of QParsers and was meant to take an 
> arbitrary query as input, find docs that match that 
> query, club them together to find interesting terms, and then use those 
> terms as if they were my main query to generate a main result set.
> This result would then be used as the set to facet, highlight etc.
> The flow: Query -> DocList(m) -> Bag (terms) -> Query -> DocList\(y)
> The MLT component on the other hand solved a very different purpose of 
> augmenting the main result set. It is used to get similar docs for each of 
> the doc in the main result set.
> DocSet\(n) -> n * Bag (terms) -> n * (Query) -> n * DocList(m)
> The new approach:
> All of this can be done better and cleaner (and makes more sense too) using 
> an MLT QParser.
> An important thing to handle here is the case where the user doesn't have 
> TermVectors, in which case, it does what happens right now i.e. parsing 
> stored fields.
> Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
> field would need to be a TextField with an index analyzer defined. This 
> analyzer will then be used to extract terms for MLT.
> In case of SolrCloud mode, '/get-termvectors' can be used after looking at 
> the schema (if TermVectors are enabled for the field). If not, a /get call 
> can be used to fetch the field and parse it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-6248) MoreLikeThis Query Parser

Reply via email to