[
https://issues.apache.org/jira/browse/SOLR-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941694#comment-15941694
]
Diego Ceccarelli commented on SOLR-10359:
-----------------------------------------
Thanks for opening this item, I like the idea and I would be happy to help.
@[~arafalov]
{quote}
The second one seems to be happening well out of Solr control (UI clicks, what
user selected, etc). I am not sure if that fits into Solr itself. Commercial
platforms (such as Fusion) might be integrating it, but they control more of a
stack.
{quote}
Solr could expose an API (e.g. {{addUserInteraction}}) that could be called by
the UI when the user interacts with the results.
I like the idea of {{storeDir}} in the configuration, that would allow also to
import/export the collection if there's the need to reindex the
collection.
Random thoughts/questions?:
* how to create a unique search id? (should be responsability of solr? I
think yes)
* if I want to use metric like the {{CTR}} (i.e., Click Through Rate,
{{number of clicks / number of impressions}}) in the scoring formula how can I
do that without joining the two collections? ( (maybe that could be a way to
'import' a particular metric into the main collection? )
* how this could work in case of multiple shards?
* it should be easy to implement complex metrics that are computed from
simple metrics, some examples: *1.* the click through rate: for a document, or
a document and a particular query, collect the number of clicks and divide by
the number of impressions (ignoring multiple requests from the same user? *2.*
time spent on a document after a query: if a log time of click and time of
closure of a document, I want to compute how much time the users spent on the
document *3.* number of clicks per query.
with respect to the data model, I would add:
* a {{user-id}}
* a blob containing an optional payload
* score of the document
> User Events Logger Component
> ----------------------------
>
> Key: SOLR-10359
> URL: https://issues.apache.org/jira/browse/SOLR-10359
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Alessandro Benedetti
> Labels: CTR, evaluation
>
> *Introduction*
> Being able to evaluate the quality of your search engine is becoming more and
> more important day by day.
> This issue is to put a milestone to integrate online evaluation metrics with
> Solr.
> *Scope*
> Scope of this issue is to provide a set of components able to :
> 1) Collect Search Results impressions ( results shown per query)
> 2) Collect Users events ( user interactions on the search results per query
> e.g. clicks, bookmarking,ect )
> 3) Calculate evaluation metrics on demand, such as Click Through Rate, DCG ...
> *Technical Design*
> A SearchComponent can be designed :
> *UsersEventsLoggerComponent*
> A property (such as storeDir) will define where the data collected will be
> stored.
> Different data structures can be explored, to keep it simple, a first
> implementation can be a Lucene Index.
> *Data Model*
> The user event can be modelled in the following way :
> <query> - the user query the event is related to
> <result_id> - the ID of the search result involved in the interaction
> <result_position> - the position in the ranking of the search result involved
> in the interaction
> <timestamp> - time when the interaction happened
> <relevancy_rating> - 0 for impressions, a value between 1-5 to identify the
> type of user event, the semantic will depend on the domain and use cases
> <test_group> - this can identify a variant, in A/B testing
> *Impressions Logging*
> When the SearchComponent is assigned to a request handler, everytime it
> processes a request and return to the user a result set for a query, the
> component will collect the impressions ( results returned) and index them in
> the auxiliary lucene index.
> This will happen in parallel as soon as you return the results to avoid
> affecting the query time.
> Of course an impact on CPU load and memory is expected, will be interesting
> to minimise it.
> * User Events Logging *
> An UpdateHandler will be exposed to accept POST requests and collect user
> events.
> Everytime a request is sent, the user event will be indexed in the underline
> auxiliary Lucene Index.
> * Stats Calculation *
> A RequestHandler will be exposed to be able to calculate stats and
> aggregations for the metrics :
> /evaluation?metric=ctr&stats=query&compare=testA,testB
> This request could calculate the CTR for our testA and testB to compare.
> Showing stats in total and per query ( to highlight the queries with
> lower/higher CTR).
> The calculations will happen separating the <test_group> for an easy
> comparison.
> Will be important to keep it as simple as possible for a first version, to
> then extend it as much as we like
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]