[jira] [Commented] (SOLR-10359) User Events Logger Component

Diego Ceccarelli (JIRA) Sat, 25 Mar 2017 04:23:20 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941694#comment-15941694
 ]


Diego Ceccarelli commented on SOLR-10359:
-----------------------------------------

Thanks for opening this item, I like the idea and I would be happy to help. 

@[~arafalov]
{quote}
The second one seems to be happening well out of Solr control (UI clicks, what 
user selected, etc). I am not sure if that fits into Solr itself. Commercial 
platforms (such as Fusion) might be integrating it, but they control more of a 
stack.
{quote}

Solr could expose an API (e.g. {{addUserInteraction}}) that could be called by 
the UI when the user interacts with the results. 

I like the idea of {{storeDir}} in the configuration, that would allow also to 
import/export the collection if there's the need to reindex the 
collection. 

Random thoughts/questions?:
  * how to create a unique search id? (should be responsability of solr? I 
think yes)
  * if I want to use metric like the {{CTR}} (i.e., Click Through Rate, 
{{number of clicks / number of impressions}})  in the scoring formula how can I 
do that without joining the two collections? ( (maybe that could be a way to 
'import' a particular metric into the main collection? )  
  * how this could work in case of multiple shards? 
  * it should be easy to implement complex metrics that are computed from 
simple metrics, some examples: *1.* the click through rate: for a document,  or 
a document and a particular query, collect the number of clicks and divide by 
the number of impressions (ignoring multiple requests from the same user? *2.* 
time spent on a document after a query: if a log time of click and time of 
closure of a document, I want to compute how much time the users spent on the 
document *3.* number of clicks per query.
   

with respect to the data model, I would add: 
    * a {{user-id}}
    * a blob containing an optional payload 
    * score of the document




> User Events Logger Component
> ----------------------------
>
>                 Key: SOLR-10359
>                 URL: https://issues.apache.org/jira/browse/SOLR-10359
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Alessandro Benedetti
>              Labels: CTR, evaluation
>
> *Introduction*
> Being able to evaluate the quality of your search engine is becoming more and 
> more important day by day.
> This issue is to put a milestone to integrate online evaluation metrics with 
> Solr.
> *Scope*
> Scope of this issue is to provide a set of components able to :
> 1) Collect Search Results impressions ( results shown per query)
> 2) Collect Users events ( user interactions on the search results per query 
> e.g. clicks, bookmarking,ect )
> 3) Calculate evaluation metrics on demand, such as Click Through Rate, DCG ...
> *Technical Design*
> A SearchComponent can be designed :
> *UsersEventsLoggerComponent*
> A property (such as storeDir) will define where the data collected will be 
> stored.
> Different data structures can be explored, to keep it simple, a first 
> implementation can be a Lucene Index.
> *Data Model*
> The user event can be modelled in the following way :
> <query> - the user query the event is related to
> <result_id> - the ID of the search result involved in the interaction
> <result_position> - the position in the ranking of the search result involved 
> in the interaction
> <timestamp> - time when the interaction happened
> <relevancy_rating> - 0 for impressions, a value between 1-5 to identify the 
> type of user event, the semantic will depend on the domain and use cases
> <test_group> - this can identify a variant, in A/B testing
> *Impressions Logging*
> When the SearchComponent  is assigned to a request handler, everytime it 
> processes a request and return to the user a result set for a query, the 
> component will collect the impressions ( results returned) and index them in 
> the auxiliary lucene index.
> This will happen in parallel as soon as you return the results to avoid 
> affecting the query time.
> Of course an impact on CPU load and memory is expected, will be interesting 
> to minimise it.
> * User Events Logging *
> An UpdateHandler will be exposed to accept POST requests and collect user 
> events.
> Everytime a request is sent, the user event will be indexed in the underline 
> auxiliary Lucene Index.
> * Stats Calculation *
> A RequestHandler will be exposed to be able to calculate stats and 
> aggregations for the metrics :
> /evaluation?metric=ctr&stats=query&compare=testA,testB
> This request could calculate the CTR for our testA and testB to compare.
> Showing stats in total and per query ( to highlight the queries with 
> lower/higher CTR).
> The calculations will happen separating the <test_group> for an easy 
> comparison.
> Will be important to keep it as simple as possible for a first version, to 
> then extend it as much as we like



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-10359) User Events Logger Component

Reply via email to