Re: Omit feature names in Learning to Rank logging

Alessandro Benedetti Fri, 06 Oct 2023 09:08:29 -0700

Hi Doug,
we have been working on this area for a while now, and our next
contribution in the line is to work on the feature vector cache (to just
cache arrays of float i.e. the feature vector with no label).

What we had to do first was to contribute a way to handle better
sparse/dense and null values (
https://issues.apache.org/jira/browse/SOLR-16759,
https://issues.apache.org/jira/browse/SOLR-16596).
Then, we have a contribution almost ready to improve the feature vector
caching (which is currently used only for logging and not reranking).
Finally, we want to split the cache (
https://issues.apache.org/jira/browse/SOLR-10448).

Your need is in line with our queued contributions but unfortunately at the
moment, we had to pause the work on that because of a lack of
sponsorship/funding in general.

We hope to resume the contributions soon but in the meantime, if it helps
we may share the draft pull request.

Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io

*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>

On Wed, 4 Oct 2023 at 16:40, Doug Turnbull
<douglas.turnb...@reddit.com.invalid> wrote:

> When performing feature extraction, it can be advantageous to log many
> features in one request. As expected, with a large number of requests, the
> number of features computed times the number of results dominates the
> server-side performance. But then for a large number of results (100s or
> 1000s) 10s or 100s of ms can be spent on receiving the data. Every byte
> counts
>
> If I'm just getting a doc id and the [features] back, I believe the order
> in the [features] value corresponds to the order I stored the features (as
> that's a flat list). Is there an option instead of
>
> title=12.34,body=5.12,recency=251.1
>
> To just get this back without the labels?
>
> 12.34,5.12,251.1
>
> I noticed dense vs sparse as an option here
> https://solr.apache.org/guide/solr/latest/query-guide/learning-to-rank.html
>
> This helps shave a tad bit of time, but it seems we could do a bit better
> eliminating the feature labels?
>
> Is there some way to do this I'm missing?
>
> Thanks!
> -Doug
>

Re: Omit feature names in Learning to Rank logging

Reply via email to