[
https://issues.apache.org/jira/browse/LUCENE-7475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-7475:
---------------------------------
Attachment: LUCENE-7475.patch
Here is a patch that:
- fixes NormValuesWriter to support sparse norms
- adds a new Lucene70NormsFormat that supports sparsity and only encodes norms
for documents that have a norm
- adds a {{codecSupportsSparsity}} method to BaseNormsFormatTestCase so that
modern norms formats can get proper testing of the sparse case
- fixes SimpleTextNormsFormat to support sparsity
- moves Lucene53NormsFormat to the backward-codecs module
Notes:
- the current patch assigns a norm value of zero to fields that generate no
tokens (can happen eg. with the empty string or if all tokens are stop words)
and only considers that a document does not have norms if no text field were
indexed at all. We could also decide that fields that generate no tokens are
considered as missing too, I think both approaches can make sense.
- the new Lucene70NormsFormat is only a first step, it can certainly be
improved in further issues
> Sparse norms
> ------------
>
> Key: LUCENE-7475
> URL: https://issues.apache.org/jira/browse/LUCENE-7475
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Priority: Minor
> Fix For: master (7.0)
>
> Attachments: LUCENE-7475.patch
>
>
> Even though norms now have an iterator API, they are still always dense in
> practice since documents that do not have a value get assigned 0 as a norm
> value.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]