[ 
https://issues.apache.org/jira/browse/LUCENE-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021403#comment-14021403
 ] 

Robert Muir commented on LUCENE-5743:
-------------------------------------

Adrien, its a good idea, basically a generalization  of the sparse case.  I 
wanted to tackle this, but decided against it here, the idea is to just improve 
lucenes defaults. This patch handles sparsity to some extent via low bPV and 
constant compression. Nothing sophisticated but I think effective enough as a 
step.

> new 4.9 norms format
> --------------------
>
>                 Key: LUCENE-5743
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5743
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Robert Muir
>         Attachments: LUCENE-5743.patch
>
>
> Norms can eat up a lot of RAM, since by default its 8 bits per field per 
> document. We rely upon users to omit them to not blow up RAM, but its a 
> constant trap.
> Previously in 4.2, I tried to compress these by default, but it was too slow. 
> My mistakes were:
> * allowing slow bits per value like bpv=5 that are implemented with expensive 
> operations.
> * trying to wedge norms into the generalized docvalues numeric case
> * not handling "simple" degraded cases like "constant norm" the same norm 
> value for every document.
> Instead, we can just have a separate norms format that is very careful about 
> what it does, since we understand in general the patterns in the data:
> * uses CONSTANT compression (just writes the single value to metadata) when 
> all values are the same.
> * only compresses to bitsPerValue = 1,2,4 (this also happens often, for very 
> short text fields like person names and other stuff in structured data)
> * otherwise, if you would need 5,6,7,8 bits per value, we just continue to do 
> what we do today, encode as byte[]. Maybe we can improve this later, but this 
> ensures we don't have a performance impact.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to