[jira] [Commented] (LUCENE-6199) Reduce per-field heap usage for indexed fields

Michael McCandless (JIRA) Sat, 31 Jan 2015 01:51:07 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299740#comment-14299740
 ]


Michael McCandless commented on LUCENE-6199:
--------------------------------------------

bq. All i want is a reasonable tradeoff. 

Well, that is what I tried for in the current patch ... I think there are 
further things we could explore (e.g. packing N longs, which are often small) 
that I didn't do because that seemed too much.

bq.  Also why did we lose node/arc counts in stats? Was this on accident?

This wasn't by accident: you have access to these from the FST.Builder, and you 
can save them away if you really need them later (nothing does in Lucene).

bq. i think adding Accountable to FieldInfos will ultimately be very invasive 
no matter how you do it? 

OK I won't add Accountable to FIS...

bq. Honestly, if the right tests are in place, i think I would get a lot less 
upset about it. But i don't like the idea of introducing bugs, that hurt real 
use cases, caused by complexity of optimizing abuse cases. Do you agree or 
disagree these changes are really scary?

I agree FIS.getAttributes should not be null, and I'll add a test case for 
that.  I'm happy to add further tests, what do you have in mind?  But net/net, 
no, I don't think these changes are scary: they look low risk to me, and they 
give an enormous reduction on per-indexd-field RAM used.

> Reduce per-field heap usage for indexed fields
> ----------------------------------------------
>
>                 Key: LUCENE-6199
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6199
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: Trunk, 5.1
>
>         Attachments: LUCENE-6199.patch
>
>
> Lucene uses a non-trivial baseline bytes of heap for each indexed
> field, and I know it's abusive for an app to create 100K indexed
> fields but I still think we can and should make some effort to reduce
> heap usage per unique field?
> E.g. in block tree we store 3 BytesRefs per field, when 3 byte[]s
> would do...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6199) Reduce per-field heap usage for indexed fields

Reply via email to