[jira] Commented: (LUCENE-2567) RT Terms Dictionary

Jason Rutherglen (JIRA) Tue, 27 Jul 2010 21:54:44 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893063#action_12893063
 ]


Jason Rutherglen commented on LUCENE-2567:
------------------------------------------

Perhaps we can use unchanging int arrays of the term ids sorted
by term ascending as a primary terms dictionary, and place new
terms/postings into a secondary realtime ConcurrentSkipListMap.
They can be iterated over like two segments are with
MultiTermsEnum. The CSLM terms dict would still only be inserted
into on demand (rather than as tokenization occurs) as terms
enums or IRs are instantiated.

With this method, we'd get the low memory usage of an int array
approach, combined with the concurrency of Java's skip list
implementation (which uses more memory). We can then
periodically merge the CSLM terms dictionary into the int[]
terms dictionary. 

I'm not immediately sure how we'd accurately estimate the memory
usage of the ConcurrentSkipListMap.

> RT Terms Dictionary
> -------------------
>
>                 Key: LUCENE-2567
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2567
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>
> Implement an in RAM terms dictionary for realtime search.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2567) RT Terms Dictionary

Reply via email to