[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Michael McCandless (JIRA) Sat, 25 Sep 2010 03:37:00 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914802#action_12914802
 ]


Michael McCandless commented on LUCENE-2575:
--------------------------------------------

{quote}
bq. Can we just have IW allocate a new byte[][] after flush?  So then any open 
readers can keep using the one they have?

This means the prior byte[]s will still be recycled after all
active previous flush readers are closed?
{quote}

Probably we should stop reusing the byte[] with this change?  So when all 
readers using a given byte[] are finally GCd, is when that byte[] is reclaimed.

{quote}
bq. it's possible single level skipping, with a larger skip interval, is fine 
for even large RAM buffers.

True, I'll implement a default of one level, and a default
large-ish skip interval.
{quote}

Well, I was thinking only implement the single-level skip case (since it ought 
to be alot simpler than the MLSLW/R)....

{quote}
How many scorers, or how often is skipping used? It's mostly for
disjunction queries?
{quote}

Actually, conjunction (AND) queries, and also PhraseQuery (which is really an 
AND query followed by positions checking).  One thing to remember is that 
skipping is *costly* (especially, the first time you use it) -- I think we 
over-use it today, ie, in many cases we should do a spin loop (.next()) 
instead, if your target "is not that far away".  PhraseQuery (the exact case) 
has a heuristic to do this, but really this ought to be implemented in the 
codec.

bq. get deletes working in the RT branch,

Do we have a design thought out for this?  The challenge is because every doc 
state now has its own private docID stream, we need a global sequence ID to 
track "when" a deletion arrived, to know whether or not that deletion applies 
to each docID, right?  (And, each added doc must also record the sequenceID 
when it was added).


> Concurrent byte and int block implementations
> ---------------------------------------------
>
>                 Key: LUCENE-2575
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2575
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Realtime Branch
>            Reporter: Jason Rutherglen
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
> LUCENE-2575.patch
>
>
> The current *BlockPool implementations aren't quite concurrent.
> We really need something that has a locking flush method, where
> flush is called at the end of adding a document. Once flushed,
> the newly written data would be available to all other reading
> threads (ie, postings etc). I'm not sure I understand the slices
> concept, it seems like it'd be easier to implement a seekable
> random access file like API. One'd seek to a given position,
> then read or write from there. The underlying management of byte
> arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

Reply via email to