[
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906723#action_12906723
]
Michael McCandless commented on LUCENE-2573:
--------------------------------------------
bq. We probably need a test that delays the flush process, otherwise flushing
to RAM occurs too fast to proceed to the next tier.
We can modify MockRAMDir to optionally "take its sweet time" when writing
certain files?
{quote}
I'm not sure if after a DWPT is flushing we need to decrement what would
effectively be a "projected RAM usage post current DWPT flush completion".
Otherwise we could in many cases, start the flush of most/all of the DWPTs.
{quote}
But shouldn't tiered flushing take care of this? Ie you only decr RAM consumed
when the flush of the DWPT finishes, not before?
bq. The DWPT that happens to exceed the first tier, is flushed out. This was
easier to implement than finding the highest RAM consuming DWPT and flushing
it, from a different thread.
Hmm but this won't be most efficient, in general? Ie we could end up creating
tiny segments depending on luck-of-the-thread-scheduling?
bq. I did a search through the code and ByteBlockAllocator.perDocAllocator has
no references, it can probably be removed, unless there was some other
intention for it.
I think this makes sense -- each DWPT now immediately flushes to its private
doc store files, so there's no longer a need to track per-doc pending RAM?
{quote}
In DocumentsWriterRAMAllocator, we're only recording the addition of more bytes
when a new block is created, however because previous blocks may be recycled,
it is the recycled blocks that are not being recorded as bytes used. Should we
record all allocated blocks as "in use" ie, count them as bytes used, or wait
until they are "in use" again to be counted as consuming RAM?
{quote}
I think we have to track both. If a buffer is not in the pool (ie not free),
then it's in use and we count that as RAM used, and that counter is used to
trigger tiered flushing. Separately we have to track net allocated, in order
to trim the buffers (drop them, so GC can reclaim) when we are over the
.setRAMBufferSizeMB.
> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Michael Busch
> Assignee: Michael Busch
> Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a
> tiered approach:
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values
> explicitly using total values (e.g. low water mark at 120MB, high water mark
> at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB()
> config method and use something like 90% and 110% for the water marks?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]