[
https://issues.apache.org/jira/browse/LUCENE-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dawid Weiss updated LUCENE-8406:
--------------------------------
Fix Version/s: (was: 6.7)
> Make ByteBufferIndexInput public
> --------------------------------
>
> Key: LUCENE-8406
> URL: https://issues.apache.org/jira/browse/LUCENE-8406
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Dawid Weiss
> Assignee: Dawid Weiss
> Priority: Minor
>
> The logic of handling byte buffers splits, their proper closing (cleaner) and
> all the trickery involved in slicing, cloning and proper exception handling
> is quite daunting.
> While ByteBufferIndexInput.newInstance(..) is public, the parent class
> ByteBufferIndexInput is not. I think we should make the parent class public
> to allow advanced users to make use of this (complex) piece of code to create
> IndexInput based on a sequence of ByteBuffers.
> One particular example here is RAMDirectory, which currently uses a custom
> IndexInput implementation, which in turn reaches to RAMFile's synchronized
> methods. This is the cause of quite dramatic congestions on multithreaded
> systems. While we clearly discourage RAMDirectory from being used in
> production environments, there really is no need for it to be slow. If
> modified only slightly (to use ByteBuffer-based input), the performance is on
> par with FSDirectory. Here's a sample log comparing FSDirectory with
> RAMDirectory and the "modified" RAMDirectory making use of the ByteBuffer
> input:
> {code}
> 14:26:40 INFO console: FSDirectory index.
> 14:26:41 INFO console: Opened with 299943 documents.
> 14:26:50 INFO console: Finished: 8.820 s, 240000 matches.
> 14:26:50 INFO console: RAMDirectory index.
> 14:26:50 INFO console: Opened with 299943 documents.
> 14:28:50 INFO console: Finished: 2.012 min, 240000 matches.
> 14:28:50 INFO console: RAMDirectory2 index (wrapped byte[] buffers).
> 14:28:50 INFO console: Opened with 299943 documents.
> 14:29:00 INFO console: Finished: 9.215 s, 240000 matches.
> 14:29:00 INFO console: RAMDirectory2 index (direct memory buffers).
> 14:29:00 INFO console: Opened with 299943 documents.
> 14:29:08 INFO console: Finished: 8.817 s, 240000 matches.
> {code}
> Note the performance difference is an order of magnitude on this 32-CPU
> system (2 minutes vs. 9 seconds). The tiny performance difference between the
> implementation based on direct memory buffers vs. those acquired via
> ByteBuffer.wrap(byte[]) is due to the fact that direct buffers access their
> data via unsafe and the wrapped counterpart uses regular java array access
> (my best guess).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]