[ 
https://issues.apache.org/jira/browse/LUCENE-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2537:
-------------------------------

    Attachment: LUCENE-2537.patch

Patch adds the following:
* FSIndexOutput overrides copyBytes to do optimized copies as well (when 
possible).
* CompoundFileWriter changed to call copyBytes, instead of using an 
intermediate buffer.
* Some minor typos and code corrections.

*NOTE:* with the changes to CFW, CheckAbort is accessed only after an entire 
file is written. The estimated amount of work is the length copied. This means 
that if abort() is called, it might take a tad longer until CFW will detect it.
I don't think it's serious, since (1) abort() is not called often, (2) for 
small segments this will probably have no effect (OneMerge was accessed roughly 
every ~2MB copied) and (3) as reported, the optimized copy is faster when using 
FileChannel, therefore the time that passes between checks may not be that long.
And there's a (4) -- for really large segments, the amount of work done to 
merge them is far larger than copying them into the CFS. Therefore the chances 
that abort() will be called during that process is relatively small ...

All tests pass.

> FSDirectory.copy() impl is unsafe
> ---------------------------------
>
>                 Key: LUCENE-2537
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2537
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Store
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 3.1, 4.0
>
>         Attachments: FileCopyTest.java, LUCENE-2537.patch, LUCENE-2537.patch
>
>
> There are a couple of issues with it:
> # FileChannel.transferFrom documents that it may not copy the number of bytes 
> requested, however we don't check the return value. So need to fix the code 
> to read in a loop until all bytes were copied..
> # When calling addIndexes() w/ very large segments (few hundred MBs in size), 
> I ran into the following exception (Java 1.6 -- Java 1.5's exception was 
> cryptic):
> {code}
> Exception in thread "main" java.io.IOException: Map failed
>     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:770)
>     at 
> sun.nio.ch.FileChannelImpl.transferToTrustedChannel(FileChannelImpl.java:450)
>     at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:523)
>     at org.apache.lucene.store.FSDirectory.copy(FSDirectory.java:450)
>     at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3019)
> Caused by: java.lang.OutOfMemoryError: Map failed
>     at sun.nio.ch.FileChannelImpl.map0(Native Method)
>     at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:767)
>     ... 7 more
> {code}
> I changed the impl to something like this:
> {code}
> long numWritten = 0;
> long numToWrite = input.size();
> long bufSize = 1 << 26;
> while (numWritten < numToWrite) {
>   numWritten += output.transferFrom(input, numWritten, bufSize);
> }
> {code}
> And the code successfully adds the indexes. This code uses chunks of 64MB, 
> however that might be too large for some applications, so we definitely need 
> a smaller one. The question is how small so that performance won't be 
> affected, and it'd be great if we can let it be configurable, however since 
> that API is called by other API, such as addIndexes, not sure it's easily 
> controllable.
> Also, I read somewhere (can't remember now where) that on Linux the native 
> impl is better and does copy in chunks. So perhaps we should make a Linux 
> specific impl?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to